Logging to experiments/invertedPendulum/invertedPendulum/Mon-21-Nov-2022-03-21-48-PM-CST_invertedPendulum_trpo_iteration_20_seed2431
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7089540958404541
Validation loss = 0.3834456205368042
Validation loss = 0.3627590835094452
Validation loss = 0.32693013548851013
Validation loss = 0.3188958466053009
Validation loss = 0.2898135781288147
Validation loss = 0.2728419303894043
Validation loss = 0.25685980916023254
Validation loss = 0.23062942922115326
Validation loss = 0.2031148076057434
Validation loss = 0.19315718114376068
Validation loss = 0.17057040333747864
Validation loss = 0.17093926668167114
Validation loss = 0.14537568390369415
Validation loss = 0.15163983404636383
Validation loss = 0.13826774060726166
Validation loss = 0.14071117341518402
Validation loss = 0.11880416423082352
Validation loss = 0.11721304059028625
Validation loss = 0.11575796455144882
Validation loss = 0.12041044980287552
Validation loss = 0.09792236983776093
Validation loss = 0.11610810458660126
Validation loss = 0.1187446191906929
Validation loss = 0.0972001850605011
Validation loss = 0.06991343200206757
Validation loss = 0.08384805172681808
Validation loss = 0.0638653039932251
Validation loss = 0.06559774279594421
Validation loss = 0.07425564527511597
Validation loss = 0.08013864606618881
Validation loss = 0.0676497146487236
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7326073050498962
Validation loss = 0.385534405708313
Validation loss = 0.37908151745796204
Validation loss = 0.3356960713863373
Validation loss = 0.3116284906864166
Validation loss = 0.29563018679618835
Validation loss = 0.2765827476978302
Validation loss = 0.24368509650230408
Validation loss = 0.22178272902965546
Validation loss = 0.1914709508419037
Validation loss = 0.18762457370758057
Validation loss = 0.17331163585186005
Validation loss = 0.16871874034404755
Validation loss = 0.158061221241951
Validation loss = 0.1350969672203064
Validation loss = 0.12618690729141235
Validation loss = 0.127272367477417
Validation loss = 0.11382181197404861
Validation loss = 0.11139816790819168
Validation loss = 0.12629151344299316
Validation loss = 0.11506783962249756
Validation loss = 0.08749064058065414
Validation loss = 0.09381838142871857
Validation loss = 0.08041524887084961
Validation loss = 0.09421494603157043
Validation loss = 0.07282236963510513
Validation loss = 0.06387745589017868
Validation loss = 0.07006903737783432
Validation loss = 0.05625692382454872
Validation loss = 0.053070344030857086
Validation loss = 0.06142430007457733
Validation loss = 0.048270612955093384
Validation loss = 0.05085252225399017
Validation loss = 0.05942702665925026
Validation loss = 0.04988885670900345
Validation loss = 0.053592316806316376
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7083213925361633
Validation loss = 0.3862590491771698
Validation loss = 0.35318276286125183
Validation loss = 0.32697951793670654
Validation loss = 0.3070606589317322
Validation loss = 0.29437947273254395
Validation loss = 0.26956647634506226
Validation loss = 0.2382161021232605
Validation loss = 0.21870528161525726
Validation loss = 0.19272126257419586
Validation loss = 0.17589962482452393
Validation loss = 0.18297512829303741
Validation loss = 0.16800513863563538
Validation loss = 0.1532215029001236
Validation loss = 0.1526152640581131
Validation loss = 0.14909613132476807
Validation loss = 0.15171436965465546
Validation loss = 0.1283617615699768
Validation loss = 0.11362502723932266
Validation loss = 0.09779667854309082
Validation loss = 0.10926071554422379
Validation loss = 0.09610282629728317
Validation loss = 0.0888136550784111
Validation loss = 0.09898387640714645
Validation loss = 0.08023259788751602
Validation loss = 0.07338759303092957
Validation loss = 0.06448517739772797
Validation loss = 0.060077179223299026
Validation loss = 0.06196093559265137
Validation loss = 0.06459175050258636
Validation loss = 0.11937441676855087
Validation loss = 0.09439302980899811
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6928128600120544
Validation loss = 0.3849281072616577
Validation loss = 0.3568195700645447
Validation loss = 0.3312436044216156
Validation loss = 0.3154233992099762
Validation loss = 0.29377683997154236
Validation loss = 0.2663073241710663
Validation loss = 0.24951347708702087
Validation loss = 0.22984082996845245
Validation loss = 0.20726849138736725
Validation loss = 0.20768849551677704
Validation loss = 0.1754886955022812
Validation loss = 0.16966769099235535
Validation loss = 0.14590921998023987
Validation loss = 0.13204023241996765
Validation loss = 0.14244304597377777
Validation loss = 0.1293638050556183
Validation loss = 0.1141614094376564
Validation loss = 0.11765997111797333
Validation loss = 0.10710790008306503
Validation loss = 0.1099015474319458
Validation loss = 0.12901122868061066
Validation loss = 0.09361817687749863
Validation loss = 0.09940186142921448
Validation loss = 0.07982058078050613
Validation loss = 0.08121944963932037
Validation loss = 0.07348322868347168
Validation loss = 0.10714883357286453
Validation loss = 0.08294288069009781
Validation loss = 0.07676924765110016
Validation loss = 0.07514582574367523
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7103604674339294
Validation loss = 0.3943321406841278
Validation loss = 0.36229613423347473
Validation loss = 0.3357662558555603
Validation loss = 0.3173280358314514
Validation loss = 0.3064574599266052
Validation loss = 0.2778085172176361
Validation loss = 0.24444691836833954
Validation loss = 0.22228609025478363
Validation loss = 0.1934831291437149
Validation loss = 0.1865917593240738
Validation loss = 0.19140368700027466
Validation loss = 0.17155185341835022
Validation loss = 0.1724245697259903
Validation loss = 0.1496783047914505
Validation loss = 0.13541266322135925
Validation loss = 0.13312435150146484
Validation loss = 0.15556079149246216
Validation loss = 0.14234159886837006
Validation loss = 0.12275146692991257
Validation loss = 0.10860667377710342
Validation loss = 0.08909260481595993
Validation loss = 0.07445613294839859
Validation loss = 0.10357849299907684
Validation loss = 0.07095295190811157
Validation loss = 0.06761197745800018
Validation loss = 0.07724808901548386
Validation loss = 0.058333493769168854
Validation loss = 0.09194119274616241
Validation loss = 0.0730142742395401
Validation loss = 0.058235831558704376
Validation loss = 0.05745528265833855
Validation loss = 0.07686466723680496
Validation loss = 0.05668794363737106
Validation loss = 0.056763842701911926
Validation loss = 0.05021456256508827
Validation loss = 0.051329441368579865
Validation loss = 0.049449943006038666
Validation loss = 0.05793937295675278
Validation loss = 0.06119914352893829
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0276  |
| Iteration     | 0        |
| MaximumReturn | -0.0183  |
| MinimumReturn | -0.0427  |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3478102385997772
Validation loss = 0.20139580965042114
Validation loss = 0.16878336668014526
Validation loss = 0.14282441139221191
Validation loss = 0.13053494691848755
Validation loss = 0.12007999420166016
Validation loss = 0.09696558117866516
Validation loss = 0.09116289764642715
Validation loss = 0.08420289307832718
Validation loss = 0.07290612161159515
Validation loss = 0.0654895007610321
Validation loss = 0.06642838567495346
Validation loss = 0.05695309489965439
Validation loss = 0.05537073686718941
Validation loss = 0.05729130655527115
Validation loss = 0.05578390881419182
Validation loss = 0.06594612449407578
Validation loss = 0.059700071811676025
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3360312283039093
Validation loss = 0.20303714275360107
Validation loss = 0.17767901718616486
Validation loss = 0.13978825509548187
Validation loss = 0.1211167499423027
Validation loss = 0.10057496279478073
Validation loss = 0.09049069881439209
Validation loss = 0.08429922163486481
Validation loss = 0.07094743847846985
Validation loss = 0.06412894278764725
Validation loss = 0.0781523585319519
Validation loss = 0.055895838886499405
Validation loss = 0.053241122514009476
Validation loss = 0.04425203427672386
Validation loss = 0.04172284156084061
Validation loss = 0.04425276443362236
Validation loss = 0.04320135712623596
Validation loss = 0.06330100446939468
Validation loss = 0.08153940737247467
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2572930157184601
Validation loss = 0.1703530251979828
Validation loss = 0.14239011704921722
Validation loss = 0.10592065006494522
Validation loss = 0.08115467429161072
Validation loss = 0.06784962862730026
Validation loss = 0.07293260842561722
Validation loss = 0.052810072898864746
Validation loss = 0.06035889312624931
Validation loss = 0.08091279864311218
Validation loss = 0.06596338003873825
Validation loss = 0.05660327523946762
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.24013228714466095
Validation loss = 0.1624598205089569
Validation loss = 0.12551862001419067
Validation loss = 0.11292541027069092
Validation loss = 0.09042735397815704
Validation loss = 0.08888176083564758
Validation loss = 0.07773125171661377
Validation loss = 0.0732814371585846
Validation loss = 0.07236257940530777
Validation loss = 0.05910533666610718
Validation loss = 0.06002112850546837
Validation loss = 0.05186387151479721
Validation loss = 0.05412569269537926
Validation loss = 0.049400124698877335
Validation loss = 0.05698419362306595
Validation loss = 0.05967268720269203
Validation loss = 0.05053724721074104
Validation loss = 0.07416632026433945
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3364690840244293
Validation loss = 0.20652174949645996
Validation loss = 0.1730751395225525
Validation loss = 0.1390632838010788
Validation loss = 0.1126316636800766
Validation loss = 0.09495449811220169
Validation loss = 0.09229061752557755
Validation loss = 0.07580007612705231
Validation loss = 0.0757443755865097
Validation loss = 0.06398019939661026
Validation loss = 0.0644335150718689
Validation loss = 0.08302313834428787
Validation loss = 0.07071750611066818
Validation loss = 0.05912938341498375
Validation loss = 0.052439481019973755
Validation loss = 0.07527604699134827
Validation loss = 0.05775709077715874
Validation loss = 0.04794566333293915
Validation loss = 0.044985707849264145
Validation loss = 0.039586812257766724
Validation loss = 0.04869205877184868
Validation loss = 0.0398319773375988
Validation loss = 0.03486047312617302
Validation loss = 0.038151469081640244
Validation loss = 0.028624117374420166
Validation loss = 0.036522507667541504
Validation loss = 0.035363756120204926
Validation loss = 0.0430469810962677
Validation loss = 0.03950972110033035
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00453 |
| Iteration     | 1        |
| MaximumReturn | -0.00324 |
| MinimumReturn | -0.00615 |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12530851364135742
Validation loss = 0.08765023946762085
Validation loss = 0.08398328721523285
Validation loss = 0.07513954490423203
Validation loss = 0.05578112229704857
Validation loss = 0.04533980414271355
Validation loss = 0.039038415998220444
Validation loss = 0.051944080740213394
Validation loss = 0.04063669592142105
Validation loss = 0.04097408056259155
Validation loss = 0.03816264122724533
Validation loss = 0.03623887151479721
Validation loss = 0.03124321810901165
Validation loss = 0.025352731347084045
Validation loss = 0.028131820261478424
Validation loss = 0.024471942335367203
Validation loss = 0.02167777344584465
Validation loss = 0.03271225467324257
Validation loss = 0.0216988418251276
Validation loss = 0.030525371432304382
Validation loss = 0.03598623722791672
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0982394590973854
Validation loss = 0.06141769513487816
Validation loss = 0.03518657758831978
Validation loss = 0.03770611435174942
Validation loss = 0.035012975335121155
Validation loss = 0.03737596794962883
Validation loss = 0.027058104053139687
Validation loss = 0.02451157569885254
Validation loss = 0.03450808301568031
Validation loss = 0.023817146196961403
Validation loss = 0.023050224408507347
Validation loss = 0.021757084876298904
Validation loss = 0.030708467587828636
Validation loss = 0.027157370001077652
Validation loss = 0.023594718426465988
Validation loss = 0.021830398589372635
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13204526901245117
Validation loss = 0.09834706783294678
Validation loss = 0.08146964013576508
Validation loss = 0.06704513728618622
Validation loss = 0.06257422268390656
Validation loss = 0.04859647899866104
Validation loss = 0.04348297044634819
Validation loss = 0.045219406485557556
Validation loss = 0.0424451045691967
Validation loss = 0.03946009278297424
Validation loss = 0.034928057342767715
Validation loss = 0.027121566236019135
Validation loss = 0.02309582568705082
Validation loss = 0.024394098669290543
Validation loss = 0.02674696035683155
Validation loss = 0.027754543349146843
Validation loss = 0.026638252660632133
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10572133213281631
Validation loss = 0.07534055411815643
Validation loss = 0.05268872156739235
Validation loss = 0.042536042630672455
Validation loss = 0.04894772544503212
Validation loss = 0.027787670493125916
Validation loss = 0.03910904377698898
Validation loss = 0.03624754399061203
Validation loss = 0.023209882900118828
Validation loss = 0.06338588893413544
Validation loss = 0.03230423107743263
Validation loss = 0.028884368017315865
Validation loss = 0.028346285223960876
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11269298195838928
Validation loss = 0.0803174376487732
Validation loss = 0.06665439903736115
Validation loss = 0.06668762862682343
Validation loss = 0.04359792172908783
Validation loss = 0.03472821041941643
Validation loss = 0.039395350962877274
Validation loss = 0.03162620589137077
Validation loss = 0.023551033809781075
Validation loss = 0.02529936097562313
Validation loss = 0.03433699160814285
Validation loss = 0.022699544206261635
Validation loss = 0.027616767212748528
Validation loss = 0.028463289141654968
Validation loss = 0.021477865055203438
Validation loss = 0.025669852271676064
Validation loss = 0.020850105211138725
Validation loss = 0.022343406453728676
Validation loss = 0.021191123872995377
Validation loss = 0.028436923399567604
Validation loss = 0.025490181520581245
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00127  |
| Iteration     | 2         |
| MaximumReturn | -0.000969 |
| MinimumReturn | -0.00155  |
| TotalSamples  | 6664      |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03239825740456581
Validation loss = 0.035492174327373505
Validation loss = 0.028316238895058632
Validation loss = 0.0319775827229023
Validation loss = 0.027314042672514915
Validation loss = 0.025731317698955536
Validation loss = 0.030888592824339867
Validation loss = 0.028289789333939552
Validation loss = 0.03478134796023369
Validation loss = 0.023948436602950096
Validation loss = 0.020510444417595863
Validation loss = 0.024576807394623756
Validation loss = 0.02557218074798584
Validation loss = 0.019735703244805336
Validation loss = 0.026958003640174866
Validation loss = 0.01654829829931259
Validation loss = 0.021544359624385834
Validation loss = 0.02514859102666378
Validation loss = 0.021052902564406395
Validation loss = 0.018508395180106163
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0629783421754837
Validation loss = 0.021237626671791077
Validation loss = 0.01883132942020893
Validation loss = 0.037957657128572464
Validation loss = 0.020069008693099022
Validation loss = 0.022282904013991356
Validation loss = 0.026437439024448395
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04813544824719429
Validation loss = 0.03125598654150963
Validation loss = 0.03361928462982178
Validation loss = 0.024803223088383675
Validation loss = 0.021629812195897102
Validation loss = 0.018349025398492813
Validation loss = 0.02565956860780716
Validation loss = 0.026364199817180634
Validation loss = 0.02081787772476673
Validation loss = 0.026026101782917976
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0590578056871891
Validation loss = 0.04208315536379814
Validation loss = 0.029367370530962944
Validation loss = 0.030952079221606255
Validation loss = 0.03635000064969063
Validation loss = 0.02538231573998928
Validation loss = 0.04141593724489212
Validation loss = 0.02061488851904869
Validation loss = 0.03176769241690636
Validation loss = 0.01982009969651699
Validation loss = 0.015297475270926952
Validation loss = 0.019957246258854866
Validation loss = 0.017636490985751152
Validation loss = 0.018669750541448593
Validation loss = 0.031380314379930496
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07722631841897964
Validation loss = 0.03371545672416687
Validation loss = 0.029917536303400993
Validation loss = 0.030073443427681923
Validation loss = 0.029828691855072975
Validation loss = 0.03244946524500847
Validation loss = 0.022528039291501045
Validation loss = 0.02044466696679592
Validation loss = 0.01812654919922352
Validation loss = 0.030494168400764465
Validation loss = 0.02921007014811039
Validation loss = 0.024955786764621735
Validation loss = 0.021690882742404938
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00146 |
| Iteration     | 3        |
| MaximumReturn | -0.00125 |
| MinimumReturn | -0.00204 |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03932594507932663
Validation loss = 0.021403133869171143
Validation loss = 0.016133811324834824
Validation loss = 0.017314523458480835
Validation loss = 0.01329178549349308
Validation loss = 0.02061873860657215
Validation loss = 0.020487284287810326
Validation loss = 0.017922034487128258
Validation loss = 0.016932817175984383
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.038375113159418106
Validation loss = 0.02508356049656868
Validation loss = 0.018299981951713562
Validation loss = 0.015555604360997677
Validation loss = 0.02415967918932438
Validation loss = 0.02289739064872265
Validation loss = 0.023819679394364357
Validation loss = 0.028161248192191124
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03914802148938179
Validation loss = 0.024476423859596252
Validation loss = 0.017178937792778015
Validation loss = 0.026728328317403793
Validation loss = 0.01805885136127472
Validation loss = 0.02271139994263649
Validation loss = 0.017308933660387993
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020391015335917473
Validation loss = 0.024995306506752968
Validation loss = 0.03072936274111271
Validation loss = 0.029547609388828278
Validation loss = 0.017359556630253792
Validation loss = 0.01771560125052929
Validation loss = 0.03072301857173443
Validation loss = 0.02031967230141163
Validation loss = 0.01956809125840664
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.030860964208841324
Validation loss = 0.0191867183893919
Validation loss = 0.01945437118411064
Validation loss = 0.014374853111803532
Validation loss = 0.019398165866732597
Validation loss = 0.02528383396565914
Validation loss = 0.016287343576550484
Validation loss = 0.0169514212757349
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000792 |
| Iteration     | 4         |
| MaximumReturn | -0.00066  |
| MinimumReturn | -0.00105  |
| TotalSamples  | 9996      |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.052346836775541306
Validation loss = 0.03522057086229324
Validation loss = 0.030654290691018105
Validation loss = 0.03585205227136612
Validation loss = 0.03028121590614319
Validation loss = 0.02131468430161476
Validation loss = 0.024492673575878143
Validation loss = 0.015598130412399769
Validation loss = 0.0182556863874197
Validation loss = 0.022156165912747383
Validation loss = 0.01782362535595894
Validation loss = 0.025058742612600327
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0282290019094944
Validation loss = 0.0252851452678442
Validation loss = 0.026262950152158737
Validation loss = 0.0173758864402771
Validation loss = 0.01677112467586994
Validation loss = 0.014819773845374584
Validation loss = 0.01708402670919895
Validation loss = 0.018122337758541107
Validation loss = 0.023158028721809387
Validation loss = 0.016530629247426987
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018353229388594627
Validation loss = 0.01925205998122692
Validation loss = 0.011710032820701599
Validation loss = 0.017627421766519547
Validation loss = 0.01440410502254963
Validation loss = 0.014096617698669434
Validation loss = 0.016508128494024277
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03386653959751129
Validation loss = 0.023672277107834816
Validation loss = 0.019350087270140648
Validation loss = 0.014300063252449036
Validation loss = 0.012444185093045235
Validation loss = 0.017977450042963028
Validation loss = 0.019179679453372955
Validation loss = 0.018794940784573555
Validation loss = 0.019043561071157455
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015709619969129562
Validation loss = 0.020689833909273148
Validation loss = 0.016189826652407646
Validation loss = 0.01443551667034626
Validation loss = 0.010392565280199051
Validation loss = 0.013566605746746063
Validation loss = 0.013356899842619896
Validation loss = 0.015498435124754906
Validation loss = 0.01231672428548336
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00501 |
| Iteration     | 5        |
| MaximumReturn | -0.00308 |
| MinimumReturn | -0.0102  |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023219216614961624
Validation loss = 0.017825428396463394
Validation loss = 0.0166209414601326
Validation loss = 0.01445596944540739
Validation loss = 0.01359865814447403
Validation loss = 0.01706044375896454
Validation loss = 0.011620153672993183
Validation loss = 0.011226603761315346
Validation loss = 0.01957700029015541
Validation loss = 0.02136596105992794
Validation loss = 0.01851811632514
Validation loss = 0.012048025615513325
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016434170305728912
Validation loss = 0.013898633420467377
Validation loss = 0.011134126223623753
Validation loss = 0.015334914438426495
Validation loss = 0.013695480301976204
Validation loss = 0.020251119509339333
Validation loss = 0.012654577381908894
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01958831027150154
Validation loss = 0.027226200327277184
Validation loss = 0.016710851341485977
Validation loss = 0.011634711176156998
Validation loss = 0.010302919894456863
Validation loss = 0.01749395579099655
Validation loss = 0.014908929355442524
Validation loss = 0.016471195966005325
Validation loss = 0.011403906159102917
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024866020306944847
Validation loss = 0.012226710096001625
Validation loss = 0.01282828114926815
Validation loss = 0.015229078941047192
Validation loss = 0.010109924711287022
Validation loss = 0.017304660752415657
Validation loss = 0.014960992150008678
Validation loss = 0.014439918100833893
Validation loss = 0.014150622300803661
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0267370343208313
Validation loss = 0.018112551420927048
Validation loss = 0.01066088117659092
Validation loss = 0.014361634850502014
Validation loss = 0.012329788878560066
Validation loss = 0.009886393323540688
Validation loss = 0.009765091352164745
Validation loss = 0.010851382277905941
Validation loss = 0.011503612622618675
Validation loss = 0.013817193917930126
Validation loss = 0.020312141627073288
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00155 |
| Iteration     | 6        |
| MaximumReturn | -0.00107 |
| MinimumReturn | -0.00313 |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010812911204993725
Validation loss = 0.013018970377743244
Validation loss = 0.01495687011629343
Validation loss = 0.008902017958462238
Validation loss = 0.011861736886203289
Validation loss = 0.010322068817913532
Validation loss = 0.015052503906190395
Validation loss = 0.01891152560710907
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01476352009922266
Validation loss = 0.012963567860424519
Validation loss = 0.009179138578474522
Validation loss = 0.023853585124015808
Validation loss = 0.014710180461406708
Validation loss = 0.014864658005535603
Validation loss = 0.017065884545445442
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01906111277639866
Validation loss = 0.012838521040976048
Validation loss = 0.011569575406610966
Validation loss = 0.008731112815439701
Validation loss = 0.013648279011249542
Validation loss = 0.009859985671937466
Validation loss = 0.014371958561241627
Validation loss = 0.012551332823932171
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021030478179454803
Validation loss = 0.014629307202994823
Validation loss = 0.017960671335458755
Validation loss = 0.011219172738492489
Validation loss = 0.01218374166637659
Validation loss = 0.014317783527076244
Validation loss = 0.010571833699941635
Validation loss = 0.009144937619566917
Validation loss = 0.013144631870090961
Validation loss = 0.011295623145997524
Validation loss = 0.009752974845468998
Validation loss = 0.008621654473245144
Validation loss = 0.00823925156146288
Validation loss = 0.010631931014358997
Validation loss = 0.014930781908333302
Validation loss = 0.013668100349605083
Validation loss = 0.012645686976611614
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0249897763133049
Validation loss = 0.010785146616399288
Validation loss = 0.01434270665049553
Validation loss = 0.011583592742681503
Validation loss = 0.01068983692675829
Validation loss = 0.010689993388950825
Validation loss = 0.011761973612010479
Validation loss = 0.010315152816474438
Validation loss = 0.009660658426582813
Validation loss = 0.01856020651757717
Validation loss = 0.011061799712479115
Validation loss = 0.00929191242903471
Validation loss = 0.009225335903465748
Validation loss = 0.010864373296499252
Validation loss = 0.0115978317335248
Validation loss = 0.009892920963466167
Validation loss = 0.0176670141518116
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000854 |
| Iteration     | 7         |
| MaximumReturn | -0.000616 |
| MinimumReturn | -0.00118  |
| TotalSamples  | 14994     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01467308308929205
Validation loss = 0.013153769075870514
Validation loss = 0.012242554686963558
Validation loss = 0.009175397455692291
Validation loss = 0.01914195343852043
Validation loss = 0.012231056578457355
Validation loss = 0.013274581171572208
Validation loss = 0.008726710453629494
Validation loss = 0.013594010844826698
Validation loss = 0.010932418517768383
Validation loss = 0.010768922045826912
Validation loss = 0.008101840503513813
Validation loss = 0.016054898500442505
Validation loss = 0.009601511992514133
Validation loss = 0.012498429045081139
Validation loss = 0.009092219173908234
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012969997711479664
Validation loss = 0.012591863982379436
Validation loss = 0.014104525558650494
Validation loss = 0.01480923593044281
Validation loss = 0.016991345211863518
Validation loss = 0.012675642967224121
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01652495004236698
Validation loss = 0.024000259116292
Validation loss = 0.014846302568912506
Validation loss = 0.013261891901493073
Validation loss = 0.02179141528904438
Validation loss = 0.010139047168195248
Validation loss = 0.012840916402637959
Validation loss = 0.012318833731114864
Validation loss = 0.010942153632640839
Validation loss = 0.014809899963438511
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015081866644322872
Validation loss = 0.013667465187609196
Validation loss = 0.012867496348917484
Validation loss = 0.020913604646921158
Validation loss = 0.017056429758667946
Validation loss = 0.023382961750030518
Validation loss = 0.014290863648056984
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019060101360082626
Validation loss = 0.016544159501791
Validation loss = 0.01603998988866806
Validation loss = 0.012665139511227608
Validation loss = 0.010672233067452908
Validation loss = 0.009132652543485165
Validation loss = 0.008827301673591137
Validation loss = 0.015681298449635506
Validation loss = 0.018400272354483604
Validation loss = 0.010229945182800293
Validation loss = 0.006829279009252787
Validation loss = 0.011533894576132298
Validation loss = 0.011704595759510994
Validation loss = 0.00997314602136612
Validation loss = 0.018124710768461227
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000749 |
| Iteration     | 8         |
| MaximumReturn | -0.000557 |
| MinimumReturn | -0.00094  |
| TotalSamples  | 16660     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009077465161681175
Validation loss = 0.009036080911755562
Validation loss = 0.01747332140803337
Validation loss = 0.008703415282070637
Validation loss = 0.010710719041526318
Validation loss = 0.008005155250430107
Validation loss = 0.010865133255720139
Validation loss = 0.014101007953286171
Validation loss = 0.015938561409711838
Validation loss = 0.017076604068279266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014001252129673958
Validation loss = 0.01209467463195324
Validation loss = 0.010485420934855938
Validation loss = 0.012135639786720276
Validation loss = 0.012161072343587875
Validation loss = 0.014113307930529118
Validation loss = 0.016356561332941055
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012965572997927666
Validation loss = 0.02144285850226879
Validation loss = 0.012131080962717533
Validation loss = 0.01231670193374157
Validation loss = 0.010615918785333633
Validation loss = 0.011653383262455463
Validation loss = 0.010667682625353336
Validation loss = 0.012352215126156807
Validation loss = 0.010609179735183716
Validation loss = 0.010753510519862175
Validation loss = 0.008396374993026257
Validation loss = 0.011481624096632004
Validation loss = 0.010202440433204174
Validation loss = 0.0109903234988451
Validation loss = 0.010991871356964111
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013589451089501381
Validation loss = 0.008873828686773777
Validation loss = 0.009236535988748074
Validation loss = 0.01076959166675806
Validation loss = 0.013453778810799122
Validation loss = 0.011470514349639416
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018276799470186234
Validation loss = 0.01557164266705513
Validation loss = 0.011658682487905025
Validation loss = 0.012533222325146198
Validation loss = 0.010371529497206211
Validation loss = 0.01393607072532177
Validation loss = 0.011523924767971039
Validation loss = 0.009358336217701435
Validation loss = 0.008190895430743694
Validation loss = 0.009176773019134998
Validation loss = 0.009759005159139633
Validation loss = 0.010208374820649624
Validation loss = 0.008840596303343773
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00114 |
| Iteration     | 9        |
| MaximumReturn | -0.00086 |
| MinimumReturn | -0.00162 |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018187157809734344
Validation loss = 0.011342830955982208
Validation loss = 0.016938354820013046
Validation loss = 0.008907416835427284
Validation loss = 0.008417938835918903
Validation loss = 0.008884662762284279
Validation loss = 0.012032543309032917
Validation loss = 0.016011420637369156
Validation loss = 0.007504526991397142
Validation loss = 0.006756755523383617
Validation loss = 0.007694586180150509
Validation loss = 0.007021298632025719
Validation loss = 0.007302454672753811
Validation loss = 0.011670430190861225
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014813506044447422
Validation loss = 0.011033516377210617
Validation loss = 0.009148637764155865
Validation loss = 0.01183297112584114
Validation loss = 0.012649934738874435
Validation loss = 0.013094140216708183
Validation loss = 0.010349438525736332
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014513249509036541
Validation loss = 0.012366046197712421
Validation loss = 0.010341830551624298
Validation loss = 0.009328771382570267
Validation loss = 0.008464867249131203
Validation loss = 0.008823150768876076
Validation loss = 0.010155894793570042
Validation loss = 0.00885472260415554
Validation loss = 0.013757963664829731
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010416163131594658
Validation loss = 0.015485944226384163
Validation loss = 0.009094296023249626
Validation loss = 0.013695312663912773
Validation loss = 0.012637739069759846
Validation loss = 0.012981418520212173
Validation loss = 0.010504086501896381
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008984890766441822
Validation loss = 0.008102167397737503
Validation loss = 0.014898169785737991
Validation loss = 0.019168773666024208
Validation loss = 0.008385865017771721
Validation loss = 0.009798971936106682
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000766 |
| Iteration     | 10        |
| MaximumReturn | -0.000528 |
| MinimumReturn | -0.00104  |
| TotalSamples  | 19992     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0156804621219635
Validation loss = 0.009387221187353134
Validation loss = 0.010509367100894451
Validation loss = 0.013293762691318989
Validation loss = 0.011298485100269318
Validation loss = 0.009904465638101101
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011353647336363792
Validation loss = 0.011057424359023571
Validation loss = 0.009403526782989502
Validation loss = 0.007788561284542084
Validation loss = 0.014771290123462677
Validation loss = 0.010839542374014854
Validation loss = 0.01675248146057129
Validation loss = 0.017988422885537148
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010244736447930336
Validation loss = 0.01688925363123417
Validation loss = 0.011702771298587322
Validation loss = 0.009004844352602959
Validation loss = 0.00949160661548376
Validation loss = 0.007785238325595856
Validation loss = 0.00917450524866581
Validation loss = 0.010275004431605339
Validation loss = 0.010159442201256752
Validation loss = 0.011565516702830791
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009654178284108639
Validation loss = 0.010974259115755558
Validation loss = 0.011218089610338211
Validation loss = 0.0160222165286541
Validation loss = 0.014131871052086353
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010690735653042793
Validation loss = 0.00998717825859785
Validation loss = 0.008162358775734901
Validation loss = 0.00813206471502781
Validation loss = 0.01031652744859457
Validation loss = 0.006637664046138525
Validation loss = 0.006109910551458597
Validation loss = 0.006580401211977005
Validation loss = 0.010329702869057655
Validation loss = 0.007644328288733959
Validation loss = 0.0213765911757946
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000744 |
| Iteration     | 11        |
| MaximumReturn | -0.000544 |
| MinimumReturn | -0.000986 |
| TotalSamples  | 21658     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021944556385278702
Validation loss = 0.009151347912847996
Validation loss = 0.014502902515232563
Validation loss = 0.007593887858092785
Validation loss = 0.010067996568977833
Validation loss = 0.010560366325080395
Validation loss = 0.011334570124745369
Validation loss = 0.0076436325907707214
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016515668481588364
Validation loss = 0.016238555312156677
Validation loss = 0.01874730736017227
Validation loss = 0.010648486204445362
Validation loss = 0.02012045495212078
Validation loss = 0.008153674192726612
Validation loss = 0.010227317921817303
Validation loss = 0.009223921224474907
Validation loss = 0.014819113537669182
Validation loss = 0.01502984482795
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013909107074141502
Validation loss = 0.014115257188677788
Validation loss = 0.009846506640315056
Validation loss = 0.011306042782962322
Validation loss = 0.01145957037806511
Validation loss = 0.009369409643113613
Validation loss = 0.012812614440917969
Validation loss = 0.010150766000151634
Validation loss = 0.010663296096026897
Validation loss = 0.008937975391745567
Validation loss = 0.010675263591110706
Validation loss = 0.009599389508366585
Validation loss = 0.008677626959979534
Validation loss = 0.014467465691268444
Validation loss = 0.009396437555551529
Validation loss = 0.008164086379110813
Validation loss = 0.008293062448501587
Validation loss = 0.007529783993959427
Validation loss = 0.00934144202619791
Validation loss = 0.007230029441416264
Validation loss = 0.007529949303716421
Validation loss = 0.007761751767247915
Validation loss = 0.010194692760705948
Validation loss = 0.008343761786818504
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009524820372462273
Validation loss = 0.009959707967936993
Validation loss = 0.009705825708806515
Validation loss = 0.010302672162652016
Validation loss = 0.0065136440098285675
Validation loss = 0.01342045795172453
Validation loss = 0.02087930031120777
Validation loss = 0.015567365102469921
Validation loss = 0.01674949936568737
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01333983801305294
Validation loss = 0.007206386886537075
Validation loss = 0.008262296207249165
Validation loss = 0.006858055479824543
Validation loss = 0.007200042251497507
Validation loss = 0.006186670623719692
Validation loss = 0.00848337821662426
Validation loss = 0.0067566754296422005
Validation loss = 0.014226390048861504
Validation loss = 0.009687388315796852
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000734 |
| Iteration     | 12        |
| MaximumReturn | -0.00047  |
| MinimumReturn | -0.00122  |
| TotalSamples  | 23324     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006060834042727947
Validation loss = 0.022410254925489426
Validation loss = 0.008144802413880825
Validation loss = 0.00702156824991107
Validation loss = 0.009755321778357029
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01058474462479353
Validation loss = 0.010135512799024582
Validation loss = 0.010087438859045506
Validation loss = 0.012105535715818405
Validation loss = 0.009351962246000767
Validation loss = 0.009319715201854706
Validation loss = 0.014696776866912842
Validation loss = 0.010665417648851871
Validation loss = 0.013528894633054733
Validation loss = 0.009051811881363392
Validation loss = 0.007879246026277542
Validation loss = 0.009811971336603165
Validation loss = 0.008545540273189545
Validation loss = 0.00973503664135933
Validation loss = 0.007904640398919582
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008650575764477253
Validation loss = 0.008239207789301872
Validation loss = 0.007803157437592745
Validation loss = 0.010718856006860733
Validation loss = 0.008557191118597984
Validation loss = 0.009654466062784195
Validation loss = 0.008236736990511417
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014976021833717823
Validation loss = 0.011881652288138866
Validation loss = 0.013687008060514927
Validation loss = 0.009163948707282543
Validation loss = 0.01285476889461279
Validation loss = 0.007563008461147547
Validation loss = 0.008996597491204739
Validation loss = 0.00721007538959384
Validation loss = 0.00816311314702034
Validation loss = 0.007271382492035627
Validation loss = 0.007608808111399412
Validation loss = 0.010031484998762608
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006959900259971619
Validation loss = 0.00857462827116251
Validation loss = 0.006539991125464439
Validation loss = 0.005691255908459425
Validation loss = 0.028345435857772827
Validation loss = 0.008847486227750778
Validation loss = 0.009932313114404678
Validation loss = 0.010953307151794434
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00193  |
| Iteration     | 13        |
| MaximumReturn | -0.000848 |
| MinimumReturn | -0.0039   |
| TotalSamples  | 24990     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015558954328298569
Validation loss = 0.006617593113332987
Validation loss = 0.005298633594065905
Validation loss = 0.004184276796877384
Validation loss = 0.006555119063705206
Validation loss = 0.005941733252257109
Validation loss = 0.004710338544100523
Validation loss = 0.004170833621174097
Validation loss = 0.006636505480855703
Validation loss = 0.00793684646487236
Validation loss = 0.006329941097646952
Validation loss = 0.00645953556522727
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021516812965273857
Validation loss = 0.007139342371374369
Validation loss = 0.006463995203375816
Validation loss = 0.005027537699788809
Validation loss = 0.006548064295202494
Validation loss = 0.005239955615252256
Validation loss = 0.007372535299509764
Validation loss = 0.007970196194946766
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016441138461232185
Validation loss = 0.00601365277543664
Validation loss = 0.00656511215493083
Validation loss = 0.007702274713665247
Validation loss = 0.006421012803912163
Validation loss = 0.00552230654284358
Validation loss = 0.005199820268899202
Validation loss = 0.00490366667509079
Validation loss = 0.009297032840549946
Validation loss = 0.004780717194080353
Validation loss = 0.004651848692446947
Validation loss = 0.004689512308686972
Validation loss = 0.006769284140318632
Validation loss = 0.004734119866043329
Validation loss = 0.006417181808501482
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011749549768865108
Validation loss = 0.006922316271811724
Validation loss = 0.004908060189336538
Validation loss = 0.009532364085316658
Validation loss = 0.006092362105846405
Validation loss = 0.007011809851974249
Validation loss = 0.0063012209720909595
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012314938008785248
Validation loss = 0.005918450653553009
Validation loss = 0.00553023861721158
Validation loss = 0.005988143850117922
Validation loss = 0.005138137377798557
Validation loss = 0.005194921512156725
Validation loss = 0.0050583104602992535
Validation loss = 0.005615584552288055
Validation loss = 0.005617499351501465
Validation loss = 0.004184294026345015
Validation loss = 0.004361144732683897
Validation loss = 0.008172563277184963
Validation loss = 0.00504280673339963
Validation loss = 0.0075631155632436275
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000857 |
| Iteration     | 14        |
| MaximumReturn | -0.000612 |
| MinimumReturn | -0.00176  |
| TotalSamples  | 26656     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007610219996422529
Validation loss = 0.009546558372676373
Validation loss = 0.007904362864792347
Validation loss = 0.005871002562344074
Validation loss = 0.006185907404869795
Validation loss = 0.007175361271947622
Validation loss = 0.00792289711534977
Validation loss = 0.006280347239226103
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006044009700417519
Validation loss = 0.009801249019801617
Validation loss = 0.008275141939520836
Validation loss = 0.009369069710373878
Validation loss = 0.007952069863677025
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010826453566551208
Validation loss = 0.007365541998296976
Validation loss = 0.005800257902592421
Validation loss = 0.007383634801954031
Validation loss = 0.006985357031226158
Validation loss = 0.0059908474795520306
Validation loss = 0.01146763190627098
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005202998407185078
Validation loss = 0.013571239076554775
Validation loss = 0.008144005201756954
Validation loss = 0.005479027051478624
Validation loss = 0.005626867525279522
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01060491893440485
Validation loss = 0.0044121937826275826
Validation loss = 0.00577510055154562
Validation loss = 0.0042207990773022175
Validation loss = 0.006953385192900896
Validation loss = 0.00991675816476345
Validation loss = 0.005715544801205397
Validation loss = 0.004913148004561663
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000921 |
| Iteration     | 15        |
| MaximumReturn | -0.000618 |
| MinimumReturn | -0.0017   |
| TotalSamples  | 28322     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0063392398878932
Validation loss = 0.01234022993594408
Validation loss = 0.008192128501832485
Validation loss = 0.005823817569762468
Validation loss = 0.005526574794203043
Validation loss = 0.005799000151455402
Validation loss = 0.007680402137339115
Validation loss = 0.007387844379991293
Validation loss = 0.00664210319519043
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009399307891726494
Validation loss = 0.006626870017498732
Validation loss = 0.01062257494777441
Validation loss = 0.006481043994426727
Validation loss = 0.005936870817095041
Validation loss = 0.006652276497334242
Validation loss = 0.005234495736658573
Validation loss = 0.00960919912904501
Validation loss = 0.005305440165102482
Validation loss = 0.006273834966123104
Validation loss = 0.006745993159711361
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010411699302494526
Validation loss = 0.010263790376484394
Validation loss = 0.004536558873951435
Validation loss = 0.004548397846519947
Validation loss = 0.006528866942971945
Validation loss = 0.00936073251068592
Validation loss = 0.005220568273216486
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011751650832593441
Validation loss = 0.006588913034647703
Validation loss = 0.007247137371450663
Validation loss = 0.00736598065122962
Validation loss = 0.005542427767068148
Validation loss = 0.005657628178596497
Validation loss = 0.00538496021181345
Validation loss = 0.005572749767452478
Validation loss = 0.008460724726319313
Validation loss = 0.010706870816648006
Validation loss = 0.005782866384834051
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004940663930028677
Validation loss = 0.009983265772461891
Validation loss = 0.004334347788244486
Validation loss = 0.003920839633792639
Validation loss = 0.0037037578877061605
Validation loss = 0.004497085697948933
Validation loss = 0.004616841673851013
Validation loss = 0.005294437985867262
Validation loss = 0.004435306414961815
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00113  |
| Iteration     | 16        |
| MaximumReturn | -0.000586 |
| MinimumReturn | -0.00258  |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006336239632219076
Validation loss = 0.0051923589780926704
Validation loss = 0.007951131090521812
Validation loss = 0.0049946922808885574
Validation loss = 0.006426898296922445
Validation loss = 0.007105355150997639
Validation loss = 0.00567859411239624
Validation loss = 0.006391999777406454
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009257805533707142
Validation loss = 0.005745891015976667
Validation loss = 0.00624797074124217
Validation loss = 0.005411237012594938
Validation loss = 0.01359178964048624
Validation loss = 0.006292147096246481
Validation loss = 0.0065866634249687195
Validation loss = 0.006885033566504717
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00619493005797267
Validation loss = 0.005162739660590887
Validation loss = 0.003986513242125511
Validation loss = 0.004563811235129833
Validation loss = 0.004286530427634716
Validation loss = 0.005164263769984245
Validation loss = 0.010722244158387184
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005138189997524023
Validation loss = 0.004603679291903973
Validation loss = 0.006896767299622297
Validation loss = 0.005631422623991966
Validation loss = 0.009151594713330269
Validation loss = 0.006596884690225124
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0050755711272358894
Validation loss = 0.005234733689576387
Validation loss = 0.006808348000049591
Validation loss = 0.005013265181332827
Validation loss = 0.007930886931717396
Validation loss = 0.0037644996773451567
Validation loss = 0.005044698249548674
Validation loss = 0.005915029440075159
Validation loss = 0.004388083703815937
Validation loss = 0.00400360394269228
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00306 |
| Iteration     | 17       |
| MaximumReturn | -0.00089 |
| MinimumReturn | -0.00862 |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01175210066139698
Validation loss = 0.005496120546013117
Validation loss = 0.006272702943533659
Validation loss = 0.005194014869630337
Validation loss = 0.004720252938568592
Validation loss = 0.0056051891297101974
Validation loss = 0.0049865152686834335
Validation loss = 0.004940013401210308
Validation loss = 0.004572147503495216
Validation loss = 0.004602139815688133
Validation loss = 0.006643922068178654
Validation loss = 0.0057539488188922405
Validation loss = 0.004633770789951086
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006864113733172417
Validation loss = 0.0045790583826601505
Validation loss = 0.007481182925403118
Validation loss = 0.005085272248834372
Validation loss = 0.004362340550869703
Validation loss = 0.004226014483720064
Validation loss = 0.00975809432566166
Validation loss = 0.005539515055716038
Validation loss = 0.010965297929942608
Validation loss = 0.004057425074279308
Validation loss = 0.008112107403576374
Validation loss = 0.007342142518609762
Validation loss = 0.010172516107559204
Validation loss = 0.00557755446061492
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011599811725318432
Validation loss = 0.004957133904099464
Validation loss = 0.004688889719545841
Validation loss = 0.004675060044974089
Validation loss = 0.005409240256994963
Validation loss = 0.004210013430565596
Validation loss = 0.004796029534190893
Validation loss = 0.004593239165842533
Validation loss = 0.00550884148105979
Validation loss = 0.004682436585426331
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009645574726164341
Validation loss = 0.009375762194395065
Validation loss = 0.004827843513339758
Validation loss = 0.004647378344088793
Validation loss = 0.005610212683677673
Validation loss = 0.00775260990485549
Validation loss = 0.004740728531032801
Validation loss = 0.0059218378737568855
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010017194785177708
Validation loss = 0.0036703564692288637
Validation loss = 0.007243076339364052
Validation loss = 0.004429745487868786
Validation loss = 0.0036027675960212946
Validation loss = 0.003401918103918433
Validation loss = 0.007671929430216551
Validation loss = 0.005905247293412685
Validation loss = 0.00569408992305398
Validation loss = 0.0043786861933767796
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00737  |
| Iteration     | 18        |
| MaximumReturn | -0.000637 |
| MinimumReturn | -0.0463   |
| TotalSamples  | 33320     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005737637635320425
Validation loss = 0.003416213905438781
Validation loss = 0.00710253743454814
Validation loss = 0.004449803382158279
Validation loss = 0.00536130229011178
Validation loss = 0.0036273528821766376
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009490309283137321
Validation loss = 0.006599700544029474
Validation loss = 0.004276335705071688
Validation loss = 0.004128280095756054
Validation loss = 0.004376423545181751
Validation loss = 0.005746049806475639
Validation loss = 0.0052403975278139114
Validation loss = 0.003948901779949665
Validation loss = 0.006012119352817535
Validation loss = 0.008221247233450413
Validation loss = 0.0042555141262710094
Validation loss = 0.0037190697621554136
Validation loss = 0.005087207071483135
Validation loss = 0.004884684458374977
Validation loss = 0.004165424965322018
Validation loss = 0.004981332458555698
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005449381656944752
Validation loss = 0.0037807030603289604
Validation loss = 0.0033997874706983566
Validation loss = 0.006175521295517683
Validation loss = 0.005758065730333328
Validation loss = 0.003507662331685424
Validation loss = 0.003605418372899294
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007166788447648287
Validation loss = 0.006682437378913164
Validation loss = 0.006759223062545061
Validation loss = 0.006155564449727535
Validation loss = 0.004705272149294615
Validation loss = 0.004658553283661604
Validation loss = 0.0038787107914686203
Validation loss = 0.006289289332926273
Validation loss = 0.006074447650462389
Validation loss = 0.006405699998140335
Validation loss = 0.006586045958101749
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010527445003390312
Validation loss = 0.005098607391119003
Validation loss = 0.004123517777770758
Validation loss = 0.0035937251523137093
Validation loss = 0.0039679137989878654
Validation loss = 0.003018048359081149
Validation loss = 0.005758943967521191
Validation loss = 0.003486256580799818
Validation loss = 0.003955277148634195
Validation loss = 0.004123965743929148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00466  |
| Iteration     | 19        |
| MaximumReturn | -0.000631 |
| MinimumReturn | -0.0172   |
| TotalSamples  | 34986     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01674548164010048
Validation loss = 0.005092811770737171
Validation loss = 0.006601483561098576
Validation loss = 0.00980690773576498
Validation loss = 0.005446400493383408
Validation loss = 0.00506415031850338
Validation loss = 0.0041611636988818645
Validation loss = 0.0038512032479047775
Validation loss = 0.003960440866649151
Validation loss = 0.0036223146598786116
Validation loss = 0.005551704205572605
Validation loss = 0.0054319011978805065
Validation loss = 0.004780318588018417
Validation loss = 0.003532824805006385
Validation loss = 0.005550657398998737
Validation loss = 0.004031701013445854
Validation loss = 0.00718151219189167
Validation loss = 0.005417628679424524
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010638686828315258
Validation loss = 0.004964978434145451
Validation loss = 0.0059021953493356705
Validation loss = 0.0037118757609277964
Validation loss = 0.004040831699967384
Validation loss = 0.004853853955864906
Validation loss = 0.004085881169885397
Validation loss = 0.0036597619764506817
Validation loss = 0.004647995810955763
Validation loss = 0.00375970802269876
Validation loss = 0.0070251584984362125
Validation loss = 0.004687040112912655
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005815478507429361
Validation loss = 0.007141986861824989
Validation loss = 0.006297827698290348
Validation loss = 0.007611900568008423
Validation loss = 0.006278627086430788
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006331300362944603
Validation loss = 0.004915629513561726
Validation loss = 0.005218740552663803
Validation loss = 0.003886113641783595
Validation loss = 0.005163085646927357
Validation loss = 0.0035168593749403954
Validation loss = 0.009227931499481201
Validation loss = 0.006973009556531906
Validation loss = 0.005174395628273487
Validation loss = 0.005515415221452713
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003977888263761997
Validation loss = 0.00394599512219429
Validation loss = 0.005824420601129532
Validation loss = 0.0029655382968485355
Validation loss = 0.005556290503591299
Validation loss = 0.003025320591405034
Validation loss = 0.0047944653779268265
Validation loss = 0.003980871289968491
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00394  |
| Iteration     | 20        |
| MaximumReturn | -0.000579 |
| MinimumReturn | -0.0584   |
| TotalSamples  | 36652     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005264774430543184
Validation loss = 0.00392004381865263
Validation loss = 0.004354290198534727
Validation loss = 0.0037878183647990227
Validation loss = 0.004172184504568577
Validation loss = 0.005438332911580801
Validation loss = 0.0069818878546357155
Validation loss = 0.0043028597719967365
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004997292999178171
Validation loss = 0.005205077119171619
Validation loss = 0.004076220560818911
Validation loss = 0.004526322241872549
Validation loss = 0.004332557786256075
Validation loss = 0.0035681554581969976
Validation loss = 0.004829169251024723
Validation loss = 0.008464056998491287
Validation loss = 0.005004409234970808
Validation loss = 0.004337398800998926
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007474611047655344
Validation loss = 0.004813420586287975
Validation loss = 0.00556949432939291
Validation loss = 0.004029055126011372
Validation loss = 0.003877804148942232
Validation loss = 0.0035916364286094904
Validation loss = 0.004143159836530685
Validation loss = 0.010389136150479317
Validation loss = 0.003946715034544468
Validation loss = 0.004786525387316942
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005464465823024511
Validation loss = 0.004260613117367029
Validation loss = 0.005843976512551308
Validation loss = 0.007018031552433968
Validation loss = 0.004667977336794138
Validation loss = 0.004361885599792004
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005490385927259922
Validation loss = 0.0032412672881036997
Validation loss = 0.003941220697015524
Validation loss = 0.004233387764543295
Validation loss = 0.0035674022510647774
Validation loss = 0.005992753431200981
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3.13     |
| Iteration     | 21        |
| MaximumReturn | -0.000567 |
| MinimumReturn | -56.3     |
| TotalSamples  | 38318     |
-----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010683424770832062
Validation loss = 0.005122708156704903
Validation loss = 0.003846136387437582
Validation loss = 0.004342460539191961
Validation loss = 0.003697045845910907
Validation loss = 0.00420420104637742
Validation loss = 0.004275348968803883
Validation loss = 0.0037627064157277346
Validation loss = 0.003534907940775156
Validation loss = 0.004955435171723366
Validation loss = 0.0035541453398764133
Validation loss = 0.003648148849606514
Validation loss = 0.0031913279090076685
Validation loss = 0.0037028538063168526
Validation loss = 0.003242631908506155
Validation loss = 0.005155595485121012
Validation loss = 0.0036186999641358852
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009536538273096085
Validation loss = 0.007152149453759193
Validation loss = 0.00465889647603035
Validation loss = 0.004247615113854408
Validation loss = 0.0032472170423716307
Validation loss = 0.00492903171107173
Validation loss = 0.0035938266664743423
Validation loss = 0.004650650080293417
Validation loss = 0.0037219971418380737
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012367412447929382
Validation loss = 0.004125700797885656
Validation loss = 0.004012269899249077
Validation loss = 0.004218894522637129
Validation loss = 0.003265032544732094
Validation loss = 0.0034343956504017115
Validation loss = 0.003464202396571636
Validation loss = 0.005166393239051104
Validation loss = 0.00316836079582572
Validation loss = 0.003676500404253602
Validation loss = 0.0038520265370607376
Validation loss = 0.004028743132948875
Validation loss = 0.006246984004974365
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009135917760431767
Validation loss = 0.005179811734706163
Validation loss = 0.004560492467135191
Validation loss = 0.0039621819742023945
Validation loss = 0.004172914661467075
Validation loss = 0.003673351602628827
Validation loss = 0.005753220058977604
Validation loss = 0.007507746107876301
Validation loss = 0.0037475537974387407
Validation loss = 0.005063345190137625
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012181511148810387
Validation loss = 0.0036781402304768562
Validation loss = 0.003965000621974468
Validation loss = 0.003543505910784006
Validation loss = 0.005091292783617973
Validation loss = 0.0033458436373621225
Validation loss = 0.006353145930916071
Validation loss = 0.0030484332237392664
Validation loss = 0.002924270462244749
Validation loss = 0.0029948630835860968
Validation loss = 0.0044448343105614185
Validation loss = 0.0034504190552979708
Validation loss = 0.004156719893217087
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.63    |
| Iteration     | 22       |
| MaximumReturn | -0.00048 |
| MinimumReturn | -56.5    |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005008204840123653
Validation loss = 0.005597407463937998
Validation loss = 0.006719639059156179
Validation loss = 0.008172014728188515
Validation loss = 0.005773546174168587
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006593909114599228
Validation loss = 0.005294555798172951
Validation loss = 0.006134620402008295
Validation loss = 0.004137976095080376
Validation loss = 0.005189447198063135
Validation loss = 0.003774794517084956
Validation loss = 0.004752343986183405
Validation loss = 0.004243156407028437
Validation loss = 0.002749940613284707
Validation loss = 0.0033296942710876465
Validation loss = 0.005661077797412872
Validation loss = 0.003195512341335416
Validation loss = 0.0021450412459671497
Validation loss = 0.0029506762512028217
Validation loss = 0.0033935036044567823
Validation loss = 0.002992111025378108
Validation loss = 0.003364433301612735
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004808771423995495
Validation loss = 0.004158537834882736
Validation loss = 0.004815160296857357
Validation loss = 0.005032678134739399
Validation loss = 0.005187415983527899
Validation loss = 0.009593700058758259
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0056578912772238255
Validation loss = 0.005784133914858103
Validation loss = 0.008351665921509266
Validation loss = 0.004721059929579496
Validation loss = 0.0063067092560231686
Validation loss = 0.007229191716760397
Validation loss = 0.009124120697379112
Validation loss = 0.0060355388559401035
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005956693086773157
Validation loss = 0.005845417734235525
Validation loss = 0.0038830016274005175
Validation loss = 0.004223008640110493
Validation loss = 0.004272396210581064
Validation loss = 0.0027492812369018793
Validation loss = 0.003442258108407259
Validation loss = 0.002957625314593315
Validation loss = 0.0027477212715893984
Validation loss = 0.00494539737701416
Validation loss = 0.002722762059420347
Validation loss = 0.002843245165422559
Validation loss = 0.003310305532068014
Validation loss = 0.004897586070001125
Validation loss = 0.00529183354228735
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -7.18     |
| Iteration     | 23        |
| MaximumReturn | -0.000467 |
| MinimumReturn | -70       |
| TotalSamples  | 41650     |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003117495682090521
Validation loss = 0.005785660818219185
Validation loss = 0.004853411111980677
Validation loss = 0.00421240646392107
Validation loss = 0.0027352222241461277
Validation loss = 0.003427942516282201
Validation loss = 0.00279433885589242
Validation loss = 0.005458429455757141
Validation loss = 0.003624582663178444
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003064320655539632
Validation loss = 0.0034146804828196764
Validation loss = 0.0030327080748975277
Validation loss = 0.0035045098047703505
Validation loss = 0.0022690552286803722
Validation loss = 0.0030592214316129684
Validation loss = 0.0025846934877336025
Validation loss = 0.003909014631062746
Validation loss = 0.004388578236103058
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007257400546222925
Validation loss = 0.0030562286265194416
Validation loss = 0.0030444597359746695
Validation loss = 0.004001804161816835
Validation loss = 0.006188032682985067
Validation loss = 0.0033267096150666475
Validation loss = 0.00511727761477232
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004170368425548077
Validation loss = 0.004249442368745804
Validation loss = 0.004004247952252626
Validation loss = 0.0037948540411889553
Validation loss = 0.0032995971851050854
Validation loss = 0.005194859579205513
Validation loss = 0.0033407907467335463
Validation loss = 0.003692369908094406
Validation loss = 0.0070509701035916805
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004488359205424786
Validation loss = 0.002732102060690522
Validation loss = 0.0052663288079202175
Validation loss = 0.00325612910091877
Validation loss = 0.0035916194319725037
Validation loss = 0.0025309650227427483
Validation loss = 0.0033419146202504635
Validation loss = 0.0019452102715149522
Validation loss = 0.0022282670252025127
Validation loss = 0.0023743545170873404
Validation loss = 0.0022821566089987755
Validation loss = 0.0017099777469411492
Validation loss = 0.001955694053322077
Validation loss = 0.004170765168964863
Validation loss = 0.00402072723954916
Validation loss = 0.003955527674406767
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00724  |
| Iteration     | 24        |
| MaximumReturn | -0.000605 |
| MinimumReturn | -0.0758   |
| TotalSamples  | 43316     |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0031455163843929768
Validation loss = 0.0035658013075590134
Validation loss = 0.005073274951428175
Validation loss = 0.0044123209081590176
Validation loss = 0.004178255330771208
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007943962700664997
Validation loss = 0.00971651915460825
Validation loss = 0.0023499985691159964
Validation loss = 0.002992864465340972
Validation loss = 0.002721822587773204
Validation loss = 0.002741163596510887
Validation loss = 0.005351852159947157
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004093510564416647
Validation loss = 0.0029830303974449635
Validation loss = 0.004903721623122692
Validation loss = 0.005770782008767128
Validation loss = 0.004472081083804369
Validation loss = 0.004379434511065483
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004227494355291128
Validation loss = 0.00712191266939044
Validation loss = 0.008115101605653763
Validation loss = 0.004554628860205412
Validation loss = 0.009830295108258724
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004870773293077946
Validation loss = 0.003494973061606288
Validation loss = 0.004151816014200449
Validation loss = 0.007179014850407839
Validation loss = 0.0029761511832475662
Validation loss = 0.0025956667959690094
Validation loss = 0.0029956784565001726
Validation loss = 0.0027549085207283497
Validation loss = 0.002968390239402652
Validation loss = 0.0036007463932037354
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -7.18     |
| Iteration     | 25        |
| MaximumReturn | -0.000593 |
| MinimumReturn | -89.3     |
| TotalSamples  | 44982     |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0037691930774599314
Validation loss = 0.0027393633499741554
Validation loss = 0.002673192648217082
Validation loss = 0.0030951879452914
Validation loss = 0.002973168157041073
Validation loss = 0.004873405676335096
Validation loss = 0.0036631994880735874
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004285408649593592
Validation loss = 0.0032565295696258545
Validation loss = 0.004679704550653696
Validation loss = 0.0026526113506406546
Validation loss = 0.0021764812991023064
Validation loss = 0.0019855222199112177
Validation loss = 0.0024111352395266294
Validation loss = 0.00185893545858562
Validation loss = 0.002167043974623084
Validation loss = 0.005481349304318428
Validation loss = 0.0021464198362082243
Validation loss = 0.00602066470310092
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003000501776114106
Validation loss = 0.002879766281694174
Validation loss = 0.002815237734466791
Validation loss = 0.005051297135651112
Validation loss = 0.0029250311199575663
Validation loss = 0.003481443040072918
Validation loss = 0.002337320242077112
Validation loss = 0.0038660429418087006
Validation loss = 0.0026703060138970613
Validation loss = 0.0028415368869900703
Validation loss = 0.003360881470143795
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006565229967236519
Validation loss = 0.005668364930897951
Validation loss = 0.003650328377261758
Validation loss = 0.004778089467436075
Validation loss = 0.004541813395917416
Validation loss = 0.003314081346616149
Validation loss = 0.005764774978160858
Validation loss = 0.0032072439789772034
Validation loss = 0.0034018629230558872
Validation loss = 0.0035458789207041264
Validation loss = 0.0032660558354109526
Validation loss = 0.003688638098537922
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015752993524074554
Validation loss = 0.0026903601828962564
Validation loss = 0.002556164050474763
Validation loss = 0.0026488183066248894
Validation loss = 0.003498617559671402
Validation loss = 0.003014487912878394
Validation loss = 0.0018897771369665861
Validation loss = 0.0014612114755436778
Validation loss = 0.0023033428005874157
Validation loss = 0.0026029276195913553
Validation loss = 0.002380070509389043
Validation loss = 0.003175549441948533
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -4.08     |
| Iteration     | 26        |
| MaximumReturn | -0.000732 |
| MinimumReturn | -100      |
| TotalSamples  | 46648     |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002976760035380721
Validation loss = 0.0038029709830880165
Validation loss = 0.0021508221980184317
Validation loss = 0.0028981128707528114
Validation loss = 0.004165815655142069
Validation loss = 0.0021847221069037914
Validation loss = 0.004718312993645668
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0033121209125965834
Validation loss = 0.0020235187839716673
Validation loss = 0.0019041160121560097
Validation loss = 0.0029818315524607897
Validation loss = 0.0014221464516595006
Validation loss = 0.00277245813049376
Validation loss = 0.0015632116701453924
Validation loss = 0.0029224108438938856
Validation loss = 0.003774620359763503
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003968985751271248
Validation loss = 0.005679736379534006
Validation loss = 0.002294543432071805
Validation loss = 0.00548503128811717
Validation loss = 0.004036181140691042
Validation loss = 0.001975183142349124
Validation loss = 0.0019850071985274553
Validation loss = 0.003635675646364689
Validation loss = 0.002562675392255187
Validation loss = 0.0032935456838458776
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003005040343850851
Validation loss = 0.003199142636731267
Validation loss = 0.003074231091886759
Validation loss = 0.0030496043618768454
Validation loss = 0.004073909483850002
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018433389486745
Validation loss = 0.004131736699491739
Validation loss = 0.0020586764439940453
Validation loss = 0.001996540930122137
Validation loss = 0.0024002809077501297
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -6.48     |
| Iteration     | 27        |
| MaximumReturn | -0.000659 |
| MinimumReturn | -76       |
| TotalSamples  | 48314     |
-----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004684110637754202
Validation loss = 0.0035891979932785034
Validation loss = 0.0033081434667110443
Validation loss = 0.002671436406672001
Validation loss = 0.0031296953093260527
Validation loss = 0.001896198489703238
Validation loss = 0.0065633938647806644
Validation loss = 0.0025169437285512686
Validation loss = 0.0017637329874560237
Validation loss = 0.0026553908828645945
Validation loss = 0.004087555687874556
Validation loss = 0.002254315884783864
Validation loss = 0.002561688655987382
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0033996019046753645
Validation loss = 0.002798292553052306
Validation loss = 0.001576205133460462
Validation loss = 0.002831624820828438
Validation loss = 0.0019292518263682723
Validation loss = 0.0017892634496092796
Validation loss = 0.0019261818379163742
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017438181675970554
Validation loss = 0.004084357991814613
Validation loss = 0.001863206154666841
Validation loss = 0.002043030923232436
Validation loss = 0.003008980304002762
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003969361074268818
Validation loss = 0.0030481370631605387
Validation loss = 0.002895328216254711
Validation loss = 0.004397344775497913
Validation loss = 0.002301260596141219
Validation loss = 0.0020949270110577345
Validation loss = 0.004364462569355965
Validation loss = 0.0026773835998028517
Validation loss = 0.0031616520136594772
Validation loss = 0.0031491455156356096
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017913068877533078
Validation loss = 0.007044648285955191
Validation loss = 0.0017212539678439498
Validation loss = 0.0017645087791606784
Validation loss = 0.001871647429652512
Validation loss = 0.001534140552394092
Validation loss = 0.001477123354561627
Validation loss = 0.002011989476159215
Validation loss = 0.0018704201793298125
Validation loss = 0.0021749045699834824
Validation loss = 0.0023520563263446093
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -14.6     |
| Iteration     | 28        |
| MaximumReturn | -0.000805 |
| MinimumReturn | -104      |
| TotalSamples  | 49980     |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003056582994759083
Validation loss = 0.0021948369685560465
Validation loss = 0.004881191998720169
Validation loss = 0.0018049657810479403
Validation loss = 0.0035256289411336184
Validation loss = 0.002056714380159974
Validation loss = 0.002831946359947324
Validation loss = 0.001876313122920692
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027194286230951548
Validation loss = 0.002033177763223648
Validation loss = 0.004826385993510485
Validation loss = 0.0016391410026699305
Validation loss = 0.0021959885489195585
Validation loss = 0.0038673027884215117
Validation loss = 0.0033039532136172056
Validation loss = 0.0027984129264950752
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027291106525808573
Validation loss = 0.002245992189273238
Validation loss = 0.0025264376308768988
Validation loss = 0.002294083358719945
Validation loss = 0.005289175547659397
Validation loss = 0.0022068857215344906
Validation loss = 0.002104006940498948
Validation loss = 0.004342803731560707
Validation loss = 0.00237005646340549
Validation loss = 0.002687778789550066
Validation loss = 0.0015002801083028316
Validation loss = 0.002752533182501793
Validation loss = 0.001847987761721015
Validation loss = 0.0023381165228784084
Validation loss = 0.0017526247538626194
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006496092304587364
Validation loss = 0.0035950893070548773
Validation loss = 0.0030360338278114796
Validation loss = 0.006738064344972372
Validation loss = 0.0038157061208039522
Validation loss = 0.01189197227358818
Validation loss = 0.003719547064974904
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005946050398051739
Validation loss = 0.0022845184430480003
Validation loss = 0.0016498420154675841
Validation loss = 0.0020461559761315584
Validation loss = 0.0017362813232466578
Validation loss = 0.0015379175310954452
Validation loss = 0.0018716939957812428
Validation loss = 0.0016708679031580687
Validation loss = 0.001940440502949059
Validation loss = 0.0018815494840964675
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -6.22     |
| Iteration     | 29        |
| MaximumReturn | -0.000573 |
| MinimumReturn | -65.9     |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003163809422403574
Validation loss = 0.002191273495554924
Validation loss = 0.00176207663025707
Validation loss = 0.0015916744014248252
Validation loss = 0.0015077904099598527
Validation loss = 0.003185073845088482
Validation loss = 0.0027811676263809204
Validation loss = 0.002605625195428729
Validation loss = 0.0014096933882683516
Validation loss = 0.0017224019393324852
Validation loss = 0.0034769373014569283
Validation loss = 0.0028357284609228373
Validation loss = 0.00196781731210649
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003278222167864442
Validation loss = 0.003344002878293395
Validation loss = 0.0025683606509119272
Validation loss = 0.002292927820235491
Validation loss = 0.0021953233517706394
Validation loss = 0.0022683092392981052
Validation loss = 0.0026783617213368416
Validation loss = 0.002033994998782873
Validation loss = 0.0030015618540346622
Validation loss = 0.002692369744181633
Validation loss = 0.0016173491021618247
Validation loss = 0.002052675001323223
Validation loss = 0.0038310957606881857
Validation loss = 0.0017009035218507051
Validation loss = 0.0025609731674194336
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0020493757911026478
Validation loss = 0.004462321754544973
Validation loss = 0.0024896005634218454
Validation loss = 0.0017989075277000666
Validation loss = 0.0033822068944573402
Validation loss = 0.0024164982605725527
Validation loss = 0.005056628957390785
Validation loss = 0.00207014218904078
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004445699509233236
Validation loss = 0.002426921622827649
Validation loss = 0.002805977128446102
Validation loss = 0.0025681531988084316
Validation loss = 0.001976629951968789
Validation loss = 0.002110402099788189
Validation loss = 0.0031831555534154177
Validation loss = 0.0038164483848959208
Validation loss = 0.007585528306663036
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002434278605505824
Validation loss = 0.0019002929329872131
Validation loss = 0.001814858871512115
Validation loss = 0.002804259303957224
Validation loss = 0.0016267836326733232
Validation loss = 0.0017106294399127364
Validation loss = 0.003774631069973111
Validation loss = 0.0018499159486964345
Validation loss = 0.0014503259444609284
Validation loss = 0.0061354124918580055
Validation loss = 0.0017491753678768873
Validation loss = 0.0017195924883708358
Validation loss = 0.0014020232483744621
Validation loss = 0.0018446976318955421
Validation loss = 0.0030713367741554976
Validation loss = 0.0016495910240337253
Validation loss = 0.0017174211097881198
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0616   |
| Iteration     | 30        |
| MaximumReturn | -0.000564 |
| MinimumReturn | -1.39     |
| TotalSamples  | 53312     |
-----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016381402965635061
Validation loss = 0.0037365430034697056
Validation loss = 0.0026579354889690876
Validation loss = 0.0016701811691746116
Validation loss = 0.0018844343721866608
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0028396653942763805
Validation loss = 0.0024699256755411625
Validation loss = 0.0011817982885986567
Validation loss = 0.002341513754799962
Validation loss = 0.0017180230934172869
Validation loss = 0.002853189827874303
Validation loss = 0.0017026102868840098
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021795351058244705
Validation loss = 0.0027841306291520596
Validation loss = 0.0017588771879673004
Validation loss = 0.0028143557719886303
Validation loss = 0.00542447017505765
Validation loss = 0.0036653855349868536
Validation loss = 0.0022708133328706026
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0027411074843257666
Validation loss = 0.004220603033900261
Validation loss = 0.0032696272246539593
Validation loss = 0.004412779584527016
Validation loss = 0.002739964285865426
Validation loss = 0.0019285138696432114
Validation loss = 0.0028634134214371443
Validation loss = 0.0018597381422296166
Validation loss = 0.0025238334201276302
Validation loss = 0.003289482556283474
Validation loss = 0.002726443111896515
Validation loss = 0.0025896653532981873
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023038077633827925
Validation loss = 0.0026639909483492374
Validation loss = 0.003920733463019133
Validation loss = 0.0015754182823002338
Validation loss = 0.0016256821108981967
Validation loss = 0.002419652184471488
Validation loss = 0.0014321967028081417
Validation loss = 0.0021815230138599873
Validation loss = 0.005445616785436869
Validation loss = 0.0015829310286790133
Validation loss = 0.0013522225199267268
Validation loss = 0.0015656695468351245
Validation loss = 0.0016558542847633362
Validation loss = 0.0019467503298074007
Validation loss = 0.001458975370042026
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0494  |
| Iteration     | 31       |
| MaximumReturn | -0.00081 |
| MinimumReturn | -0.748   |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00333478394895792
Validation loss = 0.003095233580097556
Validation loss = 0.001584130572155118
Validation loss = 0.0015162262134253979
Validation loss = 0.0022191880270838737
Validation loss = 0.0016318915877491236
Validation loss = 0.008548827841877937
Validation loss = 0.0031883844640105963
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0019683223217725754
Validation loss = 0.007846762426197529
Validation loss = 0.00519604654982686
Validation loss = 0.0031335270032286644
Validation loss = 0.002212885534390807
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006618643645197153
Validation loss = 0.0021495840046554804
Validation loss = 0.001707742572762072
Validation loss = 0.001880328985862434
Validation loss = 0.0023369016125798225
Validation loss = 0.0026334261056035757
Validation loss = 0.00442159129306674
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0026574295479804277
Validation loss = 0.0043830908834934235
Validation loss = 0.0024480849970132113
Validation loss = 0.002627367852255702
Validation loss = 0.0038385880179703236
Validation loss = 0.00212732027284801
Validation loss = 0.0026827752590179443
Validation loss = 0.002705151215195656
Validation loss = 0.0018593951826915145
Validation loss = 0.008679232560098171
Validation loss = 0.004313656594604254
Validation loss = 0.0025242490228265524
Validation loss = 0.0018967203795909882
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023916219361126423
Validation loss = 0.0014322687638923526
Validation loss = 0.0014365734532475471
Validation loss = 0.003057941561564803
Validation loss = 0.0017104876460507512
Validation loss = 0.0016022489871829748
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.08     |
| Iteration     | 32        |
| MaximumReturn | -0.000584 |
| MinimumReturn | -26.8     |
| TotalSamples  | 56644     |
-----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025741325225681067
Validation loss = 0.0016373565886169672
Validation loss = 0.00391624728217721
Validation loss = 0.0013291919603943825
Validation loss = 0.001852616434916854
Validation loss = 0.0016058896435424685
Validation loss = 0.0033651008270680904
Validation loss = 0.0014466751599684358
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014329943805932999
Validation loss = 0.001388518256135285
Validation loss = 0.002932719187811017
Validation loss = 0.002327646827325225
Validation loss = 0.0020315037108957767
Validation loss = 0.0015473986277356744
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001960944151505828
Validation loss = 0.0030085935723036528
Validation loss = 0.0023855448234826326
Validation loss = 0.002346357796341181
Validation loss = 0.0030258246697485447
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003323350567370653
Validation loss = 0.002578387036919594
Validation loss = 0.0017372742295265198
Validation loss = 0.0024929605424404144
Validation loss = 0.005620884709060192
Validation loss = 0.002148490399122238
Validation loss = 0.0018036030232906342
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0034287311136722565
Validation loss = 0.002162115415558219
Validation loss = 0.0017052926123142242
Validation loss = 0.0018686685943976045
Validation loss = 0.0015389604959636927
Validation loss = 0.002588690957054496
Validation loss = 0.0015353175112977624
Validation loss = 0.0014436629135161638
Validation loss = 0.0016479555051773787
Validation loss = 0.011897342279553413
Validation loss = 0.0019506181124597788
Validation loss = 0.0013570071896538138
Validation loss = 0.0014579463750123978
Validation loss = 0.003365445649251342
Validation loss = 0.0015603682259097695
Validation loss = 0.0029242534656077623
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00347  |
| Iteration     | 33        |
| MaximumReturn | -0.000663 |
| MinimumReturn | -0.0127   |
| TotalSamples  | 58310     |
-----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008058199658989906
Validation loss = 0.004782914184033871
Validation loss = 0.0025086551904678345
Validation loss = 0.0032369389664381742
Validation loss = 0.0017655410338193178
Validation loss = 0.002615782432258129
Validation loss = 0.0015319501981139183
Validation loss = 0.0017695175483822823
Validation loss = 0.001521499129012227
Validation loss = 0.001217346522025764
Validation loss = 0.0019089187262579799
Validation loss = 0.002362753264605999
Validation loss = 0.0026166813913732767
Validation loss = 0.0019238272216171026
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015044782776385546
Validation loss = 0.002726527862250805
Validation loss = 0.0013882473576813936
Validation loss = 0.0013752019731327891
Validation loss = 0.0031003488693386316
Validation loss = 0.0014415971236303449
Validation loss = 0.0028949975967407227
Validation loss = 0.0022772729862481356
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0033569885417819023
Validation loss = 0.0017050605965778232
Validation loss = 0.0032520003151148558
Validation loss = 0.001972992205992341
Validation loss = 0.001955565297976136
Validation loss = 0.002642942126840353
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002461344236508012
Validation loss = 0.0026045485865324736
Validation loss = 0.0026704869233071804
Validation loss = 0.0021727338898926973
Validation loss = 0.01535576581954956
Validation loss = 0.001691604615189135
Validation loss = 0.0017829535063356161
Validation loss = 0.002397263888269663
Validation loss = 0.0024370369501411915
Validation loss = 0.002783216768875718
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002632980002090335
Validation loss = 0.002527546603232622
Validation loss = 0.0016797535354271531
Validation loss = 0.0015141337644308805
Validation loss = 0.0017889784649014473
Validation loss = 0.0017966910963878036
Validation loss = 0.004255138803273439
Validation loss = 0.0017787936376407743
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00373  |
| Iteration     | 34        |
| MaximumReturn | -0.000552 |
| MinimumReturn | -0.0221   |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025483816862106323
Validation loss = 0.002875006990507245
Validation loss = 0.002245090901851654
Validation loss = 0.0033578125294297934
Validation loss = 0.004591457080096006
Validation loss = 0.0032733518164604902
Validation loss = 0.0011451378231868148
Validation loss = 0.001456116558983922
Validation loss = 0.002954151714220643
Validation loss = 0.0019438235322013497
Validation loss = 0.0013300548307597637
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0021132640540599823
Validation loss = 0.0020212652161717415
Validation loss = 0.0016177447978407145
Validation loss = 0.0020479545928537846
Validation loss = 0.002415083348751068
Validation loss = 0.003747724462300539
Validation loss = 0.0017690359381958842
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0023184900637716055
Validation loss = 0.001433412660844624
Validation loss = 0.0032383764628320932
Validation loss = 0.004696123767644167
Validation loss = 0.003207848174497485
Validation loss = 0.0022974174935370684
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0028818070422858
Validation loss = 0.0017467724392190576
Validation loss = 0.003811323083937168
Validation loss = 0.0013413629494607449
Validation loss = 0.0034752467181533575
Validation loss = 0.0025475930888205767
Validation loss = 0.0024455387610942125
Validation loss = 0.001956288469955325
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002324524335563183
Validation loss = 0.0024795429781079292
Validation loss = 0.0011934045469388366
Validation loss = 0.001650314312428236
Validation loss = 0.003096151165664196
Validation loss = 0.0012996620498597622
Validation loss = 0.0015425420133396983
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.053   |
| Iteration     | 35       |
| MaximumReturn | -0.00112 |
| MinimumReturn | -0.112   |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001352344756014645
Validation loss = 0.00237085809931159
Validation loss = 0.0031882072798907757
Validation loss = 0.0014647141797468066
Validation loss = 0.0017589250346645713
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014517647214233875
Validation loss = 0.0035031994339078665
Validation loss = 0.0013546455884352326
Validation loss = 0.005557527299970388
Validation loss = 0.002065496752038598
Validation loss = 0.002179759321734309
Validation loss = 0.005050517152994871
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021103504113852978
Validation loss = 0.0034759631380438805
Validation loss = 0.0018089085351675749
Validation loss = 0.0017938610399141908
Validation loss = 0.00232789502479136
Validation loss = 0.0014329315163195133
Validation loss = 0.0022036246955394745
Validation loss = 0.002723774639889598
Validation loss = 0.0038943577092140913
Validation loss = 0.0018277913331985474
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015456785913556814
Validation loss = 0.0018054395914077759
Validation loss = 0.0024201348423957825
Validation loss = 0.001438471837900579
Validation loss = 0.0020922161638736725
Validation loss = 0.008847606368362904
Validation loss = 0.0014075806830078363
Validation loss = 0.001844048616476357
Validation loss = 0.0023933211341500282
Validation loss = 0.0019595229532569647
Validation loss = 0.0015002215513959527
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013877570163458586
Validation loss = 0.00280921277590096
Validation loss = 0.0026313678827136755
Validation loss = 0.002965959720313549
Validation loss = 0.002646381501108408
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0611  |
| Iteration     | 36       |
| MaximumReturn | -0.00656 |
| MinimumReturn | -0.0991  |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0028280620463192463
Validation loss = 0.0016204045386984944
Validation loss = 0.0011294613359495997
Validation loss = 0.0017806285759434104
Validation loss = 0.0010939721250906587
Validation loss = 0.0017304867506027222
Validation loss = 0.0018776771612465382
Validation loss = 0.003701935289427638
Validation loss = 0.003935190383344889
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015109319938346744
Validation loss = 0.0011209276271983981
Validation loss = 0.0016696996754035354
Validation loss = 0.0020120281260460615
Validation loss = 0.0016791692469269037
Validation loss = 0.0010564984986558557
Validation loss = 0.0033910253550857306
Validation loss = 0.0018588081002235413
Validation loss = 0.0013403131160885096
Validation loss = 0.0011688722297549248
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001968260621652007
Validation loss = 0.002394664566963911
Validation loss = 0.0018484587781131268
Validation loss = 0.0016826058272272348
Validation loss = 0.002333254087716341
Validation loss = 0.003654545173048973
Validation loss = 0.0017942828126251698
Validation loss = 0.005668965633958578
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018402147106826305
Validation loss = 0.0014009678270667791
Validation loss = 0.0020463468972593546
Validation loss = 0.001775725046172738
Validation loss = 0.0019525893731042743
Validation loss = 0.001690525095909834
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003695145947858691
Validation loss = 0.0026067926082760096
Validation loss = 0.001957594882696867
Validation loss = 0.002064355881884694
Validation loss = 0.001966926734894514
Validation loss = 0.0019894514698535204
Validation loss = 0.0022929629776626825
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.044    |
| Iteration     | 37        |
| MaximumReturn | -0.000753 |
| MinimumReturn | -0.325    |
| TotalSamples  | 64974     |
-----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010076112812384963
Validation loss = 0.0013636918738484383
Validation loss = 0.0011416538618505
Validation loss = 0.0017253253608942032
Validation loss = 0.002408385742455721
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012036896077916026
Validation loss = 0.0014524550642818213
Validation loss = 0.0022107851691544056
Validation loss = 0.004684697836637497
Validation loss = 0.0013491655699908733
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027496940456330776
Validation loss = 0.0019783740863204002
Validation loss = 0.002550003118813038
Validation loss = 0.002400606870651245
Validation loss = 0.002162378514185548
Validation loss = 0.0014047191943973303
Validation loss = 0.0014505399158224463
Validation loss = 0.00333276204764843
Validation loss = 0.0012215087190270424
Validation loss = 0.001770607428625226
Validation loss = 0.0016247393796220422
Validation loss = 0.0019389772787690163
Validation loss = 0.002442761790007353
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004073458723723888
Validation loss = 0.0012633561855182052
Validation loss = 0.0018457442056387663
Validation loss = 0.0030109286308288574
Validation loss = 0.0019267371390014887
Validation loss = 0.0014116448583081365
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019511760910972953
Validation loss = 0.005360259674489498
Validation loss = 0.0018976596184074879
Validation loss = 0.0030784045811742544
Validation loss = 0.0025101888459175825
Validation loss = 0.0014691839460283518
Validation loss = 0.0013663982972502708
Validation loss = 0.0015048468485474586
Validation loss = 0.004092704970389605
Validation loss = 0.002904335269704461
Validation loss = 0.0010181000689044595
Validation loss = 0.001132617937400937
Validation loss = 0.00174031185451895
Validation loss = 0.0012301185633987188
Validation loss = 0.0019809785299003124
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00641 |
| Iteration     | 38       |
| MaximumReturn | -0.00089 |
| MinimumReturn | -0.0398  |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002466860692948103
Validation loss = 0.001316479523666203
Validation loss = 0.004092606250196695
Validation loss = 0.0017514456994831562
Validation loss = 0.002959694480523467
Validation loss = 0.0021264851093292236
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001699084765277803
Validation loss = 0.002186847385019064
Validation loss = 0.0013050073757767677
Validation loss = 0.0012818236136808991
Validation loss = 0.0040261042304337025
Validation loss = 0.0013293051160871983
Validation loss = 0.003851761342957616
Validation loss = 0.002188638551160693
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001735090627335012
Validation loss = 0.0017919073579832911
Validation loss = 0.002470962703227997
Validation loss = 0.0011781950015574694
Validation loss = 0.0014396087499335408
Validation loss = 0.001640158356167376
Validation loss = 0.001952553167939186
Validation loss = 0.0011597307166084647
Validation loss = 0.0020980031695216894
Validation loss = 0.002476290799677372
Validation loss = 0.0012030256912112236
Validation loss = 0.0038946871645748615
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014866464771330357
Validation loss = 0.004665527958422899
Validation loss = 0.002490347484126687
Validation loss = 0.0013486170209944248
Validation loss = 0.0035763715859502554
Validation loss = 0.001625517732463777
Validation loss = 0.0016644308343529701
Validation loss = 0.0010813698172569275
Validation loss = 0.003405320458114147
Validation loss = 0.0014269197126850486
Validation loss = 0.0025839051231741905
Validation loss = 0.0019992971792817116
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005763595923781395
Validation loss = 0.0012708889553323388
Validation loss = 0.00132188037969172
Validation loss = 0.001408302690833807
Validation loss = 0.0012965224450454116
Validation loss = 0.0011167441261932254
Validation loss = 0.0015633607981726527
Validation loss = 0.0009304308914579451
Validation loss = 0.0014426131965592504
Validation loss = 0.001059040310792625
Validation loss = 0.002239006804302335
Validation loss = 0.002198929898440838
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00233  |
| Iteration     | 39        |
| MaximumReturn | -0.000604 |
| MinimumReturn | -0.0388   |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015961967874318361
Validation loss = 0.001486736349761486
Validation loss = 0.0011023933766409755
Validation loss = 0.0010361437452957034
Validation loss = 0.001257871394045651
Validation loss = 0.001975201303139329
Validation loss = 0.001383697148412466
Validation loss = 0.0018454993842169642
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012589830439537764
Validation loss = 0.002119834301993251
Validation loss = 0.0033183926716446877
Validation loss = 0.001906615449115634
Validation loss = 0.007131499238312244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022435777354985476
Validation loss = 0.0022964761592447758
Validation loss = 0.001304450212046504
Validation loss = 0.0020611751824617386
Validation loss = 0.0016723425360396504
Validation loss = 0.0017851477023214102
Validation loss = 0.0011406107805669308
Validation loss = 0.0015012832591310143
Validation loss = 0.0016859447350725532
Validation loss = 0.0013333697570487857
Validation loss = 0.001625619363039732
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021643955260515213
Validation loss = 0.0023993756622076035
Validation loss = 0.0015511319506913424
Validation loss = 0.001995156751945615
Validation loss = 0.001402784837409854
Validation loss = 0.0017241149907931685
Validation loss = 0.002764188451692462
Validation loss = 0.0014902850380167365
Validation loss = 0.0014086870942264795
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0029924847185611725
Validation loss = 0.0017485758289694786
Validation loss = 0.0010486043756827712
Validation loss = 0.0013783657923340797
Validation loss = 0.0013694638619199395
Validation loss = 0.0009487565839663148
Validation loss = 0.001725246082060039
Validation loss = 0.003883851459249854
Validation loss = 0.0013436883455142379
Validation loss = 0.0018593557178974152
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0047   |
| Iteration     | 40        |
| MaximumReturn | -0.000647 |
| MinimumReturn | -0.0498   |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014200442237779498
Validation loss = 0.001345828757621348
Validation loss = 0.004601757042109966
Validation loss = 0.004387702327221632
Validation loss = 0.0017281811451539397
Validation loss = 0.00488478085026145
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015869963681325316
Validation loss = 0.006813953630626202
Validation loss = 0.0022369991056621075
Validation loss = 0.0018174083670601249
Validation loss = 0.002449622843414545
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022573198657482862
Validation loss = 0.0014392260927706957
Validation loss = 0.003073287196457386
Validation loss = 0.0024199995677918196
Validation loss = 0.0012755448697134852
Validation loss = 0.001060043228790164
Validation loss = 0.001583523117005825
Validation loss = 0.0027121491730213165
Validation loss = 0.0014506549341604114
Validation loss = 0.0018848100444301963
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0020398865453898907
Validation loss = 0.0015365941217169166
Validation loss = 0.00291193975135684
Validation loss = 0.0015890942886471748
Validation loss = 0.001982552232220769
Validation loss = 0.00486616138368845
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003257920267060399
Validation loss = 0.0014858711510896683
Validation loss = 0.0018632984720170498
Validation loss = 0.001211819238960743
Validation loss = 0.0013935472816228867
Validation loss = 0.0011496208608150482
Validation loss = 0.007039845921099186
Validation loss = 0.0026187452021986246
Validation loss = 0.0013457697350531816
Validation loss = 0.0012934241676703095
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00178  |
| Iteration     | 41        |
| MaximumReturn | -0.000735 |
| MinimumReturn | -0.0225   |
| TotalSamples  | 71638     |
-----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025543549563735723
Validation loss = 0.00135201180819422
Validation loss = 0.0009346221340820193
Validation loss = 0.0046906243078410625
Validation loss = 0.002749354811385274
Validation loss = 0.0010641636326909065
Validation loss = 0.0016182934632524848
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016763879684731364
Validation loss = 0.0012790849432349205
Validation loss = 0.0011938384268432856
Validation loss = 0.0016403819900006056
Validation loss = 0.001944398507475853
Validation loss = 0.0023868235293775797
Validation loss = 0.0021293957252055407
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012900319416075945
Validation loss = 0.0014284311328083277
Validation loss = 0.0017043956322595477
Validation loss = 0.0011709597893059254
Validation loss = 0.001181400497443974
Validation loss = 0.0012236282927915454
Validation loss = 0.0016524773091077805
Validation loss = 0.0021762121468782425
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015625400701537728
Validation loss = 0.0014446722343564034
Validation loss = 0.0015563524793833494
Validation loss = 0.0023375116288661957
Validation loss = 0.0013091502478346229
Validation loss = 0.002214873442426324
Validation loss = 0.0023321351036429405
Validation loss = 0.002633883850648999
Validation loss = 0.004486697725951672
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023691775277256966
Validation loss = 0.0018736926140263677
Validation loss = 0.0018467690097168088
Validation loss = 0.0013364707119762897
Validation loss = 0.001835407572798431
Validation loss = 0.001145559479482472
Validation loss = 0.0011172422673553228
Validation loss = 0.0016423823544755578
Validation loss = 0.0016223941929638386
Validation loss = 0.0019659081008285284
Validation loss = 0.0018576543079689145
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0109   |
| Iteration     | 42        |
| MaximumReturn | -0.000732 |
| MinimumReturn | -0.0421   |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014489602763205767
Validation loss = 0.0014266420621424913
Validation loss = 0.002623604144901037
Validation loss = 0.0010066230315715075
Validation loss = 0.0022934144362807274
Validation loss = 0.001205515582114458
Validation loss = 0.0012369534233585
Validation loss = 0.0013171699829399586
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0023751771077513695
Validation loss = 0.002408111933618784
Validation loss = 0.0012414761586114764
Validation loss = 0.001072185579687357
Validation loss = 0.006778380833566189
Validation loss = 0.0010620640823617578
Validation loss = 0.0014378424966707826
Validation loss = 0.0007966701523400843
Validation loss = 0.0018218462355434895
Validation loss = 0.001399673637934029
Validation loss = 0.0013292547082528472
Validation loss = 0.00289298128336668
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015224665403366089
Validation loss = 0.0014727574307471514
Validation loss = 0.001906251534819603
Validation loss = 0.001233258401043713
Validation loss = 0.001697648549452424
Validation loss = 0.006249187979847193
Validation loss = 0.0016474326839670539
Validation loss = 0.0012243252713233232
Validation loss = 0.0014193190727382898
Validation loss = 0.0019804697949439287
Validation loss = 0.0018702881643548608
Validation loss = 0.00386668904684484
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007946115918457508
Validation loss = 0.0014467007713392377
Validation loss = 0.0015223510563373566
Validation loss = 0.001733484328724444
Validation loss = 0.0013002047780901194
Validation loss = 0.002300043124705553
Validation loss = 0.0015521064633503556
Validation loss = 0.001204908243380487
Validation loss = 0.0023840712383389473
Validation loss = 0.0017876157071441412
Validation loss = 0.0026586696039885283
Validation loss = 0.0029657192062586546
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009201763896271586
Validation loss = 0.002684197388589382
Validation loss = 0.0021236222237348557
Validation loss = 0.0010774186812341213
Validation loss = 0.0009078296716324985
Validation loss = 0.0020725352223962545
Validation loss = 0.0014747031964361668
Validation loss = 0.0010857017477974296
Validation loss = 0.001957728061825037
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00893  |
| Iteration     | 43        |
| MaximumReturn | -0.000556 |
| MinimumReturn | -0.0619   |
| TotalSamples  | 74970     |
-----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003399088280275464
Validation loss = 0.0030890919733792543
Validation loss = 0.0011260659666731954
Validation loss = 0.0020513779018074274
Validation loss = 0.0018171011470258236
Validation loss = 0.0011207822244614363
Validation loss = 0.0015189299592748284
Validation loss = 0.0014983451692387462
Validation loss = 0.0014837108319625258
Validation loss = 0.0017457235371693969
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001242761267349124
Validation loss = 0.0011301767081022263
Validation loss = 0.002378473524004221
Validation loss = 0.001155185280367732
Validation loss = 0.0010286075994372368
Validation loss = 0.0018419984262436628
Validation loss = 0.0008322964422404766
Validation loss = 0.0013102152151986957
Validation loss = 0.0018568604718893766
Validation loss = 0.0016443515196442604
Validation loss = 0.0016556420596316457
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0026565417647361755
Validation loss = 0.001400195644237101
Validation loss = 0.0022484452929347754
Validation loss = 0.002096094423905015
Validation loss = 0.0013763741590082645
Validation loss = 0.003175035584717989
Validation loss = 0.0015034074895083904
Validation loss = 0.004012653138488531
Validation loss = 0.002472317311912775
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015674926107749343
Validation loss = 0.002148998435586691
Validation loss = 0.001989607233554125
Validation loss = 0.0019519274355843663
Validation loss = 0.0013322272570803761
Validation loss = 0.003017683746293187
Validation loss = 0.0014627841301262379
Validation loss = 0.0019914822187274694
Validation loss = 0.00131697254255414
Validation loss = 0.0010506915859878063
Validation loss = 0.0021102814935147762
Validation loss = 0.0012865436729043722
Validation loss = 0.00122610863763839
Validation loss = 0.0018157741287723184
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002946866676211357
Validation loss = 0.0009808626491576433
Validation loss = 0.0010168199660256505
Validation loss = 0.0015170100377872586
Validation loss = 0.001045547192916274
Validation loss = 0.0019807815551757812
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00479  |
| Iteration     | 44        |
| MaximumReturn | -0.000627 |
| MinimumReturn | -0.0506   |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015224474482238293
Validation loss = 0.006241854280233383
Validation loss = 0.0020195774268358946
Validation loss = 0.001774546573869884
Validation loss = 0.0018102660542353988
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011404039105400443
Validation loss = 0.00123625120613724
Validation loss = 0.007248293608427048
Validation loss = 0.0010577688226476312
Validation loss = 0.0010631532641127706
Validation loss = 0.0017429885920137167
Validation loss = 0.0017700791358947754
Validation loss = 0.004587128292769194
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001531172194518149
Validation loss = 0.001694409642368555
Validation loss = 0.002392282010987401
Validation loss = 0.0015855106757953763
Validation loss = 0.0024289023131132126
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001957138068974018
Validation loss = 0.002927226945757866
Validation loss = 0.0028344520833343267
Validation loss = 0.0017017885111272335
Validation loss = 0.0023211732041090727
Validation loss = 0.0021757218055427074
Validation loss = 0.0014035841450095177
Validation loss = 0.0014169092755764723
Validation loss = 0.002015291480347514
Validation loss = 0.0022336794063448906
Validation loss = 0.012413584627211094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020699352025985718
Validation loss = 0.0014878365909680724
Validation loss = 0.0015716791385784745
Validation loss = 0.0019303474109619856
Validation loss = 0.001354461768642068
Validation loss = 0.0022483011707663536
Validation loss = 0.0012164738727733493
Validation loss = 0.0017934527713805437
Validation loss = 0.0014643653994426131
Validation loss = 0.002228375757113099
Validation loss = 0.0014013415202498436
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00434  |
| Iteration     | 45        |
| MaximumReturn | -0.000539 |
| MinimumReturn | -0.042    |
| TotalSamples  | 78302     |
-----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002026895061135292
Validation loss = 0.0017838099738582969
Validation loss = 0.0019685886800289154
Validation loss = 0.0011434891493991017
Validation loss = 0.0012654667953029275
Validation loss = 0.007498240564018488
Validation loss = 0.003962609451264143
Validation loss = 0.0019672249909490347
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008105761953629553
Validation loss = 0.0011407271958887577
Validation loss = 0.0013113562017679214
Validation loss = 0.0008146512554958463
Validation loss = 0.001161022111773491
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00250524771399796
Validation loss = 0.0010888476390391588
Validation loss = 0.0009974633576348424
Validation loss = 0.0009989026002585888
Validation loss = 0.00252808490768075
Validation loss = 0.005070776678621769
Validation loss = 0.0012243537930771708
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002600929234176874
Validation loss = 0.001240480924025178
Validation loss = 0.0023822791408747435
Validation loss = 0.0012930717784911394
Validation loss = 0.0019321992294862866
Validation loss = 0.0017151234205812216
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023057996295392513
Validation loss = 0.0011234356788918376
Validation loss = 0.0011142212897539139
Validation loss = 0.0016480067279189825
Validation loss = 0.0011710700346156955
Validation loss = 0.0012726893182843924
Validation loss = 0.003128501819446683
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00597  |
| Iteration     | 46        |
| MaximumReturn | -0.000538 |
| MinimumReturn | -0.0607   |
| TotalSamples  | 79968     |
-----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001234508235938847
Validation loss = 0.0013388802763074636
Validation loss = 0.0015167569508776069
Validation loss = 0.001106745912693441
Validation loss = 0.003113578539341688
Validation loss = 0.0024047920014709234
Validation loss = 0.0029266788624227047
Validation loss = 0.002240872709080577
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011957610258832574
Validation loss = 0.0026620218995958567
Validation loss = 0.0023180698044598103
Validation loss = 0.0007825561915524304
Validation loss = 0.0015676639741286635
Validation loss = 0.004055480007082224
Validation loss = 0.0022787924390286207
Validation loss = 0.0020808775443583727
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012034645769745111
Validation loss = 0.0021596490405499935
Validation loss = 0.0013656567316502333
Validation loss = 0.0014840565854683518
Validation loss = 0.0016910613048821688
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016053782310336828
Validation loss = 0.002236888511106372
Validation loss = 0.00223679025657475
Validation loss = 0.0012733594048768282
Validation loss = 0.0015609149122610688
Validation loss = 0.0009185030939988792
Validation loss = 0.002439146861433983
Validation loss = 0.0020470288582146168
Validation loss = 0.001100609079003334
Validation loss = 0.0011469828896224499
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014536509988829494
Validation loss = 0.0047176070511341095
Validation loss = 0.0020921246614307165
Validation loss = 0.0015685849357396364
Validation loss = 0.0010125281987711787
Validation loss = 0.001536854193545878
Validation loss = 0.001111872959882021
Validation loss = 0.0012680053478106856
Validation loss = 0.001947288983501494
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00712  |
| Iteration     | 47        |
| MaximumReturn | -0.000548 |
| MinimumReturn | -0.0591   |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009586033411324024
Validation loss = 0.0019522799411788583
Validation loss = 0.002217549365013838
Validation loss = 0.0021177141461521387
Validation loss = 0.0017899868544191122
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018454075325280428
Validation loss = 0.0011771570425480604
Validation loss = 0.0017473468324169517
Validation loss = 0.0014660404995083809
Validation loss = 0.0015710151055827737
Validation loss = 0.0028232759796082973
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001133717829361558
Validation loss = 0.001415323931723833
Validation loss = 0.0014213552931323647
Validation loss = 0.0019852996338158846
Validation loss = 0.003171376883983612
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012227296829223633
Validation loss = 0.0014398854691535234
Validation loss = 0.0016964368987828493
Validation loss = 0.0013879865873605013
Validation loss = 0.0022001084871590137
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008911535260267556
Validation loss = 0.000971418630797416
Validation loss = 0.0013646824518218637
Validation loss = 0.003073552157729864
Validation loss = 0.0010602781549096107
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0135   |
| Iteration     | 48        |
| MaximumReturn | -0.000635 |
| MinimumReturn | -0.0833   |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014240358723327518
Validation loss = 0.0015132026746869087
Validation loss = 0.004506619647145271
Validation loss = 0.0022588998544961214
Validation loss = 0.001130080665461719
Validation loss = 0.0011644390178844333
Validation loss = 0.0029481137171387672
Validation loss = 0.0010722114238888025
Validation loss = 0.0009769108146429062
Validation loss = 0.0013468880206346512
Validation loss = 0.0009305395651608706
Validation loss = 0.0013523006346076727
Validation loss = 0.0015685701509937644
Validation loss = 0.0015132831176742911
Validation loss = 0.0020500775426626205
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011306253727525473
Validation loss = 0.00076322938548401
Validation loss = 0.0013114101020619273
Validation loss = 0.0014580648858100176
Validation loss = 0.0020248298533260822
Validation loss = 0.0012457381235435605
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017808470875024796
Validation loss = 0.0020602780859917402
Validation loss = 0.0037844839971512556
Validation loss = 0.009993196465075016
Validation loss = 0.0028804163448512554
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011579530546441674
Validation loss = 0.001810645335353911
Validation loss = 0.0018456915859133005
Validation loss = 0.0010490523418411613
Validation loss = 0.0022431202232837677
Validation loss = 0.003026483813300729
Validation loss = 0.002524584997445345
Validation loss = 0.0017522971611469984
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019114632159471512
Validation loss = 0.0017771775601431727
Validation loss = 0.0012333389604464173
Validation loss = 0.002670684829354286
Validation loss = 0.002135072834789753
Validation loss = 0.001035828609019518
Validation loss = 0.0010203310521319509
Validation loss = 0.0010717582190409303
Validation loss = 0.0010132340248674154
Validation loss = 0.0015291436575353146
Validation loss = 0.0027611595578491688
Validation loss = 0.0010901218047365546
Validation loss = 0.0015216912142932415
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0146   |
| Iteration     | 49        |
| MaximumReturn | -0.000556 |
| MinimumReturn | -0.0828   |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011127268662676215
Validation loss = 0.0012125968933105469
Validation loss = 0.0008737463504076004
Validation loss = 0.0013679887633770704
Validation loss = 0.0018977420404553413
Validation loss = 0.004606795962899923
Validation loss = 0.0032137769740074873
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0023980725090950727
Validation loss = 0.0010165950516238809
Validation loss = 0.0030794881749898195
Validation loss = 0.0009965056087821722
Validation loss = 0.0016098028281703591
Validation loss = 0.001313437707722187
Validation loss = 0.0010778425494208932
Validation loss = 0.0019345303298905492
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014382846420630813
Validation loss = 0.0042967586778104305
Validation loss = 0.003855468239635229
Validation loss = 0.0010611830512061715
Validation loss = 0.0017829230055212975
Validation loss = 0.0016940399073064327
Validation loss = 0.001253772876225412
Validation loss = 0.0017135952366515994
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014195323456078768
Validation loss = 0.0018630499253049493
Validation loss = 0.0014427854912355542
Validation loss = 0.0017983732977882028
Validation loss = 0.001220155623741448
Validation loss = 0.001811943599022925
Validation loss = 0.0010749668581411242
Validation loss = 0.00190892000682652
Validation loss = 0.001318078488111496
Validation loss = 0.0021612620912492275
Validation loss = 0.0010824005585163832
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011040458921343088
Validation loss = 0.0011026744032278657
Validation loss = 0.0022200264502316713
Validation loss = 0.0014533306239172816
Validation loss = 0.0012314723571762443
Validation loss = 0.001167215988971293
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0204   |
| Iteration     | 50        |
| MaximumReturn | -0.000646 |
| MinimumReturn | -0.104    |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0032539814710617065
Validation loss = 0.001165618421509862
Validation loss = 0.0009739965316839516
Validation loss = 0.0013759997673332691
Validation loss = 0.0021879414562135935
Validation loss = 0.000905865163076669
Validation loss = 0.000996859511360526
Validation loss = 0.0019267372554168105
Validation loss = 0.004331337753683329
Validation loss = 0.0010838102316483855
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0026113586500287056
Validation loss = 0.0014692217810079455
Validation loss = 0.0009296510252170265
Validation loss = 0.005529832560569048
Validation loss = 0.0016765970503911376
Validation loss = 0.0011688423110172153
Validation loss = 0.0012183139333501458
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005257012788206339
Validation loss = 0.001961894566193223
Validation loss = 0.0011719308095052838
Validation loss = 0.001136329141445458
Validation loss = 0.0016084318049252033
Validation loss = 0.001382893300615251
Validation loss = 0.001256412360817194
Validation loss = 0.0023476211354136467
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009340450051240623
Validation loss = 0.0015078647993505
Validation loss = 0.0012774706119671464
Validation loss = 0.0013625860447064042
Validation loss = 0.0032278227154165506
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002161589916795492
Validation loss = 0.0009918882278725505
Validation loss = 0.0012447201879695058
Validation loss = 0.0011348556727170944
Validation loss = 0.001117158797569573
Validation loss = 0.0009914375841617584
Validation loss = 0.0014742533676326275
Validation loss = 0.001010117121040821
Validation loss = 0.0012346849543973804
Validation loss = 0.000973262416664511
Validation loss = 0.002928215079009533
Validation loss = 0.0015706177800893784
Validation loss = 0.0008450047462247312
Validation loss = 0.0013994110049679875
Validation loss = 0.0011484298156574368
Validation loss = 0.001080031506717205
Validation loss = 0.0009916782146319747
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.152   |
| Iteration     | 51       |
| MaximumReturn | -0.00202 |
| MinimumReturn | -0.212   |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000990537111647427
Validation loss = 0.0009613802540116012
Validation loss = 0.0008476162911392748
Validation loss = 0.0012119640596210957
Validation loss = 0.0013339974684640765
Validation loss = 0.0030733770690858364
Validation loss = 0.001351668732240796
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001077838707715273
Validation loss = 0.002283468609675765
Validation loss = 0.0016510034911334515
Validation loss = 0.0016500938218086958
Validation loss = 0.0015015401877462864
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013978376518934965
Validation loss = 0.000714021036401391
Validation loss = 0.0011128453770652413
Validation loss = 0.0021046793553978205
Validation loss = 0.002509125741198659
Validation loss = 0.0009245668188668787
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001060732640326023
Validation loss = 0.001007113023661077
Validation loss = 0.0012093507684767246
Validation loss = 0.0016669046599417925
Validation loss = 0.001562342164106667
Validation loss = 0.0016702784923836589
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008641090244054794
Validation loss = 0.0016137707279995084
Validation loss = 0.0007147578289732337
Validation loss = 0.0007587571162730455
Validation loss = 0.0019317320547997952
Validation loss = 0.0011241730535402894
Validation loss = 0.0009324066340923309
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0168   |
| Iteration     | 52        |
| MaximumReturn | -0.000563 |
| MinimumReturn | -0.195    |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013243878493085504
Validation loss = 0.002648446708917618
Validation loss = 0.0009353678906336427
Validation loss = 0.004929170478135347
Validation loss = 0.0031638648360967636
Validation loss = 0.001465176697820425
Validation loss = 0.0015152567066252232
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001705052563920617
Validation loss = 0.0011737840250134468
Validation loss = 0.0010072351433336735
Validation loss = 0.0012048587668687105
Validation loss = 0.003610167885199189
Validation loss = 0.0009091949905268848
Validation loss = 0.0015651980647817254
Validation loss = 0.001073771039955318
Validation loss = 0.0012610222911462188
Validation loss = 0.0009250742150470614
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003687312826514244
Validation loss = 0.0011887757573276758
Validation loss = 0.000942247046623379
Validation loss = 0.0020507636945694685
Validation loss = 0.001396976294927299
Validation loss = 0.001392351696267724
Validation loss = 0.0016769495559856296
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010882680071517825
Validation loss = 0.0010889059631153941
Validation loss = 0.0009518389124423265
Validation loss = 0.0015525755006819963
Validation loss = 0.0019468552200123668
Validation loss = 0.0010906795505434275
Validation loss = 0.003763338550925255
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000684201018884778
Validation loss = 0.0017481982940807939
Validation loss = 0.001988850301131606
Validation loss = 0.0011596438707783818
Validation loss = 0.003917704802006483
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00888  |
| Iteration     | 53        |
| MaximumReturn | -0.000635 |
| MinimumReturn | -0.103    |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009275747579522431
Validation loss = 0.0012103925691917539
Validation loss = 0.0011664972407743335
Validation loss = 0.001380913075990975
Validation loss = 0.0012849707854911685
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011110362829640508
Validation loss = 0.0008654629928059876
Validation loss = 0.0007333714165724814
Validation loss = 0.0030253648292273283
Validation loss = 0.00194524263497442
Validation loss = 0.0017551715718582273
Validation loss = 0.001521710422821343
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005690651014447212
Validation loss = 0.0042176214046776295
Validation loss = 0.0010561167728155851
Validation loss = 0.0009104475611820817
Validation loss = 0.001102490583434701
Validation loss = 0.0033547705970704556
Validation loss = 0.0017302539199590683
Validation loss = 0.001122387358918786
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003382696071639657
Validation loss = 0.0009751966572366655
Validation loss = 0.0010316953994333744
Validation loss = 0.0011357417097315192
Validation loss = 0.003407497191801667
Validation loss = 0.0018089996883645654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015011547366157174
Validation loss = 0.0009446346084587276
Validation loss = 0.00095783406868577
Validation loss = 0.0011776088504120708
Validation loss = 0.001796070602722466
Validation loss = 0.0009113369742408395
Validation loss = 0.0008193321409635246
Validation loss = 0.001593283494003117
Validation loss = 0.0009957747533917427
Validation loss = 0.0009397707763127983
Validation loss = 0.0018751531606540084
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0155   |
| Iteration     | 54        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.151    |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014034649357199669
Validation loss = 0.0012897546403110027
Validation loss = 0.0009855643147602677
Validation loss = 0.0011471275938674808
Validation loss = 0.0011535420781001449
Validation loss = 0.001394544611684978
Validation loss = 0.0009689952130429447
Validation loss = 0.0024284219834953547
Validation loss = 0.002676961477845907
Validation loss = 0.0009450301877222955
Validation loss = 0.0013538931962102652
Validation loss = 0.0017944987630471587
Validation loss = 0.0014973431825637817
Validation loss = 0.0009037895943038166
Validation loss = 0.0009713807958178222
Validation loss = 0.0012373122153803706
Validation loss = 0.0017997549148276448
Validation loss = 0.0018175477162003517
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009505226626060903
Validation loss = 0.0008434322662651539
Validation loss = 0.0008623601752333343
Validation loss = 0.001048687961883843
Validation loss = 0.0014107165625318885
Validation loss = 0.0011166430776938796
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009078137227334082
Validation loss = 0.0018929834477603436
Validation loss = 0.0015421963762491941
Validation loss = 0.0011547073954716325
Validation loss = 0.0023952883202582598
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013995094923302531
Validation loss = 0.0010473212460055947
Validation loss = 0.0012661429354920983
Validation loss = 0.0014384777750819921
Validation loss = 0.0016797262942418456
Validation loss = 0.0009713665349408984
Validation loss = 0.0013678987743332982
Validation loss = 0.0010150871239602566
Validation loss = 0.0035947745200246572
Validation loss = 0.0016682515852153301
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001451239688321948
Validation loss = 0.000976384908426553
Validation loss = 0.0009541751351207495
Validation loss = 0.0010411760304123163
Validation loss = 0.0013409251114353538
Validation loss = 0.001393436687067151
Validation loss = 0.0009169700788334012
Validation loss = 0.0008287898963317275
Validation loss = 0.0022367173805832863
Validation loss = 0.0010419636964797974
Validation loss = 0.0011303821811452508
Validation loss = 0.0016011077677831054
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.113   |
| Iteration     | 55       |
| MaximumReturn | -0.0182  |
| MinimumReturn | -0.304   |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011158292181789875
Validation loss = 0.0034726508893072605
Validation loss = 0.0015977061120793223
Validation loss = 0.001146085443906486
Validation loss = 0.0010473853908479214
Validation loss = 0.000673013215418905
Validation loss = 0.0010803526965901256
Validation loss = 0.0016801422461867332
Validation loss = 0.00123414711561054
Validation loss = 0.000730577448848635
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016413190169259906
Validation loss = 0.0015509685035794973
Validation loss = 0.0006526274955831468
Validation loss = 0.0007678239489905536
Validation loss = 0.0005965700256638229
Validation loss = 0.0011424827389419079
Validation loss = 0.0023385826498270035
Validation loss = 0.0008027749136090279
Validation loss = 0.001150391180999577
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012485462939366698
Validation loss = 0.0007629739120602608
Validation loss = 0.0007855286821722984
Validation loss = 0.0010648674797266722
Validation loss = 0.0012271907180547714
Validation loss = 0.0007540130172856152
Validation loss = 0.0010962188243865967
Validation loss = 0.0009601111523807049
Validation loss = 0.0026899997610598803
Validation loss = 0.0008033153135329485
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007687061442993581
Validation loss = 0.0036325431428849697
Validation loss = 0.0012640749337151647
Validation loss = 0.0035525488201528788
Validation loss = 0.0010323288151994348
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008742701611481607
Validation loss = 0.000821684196125716
Validation loss = 0.0011603820603340864
Validation loss = 0.0008499044342897832
Validation loss = 0.0011677766451612115
Validation loss = 0.0011816361220553517
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00906  |
| Iteration     | 56        |
| MaximumReturn | -0.000622 |
| MinimumReturn | -0.132    |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004183061886578798
Validation loss = 0.0011415405897423625
Validation loss = 0.0008802576921880245
Validation loss = 0.0010273278458043933
Validation loss = 0.0007877901080064476
Validation loss = 0.001538068987429142
Validation loss = 0.0007394805434159935
Validation loss = 0.0010029865661635995
Validation loss = 0.0008970983908511698
Validation loss = 0.0014050708850845695
Validation loss = 0.0013648681342601776
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016963182715699077
Validation loss = 0.0020051435567438602
Validation loss = 0.0018366887234151363
Validation loss = 0.00065964600071311
Validation loss = 0.0014002796960994601
Validation loss = 0.003253289731219411
Validation loss = 0.0006913850666023791
Validation loss = 0.0008838761714287102
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010933683952316642
Validation loss = 0.0008869476150721312
Validation loss = 0.0008495989604853094
Validation loss = 0.0021292127203196287
Validation loss = 0.0014312263811007142
Validation loss = 0.0020142430439591408
Validation loss = 0.002984953112900257
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004813185892999172
Validation loss = 0.001218399964272976
Validation loss = 0.000755098822992295
Validation loss = 0.0010016957530751824
Validation loss = 0.0032881207298487425
Validation loss = 0.0020634715911000967
Validation loss = 0.0015669033164158463
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011112472275272012
Validation loss = 0.0007683437434025109
Validation loss = 0.0009129823301918805
Validation loss = 0.0012902352027595043
Validation loss = 0.0010252316715195775
Validation loss = 0.0006936888094060123
Validation loss = 0.0010512610897421837
Validation loss = 0.0009465362527407706
Validation loss = 0.002317218342795968
Validation loss = 0.00225739530287683
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00176  |
| Iteration     | 57        |
| MaximumReturn | -0.000575 |
| MinimumReturn | -0.025    |
| TotalSamples  | 98294     |
-----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018964899936690927
Validation loss = 0.0014134986558929086
Validation loss = 0.002845433307811618
Validation loss = 0.0025585638359189034
Validation loss = 0.0010341893648728728
Validation loss = 0.001522919163107872
Validation loss = 0.000963621074333787
Validation loss = 0.0013554907636716962
Validation loss = 0.0008343416848219931
Validation loss = 0.001140184816904366
Validation loss = 0.0017627014312893152
Validation loss = 0.001139477244578302
Validation loss = 0.0008009620360098779
Validation loss = 0.0012712815077975392
Validation loss = 0.0010806877398863435
Validation loss = 0.0010144952684640884
Validation loss = 0.0012891520746052265
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016030723927542567
Validation loss = 0.0013679704861715436
Validation loss = 0.0031676706857979298
Validation loss = 0.000882007007021457
Validation loss = 0.0008211550302803516
Validation loss = 0.007774455938488245
Validation loss = 0.001897537033073604
Validation loss = 0.0009866338223218918
Validation loss = 0.0006904560723342001
Validation loss = 0.001356656663119793
Validation loss = 0.0011199498549103737
Validation loss = 0.0008649567607790232
Validation loss = 0.0011119697010144591
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011174518149346113
Validation loss = 0.0015177000313997269
Validation loss = 0.002223576419055462
Validation loss = 0.0015307307476177812
Validation loss = 0.0008889463497325778
Validation loss = 0.0008892978657968342
Validation loss = 0.0008237610454671085
Validation loss = 0.0008213815744966269
Validation loss = 0.0017529900651425123
Validation loss = 0.0012764636194333434
Validation loss = 0.0020709848031401634
Validation loss = 0.0022263091523200274
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013520075008273125
Validation loss = 0.0011832467280328274
Validation loss = 0.0012423601001501083
Validation loss = 0.0009704965050332248
Validation loss = 0.0014711744152009487
Validation loss = 0.0012765472056344151
Validation loss = 0.0024486645124852657
Validation loss = 0.0009323748527094722
Validation loss = 0.0014207428321242332
Validation loss = 0.0014323461800813675
Validation loss = 0.0015812561614438891
Validation loss = 0.001126904971897602
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001047697034664452
Validation loss = 0.0016716510290279984
Validation loss = 0.0010115032782778144
Validation loss = 0.000808022334240377
Validation loss = 0.0007665494340471923
Validation loss = 0.0008570586796849966
Validation loss = 0.0008163810125552118
Validation loss = 0.0019455348374322057
Validation loss = 0.000869332579895854
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0823  |
| Iteration     | 58       |
| MaximumReturn | -0.0024  |
| MinimumReturn | -0.221   |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007226636516861618
Validation loss = 0.0012167408131062984
Validation loss = 0.0010466778185218573
Validation loss = 0.0010807046201080084
Validation loss = 0.0011284027714282274
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008729667752049863
Validation loss = 0.0006718291551806033
Validation loss = 0.0015643432270735502
Validation loss = 0.0036111855879426003
Validation loss = 0.0009223124361597002
Validation loss = 0.000659992394503206
Validation loss = 0.0010094763711094856
Validation loss = 0.0029513633344322443
Validation loss = 0.0007154859486036003
Validation loss = 0.0007658932590857148
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009860615245997906
Validation loss = 0.0009136753506027162
Validation loss = 0.000840625143609941
Validation loss = 0.001207013032399118
Validation loss = 0.0014777767937630415
Validation loss = 0.0013280794955790043
Validation loss = 0.0009255294571630657
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001096616848371923
Validation loss = 0.0011492219055071473
Validation loss = 0.0008467158186249435
Validation loss = 0.001471299328841269
Validation loss = 0.002105460502207279
Validation loss = 0.0025559442583471537
Validation loss = 0.0007885699160397053
Validation loss = 0.0013700929703190923
Validation loss = 0.0010022387141361833
Validation loss = 0.0013862022897228599
Validation loss = 0.0008127149194478989
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011325075756758451
Validation loss = 0.002236823318526149
Validation loss = 0.0010619531385600567
Validation loss = 0.0010040034539997578
Validation loss = 0.0009145702933892608
Validation loss = 0.001036953297443688
Validation loss = 0.0020172568038105965
Validation loss = 0.0005971243372187018
Validation loss = 0.0013318349374458194
Validation loss = 0.004015189595520496
Validation loss = 0.0010490173008292913
Validation loss = 0.0028262233827263117
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0155   |
| Iteration     | 59        |
| MaximumReturn | -0.000544 |
| MinimumReturn | -0.0817   |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007745504262857139
Validation loss = 0.0028553784359246492
Validation loss = 0.0010709598427638412
Validation loss = 0.0012929465156048536
Validation loss = 0.000631901843007654
Validation loss = 0.0016894895816221833
Validation loss = 0.002192230662330985
Validation loss = 0.0025579379871487617
Validation loss = 0.0019429806852713227
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008189002401195467
Validation loss = 0.0006153107970021665
Validation loss = 0.0015739607624709606
Validation loss = 0.0010899980552494526
Validation loss = 0.0008042420959100127
Validation loss = 0.0009554211283102632
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012624232331290841
Validation loss = 0.0007539769867435098
Validation loss = 0.0011177666019648314
Validation loss = 0.004681408870965242
Validation loss = 0.0018154982244595885
Validation loss = 0.0014921438414603472
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008428861037828028
Validation loss = 0.0015751912724226713
Validation loss = 0.0010303431190550327
Validation loss = 0.0007258803234435618
Validation loss = 0.0011154034873470664
Validation loss = 0.0008188161882571876
Validation loss = 0.001083885901607573
Validation loss = 0.0012935742270201445
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004103883169591427
Validation loss = 0.0007376439752988517
Validation loss = 0.000648514018394053
Validation loss = 0.0010807333746924996
Validation loss = 0.001350456033833325
Validation loss = 0.0006699856603518128
Validation loss = 0.0011229108786210418
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -87.5    |
| Iteration     | 60       |
| MaximumReturn | -50      |
| MinimumReturn | -106     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006034191232174635
Validation loss = 0.001635460532270372
Validation loss = 0.0011460999958217144
Validation loss = 0.0013842096086591482
Validation loss = 0.0034524411894381046
Validation loss = 0.0018680369248613715
Validation loss = 0.0020552347414195538
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006117701064795256
Validation loss = 0.0013632208574563265
Validation loss = 0.0010829647071659565
Validation loss = 0.0011671310057863593
Validation loss = 0.0009499742300249636
Validation loss = 0.0023105954751372337
Validation loss = 0.0008482041303068399
Validation loss = 0.0013156646164134145
Validation loss = 0.0009999580215662718
Validation loss = 0.0005376219633035362
Validation loss = 0.0012334620114415884
Validation loss = 0.000642795639578253
Validation loss = 0.0009553427807986736
Validation loss = 0.001419362029992044
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007561346981674433
Validation loss = 0.002615013625472784
Validation loss = 0.0018641466740518808
Validation loss = 0.0016163226682692766
Validation loss = 0.001604675780981779
Validation loss = 0.0011278638849034905
Validation loss = 0.0011960245901718736
Validation loss = 0.0010697689140215516
Validation loss = 0.0010508993873372674
Validation loss = 0.0009866275358945131
Validation loss = 0.0018503707833588123
Validation loss = 0.0010743931634351611
Validation loss = 0.0009033824317157269
Validation loss = 0.0013125803088769317
Validation loss = 0.0027902191504836082
Validation loss = 0.003026241436600685
Validation loss = 0.0006028834613971412
Validation loss = 0.0008805635152384639
Validation loss = 0.0016131412703543901
Validation loss = 0.0011475092032924294
Validation loss = 0.0006459280848503113
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005967855453491211
Validation loss = 0.001438869396224618
Validation loss = 0.001584176323376596
Validation loss = 0.0012129537062719464
Validation loss = 0.0011224953923374414
Validation loss = 0.0009037412819452584
Validation loss = 0.0012235480826348066
Validation loss = 0.0011151687940582633
Validation loss = 0.0024718500208109617
Validation loss = 0.000982840545475483
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005717055406421423
Validation loss = 0.0015631321584805846
Validation loss = 0.0009889795910567045
Validation loss = 0.0009926606435328722
Validation loss = 0.0027091968804597855
Validation loss = 0.0007303542806766927
Validation loss = 0.0010696040699258447
Validation loss = 0.0009244853281415999
Validation loss = 0.000752207706682384
Validation loss = 0.0006575367297045887
Validation loss = 0.0007379338494502008
Validation loss = 0.0008102057036012411
Validation loss = 0.0007742835441604257
Validation loss = 0.0014285888755694032
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -48.7    |
| Iteration     | 61       |
| MaximumReturn | -0.0382  |
| MinimumReturn | -88.8    |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025029529351741076
Validation loss = 0.0008443693513981998
Validation loss = 0.0011635413393378258
Validation loss = 0.0007929642451927066
Validation loss = 0.0013585606357082725
Validation loss = 0.0019223361741751432
Validation loss = 0.0009232967859134078
Validation loss = 0.0010334114776924253
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003699773456901312
Validation loss = 0.0030545443296432495
Validation loss = 0.0008310119155794382
Validation loss = 0.0013865946093574166
Validation loss = 0.001291553839109838
Validation loss = 0.0008126592729240656
Validation loss = 0.0011184074683114886
Validation loss = 0.0007538683712482452
Validation loss = 0.0008862114627845585
Validation loss = 0.0006604341324418783
Validation loss = 0.001204842235893011
Validation loss = 0.0008456294308416545
Validation loss = 0.0032356034498661757
Validation loss = 0.0009681566152721643
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003691531717777252
Validation loss = 0.001301841693930328
Validation loss = 0.0011811377480626106
Validation loss = 0.0007581697427667677
Validation loss = 0.0031715307850390673
Validation loss = 0.0008108178153634071
Validation loss = 0.000951037451159209
Validation loss = 0.0011846505803987384
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0023557287640869617
Validation loss = 0.0010989216389134526
Validation loss = 0.0007914056768640876
Validation loss = 0.0009946103673428297
Validation loss = 0.0011831660522148013
Validation loss = 0.0009622694342397153
Validation loss = 0.001304109231568873
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003157230792567134
Validation loss = 0.0010674293152987957
Validation loss = 0.0008614934631623328
Validation loss = 0.0009592790738679469
Validation loss = 0.0012504637707024813
Validation loss = 0.0013556330231949687
Validation loss = 0.0009850437054410577
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.172   |
| Iteration     | 62       |
| MaximumReturn | -0.12    |
| MinimumReturn | -0.217   |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00166514259763062
Validation loss = 0.0011720305774360895
Validation loss = 0.0007920176722109318
Validation loss = 0.0007254260708577931
Validation loss = 0.001974544022232294
Validation loss = 0.0008922164561226964
Validation loss = 0.0008907766314223409
Validation loss = 0.0015743887051939964
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005805678083561361
Validation loss = 0.0017731873085722327
Validation loss = 0.0011026824358850718
Validation loss = 0.0008054483914747834
Validation loss = 0.0008853374747559428
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011517851380631328
Validation loss = 0.001163590233772993
Validation loss = 0.0007680174894630909
Validation loss = 0.0010700604179874063
Validation loss = 0.0012257300550118089
Validation loss = 0.0014271553372964263
Validation loss = 0.0011204268084838986
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015444090822711587
Validation loss = 0.0010018821340054274
Validation loss = 0.000784802483394742
Validation loss = 0.001451763091608882
Validation loss = 0.0009097829461097717
Validation loss = 0.0007251627393998206
Validation loss = 0.0020047600846737623
Validation loss = 0.0007602815749123693
Validation loss = 0.0015883486485108733
Validation loss = 0.002000808948650956
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014088453026488423
Validation loss = 0.0007837124285288155
Validation loss = 0.0008004563860595226
Validation loss = 0.0007738237618468702
Validation loss = 0.0007041739299893379
Validation loss = 0.001329149235971272
Validation loss = 0.0006442177691496909
Validation loss = 0.0006395659293048084
Validation loss = 0.0009996640728786588
Validation loss = 0.000977379851974547
Validation loss = 0.0013827093644067645
Validation loss = 0.0009510812815278769
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0192   |
| Iteration     | 63        |
| MaximumReturn | -0.000942 |
| MinimumReturn | -0.147    |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013613285264000297
Validation loss = 0.0009699116344563663
Validation loss = 0.0008192510576918721
Validation loss = 0.0010472919093444943
Validation loss = 0.0010071737924590707
Validation loss = 0.0010664603905752301
Validation loss = 0.0010269918711856008
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008893301128409803
Validation loss = 0.0010949743445962667
Validation loss = 0.000543747388292104
Validation loss = 0.0006309266318567097
Validation loss = 0.0010600958485156298
Validation loss = 0.0013973074965178967
Validation loss = 0.0014559022383764386
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007612343761138618
Validation loss = 0.0013424503849819303
Validation loss = 0.002058402867987752
Validation loss = 0.0010478799231350422
Validation loss = 0.0013049254193902016
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001039402442984283
Validation loss = 0.0020710639655590057
Validation loss = 0.0014783981023356318
Validation loss = 0.000817280204501003
Validation loss = 0.0012013411615043879
Validation loss = 0.0007186871371231973
Validation loss = 0.0007402376504614949
Validation loss = 0.002519736299291253
Validation loss = 0.000910338421817869
Validation loss = 0.0016000261530280113
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006585308001376688
Validation loss = 0.0011473969789221883
Validation loss = 0.0008834769250825047
Validation loss = 0.00108917651232332
Validation loss = 0.0009504797635599971
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.411   |
| Iteration     | 64       |
| MaximumReturn | -0.325   |
| MinimumReturn | -0.648   |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006485219928435981
Validation loss = 0.0005669123493134975
Validation loss = 0.0007469889242202044
Validation loss = 0.0008900358807295561
Validation loss = 0.0012936487328261137
Validation loss = 0.0009115642169490457
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006615983438678086
Validation loss = 0.0005405626725405455
Validation loss = 0.0006345831789076328
Validation loss = 0.0008894499624148011
Validation loss = 0.0011791588040068746
Validation loss = 0.0005457951338030398
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009144028299488127
Validation loss = 0.0008428952423855662
Validation loss = 0.0007948076236061752
Validation loss = 0.000748323043808341
Validation loss = 0.0007528034620918334
Validation loss = 0.0010592780308797956
Validation loss = 0.000766300072427839
Validation loss = 0.0009112749830819666
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007693859515711665
Validation loss = 0.0006993236602284014
Validation loss = 0.0008703144267201424
Validation loss = 0.0009390856139361858
Validation loss = 0.0007211352349258959
Validation loss = 0.0015366502339020371
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007774950936436653
Validation loss = 0.0005388864083215594
Validation loss = 0.00067459064302966
Validation loss = 0.0006596588063985109
Validation loss = 0.0005852464819326997
Validation loss = 0.0007614238420501351
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.565   |
| Iteration     | 65       |
| MaximumReturn | -0.309   |
| MinimumReturn | -1.04    |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000701685668900609
Validation loss = 0.0008079959079623222
Validation loss = 0.0006976007716730237
Validation loss = 0.0004579593369271606
Validation loss = 0.0006924983463250101
Validation loss = 0.0007186396978795528
Validation loss = 0.0006981782498769462
Validation loss = 0.0005406077834777534
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005831789458170533
Validation loss = 0.0007370446692220867
Validation loss = 0.0005047102458775043
Validation loss = 0.00045100541319698095
Validation loss = 0.00047025320236571133
Validation loss = 0.00048278755275532603
Validation loss = 0.00043335839291103184
Validation loss = 0.0007665000157430768
Validation loss = 0.0005766036338172853
Validation loss = 0.0014233124675229192
Validation loss = 0.0005219439044594765
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007661895360797644
Validation loss = 0.0005531799979507923
Validation loss = 0.001180345774628222
Validation loss = 0.0006943398038856685
Validation loss = 0.0005543702864088118
Validation loss = 0.0015762379625812173
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009462627349421382
Validation loss = 0.0005929756443947554
Validation loss = 0.0011412310414016247
Validation loss = 0.0009354784851893783
Validation loss = 0.0011038635857403278
Validation loss = 0.0010613396298140287
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002247471595183015
Validation loss = 0.0005475247744470835
Validation loss = 0.0006064492044970393
Validation loss = 0.000508901837747544
Validation loss = 0.0005320290802046657
Validation loss = 0.0005785593530163169
Validation loss = 0.0005026129656471312
Validation loss = 0.000684077967889607
Validation loss = 0.0005030466709285975
Validation loss = 0.0005867136060260236
Validation loss = 0.0005416037747636437
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00094  |
| Iteration     | 66        |
| MaximumReturn | -0.000673 |
| MinimumReturn | -0.00132  |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006536253495141864
Validation loss = 0.0011019810335710645
Validation loss = 0.0007963517564348876
Validation loss = 0.0012695521581918001
Validation loss = 0.0007504700333811343
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008943629218265414
Validation loss = 0.0007849751273170114
Validation loss = 0.0008486934239044785
Validation loss = 0.0005460248794406652
Validation loss = 0.00046040958841331303
Validation loss = 0.0008785954560153186
Validation loss = 0.000482629839098081
Validation loss = 0.0008775333990342915
Validation loss = 0.0005306940875016153
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008914793143048882
Validation loss = 0.0006668153801001608
Validation loss = 0.000767954217735678
Validation loss = 0.0006261334638111293
Validation loss = 0.0005593592650257051
Validation loss = 0.0004992525209672749
Validation loss = 0.0012736301869153976
Validation loss = 0.0010554216569289565
Validation loss = 0.0004943058593198657
Validation loss = 0.000775779306422919
Validation loss = 0.0009839445119723678
Validation loss = 0.0005649384111166
Validation loss = 0.000559340522158891
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009648518753238022
Validation loss = 0.0007217212696559727
Validation loss = 0.0007883302168920636
Validation loss = 0.0006243744865059853
Validation loss = 0.000865292560774833
Validation loss = 0.0005362041993066669
Validation loss = 0.0006868457421660423
Validation loss = 0.0006574662402272224
Validation loss = 0.00160952506121248
Validation loss = 0.0007523927488364279
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000619820726569742
Validation loss = 0.0006369418697431684
Validation loss = 0.0006260537193156779
Validation loss = 0.0004952329909428954
Validation loss = 0.0007017239113338292
Validation loss = 0.0005490739713422954
Validation loss = 0.0011353424051776528
Validation loss = 0.0006767140002921224
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0525   |
| Iteration     | 67        |
| MaximumReturn | -0.000514 |
| MinimumReturn | -0.429    |
| TotalSamples  | 114954    |
-----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005051206680946052
Validation loss = 0.0005500791012309492
Validation loss = 0.0008685050997883081
Validation loss = 0.000829680822789669
Validation loss = 0.0008169800275936723
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006253529572859406
Validation loss = 0.0005233996780589223
Validation loss = 0.000546123890671879
Validation loss = 0.0005391830345615745
Validation loss = 0.0005782886873930693
Validation loss = 0.000791116792242974
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000562193978112191
Validation loss = 0.0006115930154919624
Validation loss = 0.0006355667719617486
Validation loss = 0.0009826601017266512
Validation loss = 0.0007330781663767993
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005609822692349553
Validation loss = 0.001173931872472167
Validation loss = 0.0013658986426889896
Validation loss = 0.0005623920587822795
Validation loss = 0.0008090899209491909
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006224007229320705
Validation loss = 0.000625156273599714
Validation loss = 0.0008359014755114913
Validation loss = 0.00047025375533849
Validation loss = 0.0006241927039809525
Validation loss = 0.0010410267859697342
Validation loss = 0.00047553409240208566
Validation loss = 0.0005335705354809761
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0453   |
| Iteration     | 68        |
| MaximumReturn | -0.000682 |
| MinimumReturn | -0.353    |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001116218394599855
Validation loss = 0.0005409293225966394
Validation loss = 0.0010777699062600732
Validation loss = 0.0007948874845169485
Validation loss = 0.00046575444866903126
Validation loss = 0.0005793037125840783
Validation loss = 0.0013307866174727678
Validation loss = 0.0005103215225972235
Validation loss = 0.0005969175253994763
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006629395065829158
Validation loss = 0.001289129606448114
Validation loss = 0.00040205364348366857
Validation loss = 0.0006950929528102279
Validation loss = 0.0006048429640941322
Validation loss = 0.0008912612684071064
Validation loss = 0.0006777993985451758
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008740568300709128
Validation loss = 0.0006418328848667443
Validation loss = 0.000732643879018724
Validation loss = 0.0006676208577118814
Validation loss = 0.0005670636310242116
Validation loss = 0.0007771178497932851
Validation loss = 0.0007643879507668316
Validation loss = 0.0005022331606596708
Validation loss = 0.0006436851108446717
Validation loss = 0.0006681320955976844
Validation loss = 0.0005888310261070728
Validation loss = 0.0006164354272186756
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005417123902589083
Validation loss = 0.0012220401549711823
Validation loss = 0.0007228680187836289
Validation loss = 0.0006343250861391425
Validation loss = 0.0010241807904094458
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005181130254641175
Validation loss = 0.000637318822555244
Validation loss = 0.0007761329761706293
Validation loss = 0.0004978713113814592
Validation loss = 0.0005333539447747171
Validation loss = 0.001004416262730956
Validation loss = 0.0006202661897987127
Validation loss = 0.0005539465928450227
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0264   |
| Iteration     | 69        |
| MaximumReturn | -0.000668 |
| MinimumReturn | -0.449    |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007296046824194491
Validation loss = 0.0006068572984077036
Validation loss = 0.000744280347134918
Validation loss = 0.0006980111938901246
Validation loss = 0.0006100651226006448
Validation loss = 0.000556720478925854
Validation loss = 0.0009843677980825305
Validation loss = 0.0011937425006181002
Validation loss = 0.0005122193251736462
Validation loss = 0.0005382636445574462
Validation loss = 0.0007124789990484715
Validation loss = 0.000615054159425199
Validation loss = 0.0006185841630212963
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009068564977496862
Validation loss = 0.0006612600991502404
Validation loss = 0.0005108094774186611
Validation loss = 0.0004585148417390883
Validation loss = 0.000435841764556244
Validation loss = 0.0007011092966422439
Validation loss = 0.00045857313671149313
Validation loss = 0.0013003033818677068
Validation loss = 0.00047073213499970734
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001712821307592094
Validation loss = 0.0004564897099044174
Validation loss = 0.0006409808993339539
Validation loss = 0.0005187323549762368
Validation loss = 0.0006672574090771377
Validation loss = 0.0004951269365847111
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012936710845679045
Validation loss = 0.00046592295984737575
Validation loss = 0.0007871890556998551
Validation loss = 0.0010507038095965981
Validation loss = 0.0006901032174937427
Validation loss = 0.0006973603158257902
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004219623515382409
Validation loss = 0.0005964147276245058
Validation loss = 0.0005032999906688929
Validation loss = 0.0006333422497846186
Validation loss = 0.0005077914684079587
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0203  |
| Iteration     | 70       |
| MaximumReturn | -0.00067 |
| MinimumReturn | -0.154   |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004530196310952306
Validation loss = 0.0005830720183439553
Validation loss = 0.0006609093397855759
Validation loss = 0.0004098110948689282
Validation loss = 0.0006310645840130746
Validation loss = 0.0010857353918254375
Validation loss = 0.0006722114630974829
Validation loss = 0.00045948143815621734
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005465963040478528
Validation loss = 0.0011889274464920163
Validation loss = 0.0005439560627564788
Validation loss = 0.0018103097099810839
Validation loss = 0.0009249688591808081
Validation loss = 0.0003814389347098768
Validation loss = 0.00040193801396526396
Validation loss = 0.0003575221635401249
Validation loss = 0.0006695211632177234
Validation loss = 0.000369618006516248
Validation loss = 0.0008024457492865622
Validation loss = 0.0004991950117982924
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009768531890586019
Validation loss = 0.0005081665585748851
Validation loss = 0.0008291353587992489
Validation loss = 0.000493827392347157
Validation loss = 0.0006288199801929295
Validation loss = 0.0005596858682110906
Validation loss = 0.0008635689155198634
Validation loss = 0.0004919341881759465
Validation loss = 0.000474554457468912
Validation loss = 0.0005450634052976966
Validation loss = 0.00048590166261419654
Validation loss = 0.0008486746228300035
Validation loss = 0.0004699020937550813
Validation loss = 0.002172224223613739
Validation loss = 0.0005305540398694575
Validation loss = 0.0005430364399217069
Validation loss = 0.0009054054389707744
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007606819272041321
Validation loss = 0.0005482155829668045
Validation loss = 0.000986528117209673
Validation loss = 0.0011840425431728363
Validation loss = 0.0007539587095379829
Validation loss = 0.001018968177959323
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005146221374161541
Validation loss = 0.0007537903147749603
Validation loss = 0.0005317347240634263
Validation loss = 0.0008086869493126869
Validation loss = 0.0004798385489266366
Validation loss = 0.0005576397525146604
Validation loss = 0.0005376967019401491
Validation loss = 0.0006675963522866368
Validation loss = 0.0008900766842998564
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0948   |
| Iteration     | 71        |
| MaximumReturn | -0.000662 |
| MinimumReturn | -0.995    |
| TotalSamples  | 121618    |
-----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005324678495526314
Validation loss = 0.0005334301386028528
Validation loss = 0.0007054066518321633
Validation loss = 0.0006024549365974963
Validation loss = 0.0004219503898639232
Validation loss = 0.0004493693704716861
Validation loss = 0.0009000435238704085
Validation loss = 0.002873623976483941
Validation loss = 0.0004215656081214547
Validation loss = 0.0013379950542002916
Validation loss = 0.0007148431031964719
Validation loss = 0.001762552303262055
Validation loss = 0.0004131255263928324
Validation loss = 0.0005332074942998588
Validation loss = 0.0005368808633647859
Validation loss = 0.001329957041889429
Validation loss = 0.000775910506490618
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000505397270899266
Validation loss = 0.0009279656806029379
Validation loss = 0.0006139905890449882
Validation loss = 0.0008845907868817449
Validation loss = 0.0007221490959636867
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005269416142255068
Validation loss = 0.00041585712460801005
Validation loss = 0.0006440652650780976
Validation loss = 0.0007275961106643081
Validation loss = 0.0004923365777358413
Validation loss = 0.0007571833557449281
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008554650121368468
Validation loss = 0.0005377476918511093
Validation loss = 0.0004225576121825725
Validation loss = 0.0008207530481740832
Validation loss = 0.0006034552352502942
Validation loss = 0.0006833343068137765
Validation loss = 0.0006356949452310801
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004919759230688214
Validation loss = 0.000597260775975883
Validation loss = 0.0007000546320341527
Validation loss = 0.0005720751360058784
Validation loss = 0.0006956170545890927
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0278   |
| Iteration     | 72        |
| MaximumReturn | -0.000701 |
| MinimumReturn | -0.379    |
| TotalSamples  | 123284    |
-----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005546030006371439
Validation loss = 0.0005679301684722304
Validation loss = 0.0005793548189103603
Validation loss = 0.001021736767143011
Validation loss = 0.0004932164447382092
Validation loss = 0.0005681060720235109
Validation loss = 0.0005767859984189272
Validation loss = 0.0006596644525416195
Validation loss = 0.0007088448037393391
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000520095753017813
Validation loss = 0.0006723331171087921
Validation loss = 0.0005990616627968848
Validation loss = 0.0009361967677250504
Validation loss = 0.0006024480098858476
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00204042112454772
Validation loss = 0.00047697845729999244
Validation loss = 0.00043855822877958417
Validation loss = 0.0004278773267287761
Validation loss = 0.0005887033767066896
Validation loss = 0.0009260280057787895
Validation loss = 0.0007356596761383116
Validation loss = 0.0005455453647300601
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005522936698980629
Validation loss = 0.00045095759560354054
Validation loss = 0.0006942531326785684
Validation loss = 0.0005456347716972232
Validation loss = 0.0007789332885295153
Validation loss = 0.0004882191715296358
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000939561112318188
Validation loss = 0.0006071744719520211
Validation loss = 0.0020153343211859465
Validation loss = 0.000798767083324492
Validation loss = 0.002147739753127098
Validation loss = 0.0004883144283667207
Validation loss = 0.0006362430285662413
Validation loss = 0.0007950720028020442
Validation loss = 0.002696316922083497
Validation loss = 0.000388001324608922
Validation loss = 0.00048137002158910036
Validation loss = 0.0006765179568901658
Validation loss = 0.0009605536470189691
Validation loss = 0.0004601907858159393
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00884  |
| Iteration     | 73        |
| MaximumReturn | -0.000572 |
| MinimumReturn | -0.0848   |
| TotalSamples  | 124950    |
-----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0003885033365804702
Validation loss = 0.0010568188736215234
Validation loss = 0.0006481633172370493
Validation loss = 0.001171395997516811
Validation loss = 0.0008518521208316088
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007475688471458852
Validation loss = 0.0005976170650683343
Validation loss = 0.00036416889633983374
Validation loss = 0.0012312537292018533
Validation loss = 0.0008898371015675366
Validation loss = 0.0007117557106539607
Validation loss = 0.0020071123726665974
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005589264328591526
Validation loss = 0.0007598809897899628
Validation loss = 0.0006120596663095057
Validation loss = 0.0005295192240737379
Validation loss = 0.0004900764906778932
Validation loss = 0.0004856681916862726
Validation loss = 0.0004966623382642865
Validation loss = 0.0004901213687844574
Validation loss = 0.0008079527178779244
Validation loss = 0.0007512530428357422
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005508717149496078
Validation loss = 0.0004938546335324645
Validation loss = 0.000565132184419781
Validation loss = 0.0008420202066190541
Validation loss = 0.0007937822956591845
Validation loss = 0.0006154695875011384
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006919680745340884
Validation loss = 0.0006047806236892939
Validation loss = 0.0007925594691187143
Validation loss = 0.0004728494677692652
Validation loss = 0.0006202522199600935
Validation loss = 0.0007491699070669711
Validation loss = 0.0005240740720182657
Validation loss = 0.0016073908191174269
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00577  |
| Iteration     | 74        |
| MaximumReturn | -0.000658 |
| MinimumReturn | -0.0851   |
| TotalSamples  | 126616    |
-----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000581237196456641
Validation loss = 0.00041246809996664524
Validation loss = 0.0004633770731743425
Validation loss = 0.0005556644755415618
Validation loss = 0.0008352383738383651
Validation loss = 0.0004837285669054836
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010329432552680373
Validation loss = 0.000755476881749928
Validation loss = 0.0004441970377229154
Validation loss = 0.0006750600878149271
Validation loss = 0.0006475793197751045
Validation loss = 0.00046094844583421946
Validation loss = 0.0006121975020505488
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006186082027852535
Validation loss = 0.00066462776158005
Validation loss = 0.0004525519616436213
Validation loss = 0.0005869722226634622
Validation loss = 0.0005326055688783526
Validation loss = 0.0004081014485564083
Validation loss = 0.0008433076436631382
Validation loss = 0.000606513291131705
Validation loss = 0.0005723636131733656
Validation loss = 0.0006678961799480021
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0004751636879518628
Validation loss = 0.0005940095870755613
Validation loss = 0.0006873039528727531
Validation loss = 0.0005541898426599801
Validation loss = 0.0005666789948008955
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009548799716867507
Validation loss = 0.0004986728308722377
Validation loss = 0.0008028944139368832
Validation loss = 0.0004799026937689632
Validation loss = 0.0004557326319627464
Validation loss = 0.0005184205365367234
Validation loss = 0.001191176357679069
Validation loss = 0.0008691204711794853
Validation loss = 0.0009458057465963066
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000905 |
| Iteration     | 75        |
| MaximumReturn | -0.000664 |
| MinimumReturn | -0.00139  |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004406701191328466
Validation loss = 0.0005795133183710277
Validation loss = 0.0006117378361523151
Validation loss = 0.0009058403084054589
Validation loss = 0.0005227794172242284
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007892653811722994
Validation loss = 0.0008665442001074553
Validation loss = 0.0007426380179822445
Validation loss = 0.0020287930965423584
Validation loss = 0.0005749181727878749
Validation loss = 0.0008638821309432387
Validation loss = 0.0009276411728933454
Validation loss = 0.0005218576407060027
Validation loss = 0.0005996318650431931
Validation loss = 0.0005914790090173483
Validation loss = 0.0006765634170733392
Validation loss = 0.0005886921426281333
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008517911774106324
Validation loss = 0.000478331494377926
Validation loss = 0.0008798560011200607
Validation loss = 0.0005726640811190009
Validation loss = 0.0007692640065215528
Validation loss = 0.0008531856583431363
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006114586140029132
Validation loss = 0.000496751454193145
Validation loss = 0.0014376300387084484
Validation loss = 0.0006091457325965166
Validation loss = 0.0005596725968644023
Validation loss = 0.0009387157624587417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004469482519198209
Validation loss = 0.0007550743175670505
Validation loss = 0.0006223047967068851
Validation loss = 0.0010213416535407305
Validation loss = 0.0005026605213060975
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0196   |
| Iteration     | 76        |
| MaximumReturn | -0.000622 |
| MinimumReturn | -0.172    |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006188094848766923
Validation loss = 0.0010468646651133895
Validation loss = 0.000627657282166183
Validation loss = 0.0007758481660857797
Validation loss = 0.0006686320994049311
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004192190826870501
Validation loss = 0.0005521529819816351
Validation loss = 0.0006466871709562838
Validation loss = 0.00045805959962308407
Validation loss = 0.0004777224676217884
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007544682594016194
Validation loss = 0.0007038815529085696
Validation loss = 0.000421838863985613
Validation loss = 0.0004912980948574841
Validation loss = 0.00043284002458676696
Validation loss = 0.0005045334692113101
Validation loss = 0.00044089113362133503
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00048233510460704565
Validation loss = 0.0007229180191643536
Validation loss = 0.0005643511540256441
Validation loss = 0.0009729904704727232
Validation loss = 0.0004901019274257123
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005119311972521245
Validation loss = 0.0004527124110609293
Validation loss = 0.0005378480418585241
Validation loss = 0.0011216452112421393
Validation loss = 0.0009572351700626314
Validation loss = 0.0004696448449976742
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0109   |
| Iteration     | 77        |
| MaximumReturn | -0.000698 |
| MinimumReturn | -0.16     |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007764239562675357
Validation loss = 0.0004904823144897819
Validation loss = 0.0005564107559621334
Validation loss = 0.0008685427019372582
Validation loss = 0.0004046153917443007
Validation loss = 0.0006134161958470941
Validation loss = 0.0005832293536514044
Validation loss = 0.00045387292630039155
Validation loss = 0.0007838701130822301
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006927213398739696
Validation loss = 0.00035959985689260066
Validation loss = 0.0004283433663658798
Validation loss = 0.00041504186810925603
Validation loss = 0.0005457927472889423
Validation loss = 0.0004840138426516205
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004892387078143656
Validation loss = 0.0009281146922148764
Validation loss = 0.0005025506252422929
Validation loss = 0.0010697846300899982
Validation loss = 0.0013260067207738757
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006553376442752779
Validation loss = 0.000911045353859663
Validation loss = 0.0005663408082909882
Validation loss = 0.0005733832367695868
Validation loss = 0.0009043681784532964
Validation loss = 0.0005008596344850957
Validation loss = 0.0004199592221993953
Validation loss = 0.0008057707455009222
Validation loss = 0.0007491200813092291
Validation loss = 0.0005401631933636963
Validation loss = 0.0022338649723678827
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006753652123734355
Validation loss = 0.00044599175453186035
Validation loss = 0.0008597160340286791
Validation loss = 0.0005672480328939855
Validation loss = 0.0005124437157064676
Validation loss = 0.0008545659948140383
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0078   |
| Iteration     | 78        |
| MaximumReturn | -0.000748 |
| MinimumReturn | -0.128    |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004421033663675189
Validation loss = 0.000422291544964537
Validation loss = 0.000898537808097899
Validation loss = 0.0016029507387429476
Validation loss = 0.00048030505422502756
Validation loss = 0.0005602987948805094
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005863219848833978
Validation loss = 0.00046062792534939945
Validation loss = 0.0010366193018853664
Validation loss = 0.0007322861929424107
Validation loss = 0.0004985976265743375
Validation loss = 0.0006218172493390739
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004730994696728885
Validation loss = 0.0005315840244293213
Validation loss = 0.00044114113552495837
Validation loss = 0.0004176435468252748
Validation loss = 0.0006861371221020818
Validation loss = 0.0007057701586745679
Validation loss = 0.000843993213493377
Validation loss = 0.0006182492361404002
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006475907284766436
Validation loss = 0.000435168418334797
Validation loss = 0.000844003923702985
Validation loss = 0.0007519364007748663
Validation loss = 0.0008852607570588589
Validation loss = 0.000603923574090004
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005536547978408635
Validation loss = 0.0004394233110360801
Validation loss = 0.0005178343853913248
Validation loss = 0.00048393101315014064
Validation loss = 0.0007493000593967736
Validation loss = 0.0006397843244485557
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00631  |
| Iteration     | 79        |
| MaximumReturn | -0.000628 |
| MinimumReturn | -0.0589   |
| TotalSamples  | 134946    |
-----------------------------
