Logging to experiments/invertedPendulum/invertedPendulum/Mon-21-Nov-2022-03-21-48-PM-CST_invertedPendulum_trpo_iteration_20_seed2531
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7403696775436401
Validation loss = 0.4324909746646881
Validation loss = 0.40178030729293823
Validation loss = 0.3495664894580841
Validation loss = 0.3431769609451294
Validation loss = 0.31206050515174866
Validation loss = 0.28673186898231506
Validation loss = 0.26125651597976685
Validation loss = 0.25953465700149536
Validation loss = 0.2328871786594391
Validation loss = 0.24493521451950073
Validation loss = 0.24551133811473846
Validation loss = 0.2028469741344452
Validation loss = 0.1855122298002243
Validation loss = 0.17703646421432495
Validation loss = 0.17650675773620605
Validation loss = 0.16918160021305084
Validation loss = 0.1660807877779007
Validation loss = 0.1439279019832611
Validation loss = 0.13187064230442047
Validation loss = 0.12438046187162399
Validation loss = 0.12203721702098846
Validation loss = 0.11837977916002274
Validation loss = 0.12614013254642487
Validation loss = 0.12165239453315735
Validation loss = 0.12222232669591904
Validation loss = 0.10820697247982025
Validation loss = 0.11269523203372955
Validation loss = 0.10484295338392258
Validation loss = 0.10012102872133255
Validation loss = 0.10539890825748444
Validation loss = 0.10378220677375793
Validation loss = 0.09672971814870834
Validation loss = 0.08856172114610672
Validation loss = 0.08437073975801468
Validation loss = 0.09369233250617981
Validation loss = 0.08918122947216034
Validation loss = 0.09304571896791458
Validation loss = 0.09119898080825806
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7183374762535095
Validation loss = 0.42226913571357727
Validation loss = 0.38568276166915894
Validation loss = 0.35809561610221863
Validation loss = 0.34304869174957275
Validation loss = 0.3112126886844635
Validation loss = 0.2866155505180359
Validation loss = 0.2510749399662018
Validation loss = 0.2369244545698166
Validation loss = 0.2515285909175873
Validation loss = 0.21593734622001648
Validation loss = 0.2081567794084549
Validation loss = 0.19789797067642212
Validation loss = 0.18869321048259735
Validation loss = 0.1730426698923111
Validation loss = 0.16014313697814941
Validation loss = 0.16550925374031067
Validation loss = 0.14215849339962006
Validation loss = 0.14387346804141998
Validation loss = 0.1340627372264862
Validation loss = 0.14714911580085754
Validation loss = 0.13881054520606995
Validation loss = 0.13397181034088135
Validation loss = 0.13475976884365082
Validation loss = 0.126143217086792
Validation loss = 0.1285729706287384
Validation loss = 0.12233935296535492
Validation loss = 0.13876168429851532
Validation loss = 0.13203652203083038
Validation loss = 0.12695732712745667
Validation loss = 0.12818391621112823
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7185470461845398
Validation loss = 0.4545716345310211
Validation loss = 0.3884587585926056
Validation loss = 0.369038850069046
Validation loss = 0.35317274928092957
Validation loss = 0.33207330107688904
Validation loss = 0.31993886828422546
Validation loss = 0.2893202304840088
Validation loss = 0.2489776313304901
Validation loss = 0.2415103316307068
Validation loss = 0.2099839150905609
Validation loss = 0.24980953335762024
Validation loss = 0.20865687727928162
Validation loss = 0.2136618047952652
Validation loss = 0.19262999296188354
Validation loss = 0.18109260499477386
Validation loss = 0.17062294483184814
Validation loss = 0.1587725281715393
Validation loss = 0.15039338171482086
Validation loss = 0.1412326842546463
Validation loss = 0.140515998005867
Validation loss = 0.12329306453466415
Validation loss = 0.12236886471509933
Validation loss = 0.12379041314125061
Validation loss = 0.13122323155403137
Validation loss = 0.11817855387926102
Validation loss = 0.13614603877067566
Validation loss = 0.1389884352684021
Validation loss = 0.11812303215265274
Validation loss = 0.11805126816034317
Validation loss = 0.12195627391338348
Validation loss = 0.11984768509864807
Validation loss = 0.1077505350112915
Validation loss = 0.09896667301654816
Validation loss = 0.09702806174755096
Validation loss = 0.08987483382225037
Validation loss = 0.1011199951171875
Validation loss = 0.1000448614358902
Validation loss = 0.10439395904541016
Validation loss = 0.08884509652853012
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7323066592216492
Validation loss = 0.4292169213294983
Validation loss = 0.3898681700229645
Validation loss = 0.3763660192489624
Validation loss = 0.354573130607605
Validation loss = 0.3409181535243988
Validation loss = 0.30015358328819275
Validation loss = 0.2945927083492279
Validation loss = 0.26372820138931274
Validation loss = 0.26516103744506836
Validation loss = 0.2269495278596878
Validation loss = 0.21535708010196686
Validation loss = 0.21111489832401276
Validation loss = 0.184263676404953
Validation loss = 0.20129556953907013
Validation loss = 0.18144865334033966
Validation loss = 0.17462360858917236
Validation loss = 0.14781701564788818
Validation loss = 0.1431846171617508
Validation loss = 0.1280139982700348
Validation loss = 0.12502384185791016
Validation loss = 0.12281691282987595
Validation loss = 0.11978057771921158
Validation loss = 0.11105483770370483
Validation loss = 0.10656508058309555
Validation loss = 0.10267842561006546
Validation loss = 0.10447287559509277
Validation loss = 0.09613053500652313
Validation loss = 0.10627623647451401
Validation loss = 0.10232295095920563
Validation loss = 0.10239934921264648
Validation loss = 0.09848971664905548
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7010700106620789
Validation loss = 0.4241600036621094
Validation loss = 0.40875041484832764
Validation loss = 0.3567846715450287
Validation loss = 0.3436986207962036
Validation loss = 0.329889178276062
Validation loss = 0.29426005482673645
Validation loss = 0.27348196506500244
Validation loss = 0.25062739849090576
Validation loss = 0.23199713230133057
Validation loss = 0.23807835578918457
Validation loss = 0.20131026208400726
Validation loss = 0.19522438943386078
Validation loss = 0.19585610926151276
Validation loss = 0.18692217767238617
Validation loss = 0.17891457676887512
Validation loss = 0.18379074335098267
Validation loss = 0.16827720403671265
Validation loss = 0.15434254705905914
Validation loss = 0.139096200466156
Validation loss = 0.13191549479961395
Validation loss = 0.1181398332118988
Validation loss = 0.11675488948822021
Validation loss = 0.1149510070681572
Validation loss = 0.12194772809743881
Validation loss = 0.13286687433719635
Validation loss = 0.11947257816791534
Validation loss = 0.12794256210327148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0187  |
| Iteration     | 0        |
| MaximumReturn | -0.0116  |
| MinimumReturn | -0.0282  |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3305096924304962
Validation loss = 0.23020824790000916
Validation loss = 0.20246417820453644
Validation loss = 0.1528116762638092
Validation loss = 0.1300317645072937
Validation loss = 0.13009889423847198
Validation loss = 0.12048060446977615
Validation loss = 0.11072158813476562
Validation loss = 0.10238412767648697
Validation loss = 0.09420281648635864
Validation loss = 0.09805579483509064
Validation loss = 0.09367071092128754
Validation loss = 0.10359107702970505
Validation loss = 0.10723532736301422
Validation loss = 0.10330752283334732
Validation loss = 0.09378739446401596
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.29556286334991455
Validation loss = 0.22085927426815033
Validation loss = 0.18923430144786835
Validation loss = 0.1699627935886383
Validation loss = 0.13420727849006653
Validation loss = 0.1199832633137703
Validation loss = 0.15182934701442719
Validation loss = 0.11377213895320892
Validation loss = 0.13159061968326569
Validation loss = 0.11143200844526291
Validation loss = 0.1258927285671234
Validation loss = 0.11445614695549011
Validation loss = 0.1217159628868103
Validation loss = 0.10788024216890335
Validation loss = 0.10941692441701889
Validation loss = 0.10596071183681488
Validation loss = 0.14122961461544037
Validation loss = 0.1367543488740921
Validation loss = 0.1160593032836914
Validation loss = 0.11890420317649841
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.363423228263855
Validation loss = 0.2384834587574005
Validation loss = 0.20799441635608673
Validation loss = 0.18183355033397675
Validation loss = 0.15524981915950775
Validation loss = 0.1429843008518219
Validation loss = 0.1213654950261116
Validation loss = 0.12086576968431473
Validation loss = 0.11486871540546417
Validation loss = 0.15911267697811127
Validation loss = 0.13377392292022705
Validation loss = 0.12593068182468414
Validation loss = 0.09786297380924225
Validation loss = 0.10466964542865753
Validation loss = 0.10801141709089279
Validation loss = 0.1131068542599678
Validation loss = 0.09899025410413742
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3267417550086975
Validation loss = 0.18880584836006165
Validation loss = 0.1735686957836151
Validation loss = 0.14422723650932312
Validation loss = 0.14580751955509186
Validation loss = 0.12454673647880554
Validation loss = 0.12468858808279037
Validation loss = 0.10408449918031693
Validation loss = 0.09930302202701569
Validation loss = 0.10165338218212128
Validation loss = 0.10103244334459305
Validation loss = 0.10871845483779907
Validation loss = 0.10407716035842896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3478143513202667
Validation loss = 0.22669446468353271
Validation loss = 0.19876728951931
Validation loss = 0.1780777871608734
Validation loss = 0.1562153697013855
Validation loss = 0.1440035104751587
Validation loss = 0.14598846435546875
Validation loss = 0.12972474098205566
Validation loss = 0.11828380078077316
Validation loss = 0.12412312626838684
Validation loss = 0.1258295476436615
Validation loss = 0.11937100440263748
Validation loss = 0.11121945083141327
Validation loss = 0.11540690809488297
Validation loss = 0.12817704677581787
Validation loss = 0.11849403381347656
Validation loss = 0.11567451804876328
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00265 |
| Iteration     | 1        |
| MaximumReturn | -0.00177 |
| MinimumReturn | -0.00369 |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09665504842996597
Validation loss = 0.07540053874254227
Validation loss = 0.06770972162485123
Validation loss = 0.06591706722974777
Validation loss = 0.06479871273040771
Validation loss = 0.08051066100597382
Validation loss = 0.06174284219741821
Validation loss = 0.058409400284290314
Validation loss = 0.0691097155213356
Validation loss = 0.0608903244137764
Validation loss = 0.060674797743558884
Validation loss = 0.05850997567176819
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11448825895786285
Validation loss = 0.08340863883495331
Validation loss = 0.08437279611825943
Validation loss = 0.07174196094274521
Validation loss = 0.07539370656013489
Validation loss = 0.07545913755893707
Validation loss = 0.06877061724662781
Validation loss = 0.06973907351493835
Validation loss = 0.07118961960077286
Validation loss = 0.07030170410871506
Validation loss = 0.07815039157867432
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13325618207454681
Validation loss = 0.09751248359680176
Validation loss = 0.08525808900594711
Validation loss = 0.08934483677148819
Validation loss = 0.08384329080581665
Validation loss = 0.06319405883550644
Validation loss = 0.07099448144435883
Validation loss = 0.0569220632314682
Validation loss = 0.06620314717292786
Validation loss = 0.06522899866104126
Validation loss = 0.07177447527647018
Validation loss = 0.05826282873749733
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09440393000841141
Validation loss = 0.07604850828647614
Validation loss = 0.07559359073638916
Validation loss = 0.0676681399345398
Validation loss = 0.06243853643536568
Validation loss = 0.06135888397693634
Validation loss = 0.06847324222326279
Validation loss = 0.06781422346830368
Validation loss = 0.05361338332295418
Validation loss = 0.05382903665304184
Validation loss = 0.05741336569190025
Validation loss = 0.05253113433718681
Validation loss = 0.05970217287540436
Validation loss = 0.052977681159973145
Validation loss = 0.05332081764936447
Validation loss = 0.04924032464623451
Validation loss = 0.05616479739546776
Validation loss = 0.05189742147922516
Validation loss = 0.04904373362660408
Validation loss = 0.05938735976815224
Validation loss = 0.05216748267412186
Validation loss = 0.05060931667685509
Validation loss = 0.04937688633799553
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08476700633764267
Validation loss = 0.08444381505250931
Validation loss = 0.07332789897918701
Validation loss = 0.05868683010339737
Validation loss = 0.05748667195439339
Validation loss = 0.0616859532892704
Validation loss = 0.06321100145578384
Validation loss = 0.06321427971124649
Validation loss = 0.0636356770992279
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00212 |
| Iteration     | 2        |
| MaximumReturn | -0.00156 |
| MinimumReturn | -0.0026  |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06494971364736557
Validation loss = 0.04386425390839577
Validation loss = 0.040381692349910736
Validation loss = 0.05673277750611305
Validation loss = 0.04670359566807747
Validation loss = 0.04575434327125549
Validation loss = 0.044840890914201736
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05598055198788643
Validation loss = 0.05128190293908119
Validation loss = 0.043504487723112106
Validation loss = 0.05215206742286682
Validation loss = 0.04606761410832405
Validation loss = 0.048170242458581924
Validation loss = 0.05252208188176155
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04842338338494301
Validation loss = 0.042516034096479416
Validation loss = 0.04380128160119057
Validation loss = 0.04554362967610359
Validation loss = 0.0435909777879715
Validation loss = 0.05080394074320793
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05260494351387024
Validation loss = 0.04755031690001488
Validation loss = 0.04108992591500282
Validation loss = 0.035006407648324966
Validation loss = 0.03683364763855934
Validation loss = 0.03211718052625656
Validation loss = 0.03380743786692619
Validation loss = 0.03763236477971077
Validation loss = 0.0314897857606411
Validation loss = 0.03498736768960953
Validation loss = 0.03729889541864395
Validation loss = 0.03302082046866417
Validation loss = 0.03228994086384773
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07710319012403488
Validation loss = 0.05366268754005432
Validation loss = 0.049002762883901596
Validation loss = 0.043331731110811234
Validation loss = 0.04346352443099022
Validation loss = 0.05003456771373749
Validation loss = 0.041093628853559494
Validation loss = 0.052177559584379196
Validation loss = 0.043682027608156204
Validation loss = 0.047918710857629776
Validation loss = 0.04446086660027504
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000743 |
| Iteration     | 3         |
| MaximumReturn | -0.000547 |
| MinimumReturn | -0.00105  |
| TotalSamples  | 8330      |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03890557587146759
Validation loss = 0.036906227469444275
Validation loss = 0.038970548659563065
Validation loss = 0.03970232605934143
Validation loss = 0.03928213194012642
Validation loss = 0.03869818150997162
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.041214682161808014
Validation loss = 0.04329612851142883
Validation loss = 0.03779037669301033
Validation loss = 0.039302073419094086
Validation loss = 0.03949418291449547
Validation loss = 0.042788513004779816
Validation loss = 0.0418325699865818
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.049527909606695175
Validation loss = 0.03841278329491615
Validation loss = 0.03436584770679474
Validation loss = 0.031238123774528503
Validation loss = 0.04075223580002785
Validation loss = 0.04371993988752365
Validation loss = 0.03639819845557213
Validation loss = 0.04181595891714096
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.036476098001003265
Validation loss = 0.034339018166065216
Validation loss = 0.0285834688693285
Validation loss = 0.04395309090614319
Validation loss = 0.03978729993104935
Validation loss = 0.033367861062288284
Validation loss = 0.0342709943652153
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.055918194353580475
Validation loss = 0.03815225884318352
Validation loss = 0.0395120345056057
Validation loss = 0.03721214830875397
Validation loss = 0.04114139825105667
Validation loss = 0.04210793972015381
Validation loss = 0.03934495523571968
Validation loss = 0.03770508989691734
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000818 |
| Iteration     | 4         |
| MaximumReturn | -0.000626 |
| MinimumReturn | -0.00102  |
| TotalSamples  | 9996      |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03126909211277962
Validation loss = 0.04243091121315956
Validation loss = 0.04145779460668564
Validation loss = 0.044354163110256195
Validation loss = 0.050038449466228485
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04010938107967377
Validation loss = 0.044746462255716324
Validation loss = 0.03929669409990311
Validation loss = 0.03712187707424164
Validation loss = 0.04397151991724968
Validation loss = 0.03850479796528816
Validation loss = 0.0413593128323555
Validation loss = 0.04234629124403
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.034294743090867996
Validation loss = 0.04065123572945595
Validation loss = 0.03059886023402214
Validation loss = 0.032924290746450424
Validation loss = 0.033477358520030975
Validation loss = 0.042138781398534775
Validation loss = 0.032013874500989914
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03234202414751053
Validation loss = 0.02766099013388157
Validation loss = 0.03784774988889694
Validation loss = 0.03434136137366295
Validation loss = 0.031760234385728836
Validation loss = 0.032392989844083786
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03900566324591637
Validation loss = 0.040739960968494415
Validation loss = 0.03475173935294151
Validation loss = 0.034824833273887634
Validation loss = 0.035437360405921936
Validation loss = 0.03940641134977341
Validation loss = 0.042871154844760895
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000754 |
| Iteration     | 5         |
| MaximumReturn | -0.000603 |
| MinimumReturn | -0.000997 |
| TotalSamples  | 11662     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04899708181619644
Validation loss = 0.04230036213994026
Validation loss = 0.039637576788663864
Validation loss = 0.04204198345541954
Validation loss = 0.04479757696390152
Validation loss = 0.03941354900598526
Validation loss = 0.04140724241733551
Validation loss = 0.039133209735155106
Validation loss = 0.042018286883831024
Validation loss = 0.03898162767291069
Validation loss = 0.0378313809633255
Validation loss = 0.03371848538517952
Validation loss = 0.04226914793252945
Validation loss = 0.04174868017435074
Validation loss = 0.03847065940499306
Validation loss = 0.03402069956064224
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04344182834029198
Validation loss = 0.045037876814603806
Validation loss = 0.04411394149065018
Validation loss = 0.04050009325146675
Validation loss = 0.04694249480962753
Validation loss = 0.03798753395676613
Validation loss = 0.05229602009057999
Validation loss = 0.04487266391515732
Validation loss = 0.04155656695365906
Validation loss = 0.04500366374850273
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03953900188207626
Validation loss = 0.04460424929857254
Validation loss = 0.04292697831988335
Validation loss = 0.039231278002262115
Validation loss = 0.042371056973934174
Validation loss = 0.036665357649326324
Validation loss = 0.03365352004766464
Validation loss = 0.03893519937992096
Validation loss = 0.0364467091858387
Validation loss = 0.039801474660634995
Validation loss = 0.03160112351179123
Validation loss = 0.04028266668319702
Validation loss = 0.04042840749025345
Validation loss = 0.04167000204324722
Validation loss = 0.039903655648231506
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03938871994614601
Validation loss = 0.03650245815515518
Validation loss = 0.033294547349214554
Validation loss = 0.03332803398370743
Validation loss = 0.03061407245695591
Validation loss = 0.04183756560087204
Validation loss = 0.03881955146789551
Validation loss = 0.029364997521042824
Validation loss = 0.04027603566646576
Validation loss = 0.038377128541469574
Validation loss = 0.03683546930551529
Validation loss = 0.03468870744109154
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04400887340307236
Validation loss = 0.040898215025663376
Validation loss = 0.03910139203071594
Validation loss = 0.04148600250482559
Validation loss = 0.03880960866808891
Validation loss = 0.04950062185525894
Validation loss = 0.049216169863939285
Validation loss = 0.03961526229977608
Validation loss = 0.046743541955947876
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000831 |
| Iteration     | 6         |
| MaximumReturn | -0.000595 |
| MinimumReturn | -0.00143  |
| TotalSamples  | 13328     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.036828216165304184
Validation loss = 0.03777458891272545
Validation loss = 0.036945369094610214
Validation loss = 0.0337333120405674
Validation loss = 0.03278112784028053
Validation loss = 0.02900790609419346
Validation loss = 0.03594354912638664
Validation loss = 0.03842044621706009
Validation loss = 0.030856460332870483
Validation loss = 0.03201085329055786
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.046914875507354736
Validation loss = 0.035165105015039444
Validation loss = 0.041107963770627975
Validation loss = 0.03295498713850975
Validation loss = 0.04140303283929825
Validation loss = 0.03298329934477806
Validation loss = 0.03248994052410126
Validation loss = 0.0300118550658226
Validation loss = 0.03387458994984627
Validation loss = 0.043694570660591125
Validation loss = 0.029545994475483894
Validation loss = 0.03135426715016365
Validation loss = 0.02759399451315403
Validation loss = 0.03523951396346092
Validation loss = 0.034379031509160995
Validation loss = 0.027874404564499855
Validation loss = 0.02937108837068081
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.034820590168237686
Validation loss = 0.029797764495015144
Validation loss = 0.03324756771326065
Validation loss = 0.033627357333898544
Validation loss = 0.036131612956523895
Validation loss = 0.046297017484903336
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03547811508178711
Validation loss = 0.030616452917456627
Validation loss = 0.029308542609214783
Validation loss = 0.032332245260477066
Validation loss = 0.03246195614337921
Validation loss = 0.027499042451381683
Validation loss = 0.023813968524336815
Validation loss = 0.031459301710128784
Validation loss = 0.026675792410969734
Validation loss = 0.03642868623137474
Validation loss = 0.0258666779845953
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04004208371043205
Validation loss = 0.03462040796875954
Validation loss = 0.032077934592962265
Validation loss = 0.0515415258705616
Validation loss = 0.03199949115514755
Validation loss = 0.031939368695020676
Validation loss = 0.03211534768342972
Validation loss = 0.028792770579457283
Validation loss = 0.02861226163804531
Validation loss = 0.028515785932540894
Validation loss = 0.033532749861478806
Validation loss = 0.03647705540060997
Validation loss = 0.029944589361548424
Validation loss = 0.031206028535962105
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000814 |
| Iteration     | 7         |
| MaximumReturn | -0.000518 |
| MinimumReturn | -0.00113  |
| TotalSamples  | 14994     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03260304778814316
Validation loss = 0.02329200692474842
Validation loss = 0.029320599511265755
Validation loss = 0.02263179048895836
Validation loss = 0.022812126204371452
Validation loss = 0.023616448044776917
Validation loss = 0.027174832299351692
Validation loss = 0.02386944554746151
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026623031124472618
Validation loss = 0.028242239728569984
Validation loss = 0.02303791604936123
Validation loss = 0.021717693656682968
Validation loss = 0.030878927558660507
Validation loss = 0.028435947373509407
Validation loss = 0.027902569621801376
Validation loss = 0.02461629919707775
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0486505888402462
Validation loss = 0.028639012947678566
Validation loss = 0.025871681049466133
Validation loss = 0.03912277892231941
Validation loss = 0.024852341040968895
Validation loss = 0.023858381435275078
Validation loss = 0.026360558345913887
Validation loss = 0.025205742567777634
Validation loss = 0.027692539617419243
Validation loss = 0.02478352189064026
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02121909335255623
Validation loss = 0.021261628717184067
Validation loss = 0.025428036227822304
Validation loss = 0.022111738100647926
Validation loss = 0.02728729508817196
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02784775011241436
Validation loss = 0.02228449657559395
Validation loss = 0.02396107278764248
Validation loss = 0.0281086266040802
Validation loss = 0.022445034235715866
Validation loss = 0.026510002091526985
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000767 |
| Iteration     | 8         |
| MaximumReturn | -0.000616 |
| MinimumReturn | -0.00098  |
| TotalSamples  | 16660     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021992236375808716
Validation loss = 0.02745652385056019
Validation loss = 0.027334406971931458
Validation loss = 0.02381272427737713
Validation loss = 0.027723683044314384
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022634124383330345
Validation loss = 0.021261222660541534
Validation loss = 0.020407725125551224
Validation loss = 0.021587328985333443
Validation loss = 0.02227902226150036
Validation loss = 0.024577004835009575
Validation loss = 0.020182717591524124
Validation loss = 0.029727712273597717
Validation loss = 0.026750408113002777
Validation loss = 0.02297499030828476
Validation loss = 0.02376735955476761
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.029044706374406815
Validation loss = 0.02991451695561409
Validation loss = 0.027219224721193314
Validation loss = 0.02512892708182335
Validation loss = 0.024852916598320007
Validation loss = 0.024315889924764633
Validation loss = 0.023795612156391144
Validation loss = 0.025514187291264534
Validation loss = 0.026234768331050873
Validation loss = 0.02192104049026966
Validation loss = 0.01904682070016861
Validation loss = 0.022247206419706345
Validation loss = 0.02134104073047638
Validation loss = 0.02214350923895836
Validation loss = 0.024293871596455574
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027840614318847656
Validation loss = 0.019181756302714348
Validation loss = 0.022678501904010773
Validation loss = 0.02573345974087715
Validation loss = 0.019775183871388435
Validation loss = 0.02238900400698185
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020454691722989082
Validation loss = 0.023495908826589584
Validation loss = 0.026045575737953186
Validation loss = 0.023460285738110542
Validation loss = 0.027776706963777542
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00073  |
| Iteration     | 9         |
| MaximumReturn | -0.000566 |
| MinimumReturn | -0.000881 |
| TotalSamples  | 18326     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.034230466932058334
Validation loss = 0.023050449788570404
Validation loss = 0.022631991654634476
Validation loss = 0.021702565252780914
Validation loss = 0.022687597200274467
Validation loss = 0.022070392966270447
Validation loss = 0.01936490833759308
Validation loss = 0.030069658532738686
Validation loss = 0.022839238867163658
Validation loss = 0.023682620376348495
Validation loss = 0.024226723238825798
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0249614454805851
Validation loss = 0.023904135450720787
Validation loss = 0.02350234054028988
Validation loss = 0.0221702940762043
Validation loss = 0.023195330053567886
Validation loss = 0.020270219072699547
Validation loss = 0.021883226931095123
Validation loss = 0.021621674299240112
Validation loss = 0.023281658068299294
Validation loss = 0.023522982373833656
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023611946031451225
Validation loss = 0.02259724587202072
Validation loss = 0.02604006417095661
Validation loss = 0.02431940659880638
Validation loss = 0.02763046696782112
Validation loss = 0.0187209639698267
Validation loss = 0.026000505313277245
Validation loss = 0.027193041518330574
Validation loss = 0.02278929203748703
Validation loss = 0.02136136032640934
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01982548087835312
Validation loss = 0.02211700938642025
Validation loss = 0.01968557946383953
Validation loss = 0.024210447445511818
Validation loss = 0.0218642670661211
Validation loss = 0.020775973796844482
Validation loss = 0.020078293979167938
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02813766710460186
Validation loss = 0.02121972292661667
Validation loss = 0.026368647813796997
Validation loss = 0.020420774817466736
Validation loss = 0.020937254652380943
Validation loss = 0.019423341378569603
Validation loss = 0.022364541888237
Validation loss = 0.018882233649492264
Validation loss = 0.020878052338957787
Validation loss = 0.017255563288927078
Validation loss = 0.020927220582962036
Validation loss = 0.018813801929354668
Validation loss = 0.021696580573916435
Validation loss = 0.01849345490336418
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000761 |
| Iteration     | 10        |
| MaximumReturn | -0.000557 |
| MinimumReturn | -0.00099  |
| TotalSamples  | 19992     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023250823840498924
Validation loss = 0.025039341300725937
Validation loss = 0.01836547814309597
Validation loss = 0.020635267719626427
Validation loss = 0.021621275693178177
Validation loss = 0.018734581768512726
Validation loss = 0.021490078419446945
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021020084619522095
Validation loss = 0.021098650991916656
Validation loss = 0.019172923639416695
Validation loss = 0.02165367268025875
Validation loss = 0.020293056964874268
Validation loss = 0.019920114427804947
Validation loss = 0.024672243744134903
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02207297272980213
Validation loss = 0.01914013735949993
Validation loss = 0.02867082878947258
Validation loss = 0.021350938826799393
Validation loss = 0.01921999454498291
Validation loss = 0.021125372499227524
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0202217735350132
Validation loss = 0.028792228549718857
Validation loss = 0.02578912116587162
Validation loss = 0.03115043416619301
Validation loss = 0.02464582398533821
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021808668971061707
Validation loss = 0.028185483068227768
Validation loss = 0.021021025255322456
Validation loss = 0.0191192664206028
Validation loss = 0.026227230206131935
Validation loss = 0.01883077248930931
Validation loss = 0.023704303428530693
Validation loss = 0.0216640867292881
Validation loss = 0.020448606461286545
Validation loss = 0.018240544945001602
Validation loss = 0.01992637664079666
Validation loss = 0.01852419599890709
Validation loss = 0.01974603906273842
Validation loss = 0.01779291406273842
Validation loss = 0.021757761016488075
Validation loss = 0.017206251621246338
Validation loss = 0.01773480698466301
Validation loss = 0.022666173055768013
Validation loss = 0.017109861597418785
Validation loss = 0.020528819411993027
Validation loss = 0.019858410581946373
Validation loss = 0.017654499039053917
Validation loss = 0.018432702869176865
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 11        |
| MaximumReturn | -0.000538 |
| MinimumReturn | -0.0027   |
| TotalSamples  | 21658     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018677538260817528
Validation loss = 0.016567090526223183
Validation loss = 0.01970958709716797
Validation loss = 0.01674247533082962
Validation loss = 0.02297132834792137
Validation loss = 0.022029150277376175
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020889218896627426
Validation loss = 0.0179294403642416
Validation loss = 0.017583508044481277
Validation loss = 0.01501443050801754
Validation loss = 0.016557414084672928
Validation loss = 0.016284847632050514
Validation loss = 0.017247384414076805
Validation loss = 0.015606753528118134
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023119574412703514
Validation loss = 0.016606314107775688
Validation loss = 0.015419738367199898
Validation loss = 0.019654886797070503
Validation loss = 0.018884535878896713
Validation loss = 0.017264392226934433
Validation loss = 0.015476365573704243
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02358008548617363
Validation loss = 0.019170129671692848
Validation loss = 0.01828751526772976
Validation loss = 0.018298733979463577
Validation loss = 0.013827587477862835
Validation loss = 0.016582394018769264
Validation loss = 0.017793748527765274
Validation loss = 0.015529167838394642
Validation loss = 0.01499895565211773
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01824096590280533
Validation loss = 0.01680474542081356
Validation loss = 0.015534764155745506
Validation loss = 0.014654124155640602
Validation loss = 0.014018045738339424
Validation loss = 0.013862247578799725
Validation loss = 0.015060404315590858
Validation loss = 0.015053493902087212
Validation loss = 0.01889878883957863
Validation loss = 0.016685370355844498
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00103  |
| Iteration     | 12        |
| MaximumReturn | -0.000558 |
| MinimumReturn | -0.00246  |
| TotalSamples  | 23324     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01718008704483509
Validation loss = 0.014012870378792286
Validation loss = 0.0155557282269001
Validation loss = 0.014934617094695568
Validation loss = 0.016719060018658638
Validation loss = 0.012706886976957321
Validation loss = 0.01266616303473711
Validation loss = 0.014038065448403358
Validation loss = 0.014487412758171558
Validation loss = 0.015081848949193954
Validation loss = 0.0170403104275465
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013620266690850258
Validation loss = 0.012927009724080563
Validation loss = 0.016788698732852936
Validation loss = 0.015280384570360184
Validation loss = 0.012145662680268288
Validation loss = 0.014154299162328243
Validation loss = 0.012785687111318111
Validation loss = 0.012861116789281368
Validation loss = 0.013829316943883896
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03043794445693493
Validation loss = 0.015693986788392067
Validation loss = 0.01413759682327509
Validation loss = 0.012752965092658997
Validation loss = 0.013330349698662758
Validation loss = 0.013987156562507153
Validation loss = 0.01970956102013588
Validation loss = 0.014134770259261131
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01565941981971264
Validation loss = 0.019546248018741608
Validation loss = 0.013475144281983376
Validation loss = 0.014674452133476734
Validation loss = 0.013966783881187439
Validation loss = 0.014484966173768044
Validation loss = 0.015005657449364662
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017002588137984276
Validation loss = 0.011974941939115524
Validation loss = 0.012120967730879784
Validation loss = 0.013733177445828915
Validation loss = 0.012416478246450424
Validation loss = 0.011950136162340641
Validation loss = 0.014447465538978577
Validation loss = 0.011461758054792881
Validation loss = 0.01156237069517374
Validation loss = 0.011944115161895752
Validation loss = 0.014853814616799355
Validation loss = 0.012690740637481213
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000912 |
| Iteration     | 13        |
| MaximumReturn | -0.000581 |
| MinimumReturn | -0.00322  |
| TotalSamples  | 24990     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020409800112247467
Validation loss = 0.020146586000919342
Validation loss = 0.012792297638952732
Validation loss = 0.011939909309148788
Validation loss = 0.012423588894307613
Validation loss = 0.013747982680797577
Validation loss = 0.01279605831950903
Validation loss = 0.011886862106621265
Validation loss = 0.01103177946060896
Validation loss = 0.012665211223065853
Validation loss = 0.012908141128718853
Validation loss = 0.017323242500424385
Validation loss = 0.013612688519060612
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01369693223387003
Validation loss = 0.012126470915973186
Validation loss = 0.011557836085557938
Validation loss = 0.0111243212595582
Validation loss = 0.016274169087409973
Validation loss = 0.016567476093769073
Validation loss = 0.017045702785253525
Validation loss = 0.012415721081197262
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01659810170531273
Validation loss = 0.015405469574034214
Validation loss = 0.013977131806313992
Validation loss = 0.012269090861082077
Validation loss = 0.01386395376175642
Validation loss = 0.017191385850310326
Validation loss = 0.012184792198240757
Validation loss = 0.013059749267995358
Validation loss = 0.012036129832267761
Validation loss = 0.012466582469642162
Validation loss = 0.014109928160905838
Validation loss = 0.01176313403993845
Validation loss = 0.015199991874396801
Validation loss = 0.016240224242210388
Validation loss = 0.014011888764798641
Validation loss = 0.011864599771797657
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013693023473024368
Validation loss = 0.01599179022014141
Validation loss = 0.016965188086032867
Validation loss = 0.012883505783975124
Validation loss = 0.017028214409947395
Validation loss = 0.015417029149830341
Validation loss = 0.012326319701969624
Validation loss = 0.012318000197410583
Validation loss = 0.011793076992034912
Validation loss = 0.012807772494852543
Validation loss = 0.014439788646996021
Validation loss = 0.01309354230761528
Validation loss = 0.015550331212580204
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014572215266525745
Validation loss = 0.013155993074178696
Validation loss = 0.011459817178547382
Validation loss = 0.011981699615716934
Validation loss = 0.0141603359952569
Validation loss = 0.01324899960309267
Validation loss = 0.013279096223413944
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000873 |
| Iteration     | 14        |
| MaximumReturn | -0.000629 |
| MinimumReturn | -0.00159  |
| TotalSamples  | 26656     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015005952678620815
Validation loss = 0.01132439449429512
Validation loss = 0.018408022820949554
Validation loss = 0.011500915512442589
Validation loss = 0.011626060120761395
Validation loss = 0.013374540954828262
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015018965117633343
Validation loss = 0.01112686563283205
Validation loss = 0.0133164431899786
Validation loss = 0.010654333978891373
Validation loss = 0.010098254308104515
Validation loss = 0.01241135410964489
Validation loss = 0.010672900825738907
Validation loss = 0.012322739697992802
Validation loss = 0.010543524287641048
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0172512698918581
Validation loss = 0.012135193683207035
Validation loss = 0.013244346715509892
Validation loss = 0.010638401843607426
Validation loss = 0.016685372218489647
Validation loss = 0.011722116731107235
Validation loss = 0.01495734415948391
Validation loss = 0.011379379779100418
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014263328164815903
Validation loss = 0.012421691790223122
Validation loss = 0.010835319757461548
Validation loss = 0.013947518542408943
Validation loss = 0.013466794043779373
Validation loss = 0.014677131548523903
Validation loss = 0.011072848923504353
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011260386556386948
Validation loss = 0.01071222499012947
Validation loss = 0.010098851285874844
Validation loss = 0.010428447276353836
Validation loss = 0.013705987483263016
Validation loss = 0.009821712039411068
Validation loss = 0.01188442762941122
Validation loss = 0.01137884147465229
Validation loss = 0.010637739673256874
Validation loss = 0.013334067538380623
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000949 |
| Iteration     | 15        |
| MaximumReturn | -0.000505 |
| MinimumReturn | -0.00318  |
| TotalSamples  | 28322     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013787862844765186
Validation loss = 0.010644548572599888
Validation loss = 0.01052809040993452
Validation loss = 0.013348205015063286
Validation loss = 0.01399295125156641
Validation loss = 0.010379357263445854
Validation loss = 0.014579293318092823
Validation loss = 0.021038716658949852
Validation loss = 0.010025033727288246
Validation loss = 0.011062330566346645
Validation loss = 0.008215882815420628
Validation loss = 0.011365176178514957
Validation loss = 0.009967593476176262
Validation loss = 0.010799328796565533
Validation loss = 0.010467142798006535
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014074833132326603
Validation loss = 0.010342894122004509
Validation loss = 0.01006060279905796
Validation loss = 0.01466129720211029
Validation loss = 0.01313022430986166
Validation loss = 0.01078247744590044
Validation loss = 0.011240186169743538
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010024060495197773
Validation loss = 0.010540368035435677
Validation loss = 0.009665040299296379
Validation loss = 0.011727996170520782
Validation loss = 0.015262932516634464
Validation loss = 0.010082485154271126
Validation loss = 0.009888209402561188
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010490210726857185
Validation loss = 0.013445199467241764
Validation loss = 0.010346530936658382
Validation loss = 0.010120434686541557
Validation loss = 0.015074707567691803
Validation loss = 0.012373547069728374
Validation loss = 0.011775800958275795
Validation loss = 0.009925996884703636
Validation loss = 0.010886574164032936
Validation loss = 0.009370303712785244
Validation loss = 0.012882664799690247
Validation loss = 0.012575657106935978
Validation loss = 0.0104575976729393
Validation loss = 0.010079676285386086
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010759635828435421
Validation loss = 0.010927102528512478
Validation loss = 0.010238475166261196
Validation loss = 0.010902266018092632
Validation loss = 0.009749011136591434
Validation loss = 0.013206668198108673
Validation loss = 0.009167976677417755
Validation loss = 0.011708018369972706
Validation loss = 0.009078264236450195
Validation loss = 0.011104227975010872
Validation loss = 0.010311602614820004
Validation loss = 0.010500460863113403
Validation loss = 0.00983657967299223
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00083  |
| Iteration     | 16        |
| MaximumReturn | -0.000574 |
| MinimumReturn | -0.00222  |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009704953990876675
Validation loss = 0.01066757831722498
Validation loss = 0.009098740294575691
Validation loss = 0.010660860687494278
Validation loss = 0.012233138084411621
Validation loss = 0.009841260500252247
Validation loss = 0.010818873532116413
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012778805568814278
Validation loss = 0.011732155457139015
Validation loss = 0.012802896089851856
Validation loss = 0.012748697772622108
Validation loss = 0.010440023615956306
Validation loss = 0.01188684906810522
Validation loss = 0.012760281562805176
Validation loss = 0.012385958805680275
Validation loss = 0.010423216968774796
Validation loss = 0.011043927632272243
Validation loss = 0.010296096093952656
Validation loss = 0.011792810633778572
Validation loss = 0.009505937807261944
Validation loss = 0.01010596938431263
Validation loss = 0.013058507815003395
Validation loss = 0.010969112627208233
Validation loss = 0.01101572997868061
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010166623629629612
Validation loss = 0.0126072708517313
Validation loss = 0.010324167087674141
Validation loss = 0.011748881079256535
Validation loss = 0.008670876733958721
Validation loss = 0.01376192457973957
Validation loss = 0.009459195658564568
Validation loss = 0.012052921578288078
Validation loss = 0.0109922019764781
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012745469808578491
Validation loss = 0.010803135111927986
Validation loss = 0.010675921104848385
Validation loss = 0.010432714596390724
Validation loss = 0.009682679548859596
Validation loss = 0.011638086289167404
Validation loss = 0.009598557837307453
Validation loss = 0.010047649033367634
Validation loss = 0.011577634140849113
Validation loss = 0.00967418123036623
Validation loss = 0.010110964067280293
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010475071147084236
Validation loss = 0.009650710970163345
Validation loss = 0.013065235689282417
Validation loss = 0.011649517342448235
Validation loss = 0.010522810742259026
Validation loss = 0.009887291118502617
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00125  |
| Iteration     | 17        |
| MaximumReturn | -0.000471 |
| MinimumReturn | -0.00345  |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014015989378094673
Validation loss = 0.0089141009375453
Validation loss = 0.008328564465045929
Validation loss = 0.011762866750359535
Validation loss = 0.010585829615592957
Validation loss = 0.009919154457747936
Validation loss = 0.01257635559886694
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017190584912896156
Validation loss = 0.009903683327138424
Validation loss = 0.009990730322897434
Validation loss = 0.0092763751745224
Validation loss = 0.010124018415808678
Validation loss = 0.010390093550086021
Validation loss = 0.009663022123277187
Validation loss = 0.0086820088326931
Validation loss = 0.010490362532436848
Validation loss = 0.009122069925069809
Validation loss = 0.00989639200270176
Validation loss = 0.008992914110422134
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012369940057396889
Validation loss = 0.010324412025511265
Validation loss = 0.011309105902910233
Validation loss = 0.008876452222466469
Validation loss = 0.009693942032754421
Validation loss = 0.011256584897637367
Validation loss = 0.01476853247731924
Validation loss = 0.009229405783116817
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011172031983733177
Validation loss = 0.009596885181963444
Validation loss = 0.011630571447312832
Validation loss = 0.009093135595321655
Validation loss = 0.009103263728320599
Validation loss = 0.010945022106170654
Validation loss = 0.008981699123978615
Validation loss = 0.009369775652885437
Validation loss = 0.01822606287896633
Validation loss = 0.011436311528086662
Validation loss = 0.009252901189029217
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013423826545476913
Validation loss = 0.009991111233830452
Validation loss = 0.012425362132489681
Validation loss = 0.010452532209455967
Validation loss = 0.010774769820272923
Validation loss = 0.010701265186071396
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00165  |
| Iteration     | 18        |
| MaximumReturn | -0.000566 |
| MinimumReturn | -0.0096   |
| TotalSamples  | 33320     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012798424810171127
Validation loss = 0.01413639448583126
Validation loss = 0.01068667322397232
Validation loss = 0.012335069477558136
Validation loss = 0.009655616246163845
Validation loss = 0.010812098160386086
Validation loss = 0.009329275228083134
Validation loss = 0.008804298005998135
Validation loss = 0.011000248603522778
Validation loss = 0.009868146851658821
Validation loss = 0.008878661319613457
Validation loss = 0.010558187030255795
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007893525995314121
Validation loss = 0.008813976310193539
Validation loss = 0.009347842074930668
Validation loss = 0.008418253622949123
Validation loss = 0.009813426062464714
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00827411562204361
Validation loss = 0.009255068376660347
Validation loss = 0.008486920967698097
Validation loss = 0.01148348767310381
Validation loss = 0.00980139710009098
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009773528203368187
Validation loss = 0.009072034619748592
Validation loss = 0.01508118212223053
Validation loss = 0.00834912620484829
Validation loss = 0.009104127995669842
Validation loss = 0.010757374577224255
Validation loss = 0.008146706037223339
Validation loss = 0.009081251919269562
Validation loss = 0.007825113832950592
Validation loss = 0.011974948458373547
Validation loss = 0.008448555134236813
Validation loss = 0.007380375638604164
Validation loss = 0.017973503097891808
Validation loss = 0.009970941580832005
Validation loss = 0.008151747286319733
Validation loss = 0.006693209987133741
Validation loss = 0.008282730355858803
Validation loss = 0.007967930287122726
Validation loss = 0.012030206620693207
Validation loss = 0.009487439878284931
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009738960303366184
Validation loss = 0.013007274828851223
Validation loss = 0.010565937496721745
Validation loss = 0.009316056966781616
Validation loss = 0.009158765897154808
Validation loss = 0.008354631252586842
Validation loss = 0.009098879992961884
Validation loss = 0.008542196825146675
Validation loss = 0.00965912640094757
Validation loss = 0.00987776555120945
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00136  |
| Iteration     | 19        |
| MaximumReturn | -0.000634 |
| MinimumReturn | -0.00917  |
| TotalSamples  | 34986     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011177089996635914
Validation loss = 0.008627980947494507
Validation loss = 0.009245731867849827
Validation loss = 0.007909249514341354
Validation loss = 0.008631395176053047
Validation loss = 0.008371447212994099
Validation loss = 0.007830976508557796
Validation loss = 0.009725356474518776
Validation loss = 0.011267206631600857
Validation loss = 0.0077683390118181705
Validation loss = 0.01112815085798502
Validation loss = 0.009731806814670563
Validation loss = 0.008206808939576149
Validation loss = 0.009259204380214214
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009529032744467258
Validation loss = 0.009084869176149368
Validation loss = 0.009892802685499191
Validation loss = 0.007662659976631403
Validation loss = 0.008112962357699871
Validation loss = 0.009264877066016197
Validation loss = 0.012532176449894905
Validation loss = 0.00840647704899311
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007560197729617357
Validation loss = 0.008913755416870117
Validation loss = 0.009041918441653252
Validation loss = 0.014097362756729126
Validation loss = 0.008671903982758522
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008960623294115067
Validation loss = 0.009776255115866661
Validation loss = 0.008789186365902424
Validation loss = 0.008634059689939022
Validation loss = 0.008471816778182983
Validation loss = 0.012464448809623718
Validation loss = 0.009077665396034718
Validation loss = 0.008161251433193684
Validation loss = 0.014370810240507126
Validation loss = 0.013029079884290695
Validation loss = 0.008718786761164665
Validation loss = 0.008833729662001133
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008357802405953407
Validation loss = 0.007663837168365717
Validation loss = 0.008677458390593529
Validation loss = 0.009050087071955204
Validation loss = 0.008842352777719498
Validation loss = 0.00788121297955513
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00399  |
| Iteration     | 20        |
| MaximumReturn | -0.000553 |
| MinimumReturn | -0.0593   |
| TotalSamples  | 36652     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007275443058460951
Validation loss = 0.009885791689157486
Validation loss = 0.007821973413228989
Validation loss = 0.007207855582237244
Validation loss = 0.007110688369721174
Validation loss = 0.00922344345599413
Validation loss = 0.009957736358046532
Validation loss = 0.007260700222104788
Validation loss = 0.007666175719350576
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008242983371019363
Validation loss = 0.007538969162851572
Validation loss = 0.009190070442855358
Validation loss = 0.008103784173727036
Validation loss = 0.008361930027604103
Validation loss = 0.006712257396429777
Validation loss = 0.007045289035886526
Validation loss = 0.009478083811700344
Validation loss = 0.007748185656964779
Validation loss = 0.006987933535128832
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011455010622739792
Validation loss = 0.023009218275547028
Validation loss = 0.01137460395693779
Validation loss = 0.0069636935368180275
Validation loss = 0.009933690540492535
Validation loss = 0.010261595249176025
Validation loss = 0.008472387678921223
Validation loss = 0.009074735455214977
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009196934290230274
Validation loss = 0.00865807756781578
Validation loss = 0.008462945930659771
Validation loss = 0.009333249181509018
Validation loss = 0.007710588630288839
Validation loss = 0.011574101634323597
Validation loss = 0.009137192741036415
Validation loss = 0.009576954878866673
Validation loss = 0.007794301491230726
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00855647586286068
Validation loss = 0.007548586465418339
Validation loss = 0.007301337085664272
Validation loss = 0.00817475002259016
Validation loss = 0.008863269351422787
Validation loss = 0.007855962961912155
Validation loss = 0.008067687973380089
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00199  |
| Iteration     | 21        |
| MaximumReturn | -0.000648 |
| MinimumReturn | -0.0282   |
| TotalSamples  | 38318     |
-----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011012275703251362
Validation loss = 0.009809687733650208
Validation loss = 0.007602226454764605
Validation loss = 0.009760897606611252
Validation loss = 0.009761178866028786
Validation loss = 0.008759133517742157
Validation loss = 0.008460147306323051
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008873986080288887
Validation loss = 0.006974852178245783
Validation loss = 0.008033188991248608
Validation loss = 0.007214737590402365
Validation loss = 0.009519284591078758
Validation loss = 0.011139271780848503
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012160555459558964
Validation loss = 0.00803937204182148
Validation loss = 0.007832816801965237
Validation loss = 0.008833318948745728
Validation loss = 0.00895067397505045
Validation loss = 0.008531371131539345
Validation loss = 0.008130582980811596
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016682861372828484
Validation loss = 0.010055018588900566
Validation loss = 0.008428401313722134
Validation loss = 0.006537522189319134
Validation loss = 0.006700974423438311
Validation loss = 0.0069466023705899715
Validation loss = 0.0065429019741714
Validation loss = 0.009418410249054432
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013700124807655811
Validation loss = 0.007391826715320349
Validation loss = 0.006915865000337362
Validation loss = 0.009006209671497345
Validation loss = 0.008227578364312649
Validation loss = 0.008477083407342434
Validation loss = 0.011022848077118397
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 22        |
| MaximumReturn | -0.000575 |
| MinimumReturn | -0.00476  |
| TotalSamples  | 39984     |
-----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007310196757316589
Validation loss = 0.007314021699130535
Validation loss = 0.009277290664613247
Validation loss = 0.008235776796936989
Validation loss = 0.007316281087696552
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011591210961341858
Validation loss = 0.007009924855083227
Validation loss = 0.007331588305532932
Validation loss = 0.009630216285586357
Validation loss = 0.007505373097956181
Validation loss = 0.008541271090507507
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008661623112857342
Validation loss = 0.009919139556586742
Validation loss = 0.007225611247122288
Validation loss = 0.013537406921386719
Validation loss = 0.0075982785783708096
Validation loss = 0.007731757126748562
Validation loss = 0.007268392946571112
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009592498652637005
Validation loss = 0.0072496188804507256
Validation loss = 0.0069696903228759766
Validation loss = 0.006399109959602356
Validation loss = 0.011089691892266273
Validation loss = 0.008722136728465557
Validation loss = 0.0073882536962628365
Validation loss = 0.007156442850828171
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008795855566859245
Validation loss = 0.00901547446846962
Validation loss = 0.007553232368081808
Validation loss = 0.0070052118971943855
Validation loss = 0.007119455374777317
Validation loss = 0.007671964820474386
Validation loss = 0.008270394057035446
Validation loss = 0.007879513315856457
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00524 |
| Iteration     | 23       |
| MaximumReturn | -0.00062 |
| MinimumReturn | -0.0348  |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021006982773542404
Validation loss = 0.006596796214580536
Validation loss = 0.007565496023744345
Validation loss = 0.007690934929996729
Validation loss = 0.007867133244872093
Validation loss = 0.006613603793084621
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009180709719657898
Validation loss = 0.007080955896526575
Validation loss = 0.00572297815233469
Validation loss = 0.005995396990329027
Validation loss = 0.008281320333480835
Validation loss = 0.006905537098646164
Validation loss = 0.006112777628004551
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014480307698249817
Validation loss = 0.00662650540471077
Validation loss = 0.006953124888241291
Validation loss = 0.008429608307778835
Validation loss = 0.006836462765932083
Validation loss = 0.006786820944398642
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008468022570014
Validation loss = 0.0064133270643651485
Validation loss = 0.007002974860370159
Validation loss = 0.006547164171934128
Validation loss = 0.00575617840513587
Validation loss = 0.010099795646965504
Validation loss = 0.007344911340624094
Validation loss = 0.006825659424066544
Validation loss = 0.006177736911922693
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010400011204183102
Validation loss = 0.007406644523143768
Validation loss = 0.008743127807974815
Validation loss = 0.006920506712049246
Validation loss = 0.006852520164102316
Validation loss = 0.006766514386981726
Validation loss = 0.00629271799698472
Validation loss = 0.008568400517106056
Validation loss = 0.006179383024573326
Validation loss = 0.007321419660001993
Validation loss = 0.00682148477062583
Validation loss = 0.00599080603569746
Validation loss = 0.00652608647942543
Validation loss = 0.008081278763711452
Validation loss = 0.008482119999825954
Validation loss = 0.007949953898787498
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00298  |
| Iteration     | 24        |
| MaximumReturn | -0.000533 |
| MinimumReturn | -0.0349   |
| TotalSamples  | 43316     |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006771240383386612
Validation loss = 0.010993972420692444
Validation loss = 0.008023486472666264
Validation loss = 0.006418813019990921
Validation loss = 0.00891081616282463
Validation loss = 0.007532877381891012
Validation loss = 0.009597334079444408
Validation loss = 0.005342846270650625
Validation loss = 0.006659692153334618
Validation loss = 0.006322209257632494
Validation loss = 0.006329337600618601
Validation loss = 0.00761992996558547
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006058295723050833
Validation loss = 0.00718602305278182
Validation loss = 0.007011273875832558
Validation loss = 0.0073599787428975105
Validation loss = 0.006032629869878292
Validation loss = 0.006391530856490135
Validation loss = 0.007227847818285227
Validation loss = 0.007791938725858927
Validation loss = 0.007587026339024305
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00874060858041048
Validation loss = 0.005659342277795076
Validation loss = 0.009993976913392544
Validation loss = 0.008031796663999557
Validation loss = 0.011384589597582817
Validation loss = 0.011540965177118778
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006743825506418943
Validation loss = 0.0072954511269927025
Validation loss = 0.007385378237813711
Validation loss = 0.005827180575579405
Validation loss = 0.0062618679367005825
Validation loss = 0.005386819131672382
Validation loss = 0.006775337271392345
Validation loss = 0.009000692516565323
Validation loss = 0.00563565269112587
Validation loss = 0.006672219838947058
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007340969052165747
Validation loss = 0.007115705870091915
Validation loss = 0.007750056218355894
Validation loss = 0.0067376792430877686
Validation loss = 0.0074493493884801865
Validation loss = 0.006250293459743261
Validation loss = 0.007778410334140062
Validation loss = 0.005944209173321724
Validation loss = 0.006022410001605749
Validation loss = 0.007986379787325859
Validation loss = 0.006754934787750244
Validation loss = 0.0072305151261389256
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00327  |
| Iteration     | 25        |
| MaximumReturn | -0.000543 |
| MinimumReturn | -0.0162   |
| TotalSamples  | 44982     |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005232896190136671
Validation loss = 0.006161356344819069
Validation loss = 0.006938399281352758
Validation loss = 0.005186615977436304
Validation loss = 0.005643968004733324
Validation loss = 0.006353850942105055
Validation loss = 0.006294459570199251
Validation loss = 0.005674440413713455
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007068179082125425
Validation loss = 0.0054407874122262
Validation loss = 0.005080285482108593
Validation loss = 0.0052546607330441475
Validation loss = 0.008523701690137386
Validation loss = 0.006033419165760279
Validation loss = 0.005568399094045162
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009506668895483017
Validation loss = 0.008063663728535175
Validation loss = 0.008257999084889889
Validation loss = 0.00969333853572607
Validation loss = 0.0064902096055448055
Validation loss = 0.008457805030047894
Validation loss = 0.007110673002898693
Validation loss = 0.006120072212070227
Validation loss = 0.006884847301989794
Validation loss = 0.005141118541359901
Validation loss = 0.007926801219582558
Validation loss = 0.0070101674646139145
Validation loss = 0.006371196825057268
Validation loss = 0.00525699881836772
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0057789054699242115
Validation loss = 0.009263158775866032
Validation loss = 0.0081406868994236
Validation loss = 0.005313970614224672
Validation loss = 0.007124241907149553
Validation loss = 0.0051323771476745605
Validation loss = 0.00492985500022769
Validation loss = 0.005808162968605757
Validation loss = 0.00577351450920105
Validation loss = 0.0070604016073048115
Validation loss = 0.006040414795279503
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007521354127675295
Validation loss = 0.005713570397347212
Validation loss = 0.009720421396195889
Validation loss = 0.007427325937896967
Validation loss = 0.006521554663777351
Validation loss = 0.008126260712742805
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00362  |
| Iteration     | 26        |
| MaximumReturn | -0.000575 |
| MinimumReturn | -0.0305   |
| TotalSamples  | 46648     |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005948700942099094
Validation loss = 0.005740463733673096
Validation loss = 0.007518684957176447
Validation loss = 0.005662651266902685
Validation loss = 0.006745032500475645
Validation loss = 0.00658384757116437
Validation loss = 0.008010194636881351
Validation loss = 0.007083345204591751
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006554514169692993
Validation loss = 0.004776909947395325
Validation loss = 0.005294387694448233
Validation loss = 0.006074877455830574
Validation loss = 0.0050463550724089146
Validation loss = 0.005538915283977985
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005705850198864937
Validation loss = 0.005767920520156622
Validation loss = 0.00580421881750226
Validation loss = 0.006811917293816805
Validation loss = 0.005704091861844063
Validation loss = 0.004763331264257431
Validation loss = 0.005093270447105169
Validation loss = 0.007166791707277298
Validation loss = 0.008170231245458126
Validation loss = 0.006539322901517153
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00739726796746254
Validation loss = 0.005007950589060783
Validation loss = 0.005885109305381775
Validation loss = 0.006040581036359072
Validation loss = 0.009115539491176605
Validation loss = 0.006449862848967314
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007028857711702585
Validation loss = 0.006572011858224869
Validation loss = 0.006020468194037676
Validation loss = 0.006853570695966482
Validation loss = 0.0060506113804876804
Validation loss = 0.00860331766307354
Validation loss = 0.006457965821027756
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0024   |
| Iteration     | 27        |
| MaximumReturn | -0.000532 |
| MinimumReturn | -0.0157   |
| TotalSamples  | 48314     |
-----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005840931553393602
Validation loss = 0.0050791664980351925
Validation loss = 0.005509961862117052
Validation loss = 0.005329901352524757
Validation loss = 0.005391773302108049
Validation loss = 0.0052487850189208984
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005493320059031248
Validation loss = 0.00666792830452323
Validation loss = 0.004849608987569809
Validation loss = 0.005185633897781372
Validation loss = 0.006351693067699671
Validation loss = 0.005639967974275351
Validation loss = 0.006647169589996338
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005161189939826727
Validation loss = 0.005769567564129829
Validation loss = 0.00928776990622282
Validation loss = 0.00561515474691987
Validation loss = 0.006557750049978495
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007259551435709
Validation loss = 0.005941326264292002
Validation loss = 0.011971167288720608
Validation loss = 0.005522457417100668
Validation loss = 0.0056327530182898045
Validation loss = 0.004422094207257032
Validation loss = 0.005243530962616205
Validation loss = 0.006119364872574806
Validation loss = 0.004692280199378729
Validation loss = 0.004691803362220526
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011971880681812763
Validation loss = 0.006631717551499605
Validation loss = 0.0066857910715043545
Validation loss = 0.006379798520356417
Validation loss = 0.0075277164578437805
Validation loss = 0.0066095688380301
Validation loss = 0.006985091138631105
Validation loss = 0.007121395319700241
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00414  |
| Iteration     | 28        |
| MaximumReturn | -0.000673 |
| MinimumReturn | -0.0352   |
| TotalSamples  | 49980     |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0067238761112093925
Validation loss = 0.005322394892573357
Validation loss = 0.006048263981938362
Validation loss = 0.006260295398533344
Validation loss = 0.0063554998487234116
Validation loss = 0.006006291601806879
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005585698876529932
Validation loss = 0.0056611280888319016
Validation loss = 0.009909447282552719
Validation loss = 0.00513465516269207
Validation loss = 0.00610794173553586
Validation loss = 0.005190852098166943
Validation loss = 0.00556940445676446
Validation loss = 0.005049673840403557
Validation loss = 0.00650318618863821
Validation loss = 0.004528322722762823
Validation loss = 0.005511628929525614
Validation loss = 0.004668800625950098
Validation loss = 0.006510738283395767
Validation loss = 0.005395907908678055
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005805645138025284
Validation loss = 0.005768411792814732
Validation loss = 0.005983075592666864
Validation loss = 0.007564552593976259
Validation loss = 0.007718542590737343
Validation loss = 0.0057605477049946785
Validation loss = 0.006910575088113546
Validation loss = 0.010308872908353806
Validation loss = 0.005758897867053747
Validation loss = 0.005826984532177448
Validation loss = 0.005883651319891214
Validation loss = 0.008389133960008621
Validation loss = 0.006223801989108324
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005809397902339697
Validation loss = 0.0071097505278885365
Validation loss = 0.0045552621595561504
Validation loss = 0.005342823453247547
Validation loss = 0.007421316113322973
Validation loss = 0.006048061419278383
Validation loss = 0.005326130427420139
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008474844507873058
Validation loss = 0.0068901642225682735
Validation loss = 0.006771568208932877
Validation loss = 0.00896297488361597
Validation loss = 0.0076569304801523685
Validation loss = 0.007175781298428774
Validation loss = 0.005089245270937681
Validation loss = 0.007369741331785917
Validation loss = 0.0082024447619915
Validation loss = 0.006195404101163149
Validation loss = 0.00724977720528841
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000759 |
| Iteration     | 29        |
| MaximumReturn | -0.000534 |
| MinimumReturn | -0.00104  |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006122501567006111
Validation loss = 0.005318716634064913
Validation loss = 0.006046676076948643
Validation loss = 0.00858468096703291
Validation loss = 0.005854531656950712
Validation loss = 0.005545837339013815
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006603475660085678
Validation loss = 0.005931914318352938
Validation loss = 0.005414964165538549
Validation loss = 0.004787284880876541
Validation loss = 0.004582078196108341
Validation loss = 0.005543440114706755
Validation loss = 0.007826452143490314
Validation loss = 0.005473660305142403
Validation loss = 0.005373526830226183
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005505549721419811
Validation loss = 0.005220590624958277
Validation loss = 0.0062790303491055965
Validation loss = 0.007903610356152058
Validation loss = 0.005831974558532238
Validation loss = 0.0066795735619962215
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00678960420191288
Validation loss = 0.005541473627090454
Validation loss = 0.0057165236212313175
Validation loss = 0.004943988751620054
Validation loss = 0.0068977647460997105
Validation loss = 0.005585025064647198
Validation loss = 0.00680201593786478
Validation loss = 0.004891859367489815
Validation loss = 0.006227427162230015
Validation loss = 0.006582261528819799
Validation loss = 0.007257223129272461
Validation loss = 0.005661178845912218
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007041881792247295
Validation loss = 0.006823703180998564
Validation loss = 0.006945540197193623
Validation loss = 0.005784737411886454
Validation loss = 0.013270890340209007
Validation loss = 0.0077569917775690556
Validation loss = 0.006728597451001406
Validation loss = 0.0062345098704099655
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.581    |
| Iteration     | 30        |
| MaximumReturn | -0.000563 |
| MinimumReturn | -11.4     |
| TotalSamples  | 53312     |
-----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015605699270963669
Validation loss = 0.006739574018865824
Validation loss = 0.007665442768484354
Validation loss = 0.007007643114775419
Validation loss = 0.007531159557402134
Validation loss = 0.008756638504564762
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0161020178347826
Validation loss = 0.006063101347535849
Validation loss = 0.0066399360075592995
Validation loss = 0.005557284690439701
Validation loss = 0.0055932230316102505
Validation loss = 0.006013659760355949
Validation loss = 0.005376115906983614
Validation loss = 0.005408537108451128
Validation loss = 0.0050000520423054695
Validation loss = 0.00660609221085906
Validation loss = 0.00540209049358964
Validation loss = 0.004772367887198925
Validation loss = 0.0056973909959197044
Validation loss = 0.004892706871032715
Validation loss = 0.0047251502983272076
Validation loss = 0.0051011089235544205
Validation loss = 0.007139893714338541
Validation loss = 0.006671882700175047
Validation loss = 0.00627331854775548
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012286648154258728
Validation loss = 0.009115138091146946
Validation loss = 0.005702461116015911
Validation loss = 0.004968822933733463
Validation loss = 0.007452911697328091
Validation loss = 0.0066072214394807816
Validation loss = 0.006526305805891752
Validation loss = 0.007622421253472567
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013827688060700893
Validation loss = 0.006235299166291952
Validation loss = 0.005915956571698189
Validation loss = 0.005952585954219103
Validation loss = 0.006509627681225538
Validation loss = 0.006872280966490507
Validation loss = 0.008840780705213547
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010595573112368584
Validation loss = 0.006842997390776873
Validation loss = 0.010844316333532333
Validation loss = 0.006283024325966835
Validation loss = 0.008587068878114223
Validation loss = 0.007361996453255415
Validation loss = 0.006460923235863447
Validation loss = 0.007721655070781708
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00255  |
| Iteration     | 31        |
| MaximumReturn | -0.000576 |
| MinimumReturn | -0.017    |
| TotalSamples  | 54978     |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008838389068841934
Validation loss = 0.007783058565109968
Validation loss = 0.008213973604142666
Validation loss = 0.00851467065513134
Validation loss = 0.006952131167054176
Validation loss = 0.009749204851686954
Validation loss = 0.007836484350264072
Validation loss = 0.008349006064236164
Validation loss = 0.010859781876206398
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006703815422952175
Validation loss = 0.0061990804970264435
Validation loss = 0.006220950745046139
Validation loss = 0.0060090068727731705
Validation loss = 0.00680560851469636
Validation loss = 0.005802275612950325
Validation loss = 0.0057896641083061695
Validation loss = 0.0051076943054795265
Validation loss = 0.005627969745546579
Validation loss = 0.0069831195287406445
Validation loss = 0.007441812660545111
Validation loss = 0.005176919512450695
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008672199212014675
Validation loss = 0.006518059875816107
Validation loss = 0.0064582061022520065
Validation loss = 0.006066199392080307
Validation loss = 0.007572385016828775
Validation loss = 0.005723729729652405
Validation loss = 0.005947967525571585
Validation loss = 0.005510684568434954
Validation loss = 0.007019112352281809
Validation loss = 0.004772007465362549
Validation loss = 0.005260375794023275
Validation loss = 0.006393910385668278
Validation loss = 0.007737432140856981
Validation loss = 0.004926835652440786
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006829533725976944
Validation loss = 0.008226202800869942
Validation loss = 0.007177709601819515
Validation loss = 0.006255703512579203
Validation loss = 0.007546654436737299
Validation loss = 0.005918125156313181
Validation loss = 0.007669175509363413
Validation loss = 0.006976005621254444
Validation loss = 0.01010128390043974
Validation loss = 0.006783149670809507
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0074031176045536995
Validation loss = 0.007304225582629442
Validation loss = 0.006105286534875631
Validation loss = 0.008473755791783333
Validation loss = 0.009449069388210773
Validation loss = 0.005551443435251713
Validation loss = 0.0056297979317605495
Validation loss = 0.008313028141856194
Validation loss = 0.009066533297300339
Validation loss = 0.007632242515683174
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00178  |
| Iteration     | 32        |
| MaximumReturn | -0.000591 |
| MinimumReturn | -0.0113   |
| TotalSamples  | 56644     |
-----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011118084192276001
Validation loss = 0.007244647480547428
Validation loss = 0.008219234645366669
Validation loss = 0.00698041170835495
Validation loss = 0.007975505664944649
Validation loss = 0.006724346429109573
Validation loss = 0.00652877613902092
Validation loss = 0.006667363457381725
Validation loss = 0.006974082440137863
Validation loss = 0.006099178455770016
Validation loss = 0.007928358390927315
Validation loss = 0.007273213472217321
Validation loss = 0.006478076335042715
Validation loss = 0.00816587544977665
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008436341769993305
Validation loss = 0.004872129298746586
Validation loss = 0.004403699655085802
Validation loss = 0.005138732027262449
Validation loss = 0.006232492160052061
Validation loss = 0.005272470414638519
Validation loss = 0.008119859732687473
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006017256993800402
Validation loss = 0.006594828329980373
Validation loss = 0.008624420501291752
Validation loss = 0.005119407083839178
Validation loss = 0.006921867839992046
Validation loss = 0.005145994480699301
Validation loss = 0.006067859474569559
Validation loss = 0.0064882249571383
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0070044659078121185
Validation loss = 0.0064659700728952885
Validation loss = 0.006044601555913687
Validation loss = 0.006396338809281588
Validation loss = 0.008202454075217247
Validation loss = 0.006347052752971649
Validation loss = 0.008801392279565334
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008988751098513603
Validation loss = 0.006681705359369516
Validation loss = 0.00717900600284338
Validation loss = 0.006187777034938335
Validation loss = 0.009485473856329918
Validation loss = 0.007635463960468769
Validation loss = 0.006655654404312372
Validation loss = 0.007315858732908964
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00437 |
| Iteration     | 33       |
| MaximumReturn | -0.0005  |
| MinimumReturn | -0.0398  |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008951069787144661
Validation loss = 0.00578009570017457
Validation loss = 0.006881782319396734
Validation loss = 0.006297033745795488
Validation loss = 0.005373215768486261
Validation loss = 0.006332142278552055
Validation loss = 0.006454552989453077
Validation loss = 0.007438544183969498
Validation loss = 0.006236289162188768
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0066544851288199425
Validation loss = 0.004155619069933891
Validation loss = 0.0054739536717534065
Validation loss = 0.004740363452583551
Validation loss = 0.005428076256066561
Validation loss = 0.006471762899309397
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005206076428294182
Validation loss = 0.006969400681555271
Validation loss = 0.006009113974869251
Validation loss = 0.006433042697608471
Validation loss = 0.00541478069499135
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006986601278185844
Validation loss = 0.005156594328582287
Validation loss = 0.004915937315672636
Validation loss = 0.005888685584068298
Validation loss = 0.00852349866181612
Validation loss = 0.0062114340253174305
Validation loss = 0.007535320241004229
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007135793101042509
Validation loss = 0.006411089561879635
Validation loss = 0.00665234075859189
Validation loss = 0.006062590517103672
Validation loss = 0.006360889412462711
Validation loss = 0.005482158623635769
Validation loss = 0.007493787910789251
Validation loss = 0.005865301005542278
Validation loss = 0.007993869483470917
Validation loss = 0.007247364614158869
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00181  |
| Iteration     | 34        |
| MaximumReturn | -0.000593 |
| MinimumReturn | -0.0104   |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00871965941041708
Validation loss = 0.006308760028332472
Validation loss = 0.006346945650875568
Validation loss = 0.005045793950557709
Validation loss = 0.004704797174781561
Validation loss = 0.004842978902161121
Validation loss = 0.004400656558573246
Validation loss = 0.008474500849843025
Validation loss = 0.004562061745673418
Validation loss = 0.007739802822470665
Validation loss = 0.006940816063433886
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0062126037664711475
Validation loss = 0.004248773213475943
Validation loss = 0.009071914479136467
Validation loss = 0.0058820839039981365
Validation loss = 0.005909004248678684
Validation loss = 0.004778203088790178
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006671398412436247
Validation loss = 0.0058454349637031555
Validation loss = 0.005691410508006811
Validation loss = 0.005307772196829319
Validation loss = 0.00502001540735364
Validation loss = 0.0054461536929011345
Validation loss = 0.006265720818191767
Validation loss = 0.007543456740677357
Validation loss = 0.005674913991242647
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006122357677668333
Validation loss = 0.006148274522274733
Validation loss = 0.00549230445176363
Validation loss = 0.005766873247921467
Validation loss = 0.0053032939322292805
Validation loss = 0.006095521617680788
Validation loss = 0.0059789614751935005
Validation loss = 0.005922655574977398
Validation loss = 0.005934970919042826
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006635528989136219
Validation loss = 0.005105301737785339
Validation loss = 0.005914455279707909
Validation loss = 0.008538658730685711
Validation loss = 0.005965462885797024
Validation loss = 0.006149178370833397
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00216  |
| Iteration     | 35        |
| MaximumReturn | -0.000603 |
| MinimumReturn | -0.014    |
| TotalSamples  | 61642     |
-----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00725801894441247
Validation loss = 0.011759880930185318
Validation loss = 0.004567284602671862
Validation loss = 0.004097359254956245
Validation loss = 0.004958436358720064
Validation loss = 0.005034586414694786
Validation loss = 0.005589651875197887
Validation loss = 0.004474894143640995
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0050239781849086285
Validation loss = 0.004075498320162296
Validation loss = 0.0041212039068341255
Validation loss = 0.004881173837929964
Validation loss = 0.0050193252973258495
Validation loss = 0.004417514428496361
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005907826125621796
Validation loss = 0.005059238988906145
Validation loss = 0.0058069913648068905
Validation loss = 0.006821171380579472
Validation loss = 0.00497761694714427
Validation loss = 0.005948980338871479
Validation loss = 0.005322123412042856
Validation loss = 0.0052668629214167595
Validation loss = 0.005149149335920811
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005626085679978132
Validation loss = 0.006703649181872606
Validation loss = 0.007268982473760843
Validation loss = 0.007526285480707884
Validation loss = 0.006992802955210209
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006097209174185991
Validation loss = 0.005424744915217161
Validation loss = 0.006784937810152769
Validation loss = 0.005463188048452139
Validation loss = 0.005876312498003244
Validation loss = 0.010084807872772217
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00386  |
| Iteration     | 36        |
| MaximumReturn | -0.000565 |
| MinimumReturn | -0.0137   |
| TotalSamples  | 63308     |
-----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006095482502132654
Validation loss = 0.006680930033326149
Validation loss = 0.008072626776993275
Validation loss = 0.005639936309307814
Validation loss = 0.005792131181806326
Validation loss = 0.006630974356085062
Validation loss = 0.004837654996663332
Validation loss = 0.00580564746633172
Validation loss = 0.006729379296302795
Validation loss = 0.005266816355288029
Validation loss = 0.005494917742908001
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010097483173012733
Validation loss = 0.0049957772716879845
Validation loss = 0.006140603218227625
Validation loss = 0.0054091280326247215
Validation loss = 0.004239861387759447
Validation loss = 0.004968663677573204
Validation loss = 0.006589351687580347
Validation loss = 0.005138549488037825
Validation loss = 0.004811475053429604
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00686531700193882
Validation loss = 0.00666268402710557
Validation loss = 0.0064669037237763405
Validation loss = 0.007283297833055258
Validation loss = 0.005370339844375849
Validation loss = 0.005394445266574621
Validation loss = 0.004923316650092602
Validation loss = 0.004841573070734739
Validation loss = 0.00566073739901185
Validation loss = 0.005585104692727327
Validation loss = 0.00870340596884489
Validation loss = 0.005254122894257307
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005964605137705803
Validation loss = 0.005847627762705088
Validation loss = 0.006524841301143169
Validation loss = 0.00743161141872406
Validation loss = 0.006992415990680456
Validation loss = 0.009193469770252705
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00800328515470028
Validation loss = 0.005908117163926363
Validation loss = 0.004793616943061352
Validation loss = 0.005956050008535385
Validation loss = 0.005831470713019371
Validation loss = 0.008308191783726215
Validation loss = 0.007013499736785889
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00422 |
| Iteration     | 37       |
| MaximumReturn | -0.00048 |
| MinimumReturn | -0.0196  |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004629281349480152
Validation loss = 0.009081065654754639
Validation loss = 0.004508259706199169
Validation loss = 0.006290102377533913
Validation loss = 0.005954954773187637
Validation loss = 0.005056856665760279
Validation loss = 0.00491760391741991
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004390275571495295
Validation loss = 0.004672760609537363
Validation loss = 0.006005975883454084
Validation loss = 0.004081105813384056
Validation loss = 0.005098129156976938
Validation loss = 0.004101489670574665
Validation loss = 0.005776338279247284
Validation loss = 0.0043605356477200985
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009106186218559742
Validation loss = 0.0056624156422913074
Validation loss = 0.0051727984100580215
Validation loss = 0.004724530503153801
Validation loss = 0.0049481382593512535
Validation loss = 0.0050009614787995815
Validation loss = 0.005642878822982311
Validation loss = 0.005439619068056345
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007659027352929115
Validation loss = 0.006071225740015507
Validation loss = 0.007875807583332062
Validation loss = 0.006516310386359692
Validation loss = 0.006235645152628422
Validation loss = 0.0058349184691905975
Validation loss = 0.005693637765944004
Validation loss = 0.006582621019333601
Validation loss = 0.006360355764627457
Validation loss = 0.006161779630929232
Validation loss = 0.006519069895148277
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005542761646211147
Validation loss = 0.007182785775512457
Validation loss = 0.006149091757833958
Validation loss = 0.004995108116418123
Validation loss = 0.008885689079761505
Validation loss = 0.006264215800911188
Validation loss = 0.005193640012294054
Validation loss = 0.0064843157306313515
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0053   |
| Iteration     | 38        |
| MaximumReturn | -0.000511 |
| MinimumReturn | -0.0188   |
| TotalSamples  | 66640     |
-----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005196170415729284
Validation loss = 0.006325209513306618
Validation loss = 0.004098090808838606
Validation loss = 0.0038869711570441723
Validation loss = 0.006643627304583788
Validation loss = 0.004728797357529402
Validation loss = 0.0044565871357917786
Validation loss = 0.004794442094862461
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006280247587710619
Validation loss = 0.004465083125978708
Validation loss = 0.005072867963463068
Validation loss = 0.008551359176635742
Validation loss = 0.0035223085433244705
Validation loss = 0.0052075511775910854
Validation loss = 0.0038219676353037357
Validation loss = 0.003947196062654257
Validation loss = 0.0060111042112112045
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00535934790968895
Validation loss = 0.006029872689396143
Validation loss = 0.006082936190068722
Validation loss = 0.005620166659355164
Validation loss = 0.007683726027607918
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00624113017693162
Validation loss = 0.005908959079533815
Validation loss = 0.005174411926418543
Validation loss = 0.006130106747150421
Validation loss = 0.006449654698371887
Validation loss = 0.006151226814836264
Validation loss = 0.006465964950621128
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005094984080642462
Validation loss = 0.005656696856021881
Validation loss = 0.005668899975717068
Validation loss = 0.005657036788761616
Validation loss = 0.007384210824966431
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00648 |
| Iteration     | 39       |
| MaximumReturn | -0.00065 |
| MinimumReturn | -0.0285  |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008504882454872131
Validation loss = 0.005109407473355532
Validation loss = 0.004643578547984362
Validation loss = 0.004953456111252308
Validation loss = 0.006324245594441891
Validation loss = 0.007075319066643715
Validation loss = 0.004612382967025042
Validation loss = 0.004898882005363703
Validation loss = 0.004248937126249075
Validation loss = 0.006377744022756815
Validation loss = 0.005802469793707132
Validation loss = 0.006249521858990192
Validation loss = 0.004195850342512131
Validation loss = 0.00440698117017746
Validation loss = 0.0056490786373615265
Validation loss = 0.005252129398286343
Validation loss = 0.00562190031632781
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007089476566761732
Validation loss = 0.004392279777675867
Validation loss = 0.004653564188629389
Validation loss = 0.003937210887670517
Validation loss = 0.005568031221628189
Validation loss = 0.005922886077314615
Validation loss = 0.006633075885474682
Validation loss = 0.004187269601970911
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004061487969011068
Validation loss = 0.005148925352841616
Validation loss = 0.005329193081706762
Validation loss = 0.005150223150849342
Validation loss = 0.004839510191231966
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008364816196262836
Validation loss = 0.006138799246400595
Validation loss = 0.005147272255271673
Validation loss = 0.00630830368027091
Validation loss = 0.006235652137547731
Validation loss = 0.005266472697257996
Validation loss = 0.006666847504675388
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007937540300190449
Validation loss = 0.006494101602584124
Validation loss = 0.0054106866009533405
Validation loss = 0.005724634509533644
Validation loss = 0.007240960840135813
Validation loss = 0.004774540197104216
Validation loss = 0.005947877652943134
Validation loss = 0.005533100105822086
Validation loss = 0.005043179262429476
Validation loss = 0.0047566150315105915
Validation loss = 0.005573625676333904
Validation loss = 0.005600305274128914
Validation loss = 0.006182338576763868
Validation loss = 0.004686886444687843
Validation loss = 0.007431851699948311
Validation loss = 0.005507282912731171
Validation loss = 0.010594027116894722
Validation loss = 0.005654101725667715
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00203  |
| Iteration     | 40        |
| MaximumReturn | -0.000649 |
| MinimumReturn | -0.0118   |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008332640863955021
Validation loss = 0.006867366377264261
Validation loss = 0.003798477118834853
Validation loss = 0.004676069598644972
Validation loss = 0.00458567263558507
Validation loss = 0.004843677394092083
Validation loss = 0.004644401837140322
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0037014628760516644
Validation loss = 0.005797747056931257
Validation loss = 0.0050931149162352085
Validation loss = 0.004077192861586809
Validation loss = 0.0038980257231742144
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006041199900209904
Validation loss = 0.004387565888464451
Validation loss = 0.004482606891542673
Validation loss = 0.005370765924453735
Validation loss = 0.00523196067661047
Validation loss = 0.005068770609796047
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006753180176019669
Validation loss = 0.0075281416065990925
Validation loss = 0.006159449927508831
Validation loss = 0.006556349340826273
Validation loss = 0.0053541939705610275
Validation loss = 0.007406479213386774
Validation loss = 0.007106315344572067
Validation loss = 0.007040534634143114
Validation loss = 0.006586293689906597
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006044455338269472
Validation loss = 0.005748831667006016
Validation loss = 0.006202744320034981
Validation loss = 0.005272149108350277
Validation loss = 0.005647597834467888
Validation loss = 0.006203390657901764
Validation loss = 0.00847285334020853
Validation loss = 0.0065141054801642895
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000808 |
| Iteration     | 41        |
| MaximumReturn | -0.000536 |
| MinimumReturn | -0.00103  |
| TotalSamples  | 71638     |
-----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004851494450122118
Validation loss = 0.005564706400036812
Validation loss = 0.005388980731368065
Validation loss = 0.004361819475889206
Validation loss = 0.004888102877885103
Validation loss = 0.00550870830193162
Validation loss = 0.0052456627599895
Validation loss = 0.005681367125362158
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00432857358828187
Validation loss = 0.005458681844174862
Validation loss = 0.004002328962087631
Validation loss = 0.003795296885073185
Validation loss = 0.00445021316409111
Validation loss = 0.006051525939255953
Validation loss = 0.003591956803575158
Validation loss = 0.007819070480763912
Validation loss = 0.005576906260102987
Validation loss = 0.005706556141376495
Validation loss = 0.004278047941625118
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005249784328043461
Validation loss = 0.004796172957867384
Validation loss = 0.004613087512552738
Validation loss = 0.004076910205185413
Validation loss = 0.004398665856570005
Validation loss = 0.007696530781686306
Validation loss = 0.006825600750744343
Validation loss = 0.00356084480881691
Validation loss = 0.0051281447522342205
Validation loss = 0.0051034740172326565
Validation loss = 0.00653586070984602
Validation loss = 0.0049706874415278435
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006279493682086468
Validation loss = 0.005342824850231409
Validation loss = 0.007297162432223558
Validation loss = 0.00591636635363102
Validation loss = 0.004818911198526621
Validation loss = 0.005933977197855711
Validation loss = 0.004897358361631632
Validation loss = 0.006538179703056812
Validation loss = 0.005892508197575808
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0053338780999183655
Validation loss = 0.0056490846909582615
Validation loss = 0.005430798977613449
Validation loss = 0.00523156626150012
Validation loss = 0.005840185564011335
Validation loss = 0.0054504843428730965
Validation loss = 0.006157103925943375
Validation loss = 0.0056345765478909016
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00112  |
| Iteration     | 42        |
| MaximumReturn | -0.000648 |
| MinimumReturn | -0.00403  |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005026752129197121
Validation loss = 0.004739406518638134
Validation loss = 0.004104264080524445
Validation loss = 0.005718836095184088
Validation loss = 0.004683484323322773
Validation loss = 0.004617591854184866
Validation loss = 0.004411209374666214
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006291328463703394
Validation loss = 0.010723229497671127
Validation loss = 0.005696549545973539
Validation loss = 0.003633473999798298
Validation loss = 0.0060655707493424416
Validation loss = 0.0056804483756423
Validation loss = 0.0056001730263233185
Validation loss = 0.005739693529903889
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005091659724712372
Validation loss = 0.007150582037866116
Validation loss = 0.006585958879441023
Validation loss = 0.006784770637750626
Validation loss = 0.009113151580095291
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00690216151997447
Validation loss = 0.005070088431239128
Validation loss = 0.005674400366842747
Validation loss = 0.0067137754522264
Validation loss = 0.005202007479965687
Validation loss = 0.005615361966192722
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005427717696875334
Validation loss = 0.005706013645976782
Validation loss = 0.010324493981897831
Validation loss = 0.005782982334494591
Validation loss = 0.005357783753424883
Validation loss = 0.005361088551580906
Validation loss = 0.005832665134221315
Validation loss = 0.005372967571020126
Validation loss = 0.007238355930894613
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 43        |
| MaximumReturn | -0.000525 |
| MinimumReturn | -0.00422  |
| TotalSamples  | 74970     |
-----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006806966383010149
Validation loss = 0.004202073905616999
Validation loss = 0.006127471104264259
Validation loss = 0.004458709619939327
Validation loss = 0.004483639728277922
Validation loss = 0.005187495611608028
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00485785398632288
Validation loss = 0.004069042392075062
Validation loss = 0.00442466838285327
Validation loss = 0.0036792673636227846
Validation loss = 0.005868448410183191
Validation loss = 0.003549599554389715
Validation loss = 0.0031473704148083925
Validation loss = 0.004968675319105387
Validation loss = 0.003738680388778448
Validation loss = 0.00630313903093338
Validation loss = 0.009737230837345123
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011085428297519684
Validation loss = 0.005316972266882658
Validation loss = 0.0077138133347034454
Validation loss = 0.006344743072986603
Validation loss = 0.005209599621593952
Validation loss = 0.006488030776381493
Validation loss = 0.005700947716832161
Validation loss = 0.008744487538933754
Validation loss = 0.006697732489556074
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006381179206073284
Validation loss = 0.00543617270886898
Validation loss = 0.006300641689449549
Validation loss = 0.005561656318604946
Validation loss = 0.007699066773056984
Validation loss = 0.007838559336960316
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00644996203482151
Validation loss = 0.005428065545856953
Validation loss = 0.005518951453268528
Validation loss = 0.005591732449829578
Validation loss = 0.005637540016323328
Validation loss = 0.005342018324881792
Validation loss = 0.006779943127185106
Validation loss = 0.004815310705453157
Validation loss = 0.005961916409432888
Validation loss = 0.007260204758495092
Validation loss = 0.005739860236644745
Validation loss = 0.004998220596462488
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00145  |
| Iteration     | 44        |
| MaximumReturn | -0.000566 |
| MinimumReturn | -0.00555  |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004018716514110565
Validation loss = 0.004409496206790209
Validation loss = 0.007136182393878698
Validation loss = 0.004873102996498346
Validation loss = 0.00670581916347146
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005145492032170296
Validation loss = 0.004374913405627012
Validation loss = 0.005326050333678722
Validation loss = 0.004727369640022516
Validation loss = 0.004904998932033777
Validation loss = 0.004601455759257078
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005395807791501284
Validation loss = 0.004959379322826862
Validation loss = 0.005057854112237692
Validation loss = 0.005894076079130173
Validation loss = 0.006288572214543819
Validation loss = 0.005739251151680946
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007156079635024071
Validation loss = 0.0078054084442555904
Validation loss = 0.006486901082098484
Validation loss = 0.006812047213315964
Validation loss = 0.005928225815296173
Validation loss = 0.006084900349378586
Validation loss = 0.005791178438812494
Validation loss = 0.0053729526698589325
Validation loss = 0.005259548779577017
Validation loss = 0.005189727991819382
Validation loss = 0.00581571226939559
Validation loss = 0.005311896093189716
Validation loss = 0.005997090600430965
Validation loss = 0.005999465472996235
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006199351511895657
Validation loss = 0.008472654968500137
Validation loss = 0.005419131834059954
Validation loss = 0.004958976060152054
Validation loss = 0.005286272149533033
Validation loss = 0.004977039527148008
Validation loss = 0.00666496017947793
Validation loss = 0.00583439227193594
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00145  |
| Iteration     | 45        |
| MaximumReturn | -0.000589 |
| MinimumReturn | -0.00668  |
| TotalSamples  | 78302     |
-----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004317089449614286
Validation loss = 0.003529860870912671
Validation loss = 0.005047121085226536
Validation loss = 0.004182355012744665
Validation loss = 0.004613372962921858
Validation loss = 0.004687792155891657
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00490267900750041
Validation loss = 0.0038801711052656174
Validation loss = 0.004548780620098114
Validation loss = 0.003957943059504032
Validation loss = 0.004549768753349781
Validation loss = 0.004213050939142704
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006752877030521631
Validation loss = 0.005641953554004431
Validation loss = 0.005321917124092579
Validation loss = 0.010730046778917313
Validation loss = 0.0049101426266133785
Validation loss = 0.006642287131398916
Validation loss = 0.004948265850543976
Validation loss = 0.004369175527244806
Validation loss = 0.006474602501839399
Validation loss = 0.004611262120306492
Validation loss = 0.003931078594177961
Validation loss = 0.0036288476549088955
Validation loss = 0.0042546712793409824
Validation loss = 0.003814050694927573
Validation loss = 0.004099993966519833
Validation loss = 0.0036607449874281883
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005409665405750275
Validation loss = 0.007485251408070326
Validation loss = 0.005430903751403093
Validation loss = 0.006464444566518068
Validation loss = 0.007850304245948792
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005426798015832901
Validation loss = 0.008069507777690887
Validation loss = 0.005791880190372467
Validation loss = 0.005256264470517635
Validation loss = 0.007675913628190756
Validation loss = 0.0052575222216546535
Validation loss = 0.005155828781425953
Validation loss = 0.005864785052835941
Validation loss = 0.005033496301621199
Validation loss = 0.0061218151822686195
Validation loss = 0.006094626151025295
Validation loss = 0.005154267884790897
Validation loss = 0.00688535813242197
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00162  |
| Iteration     | 46        |
| MaximumReturn | -0.000524 |
| MinimumReturn | -0.00615  |
| TotalSamples  | 79968     |
-----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004368850030004978
Validation loss = 0.005676900502294302
Validation loss = 0.005170946009457111
Validation loss = 0.004369910340756178
Validation loss = 0.006510971579700708
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00349728437140584
Validation loss = 0.0031365747563540936
Validation loss = 0.0037243571132421494
Validation loss = 0.004108920693397522
Validation loss = 0.005276013165712357
Validation loss = 0.0039452859200537205
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006532350089401007
Validation loss = 0.00665672030299902
Validation loss = 0.005216695833951235
Validation loss = 0.005819088779389858
Validation loss = 0.004408568609505892
Validation loss = 0.005105584394186735
Validation loss = 0.005092618055641651
Validation loss = 0.004650871269404888
Validation loss = 0.004200478084385395
Validation loss = 0.005343601573258638
Validation loss = 0.003738385159522295
Validation loss = 0.008550492115318775
Validation loss = 0.0040150778368115425
Validation loss = 0.004433218389749527
Validation loss = 0.004870546981692314
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006400581449270248
Validation loss = 0.00577482208609581
Validation loss = 0.006295216269791126
Validation loss = 0.005678222514688969
Validation loss = 0.005645156372338533
Validation loss = 0.005738849751651287
Validation loss = 0.0054437885992228985
Validation loss = 0.005996826104819775
Validation loss = 0.00546750333160162
Validation loss = 0.006002382375299931
Validation loss = 0.0046533988788723946
Validation loss = 0.0057692015543580055
Validation loss = 0.004670288413763046
Validation loss = 0.00571648171171546
Validation loss = 0.005492324475198984
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005154943559318781
Validation loss = 0.005294847767800093
Validation loss = 0.004834887571632862
Validation loss = 0.0060486202128231525
Validation loss = 0.009298448450863361
Validation loss = 0.007646199315786362
Validation loss = 0.008095191791653633
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00343  |
| Iteration     | 47        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.0126   |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007138101849704981
Validation loss = 0.0038813608698546886
Validation loss = 0.003550719004124403
Validation loss = 0.004226116929203272
Validation loss = 0.004100644029676914
Validation loss = 0.005554890260100365
Validation loss = 0.005263921804726124
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014185081236064434
Validation loss = 0.005406327545642853
Validation loss = 0.003540544304996729
Validation loss = 0.005530783906579018
Validation loss = 0.010756340809166431
Validation loss = 0.003951797727495432
Validation loss = 0.0034216928761452436
Validation loss = 0.0033503402955830097
Validation loss = 0.0035423901863396168
Validation loss = 0.0038237213157117367
Validation loss = 0.004354394506663084
Validation loss = 0.0038857038598507643
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003723829984664917
Validation loss = 0.007021625991910696
Validation loss = 0.005453717894852161
Validation loss = 0.004818425048142672
Validation loss = 0.004996875766664743
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005408514756709337
Validation loss = 0.0056989132426679134
Validation loss = 0.0063910773023962975
Validation loss = 0.005342555232346058
Validation loss = 0.00657601747661829
Validation loss = 0.0057158442214131355
Validation loss = 0.006310544908046722
Validation loss = 0.004676277749240398
Validation loss = 0.005329343024641275
Validation loss = 0.004938256926834583
Validation loss = 0.005639698356389999
Validation loss = 0.004975269082933664
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007356618531048298
Validation loss = 0.004735881462693214
Validation loss = 0.004862294066697359
Validation loss = 0.005970983766019344
Validation loss = 0.0063813189044594765
Validation loss = 0.004796830005943775
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00298 |
| Iteration     | 48       |
| MaximumReturn | -0.00068 |
| MinimumReturn | -0.0161  |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004296338185667992
Validation loss = 0.004818449262529612
Validation loss = 0.005683710798621178
Validation loss = 0.003973405342549086
Validation loss = 0.00608538743108511
Validation loss = 0.004732209723442793
Validation loss = 0.0036537162959575653
Validation loss = 0.00429642666131258
Validation loss = 0.004579510074108839
Validation loss = 0.0035170577466487885
Validation loss = 0.004006013739854097
Validation loss = 0.0059707812033593655
Validation loss = 0.004079645499587059
Validation loss = 0.0047265877947211266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004069941584020853
Validation loss = 0.004896822385489941
Validation loss = 0.004648946225643158
Validation loss = 0.0032814047299325466
Validation loss = 0.0031496798619627953
Validation loss = 0.006546973250806332
Validation loss = 0.0032179933041334152
Validation loss = 0.0043618143536150455
Validation loss = 0.0033451332710683346
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0062533216550946236
Validation loss = 0.005060157272964716
Validation loss = 0.004234627354890108
Validation loss = 0.006539028603583574
Validation loss = 0.004566160496324301
Validation loss = 0.003552027279511094
Validation loss = 0.004490784835070372
Validation loss = 0.007305084727704525
Validation loss = 0.004864961374551058
Validation loss = 0.005515864118933678
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004965655505657196
Validation loss = 0.0052367690950632095
Validation loss = 0.005916869267821312
Validation loss = 0.005000158678740263
Validation loss = 0.005470352713018656
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005457374267280102
Validation loss = 0.005236076656728983
Validation loss = 0.004846871830523014
Validation loss = 0.011595968157052994
Validation loss = 0.005322770681232214
Validation loss = 0.006084030959755182
Validation loss = 0.007697527762502432
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0023   |
| Iteration     | 49        |
| MaximumReturn | -0.000662 |
| MinimumReturn | -0.0114   |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0053410823456943035
Validation loss = 0.003901857417076826
Validation loss = 0.0066849165596067905
Validation loss = 0.006977589335292578
Validation loss = 0.004152283538132906
Validation loss = 0.0046521141193807125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004896769765764475
Validation loss = 0.0038282086607068777
Validation loss = 0.004585806746035814
Validation loss = 0.004428192973136902
Validation loss = 0.003998277243226767
Validation loss = 0.004346623085439205
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004504380282014608
Validation loss = 0.005323408171534538
Validation loss = 0.00618680939078331
Validation loss = 0.006684189196676016
Validation loss = 0.006037931423634291
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005293110851198435
Validation loss = 0.004835556261241436
Validation loss = 0.0045440844260156155
Validation loss = 0.0048315622843801975
Validation loss = 0.005035779904574156
Validation loss = 0.007604979909956455
Validation loss = 0.004439033102244139
Validation loss = 0.006901944056153297
Validation loss = 0.004715648014098406
Validation loss = 0.006296485662460327
Validation loss = 0.004957993980497122
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005217620171606541
Validation loss = 0.005055224057286978
Validation loss = 0.005156270228326321
Validation loss = 0.004601531662046909
Validation loss = 0.006393383722752333
Validation loss = 0.006368187256157398
Validation loss = 0.007172484416514635
Validation loss = 0.005938399583101273
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00397  |
| Iteration     | 50        |
| MaximumReturn | -0.000738 |
| MinimumReturn | -0.0226   |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0037538453470915556
Validation loss = 0.0031193019822239876
Validation loss = 0.004102593287825584
Validation loss = 0.0035331419203430414
Validation loss = 0.003682770300656557
Validation loss = 0.004067509900778532
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0036598346196115017
Validation loss = 0.004470772575587034
Validation loss = 0.0036296930629760027
Validation loss = 0.0053047845140099525
Validation loss = 0.00332056125625968
Validation loss = 0.003635405097156763
Validation loss = 0.004097390454262495
Validation loss = 0.01283934898674488
Validation loss = 0.0030633253045380116
Validation loss = 0.005095741245895624
Validation loss = 0.003714792663231492
Validation loss = 0.004155048634856939
Validation loss = 0.0031558964401483536
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006171524059027433
Validation loss = 0.0037720557302236557
Validation loss = 0.004897043574601412
Validation loss = 0.00529320677742362
Validation loss = 0.003928760997951031
Validation loss = 0.004534498322755098
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00480429083108902
Validation loss = 0.005235324613749981
Validation loss = 0.006370099261403084
Validation loss = 0.0049120294861495495
Validation loss = 0.0054938411340117455
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006456050090491772
Validation loss = 0.005614593159407377
Validation loss = 0.005439735949039459
Validation loss = 0.005789828486740589
Validation loss = 0.006855517625808716
Validation loss = 0.0055871871300041676
Validation loss = 0.006187015678733587
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0015   |
| Iteration     | 51        |
| MaximumReturn | -0.000623 |
| MinimumReturn | -0.00836  |
| TotalSamples  | 88298     |
-----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004807460121810436
Validation loss = 0.004075073171406984
Validation loss = 0.0037577187176793814
Validation loss = 0.0037248500157147646
Validation loss = 0.005432896316051483
Validation loss = 0.0037003906909376383
Validation loss = 0.00432501919567585
Validation loss = 0.005530375521630049
Validation loss = 0.0031495040748268366
Validation loss = 0.003906934522092342
Validation loss = 0.004351137205958366
Validation loss = 0.005806338973343372
Validation loss = 0.0040438007563352585
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00562108401209116
Validation loss = 0.008754530921578407
Validation loss = 0.004027761984616518
Validation loss = 0.006285807117819786
Validation loss = 0.0038009618874639273
Validation loss = 0.004839401226490736
Validation loss = 0.003802638268098235
Validation loss = 0.003993574995547533
Validation loss = 0.0037577778566628695
Validation loss = 0.005298522301018238
Validation loss = 0.0040761870332062244
Validation loss = 0.004232902079820633
Validation loss = 0.003899527946487069
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004845953546464443
Validation loss = 0.0037874223198741674
Validation loss = 0.0034327006433159113
Validation loss = 0.0039651948027312756
Validation loss = 0.0038626508321613073
Validation loss = 0.00328657403588295
Validation loss = 0.004284403752535582
Validation loss = 0.003908153157681227
Validation loss = 0.004609891679137945
Validation loss = 0.003589772619307041
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004466641694307327
Validation loss = 0.005424536298960447
Validation loss = 0.005213884171098471
Validation loss = 0.0070712510496377945
Validation loss = 0.007328041363507509
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005604775622487068
Validation loss = 0.005090883933007717
Validation loss = 0.005009735934436321
Validation loss = 0.00619078753516078
Validation loss = 0.00525963818654418
Validation loss = 0.005681753158569336
Validation loss = 0.0051792156882584095
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00187  |
| Iteration     | 52        |
| MaximumReturn | -0.000672 |
| MinimumReturn | -0.0139   |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003860750235617161
Validation loss = 0.004162967670708895
Validation loss = 0.003908931743353605
Validation loss = 0.003726291935890913
Validation loss = 0.0035743338521569967
Validation loss = 0.0037234604824334383
Validation loss = 0.00476671801880002
Validation loss = 0.0031444234773516655
Validation loss = 0.0038332061376422644
Validation loss = 0.004649691749364138
Validation loss = 0.00479225954040885
Validation loss = 0.00468938983976841
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0034901052713394165
Validation loss = 0.005617005284875631
Validation loss = 0.004538039211183786
Validation loss = 0.00467031542211771
Validation loss = 0.004533193539828062
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0034340480342507362
Validation loss = 0.004329581279307604
Validation loss = 0.0043013086542487144
Validation loss = 0.0036712870933115482
Validation loss = 0.0038696564733982086
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006105218548327684
Validation loss = 0.0047548613511025906
Validation loss = 0.005624838173389435
Validation loss = 0.005021503660827875
Validation loss = 0.006009586155414581
Validation loss = 0.005106644704937935
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006126657593995333
Validation loss = 0.006356775760650635
Validation loss = 0.005274205468595028
Validation loss = 0.005245138891041279
Validation loss = 0.007126734592020512
Validation loss = 0.00610891729593277
Validation loss = 0.005305892787873745
Validation loss = 0.0059558856301009655
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00249  |
| Iteration     | 53        |
| MaximumReturn | -0.000578 |
| MinimumReturn | -0.0196   |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004849372431635857
Validation loss = 0.004765971563756466
Validation loss = 0.0037448753137141466
Validation loss = 0.004112966358661652
Validation loss = 0.0055521694011986256
Validation loss = 0.004531410522758961
Validation loss = 0.005830373149365187
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0033558057621121407
Validation loss = 0.003643719246610999
Validation loss = 0.0030591958202421665
Validation loss = 0.004519380163401365
Validation loss = 0.0041108704172074795
Validation loss = 0.004746780730783939
Validation loss = 0.004057800862938166
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0038507324643433094
Validation loss = 0.004282983485609293
Validation loss = 0.005282061640173197
Validation loss = 0.0034190716687589884
Validation loss = 0.003607407445088029
Validation loss = 0.00598866818472743
Validation loss = 0.005510975141078234
Validation loss = 0.005548452027142048
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005202903412282467
Validation loss = 0.006345974747091532
Validation loss = 0.005134034436196089
Validation loss = 0.0050483690574765205
Validation loss = 0.005357225425541401
Validation loss = 0.004587574861943722
Validation loss = 0.005233357660472393
Validation loss = 0.005194265861064196
Validation loss = 0.0053010666742920876
Validation loss = 0.005308480467647314
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005945094861090183
Validation loss = 0.005032050423324108
Validation loss = 0.0054616727866232395
Validation loss = 0.006764361634850502
Validation loss = 0.005310960114002228
Validation loss = 0.0052394806407392025
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00112  |
| Iteration     | 54        |
| MaximumReturn | -0.000621 |
| MinimumReturn | -0.00802  |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004223201423883438
Validation loss = 0.0031114667654037476
Validation loss = 0.004297763574868441
Validation loss = 0.003323820885270834
Validation loss = 0.0034428166691213846
Validation loss = 0.003729514079168439
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004213741980493069
Validation loss = 0.004252159036695957
Validation loss = 0.0035142386332154274
Validation loss = 0.004572962410748005
Validation loss = 0.0037343313451856375
Validation loss = 0.0037771067582070827
Validation loss = 0.005390514619648457
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004774283617734909
Validation loss = 0.004533516708761454
Validation loss = 0.0038554351776838303
Validation loss = 0.003902432043105364
Validation loss = 0.0051516322419047356
Validation loss = 0.005129196215420961
Validation loss = 0.003635160392150283
Validation loss = 0.00469925394281745
Validation loss = 0.0031727508176118135
Validation loss = 0.003428997937589884
Validation loss = 0.004057325888425112
Validation loss = 0.0038253995589911938
Validation loss = 0.0032962351106107235
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005325358361005783
Validation loss = 0.005667889025062323
Validation loss = 0.004355497192591429
Validation loss = 0.003979317378252745
Validation loss = 0.0051541137509047985
Validation loss = 0.004600346554070711
Validation loss = 0.005198930390179157
Validation loss = 0.005531566217541695
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004966231994330883
Validation loss = 0.005660608876496553
Validation loss = 0.006292968522757292
Validation loss = 0.004670211113989353
Validation loss = 0.0053254361264407635
Validation loss = 0.006198440678417683
Validation loss = 0.00480146799236536
Validation loss = 0.005707934964448214
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00182  |
| Iteration     | 55        |
| MaximumReturn | -0.000578 |
| MinimumReturn | -0.013    |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004361175931990147
Validation loss = 0.0036501081194728613
Validation loss = 0.0037201899103820324
Validation loss = 0.0037726096343249083
Validation loss = 0.004969882778823376
Validation loss = 0.005061828065663576
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005582415498793125
Validation loss = 0.004239958245307207
Validation loss = 0.0047762272879481316
Validation loss = 0.005915914662182331
Validation loss = 0.0032824173104017973
Validation loss = 0.0047286455519497395
Validation loss = 0.004801414906978607
Validation loss = 0.006902812514454126
Validation loss = 0.003818501252681017
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00315373414196074
Validation loss = 0.003663986921310425
Validation loss = 0.0043232133612036705
Validation loss = 0.003984278999269009
Validation loss = 0.007870253175497055
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0051447246223688126
Validation loss = 0.005134199745953083
Validation loss = 0.005655484739691019
Validation loss = 0.004413875751197338
Validation loss = 0.004978317301720381
Validation loss = 0.004570483695715666
Validation loss = 0.006155015435069799
Validation loss = 0.004406409338116646
Validation loss = 0.005026473663747311
Validation loss = 0.004479069262742996
Validation loss = 0.005496789235621691
Validation loss = 0.0039907717145979404
Validation loss = 0.007245135027915239
Validation loss = 0.004750323016196489
Validation loss = 0.005209202412515879
Validation loss = 0.005692864302545786
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011790454387664795
Validation loss = 0.005317979957908392
Validation loss = 0.004951279144734144
Validation loss = 0.005732853431254625
Validation loss = 0.005176545586436987
Validation loss = 0.005540749058127403
Validation loss = 0.005943470634520054
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0027   |
| Iteration     | 56        |
| MaximumReturn | -0.000597 |
| MinimumReturn | -0.0143   |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0031885330099612474
Validation loss = 0.0037876300048083067
Validation loss = 0.004132408183068037
Validation loss = 0.005398031789809465
Validation loss = 0.004915639292448759
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0033873755019158125
Validation loss = 0.003448350355029106
Validation loss = 0.004193487111479044
Validation loss = 0.004042453598231077
Validation loss = 0.003980875015258789
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006294595077633858
Validation loss = 0.004293337929993868
Validation loss = 0.003810266265645623
Validation loss = 0.004204136319458485
Validation loss = 0.004365270491689444
Validation loss = 0.00429602200165391
Validation loss = 0.0035898489877581596
Validation loss = 0.0027242200449109077
Validation loss = 0.003652055049315095
Validation loss = 0.004556294064968824
Validation loss = 0.0037380156572908163
Validation loss = 0.004224754404276609
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004338942002505064
Validation loss = 0.005322251934558153
Validation loss = 0.004786120727658272
Validation loss = 0.004342005122452974
Validation loss = 0.005053072702139616
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00544084282591939
Validation loss = 0.006496043410152197
Validation loss = 0.004953470546752214
Validation loss = 0.007008127402514219
Validation loss = 0.0053209238685667515
Validation loss = 0.005078612361103296
Validation loss = 0.007177362218499184
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0019   |
| Iteration     | 57        |
| MaximumReturn | -0.000588 |
| MinimumReturn | -0.0104   |
| TotalSamples  | 98294     |
-----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005048162303864956
Validation loss = 0.004226874094456434
Validation loss = 0.003954499028623104
Validation loss = 0.0037258535157889128
Validation loss = 0.003810412483289838
Validation loss = 0.004516431596130133
Validation loss = 0.0037244318518787622
Validation loss = 0.0047583794221282005
Validation loss = 0.003920095507055521
Validation loss = 0.0040793465450406075
Validation loss = 0.003786844899877906
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0044338093139231205
Validation loss = 0.004125815350562334
Validation loss = 0.005068422760814428
Validation loss = 0.003092237515375018
Validation loss = 0.003087888238951564
Validation loss = 0.007633216679096222
Validation loss = 0.004034401848912239
Validation loss = 0.003292881418019533
Validation loss = 0.004242924507707357
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007109487429261208
Validation loss = 0.0040679569356143475
Validation loss = 0.003173292614519596
Validation loss = 0.003223368199542165
Validation loss = 0.0044149900786578655
Validation loss = 0.0033294912427663803
Validation loss = 0.0034836309496313334
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004425318446010351
Validation loss = 0.0042157103307545185
Validation loss = 0.0053964718244969845
Validation loss = 0.004701597616076469
Validation loss = 0.00574212521314621
Validation loss = 0.0053887865506112576
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007376461755484343
Validation loss = 0.004880542401224375
Validation loss = 0.005515489261597395
Validation loss = 0.004568683449178934
Validation loss = 0.004876364953815937
Validation loss = 0.006541824899613857
Validation loss = 0.004862903617322445
Validation loss = 0.00575943011790514
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0013   |
| Iteration     | 58        |
| MaximumReturn | -0.000637 |
| MinimumReturn | -0.00758  |
| TotalSamples  | 99960     |
-----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004146599676460028
Validation loss = 0.0033667602110654116
Validation loss = 0.0038622678257524967
Validation loss = 0.005044728983193636
Validation loss = 0.003319709561765194
Validation loss = 0.004452106077224016
Validation loss = 0.0044549317099153996
Validation loss = 0.00453575886785984
Validation loss = 0.0035587104503065348
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0038579944521188736
Validation loss = 0.0037763139698654413
Validation loss = 0.00463250232860446
Validation loss = 0.0035790649708360434
Validation loss = 0.003903706092387438
Validation loss = 0.003792410483583808
Validation loss = 0.0036286108661442995
Validation loss = 0.003989989869296551
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0037890842650085688
Validation loss = 0.0038659735582768917
Validation loss = 0.0037804252933710814
Validation loss = 0.004507775418460369
Validation loss = 0.0035423252265900373
Validation loss = 0.0036175844725221395
Validation loss = 0.004251927603036165
Validation loss = 0.003151289187371731
Validation loss = 0.003596767783164978
Validation loss = 0.0036219123285263777
Validation loss = 0.002882573287934065
Validation loss = 0.0051568602211773396
Validation loss = 0.003132985904812813
Validation loss = 0.0035848948173224926
Validation loss = 0.0029723928309977055
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004190091509371996
Validation loss = 0.0048709227703511715
Validation loss = 0.00421163672581315
Validation loss = 0.0063155340030789375
Validation loss = 0.006228501908481121
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006109562702476978
Validation loss = 0.004481134936213493
Validation loss = 0.004338228143751621
Validation loss = 0.004470977932214737
Validation loss = 0.004151481203734875
Validation loss = 0.005614962428808212
Validation loss = 0.004797397181391716
Validation loss = 0.007901662960648537
Validation loss = 0.00506655452772975
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0061   |
| Iteration     | 59        |
| MaximumReturn | -0.000888 |
| MinimumReturn | -0.0247   |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00452983845025301
Validation loss = 0.004014129750430584
Validation loss = 0.0035115669015794992
Validation loss = 0.0033162578474730253
Validation loss = 0.005779200699180365
Validation loss = 0.0035589493345469236
Validation loss = 0.006424913182854652
Validation loss = 0.004519111011177301
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0035513401962816715
Validation loss = 0.0038876391481608152
Validation loss = 0.003756715916097164
Validation loss = 0.004627150483429432
Validation loss = 0.0033541640732437372
Validation loss = 0.003697749460116029
Validation loss = 0.0037306940648704767
Validation loss = 0.0036234213039278984
Validation loss = 0.003352368250489235
Validation loss = 0.004593216348439455
Validation loss = 0.00396482041105628
Validation loss = 0.0041423668153584
Validation loss = 0.0036206713411957026
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027188602834939957
Validation loss = 0.0033663406502455473
Validation loss = 0.005146294832229614
Validation loss = 0.008218351751565933
Validation loss = 0.0032214925158768892
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004345358815044165
Validation loss = 0.005033452529460192
Validation loss = 0.005063266027718782
Validation loss = 0.00492871692404151
Validation loss = 0.006010929122567177
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004660042934119701
Validation loss = 0.00799270998686552
Validation loss = 0.0049184043891727924
Validation loss = 0.0050406623631715775
Validation loss = 0.005829086527228355
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00116  |
| Iteration     | 60        |
| MaximumReturn | -0.000718 |
| MinimumReturn | -0.00817  |
| TotalSamples  | 103292    |
-----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009643220342695713
Validation loss = 0.005002835765480995
Validation loss = 0.004815808031708002
Validation loss = 0.004178135190159082
Validation loss = 0.003561401506885886
Validation loss = 0.00482769263908267
Validation loss = 0.004244519397616386
Validation loss = 0.003774512093514204
Validation loss = 0.004041252192109823
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003908256534487009
Validation loss = 0.004731801338493824
Validation loss = 0.006933822762221098
Validation loss = 0.0038899241480976343
Validation loss = 0.0033019923139363527
Validation loss = 0.004525564145296812
Validation loss = 0.004033065401017666
Validation loss = 0.004449507687240839
Validation loss = 0.00386799406260252
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004260648973286152
Validation loss = 0.004824215080589056
Validation loss = 0.005646853242069483
Validation loss = 0.005227407440543175
Validation loss = 0.0028957545291632414
Validation loss = 0.003213115967810154
Validation loss = 0.004747038707137108
Validation loss = 0.002629178809002042
Validation loss = 0.0072693414986133575
Validation loss = 0.0036607016809284687
Validation loss = 0.003100987756624818
Validation loss = 0.0033556062262505293
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005839604418724775
Validation loss = 0.004446609411388636
Validation loss = 0.007792147807776928
Validation loss = 0.004663622938096523
Validation loss = 0.005801210179924965
Validation loss = 0.005396728869527578
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005243239458650351
Validation loss = 0.005758948624134064
Validation loss = 0.005444410257041454
Validation loss = 0.005609947256743908
Validation loss = 0.004494217690080404
Validation loss = 0.0045254346914589405
Validation loss = 0.005186484195291996
Validation loss = 0.005733224097639322
Validation loss = 0.006116976961493492
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00311  |
| Iteration     | 61        |
| MaximumReturn | -0.000541 |
| MinimumReturn | -0.0179   |
| TotalSamples  | 104958    |
-----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0031583192758262157
Validation loss = 0.004043621942400932
Validation loss = 0.003143765963613987
Validation loss = 0.003649388672783971
Validation loss = 0.00507280882447958
Validation loss = 0.004847722128033638
Validation loss = 0.005070856772363186
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0034134341403841972
Validation loss = 0.003333211876451969
Validation loss = 0.0038579495158046484
Validation loss = 0.0030368370935320854
Validation loss = 0.004114401061087847
Validation loss = 0.0037947052624076605
Validation loss = 0.003423035377636552
Validation loss = 0.0036179774906486273
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005969109013676643
Validation loss = 0.002553746569901705
Validation loss = 0.003758158767595887
Validation loss = 0.006582397501915693
Validation loss = 0.0039308033883571625
Validation loss = 0.0030596342403441668
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00396548630669713
Validation loss = 0.004081356339156628
Validation loss = 0.004104075953364372
Validation loss = 0.0070460159331560135
Validation loss = 0.005417903419584036
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004648031201213598
Validation loss = 0.004623654764145613
Validation loss = 0.004352746065706015
Validation loss = 0.006037183105945587
Validation loss = 0.005211785901337862
Validation loss = 0.005691987928003073
Validation loss = 0.005204886198043823
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00298  |
| Iteration     | 62        |
| MaximumReturn | -0.000525 |
| MinimumReturn | -0.0228   |
| TotalSamples  | 106624    |
-----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0037259107921272516
Validation loss = 0.003452427452430129
Validation loss = 0.004132288973778486
Validation loss = 0.003267040243372321
Validation loss = 0.003197837620973587
Validation loss = 0.005280111916363239
Validation loss = 0.0031999198254197836
Validation loss = 0.0051073855720460415
Validation loss = 0.0042077722027897835
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0034957488533109426
Validation loss = 0.008544831536710262
Validation loss = 0.004075265489518642
Validation loss = 0.004006654489785433
Validation loss = 0.004438317846506834
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0026786907110363245
Validation loss = 0.0037883827462792397
Validation loss = 0.003727653995156288
Validation loss = 0.0027049905620515347
Validation loss = 0.0026637990958988667
Validation loss = 0.002445241203531623
Validation loss = 0.0035800705663859844
Validation loss = 0.0024711040314286947
Validation loss = 0.004406441934406757
Validation loss = 0.002650023903697729
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0064755757339298725
Validation loss = 0.0046242233365774155
Validation loss = 0.005901667755097151
Validation loss = 0.005294179543852806
Validation loss = 0.0038576568476855755
Validation loss = 0.004300614818930626
Validation loss = 0.004953966476023197
Validation loss = 0.004322495311498642
Validation loss = 0.004048663657158613
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004988397005945444
Validation loss = 0.005501796957105398
Validation loss = 0.005335533060133457
Validation loss = 0.005610750522464514
Validation loss = 0.004575490485876799
Validation loss = 0.004920552019029856
Validation loss = 0.004927561152726412
Validation loss = 0.004339946899563074
Validation loss = 0.004410122986882925
Validation loss = 0.005509157665073872
Validation loss = 0.005712255369871855
Validation loss = 0.005581320263445377
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000805 |
| Iteration     | 63        |
| MaximumReturn | -0.000603 |
| MinimumReturn | -0.00101  |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003396732499822974
Validation loss = 0.0036487432662397623
Validation loss = 0.005033975932747126
Validation loss = 0.003671410260722041
Validation loss = 0.0030546565540134907
Validation loss = 0.003600016701966524
Validation loss = 0.0037111651618033648
Validation loss = 0.004543726332485676
Validation loss = 0.003800728591158986
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003552147652953863
Validation loss = 0.003031959757208824
Validation loss = 0.0032697361893951893
Validation loss = 0.0031951237469911575
Validation loss = 0.004162031691521406
Validation loss = 0.003409055294468999
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0028459508903324604
Validation loss = 0.0034247960429638624
Validation loss = 0.004408208653330803
Validation loss = 0.005193411372601986
Validation loss = 0.0047075157053768635
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005623956676572561
Validation loss = 0.004441904369741678
Validation loss = 0.004958202596753836
Validation loss = 0.004740610718727112
Validation loss = 0.005164411384612322
Validation loss = 0.005717211402952671
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004975906573235989
Validation loss = 0.0044927578419446945
Validation loss = 0.004908057861030102
Validation loss = 0.004666933324187994
Validation loss = 0.004023255780339241
Validation loss = 0.007461694534868002
Validation loss = 0.004299608059227467
Validation loss = 0.004116904456168413
Validation loss = 0.006508525926619768
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00124  |
| Iteration     | 64        |
| MaximumReturn | -0.000585 |
| MinimumReturn | -0.00851  |
| TotalSamples  | 109956    |
-----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00391271011903882
Validation loss = 0.004189504776149988
Validation loss = 0.004033182747662067
Validation loss = 0.004776559770107269
Validation loss = 0.003241420490667224
Validation loss = 0.00302328122779727
Validation loss = 0.0042610811069607735
Validation loss = 0.004442133475095034
Validation loss = 0.004253910854458809
Validation loss = 0.004616600926965475
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003651908366009593
Validation loss = 0.003891657805070281
Validation loss = 0.004035864025354385
Validation loss = 0.003538603661581874
Validation loss = 0.005095561500638723
Validation loss = 0.002807844430208206
Validation loss = 0.0032714062836021185
Validation loss = 0.003332956228405237
Validation loss = 0.002683532191440463
Validation loss = 0.0038301607128232718
Validation loss = 0.0034335812088102102
Validation loss = 0.0034699938260018826
Validation loss = 0.002596500562503934
Validation loss = 0.0038679440040141344
Validation loss = 0.0032543926499783993
Validation loss = 0.0038780244067311287
Validation loss = 0.003505026688799262
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004732018336653709
Validation loss = 0.0029246665071696043
Validation loss = 0.002864100970327854
Validation loss = 0.0030182686168700457
Validation loss = 0.003449806245043874
Validation loss = 0.0023531951010227203
Validation loss = 0.00413063308224082
Validation loss = 0.004122818820178509
Validation loss = 0.0032494578044861555
Validation loss = 0.0032195288222283125
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005551653914153576
Validation loss = 0.004600198473781347
Validation loss = 0.006618913263082504
Validation loss = 0.004548883531242609
Validation loss = 0.004079086240381002
Validation loss = 0.004060440696775913
Validation loss = 0.004364959895610809
Validation loss = 0.005363018251955509
Validation loss = 0.004170138854533434
Validation loss = 0.005499648861587048
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004887031391263008
Validation loss = 0.00448567746207118
Validation loss = 0.00490457471460104
Validation loss = 0.005999953020364046
Validation loss = 0.004241755232214928
Validation loss = 0.004812668543308973
Validation loss = 0.00449259951710701
Validation loss = 0.005263091530650854
Validation loss = 0.004847928415983915
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0015   |
| Iteration     | 65        |
| MaximumReturn | -0.000629 |
| MinimumReturn | -0.017    |
| TotalSamples  | 111622    |
-----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0035093817859888077
Validation loss = 0.003945263102650642
Validation loss = 0.004103460814803839
Validation loss = 0.0038070185109972954
Validation loss = 0.003181488486006856
Validation loss = 0.003644371870905161
Validation loss = 0.004021091852337122
Validation loss = 0.004972696304321289
Validation loss = 0.0035516151692718267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003938648849725723
Validation loss = 0.003517051227390766
Validation loss = 0.0046058231964707375
Validation loss = 0.003341149538755417
Validation loss = 0.003213894320651889
Validation loss = 0.0036721581127494574
Validation loss = 0.0066365464590489864
Validation loss = 0.008537251502275467
Validation loss = 0.00400526775047183
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004657026845961809
Validation loss = 0.0025164985563606024
Validation loss = 0.0032782142516225576
Validation loss = 0.0029063671827316284
Validation loss = 0.0027667859103530645
Validation loss = 0.003661731956526637
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004106762818992138
Validation loss = 0.004546258132904768
Validation loss = 0.004752068780362606
Validation loss = 0.008090144954621792
Validation loss = 0.005989262368530035
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004522598814219236
Validation loss = 0.00582268787547946
Validation loss = 0.004687839187681675
Validation loss = 0.0059019397012889385
Validation loss = 0.0048932828940451145
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00126  |
| Iteration     | 66        |
| MaximumReturn | -0.000612 |
| MinimumReturn | -0.0111   |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0043671224266290665
Validation loss = 0.004787545185536146
Validation loss = 0.003343495074659586
Validation loss = 0.004634586628526449
Validation loss = 0.0031903772614896297
Validation loss = 0.0034422087483108044
Validation loss = 0.003367239609360695
Validation loss = 0.0038953423500061035
Validation loss = 0.004087889101356268
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0037102403584867716
Validation loss = 0.003356002736836672
Validation loss = 0.006854318547993898
Validation loss = 0.0035879998467862606
Validation loss = 0.003647558158263564
Validation loss = 0.004181004129350185
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003395098028704524
Validation loss = 0.0025914835277944803
Validation loss = 0.0031026676297187805
Validation loss = 0.003374530002474785
Validation loss = 0.0037421335000544786
Validation loss = 0.0028771087527275085
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004286978393793106
Validation loss = 0.005667701829224825
Validation loss = 0.003954282496124506
Validation loss = 0.006445934530347586
Validation loss = 0.004411578644067049
Validation loss = 0.00532061280682683
Validation loss = 0.004922250751405954
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0043189008720219135
Validation loss = 0.00405374588444829
Validation loss = 0.006078026257455349
Validation loss = 0.005104780662804842
Validation loss = 0.0061438558623194695
Validation loss = 0.005526307504624128
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00611  |
| Iteration     | 67        |
| MaximumReturn | -0.000652 |
| MinimumReturn | -0.0149   |
| TotalSamples  | 114954    |
-----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005408432800322771
Validation loss = 0.0054955980740487576
Validation loss = 0.003088318044319749
Validation loss = 0.00568713154643774
Validation loss = 0.0029774904251098633
Validation loss = 0.0032228981144726276
Validation loss = 0.0032406370155513287
Validation loss = 0.0028916297014802694
Validation loss = 0.004442016128450632
Validation loss = 0.0032526494469493628
Validation loss = 0.004144476260989904
Validation loss = 0.0026024659164249897
Validation loss = 0.0033737439662218094
Validation loss = 0.0032365641091018915
Validation loss = 0.0049076019786298275
Validation loss = 0.0032818163745105267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003440241329371929
Validation loss = 0.003479765960946679
Validation loss = 0.003317666705697775
Validation loss = 0.003941745962947607
Validation loss = 0.003108011791482568
Validation loss = 0.004049221519380808
Validation loss = 0.003420311026275158
Validation loss = 0.0036723685916513205
Validation loss = 0.003290544031187892
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027711940929293633
Validation loss = 0.003951054997742176
Validation loss = 0.0029978593811392784
Validation loss = 0.004603314213454723
Validation loss = 0.0027655165176838636
Validation loss = 0.0036561714950948954
Validation loss = 0.0027760216034948826
Validation loss = 0.0023097586818039417
Validation loss = 0.004682330414652824
Validation loss = 0.0027018303517252207
Validation loss = 0.0029603978618979454
Validation loss = 0.002877869876101613
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005008836276829243
Validation loss = 0.004636144265532494
Validation loss = 0.0036144822370260954
Validation loss = 0.006835993379354477
Validation loss = 0.00380647461861372
Validation loss = 0.004464114550501108
Validation loss = 0.006446163170039654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0056916652247309685
Validation loss = 0.0039681242778897285
Validation loss = 0.006207409780472517
Validation loss = 0.004642929416149855
Validation loss = 0.0038417463656514883
Validation loss = 0.005971983540803194
Validation loss = 0.0041093723848462105
Validation loss = 0.005233216565102339
Validation loss = 0.006360108032822609
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00159  |
| Iteration     | 68        |
| MaximumReturn | -0.000598 |
| MinimumReturn | -0.0113   |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0026812045834958553
Validation loss = 0.0030215424485504627
Validation loss = 0.003880987176671624
Validation loss = 0.004620593972504139
Validation loss = 0.005173087120056152
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003992028068751097
Validation loss = 0.003598682815209031
Validation loss = 0.002985286759212613
Validation loss = 0.003349737264215946
Validation loss = 0.0037720142863690853
Validation loss = 0.0033047820907086134
Validation loss = 0.002988579450175166
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004173656925559044
Validation loss = 0.004418447148054838
Validation loss = 0.002805163851007819
Validation loss = 0.002387518994510174
Validation loss = 0.004049223847687244
Validation loss = 0.003999401815235615
Validation loss = 0.005323493387550116
Validation loss = 0.003510709386318922
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003979669418185949
Validation loss = 0.004305324517190456
Validation loss = 0.0034906642977148294
Validation loss = 0.004388802219182253
Validation loss = 0.004274303559213877
Validation loss = 0.004151705652475357
Validation loss = 0.003942541778087616
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004123083781450987
Validation loss = 0.005113665945827961
Validation loss = 0.004753752611577511
Validation loss = 0.003992851357907057
Validation loss = 0.0045831394381821156
Validation loss = 0.004897315055131912
Validation loss = 0.003831970738247037
Validation loss = 0.005864827428013086
Validation loss = 0.003938554786145687
Validation loss = 0.004087882116436958
Validation loss = 0.004871547222137451
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00175  |
| Iteration     | 69        |
| MaximumReturn | -0.000636 |
| MinimumReturn | -0.00934  |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004460526164621115
Validation loss = 0.0036891379859298468
Validation loss = 0.0023328778333961964
Validation loss = 0.0028339463751763105
Validation loss = 0.0038327353540807962
Validation loss = 0.003095790510997176
Validation loss = 0.0029646209441125393
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0036841260734945536
Validation loss = 0.004031280986964703
Validation loss = 0.0028880112804472446
Validation loss = 0.003650314873084426
Validation loss = 0.004891876131296158
Validation loss = 0.0032109736930578947
Validation loss = 0.003192052012309432
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003510307287797332
Validation loss = 0.002289659809321165
Validation loss = 0.0033992447424679995
Validation loss = 0.002976269694045186
Validation loss = 0.0033228187821805477
Validation loss = 0.003250897629186511
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00392038794234395
Validation loss = 0.003676717635244131
Validation loss = 0.0038230670616030693
Validation loss = 0.005566103383898735
Validation loss = 0.0034866498317569494
Validation loss = 0.004239036235958338
Validation loss = 0.004640481900423765
Validation loss = 0.00486935768276453
Validation loss = 0.003676094813272357
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003952759318053722
Validation loss = 0.006557015236467123
Validation loss = 0.004491662606596947
Validation loss = 0.0048010400496423244
Validation loss = 0.004154076799750328
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00587  |
| Iteration     | 70        |
| MaximumReturn | -0.000881 |
| MinimumReturn | -0.0146   |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0032969338353723288
Validation loss = 0.00330551341176033
Validation loss = 0.00451265275478363
Validation loss = 0.002778355497866869
Validation loss = 0.0037755598314106464
Validation loss = 0.003037389600649476
Validation loss = 0.0035611139610409737
Validation loss = 0.0034169869031757116
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0034157796762883663
Validation loss = 0.0038622007705271244
Validation loss = 0.003378241555765271
Validation loss = 0.003981288988143206
Validation loss = 0.0026777484454214573
Validation loss = 0.003118834225460887
Validation loss = 0.004999878816306591
Validation loss = 0.003955849912017584
Validation loss = 0.0026583850849419832
Validation loss = 0.0029050472658127546
Validation loss = 0.002972327172756195
Validation loss = 0.003272860776633024
Validation loss = 0.002948255743831396
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0033310542348772287
Validation loss = 0.0020836712792515755
Validation loss = 0.002010348252952099
Validation loss = 0.0026800595223903656
Validation loss = 0.002942712279036641
Validation loss = 0.0026283564511686563
Validation loss = 0.002416759030893445
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004437948577105999
Validation loss = 0.0036144854966551065
Validation loss = 0.003587889252230525
Validation loss = 0.004817792680114508
Validation loss = 0.00455933902412653
Validation loss = 0.0038252819795161486
Validation loss = 0.0037178322672843933
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005783155560493469
Validation loss = 0.004279149230569601
Validation loss = 0.004843072034418583
Validation loss = 0.0036484943702816963
Validation loss = 0.004181466996669769
Validation loss = 0.0050300112925469875
Validation loss = 0.006712073460221291
Validation loss = 0.004979017656296492
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00877 |
| Iteration     | 71       |
| MaximumReturn | -0.00087 |
| MinimumReturn | -0.0211  |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008531298488378525
Validation loss = 0.0038279842119663954
Validation loss = 0.003233949653804302
Validation loss = 0.003603193210437894
Validation loss = 0.0031166968401521444
Validation loss = 0.0029010900761932135
Validation loss = 0.0037085223011672497
Validation loss = 0.002676438307389617
Validation loss = 0.005379644222557545
Validation loss = 0.007316432893276215
Validation loss = 0.0027100995648652315
Validation loss = 0.002869833493605256
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003809492103755474
Validation loss = 0.004645481240004301
Validation loss = 0.003790452843531966
Validation loss = 0.004881348926573992
Validation loss = 0.002657498000189662
Validation loss = 0.002993851201608777
Validation loss = 0.003426457056775689
Validation loss = 0.00274450471624732
Validation loss = 0.003971202298998833
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00432570418342948
Validation loss = 0.004493150394409895
Validation loss = 0.0022064028307795525
Validation loss = 0.003211897797882557
Validation loss = 0.0026062580291181803
Validation loss = 0.0021881985012441874
Validation loss = 0.005594614427536726
Validation loss = 0.003152014920488
Validation loss = 0.00312714371830225
Validation loss = 0.0026396920438855886
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004022012464702129
Validation loss = 0.005625379737466574
Validation loss = 0.00406777486205101
Validation loss = 0.004207684192806482
Validation loss = 0.004383891820907593
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00465121865272522
Validation loss = 0.004402974620461464
Validation loss = 0.005183790344744921
Validation loss = 0.0036560939624905586
Validation loss = 0.004503547213971615
Validation loss = 0.004180942662060261
Validation loss = 0.004287137649953365
Validation loss = 0.004425971768796444
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00087  |
| Iteration     | 72        |
| MaximumReturn | -0.000652 |
| MinimumReturn | -0.00112  |
| TotalSamples  | 123284    |
-----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0038631840143352747
Validation loss = 0.0031825555488467216
Validation loss = 0.003884741570800543
Validation loss = 0.0031501431949436665
Validation loss = 0.0030599392484873533
Validation loss = 0.0042763687670230865
Validation loss = 0.003282683901488781
Validation loss = 0.002989698899909854
Validation loss = 0.0033133907709270716
Validation loss = 0.003653101157397032
Validation loss = 0.0034734068904072046
Validation loss = 0.0033281089272350073
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004547594580799341
Validation loss = 0.0031957770697772503
Validation loss = 0.0038983190897852182
Validation loss = 0.003156827064231038
Validation loss = 0.0036283486988395452
Validation loss = 0.002922138199210167
Validation loss = 0.0030778206419199705
Validation loss = 0.003935948945581913
Validation loss = 0.0037428492214530706
Validation loss = 0.0027336694765836
Validation loss = 0.003774858545511961
Validation loss = 0.0027758085634559393
Validation loss = 0.0033019704278558493
Validation loss = 0.0034795745741575956
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0025737741962075233
Validation loss = 0.0032479246146976948
Validation loss = 0.0027481657452881336
Validation loss = 0.0031012403778731823
Validation loss = 0.0029641289729624987
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0036804331466555595
Validation loss = 0.005391755606979132
Validation loss = 0.004436028655618429
Validation loss = 0.004705261439085007
Validation loss = 0.003716960083693266
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00661515025421977
Validation loss = 0.005551151465624571
Validation loss = 0.0043019820004701614
Validation loss = 0.004267104901373386
Validation loss = 0.003610781393945217
Validation loss = 0.004485746379941702
Validation loss = 0.004125668201595545
Validation loss = 0.004285041242837906
Validation loss = 0.004980846773833036
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00572  |
| Iteration     | 73        |
| MaximumReturn | -0.000694 |
| MinimumReturn | -0.027    |
| TotalSamples  | 124950    |
-----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0032202217262238264
Validation loss = 0.0028857921715825796
Validation loss = 0.004090509843081236
Validation loss = 0.0027514765970408916
Validation loss = 0.004431860521435738
Validation loss = 0.002554531442001462
Validation loss = 0.0025821770541369915
Validation loss = 0.0039746942929923534
Validation loss = 0.0036851109471172094
Validation loss = 0.003003922291100025
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0033767579589039087
Validation loss = 0.004210460465401411
Validation loss = 0.003371152328327298
Validation loss = 0.0033320572692900896
Validation loss = 0.0028941931668668985
Validation loss = 0.003719860455021262
Validation loss = 0.00265892525203526
Validation loss = 0.004062322434037924
Validation loss = 0.0038780623581260443
Validation loss = 0.0027268731500953436
Validation loss = 0.002796224085614085
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0025318574625998735
Validation loss = 0.0028591712471097708
Validation loss = 0.002352885203436017
Validation loss = 0.003703034482896328
Validation loss = 0.004061281215399504
Validation loss = 0.0021897226106375456
Validation loss = 0.0036867044400423765
Validation loss = 0.0022028223611414433
Validation loss = 0.0021452163346111774
Validation loss = 0.0020757117308676243
Validation loss = 0.0023919676896184683
Validation loss = 0.0039055184461176395
Validation loss = 0.003414016915485263
Validation loss = 0.0034922955092042685
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00672739464789629
Validation loss = 0.003753397148102522
Validation loss = 0.0044311825186014175
Validation loss = 0.0037808131892234087
Validation loss = 0.0030803813133388758
Validation loss = 0.007259557954967022
Validation loss = 0.003184942528605461
Validation loss = 0.0036714745219796896
Validation loss = 0.004567204974591732
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0047401380725204945
Validation loss = 0.003924993332475424
Validation loss = 0.0036252890713512897
Validation loss = 0.0041732401587069035
Validation loss = 0.003617027075961232
Validation loss = 0.0038894321769475937
Validation loss = 0.005126633681356907
Validation loss = 0.004144608974456787
Validation loss = 0.0038098637014627457
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00176  |
| Iteration     | 74        |
| MaximumReturn | -0.000621 |
| MinimumReturn | -0.0241   |
| TotalSamples  | 126616    |
-----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0031979468185454607
Validation loss = 0.0026794804725795984
Validation loss = 0.0033966859336942434
Validation loss = 0.003268610220402479
Validation loss = 0.003100052708759904
Validation loss = 0.0030079151038080454
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002908749273046851
Validation loss = 0.003961001057177782
Validation loss = 0.0036946614272892475
Validation loss = 0.002840884728357196
Validation loss = 0.0032642006408423185
Validation loss = 0.003175272373482585
Validation loss = 0.002601396990939975
Validation loss = 0.00380528811365366
Validation loss = 0.003211911767721176
Validation loss = 0.0038409517146646976
Validation loss = 0.0034312657080590725
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004955298732966185
Validation loss = 0.002443933393806219
Validation loss = 0.004347489681094885
Validation loss = 0.009319698438048363
Validation loss = 0.002595418132841587
Validation loss = 0.0023689819499850273
Validation loss = 0.0021305012051016092
Validation loss = 0.0019014639547094703
Validation loss = 0.003622496034950018
Validation loss = 0.002093731891363859
Validation loss = 0.0021948195062577724
Validation loss = 0.002966441912576556
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004917458165436983
Validation loss = 0.0030351541936397552
Validation loss = 0.006109726149588823
Validation loss = 0.0031648187432438135
Validation loss = 0.003961427137255669
Validation loss = 0.0034729531034827232
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004229431040585041
Validation loss = 0.0041413805447518826
Validation loss = 0.0050665936432778835
Validation loss = 0.004002023488283157
Validation loss = 0.004136045929044485
Validation loss = 0.004672075156122446
Validation loss = 0.005014414899051189
Validation loss = 0.0035166984889656305
Validation loss = 0.0038060410879552364
Validation loss = 0.003924613352864981
Validation loss = 0.0039715273305773735
Validation loss = 0.003186109708622098
Validation loss = 0.005219468846917152
Validation loss = 0.0042075347155332565
Validation loss = 0.0036789688747376204
Validation loss = 0.004607506096363068
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00286  |
| Iteration     | 75        |
| MaximumReturn | -0.000575 |
| MinimumReturn | -0.0191   |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005653046071529388
Validation loss = 0.002504638396203518
Validation loss = 0.0025662400294095278
Validation loss = 0.0027392124757170677
Validation loss = 0.004469193052500486
Validation loss = 0.0035270287189632654
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003289832966402173
Validation loss = 0.003131263889372349
Validation loss = 0.004364253021776676
Validation loss = 0.0039380318485200405
Validation loss = 0.002554452046751976
Validation loss = 0.003412176389247179
Validation loss = 0.0035926662385463715
Validation loss = 0.003135909792035818
Validation loss = 0.006031175144016743
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002796137472614646
Validation loss = 0.00328766368329525
Validation loss = 0.003199881175532937
Validation loss = 0.002556129591539502
Validation loss = 0.0019807093776762486
Validation loss = 0.0031349260825663805
Validation loss = 0.002372781280428171
Validation loss = 0.0027296431362628937
Validation loss = 0.002025006338953972
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004240653943270445
Validation loss = 0.004141993820667267
Validation loss = 0.0026934018824249506
Validation loss = 0.0036623631604015827
Validation loss = 0.003703357884660363
Validation loss = 0.003628524485975504
Validation loss = 0.003979725297540426
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004369590897113085
Validation loss = 0.004639226943254471
Validation loss = 0.005816748831421137
Validation loss = 0.0049798088148236275
Validation loss = 0.0037330409977585077
Validation loss = 0.004295822698622942
Validation loss = 0.0035571937914937735
Validation loss = 0.0033018714748322964
Validation loss = 0.003399546956643462
Validation loss = 0.00422322005033493
Validation loss = 0.0037983465008437634
Validation loss = 0.003549568820744753
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00479  |
| Iteration     | 76        |
| MaximumReturn | -0.000672 |
| MinimumReturn | -0.0251   |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025742182042449713
Validation loss = 0.002673377515748143
Validation loss = 0.0038780125323683023
Validation loss = 0.0032821623608469963
Validation loss = 0.0036645582877099514
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003147113136947155
Validation loss = 0.0035790405236184597
Validation loss = 0.0029581449925899506
Validation loss = 0.0033748778514564037
Validation loss = 0.0038435980677604675
Validation loss = 0.0027322026435285807
Validation loss = 0.0038342138286679983
Validation loss = 0.003023960394784808
Validation loss = 0.0038198118563741446
Validation loss = 0.004108028952032328
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0029277994763106108
Validation loss = 0.002241621958091855
Validation loss = 0.002888814778998494
Validation loss = 0.0025891740806400776
Validation loss = 0.0030115938279777765
Validation loss = 0.002222643932327628
Validation loss = 0.0024098861031234264
Validation loss = 0.0027600955218076706
Validation loss = 0.002454745816066861
Validation loss = 0.0031102315988391638
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0039263879880309105
Validation loss = 0.006250936072319746
Validation loss = 0.002865813672542572
Validation loss = 0.0029845554381608963
Validation loss = 0.0034249452874064445
Validation loss = 0.002685592044144869
Validation loss = 0.004004943650215864
Validation loss = 0.0037448497023433447
Validation loss = 0.0033442142885178328
Validation loss = 0.0035854049492627382
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004153549671173096
Validation loss = 0.005055065732449293
Validation loss = 0.004126980435103178
Validation loss = 0.004243555944412947
Validation loss = 0.005009753629565239
Validation loss = 0.0031756723765283823
Validation loss = 0.004097885452210903
Validation loss = 0.0029968961607664824
Validation loss = 0.0038454041350632906
Validation loss = 0.0030043483711779118
Validation loss = 0.0037089793477207422
Validation loss = 0.0033964812755584717
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00192  |
| Iteration     | 77        |
| MaximumReturn | -0.000677 |
| MinimumReturn | -0.0279   |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003122973721474409
Validation loss = 0.00395458796992898
Validation loss = 0.0032275784760713577
Validation loss = 0.0028350241482257843
Validation loss = 0.00315527874045074
Validation loss = 0.003525789361447096
Validation loss = 0.0028901577461510897
Validation loss = 0.0033552050590515137
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002693487796932459
Validation loss = 0.0026008530985563993
Validation loss = 0.0028081706259399652
Validation loss = 0.003072168678045273
Validation loss = 0.002826619427651167
Validation loss = 0.0023434017784893513
Validation loss = 0.0029163192957639694
Validation loss = 0.0022373488172888756
Validation loss = 0.003684266237542033
Validation loss = 0.0036255158483982086
Validation loss = 0.002876628190279007
Validation loss = 0.0025819502770900726
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004576869774609804
Validation loss = 0.002183878794312477
Validation loss = 0.002571142977103591
Validation loss = 0.0019166722195222974
Validation loss = 0.004027103073894978
Validation loss = 0.0034818577114492655
Validation loss = 0.002836270723491907
Validation loss = 0.00211130827665329
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0033281557261943817
Validation loss = 0.003672562073916197
Validation loss = 0.002894248114898801
Validation loss = 0.0030131363309919834
Validation loss = 0.003820665879175067
Validation loss = 0.005359395407140255
Validation loss = 0.0048012202605605125
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0034989637788385153
Validation loss = 0.0031283246353268623
Validation loss = 0.003672050079330802
Validation loss = 0.003767849877476692
Validation loss = 0.004397645592689514
Validation loss = 0.003243320155888796
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00309 |
| Iteration     | 78       |
| MaximumReturn | -0.00056 |
| MinimumReturn | -0.0238  |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00308482744731009
Validation loss = 0.003217978635802865
Validation loss = 0.0029549086466431618
Validation loss = 0.0024679582566022873
Validation loss = 0.0032103362027555704
Validation loss = 0.0031886729411780834
Validation loss = 0.002579763298854232
Validation loss = 0.0042162020690739155
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0023449123837053776
Validation loss = 0.0031806863844394684
Validation loss = 0.0026527089066803455
Validation loss = 0.002870014403015375
Validation loss = 0.0031317865941673517
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027270889841020107
Validation loss = 0.0030211310368031263
Validation loss = 0.0034465959761291742
Validation loss = 0.004975928924977779
Validation loss = 0.002085140673443675
Validation loss = 0.002782105468213558
Validation loss = 0.003086831420660019
Validation loss = 0.002020362764596939
Validation loss = 0.004145600832998753
Validation loss = 0.004179258830845356
Validation loss = 0.0022637448273599148
Validation loss = 0.0019977714400738478
Validation loss = 0.002421894110739231
Validation loss = 0.0032733397092670202
Validation loss = 0.002103454200550914
Validation loss = 0.0023493554908782244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0031849127262830734
Validation loss = 0.0037062729243189096
Validation loss = 0.005856877658516169
Validation loss = 0.003573479363694787
Validation loss = 0.0035129731986671686
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003561947029083967
Validation loss = 0.003325238823890686
Validation loss = 0.003577424678951502
Validation loss = 0.0035110730677843094
Validation loss = 0.0035291328094899654
Validation loss = 0.003301378805190325
Validation loss = 0.003395332954823971
Validation loss = 0.0033993846736848354
Validation loss = 0.004200439900159836
Validation loss = 0.004445905331522226
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00507  |
| Iteration     | 79        |
| MaximumReturn | -0.000543 |
| MinimumReturn | -0.0289   |
| TotalSamples  | 134946    |
-----------------------------
