Logging to experiments/hopper/hopperA01/Mon-31-Oct-2022-11-00-29-AM-CDT_hopper_trpo_iteration_20_seed2431
Print configuration .....
{'env_name': 'hopper', 'random_seeds': [1234, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/hopper_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.8826010227203369
Validation loss = 0.27318698167800903
Validation loss = 0.24809518456459045
Validation loss = 0.22002854943275452
Validation loss = 0.22531580924987793
Validation loss = 0.23128929734230042
Validation loss = 0.23827846348285675
Validation loss = 0.23083645105361938
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6992670297622681
Validation loss = 0.2797683775424957
Validation loss = 0.255623459815979
Validation loss = 0.2259131371974945
Validation loss = 0.2248360961675644
Validation loss = 0.22856087982654572
Validation loss = 0.23579955101013184
Validation loss = 0.2502182126045227
Validation loss = 0.24538037180900574
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.9111478328704834
Validation loss = 0.28076767921447754
Validation loss = 0.244516521692276
Validation loss = 0.2317916452884674
Validation loss = 0.22366207838058472
Validation loss = 0.2276555299758911
Validation loss = 0.23295244574546814
Validation loss = 0.23829150199890137
Validation loss = 0.23950549960136414
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7121208906173706
Validation loss = 0.27497875690460205
Validation loss = 0.24194861948490143
Validation loss = 0.2191331684589386
Validation loss = 0.22438696026802063
Validation loss = 0.22226980328559875
Validation loss = 0.23252958059310913
Validation loss = 0.238021582365036
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5505161881446838
Validation loss = 0.28131967782974243
Validation loss = 0.24719437956809998
Validation loss = 0.22241312265396118
Validation loss = 0.2314775586128235
Validation loss = 0.22738739848136902
Validation loss = 0.22843137383460999
Validation loss = 0.23242193460464478
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.83e+03 |
| Iteration     | 0         |
| MaximumReturn | -1.24e+03 |
| MinimumReturn | -2.22e+03 |
| TotalSamples  | 8000      |
-----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2603631615638733
Validation loss = 0.2420172393321991
Validation loss = 0.24320243299007416
Validation loss = 0.24449396133422852
Validation loss = 0.24906140565872192
Validation loss = 0.23793116211891174
Validation loss = 0.25425440073013306
Validation loss = 0.23073408007621765
Validation loss = 0.24779903888702393
Validation loss = 0.2376788705587387
Validation loss = 0.23459658026695251
Validation loss = 0.2345018833875656
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2590135335922241
Validation loss = 0.24030840396881104
Validation loss = 0.2408713549375534
Validation loss = 0.24407336115837097
Validation loss = 0.24785827100276947
Validation loss = 0.2502477169036865
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2552798390388489
Validation loss = 0.24795928597450256
Validation loss = 0.2405756711959839
Validation loss = 0.24103863537311554
Validation loss = 0.24257025122642517
Validation loss = 0.2517549991607666
Validation loss = 0.23050206899642944
Validation loss = 0.23256513476371765
Validation loss = 0.23055684566497803
Validation loss = 0.2419574111700058
Validation loss = 0.23864687979221344
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.25631076097488403
Validation loss = 0.24383464455604553
Validation loss = 0.2369348108768463
Validation loss = 0.23305529356002808
Validation loss = 0.2576678991317749
Validation loss = 0.2524130642414093
Validation loss = 0.25224053859710693
Validation loss = 0.2455621063709259
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2567651867866516
Validation loss = 0.24179449677467346
Validation loss = 0.24804425239562988
Validation loss = 0.24515894055366516
Validation loss = 0.2369592785835266
Validation loss = 0.23360779881477356
Validation loss = 0.23817157745361328
Validation loss = 0.24431726336479187
Validation loss = 0.23482106626033783
Validation loss = 0.23436085879802704
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -975      |
| Iteration     | 1         |
| MaximumReturn | -439      |
| MinimumReturn | -1.79e+03 |
| TotalSamples  | 12000     |
-----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.32913199067115784
Validation loss = 0.2967100739479065
Validation loss = 0.2954525053501129
Validation loss = 0.2873633801937103
Validation loss = 0.30838683247566223
Validation loss = 0.2858218252658844
Validation loss = 0.2783829867839813
Validation loss = 0.2752552330493927
Validation loss = 0.2753802239894867
Validation loss = 0.27432486414909363
Validation loss = 0.2757912874221802
Validation loss = 0.2716939151287079
Validation loss = 0.2777155041694641
Validation loss = 0.27152252197265625
Validation loss = 0.27513232827186584
Validation loss = 0.2679159343242645
Validation loss = 0.2686433792114258
Validation loss = 0.2697942554950714
Validation loss = 0.27446091175079346
Validation loss = 0.2664403021335602
Validation loss = 0.2741016745567322
Validation loss = 0.2737565040588379
Validation loss = 0.27542367577552795
Validation loss = 0.26904237270355225
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3150258958339691
Validation loss = 0.3021716773509979
Validation loss = 0.3096590042114258
Validation loss = 0.3148888051509857
Validation loss = 0.2937040328979492
Validation loss = 0.3042493760585785
Validation loss = 0.2948445975780487
Validation loss = 0.28544214367866516
Validation loss = 0.2835954427719116
Validation loss = 0.27904823422431946
Validation loss = 0.28393495082855225
Validation loss = 0.27874264121055603
Validation loss = 0.2833685576915741
Validation loss = 0.27593520283699036
Validation loss = 0.2847244441509247
Validation loss = 0.2794989347457886
Validation loss = 0.2853039801120758
Validation loss = 0.2705398201942444
Validation loss = 0.27443376183509827
Validation loss = 0.2793441712856293
Validation loss = 0.26852431893348694
Validation loss = 0.2705244719982147
Validation loss = 0.26991236209869385
Validation loss = 0.27551206946372986
Validation loss = 0.27140676975250244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.31818994879722595
Validation loss = 0.2937660217285156
Validation loss = 0.2930019497871399
Validation loss = 0.2909816801548004
Validation loss = 0.2869028151035309
Validation loss = 0.2751016914844513
Validation loss = 0.2913747727870941
Validation loss = 0.2820928394794464
Validation loss = 0.2746008038520813
Validation loss = 0.28195324540138245
Validation loss = 0.2756538391113281
Validation loss = 0.2783958315849304
Validation loss = 0.2796807587146759
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3211893141269684
Validation loss = 0.313427209854126
Validation loss = 0.2894867956638336
Validation loss = 0.29878973960876465
Validation loss = 0.2955167293548584
Validation loss = 0.2957089841365814
Validation loss = 0.2906164228916168
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3397468030452728
Validation loss = 0.31165003776550293
Validation loss = 0.29848533868789673
Validation loss = 0.28776276111602783
Validation loss = 0.29671597480773926
Validation loss = 0.2830662429332733
Validation loss = 0.28393492102622986
Validation loss = 0.2863501012325287
Validation loss = 0.2929479777812958
Validation loss = 0.29538536071777344
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.35e+03 |
| Iteration     | 2         |
| MaximumReturn | -872      |
| MinimumReturn | -1.81e+03 |
| TotalSamples  | 16000     |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.262528657913208
Validation loss = 0.25017473101615906
Validation loss = 0.24475061893463135
Validation loss = 0.25130584836006165
Validation loss = 0.24449422955513
Validation loss = 0.24597565829753876
Validation loss = 0.23952986299991608
Validation loss = 0.24349208176136017
Validation loss = 0.2393270879983902
Validation loss = 0.24005869030952454
Validation loss = 0.23699718713760376
Validation loss = 0.25099802017211914
Validation loss = 0.24391406774520874
Validation loss = 0.2351902425289154
Validation loss = 0.23935574293136597
Validation loss = 0.2340826392173767
Validation loss = 0.23893584311008453
Validation loss = 0.23505982756614685
Validation loss = 0.24475160241127014
Validation loss = 0.24202021956443787
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2741999626159668
Validation loss = 0.25455841422080994
Validation loss = 0.2666536271572113
Validation loss = 0.2445547878742218
Validation loss = 0.25067877769470215
Validation loss = 0.2557579278945923
Validation loss = 0.24374982714653015
Validation loss = 0.2548941373825073
Validation loss = 0.24726539850234985
Validation loss = 0.24888992309570312
Validation loss = 0.25182121992111206
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2789088785648346
Validation loss = 0.259063184261322
Validation loss = 0.2586018443107605
Validation loss = 0.24343322217464447
Validation loss = 0.25363463163375854
Validation loss = 0.24890097975730896
Validation loss = 0.2390289306640625
Validation loss = 0.24012747406959534
Validation loss = 0.24599909782409668
Validation loss = 0.24266499280929565
Validation loss = 0.24100492894649506
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.27401357889175415
Validation loss = 0.2880256175994873
Validation loss = 0.261826753616333
Validation loss = 0.25337275862693787
Validation loss = 0.26999354362487793
Validation loss = 0.25360533595085144
Validation loss = 0.25697776675224304
Validation loss = 0.25322335958480835
Validation loss = 0.2590523958206177
Validation loss = 0.2595478296279907
Validation loss = 0.252898633480072
Validation loss = 0.25394386053085327
Validation loss = 0.25892549753189087
Validation loss = 0.24916408956050873
Validation loss = 0.2529943585395813
Validation loss = 0.25522899627685547
Validation loss = 0.23822152614593506
Validation loss = 0.24468380212783813
Validation loss = 0.24163421988487244
Validation loss = 0.2366175353527069
Validation loss = 0.23582054674625397
Validation loss = 0.23358173668384552
Validation loss = 0.24788647890090942
Validation loss = 0.23695218563079834
Validation loss = 0.23375403881072998
Validation loss = 0.24581918120384216
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2725057303905487
Validation loss = 0.26287907361984253
Validation loss = 0.26039206981658936
Validation loss = 0.25452277064323425
Validation loss = 0.25554269552230835
Validation loss = 0.26091405749320984
Validation loss = 0.2514829933643341
Validation loss = 0.24834808707237244
Validation loss = 0.2614341974258423
Validation loss = 0.26093000173568726
Validation loss = 0.26447269320487976
Validation loss = 0.25578707456588745
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.4e+03  |
| Iteration     | 3         |
| MaximumReturn | 20.5      |
| MinimumReturn | -2.96e+03 |
| TotalSamples  | 20000     |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.29960566759109497
Validation loss = 0.24934367835521698
Validation loss = 0.24432909488677979
Validation loss = 0.2313363254070282
Validation loss = 0.23294687271118164
Validation loss = 0.23074567317962646
Validation loss = 0.2283644676208496
Validation loss = 0.22772257030010223
Validation loss = 0.23218949139118195
Validation loss = 0.2273826152086258
Validation loss = 0.232523113489151
Validation loss = 0.22593331336975098
Validation loss = 0.22811897099018097
Validation loss = 0.22754152119159698
Validation loss = 0.23018236458301544
Validation loss = 0.22210094332695007
Validation loss = 0.23020105063915253
Validation loss = 0.2243543416261673
Validation loss = 0.22704610228538513
Validation loss = 0.2177627980709076
Validation loss = 0.21571961045265198
Validation loss = 0.21439754962921143
Validation loss = 0.22042079269886017
Validation loss = 0.2199760377407074
Validation loss = 0.21757462620735168
Validation loss = 0.2313547134399414
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.31245389580726624
Validation loss = 0.26010197401046753
Validation loss = 0.24552574753761292
Validation loss = 0.24522092938423157
Validation loss = 0.2417786866426468
Validation loss = 0.2346450388431549
Validation loss = 0.23611727356910706
Validation loss = 0.22974076867103577
Validation loss = 0.2377070188522339
Validation loss = 0.2361600697040558
Validation loss = 0.2513643205165863
Validation loss = 0.2313898354768753
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.30992022156715393
Validation loss = 0.2693386673927307
Validation loss = 0.25176334381103516
Validation loss = 0.23966535925865173
Validation loss = 0.24217092990875244
Validation loss = 0.24351099133491516
Validation loss = 0.23515495657920837
Validation loss = 0.23529252409934998
Validation loss = 0.24115879833698273
Validation loss = 0.23648956418037415
Validation loss = 0.2343929558992386
Validation loss = 0.23436562716960907
Validation loss = 0.23256763815879822
Validation loss = 0.23667068779468536
Validation loss = 0.2360415756702423
Validation loss = 0.23695261776447296
Validation loss = 0.22465912997722626
Validation loss = 0.22940170764923096
Validation loss = 0.22595064342021942
Validation loss = 0.23115667700767517
Validation loss = 0.22577369213104248
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.28635549545288086
Validation loss = 0.25291353464126587
Validation loss = 0.25251656770706177
Validation loss = 0.2429245263338089
Validation loss = 0.2326822280883789
Validation loss = 0.2345443218946457
Validation loss = 0.23146839439868927
Validation loss = 0.23564867675304413
Validation loss = 0.22828927636146545
Validation loss = 0.23311610519886017
Validation loss = 0.22792211174964905
Validation loss = 0.23009851574897766
Validation loss = 0.22782650589942932
Validation loss = 0.2217140942811966
Validation loss = 0.22874537110328674
Validation loss = 0.2277859002351761
Validation loss = 0.23812219500541687
Validation loss = 0.22690698504447937
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.317186564207077
Validation loss = 0.2888800799846649
Validation loss = 0.2617954611778259
Validation loss = 0.2540510892868042
Validation loss = 0.2512153387069702
Validation loss = 0.24718347191810608
Validation loss = 0.2516496777534485
Validation loss = 0.24148881435394287
Validation loss = 0.24926093220710754
Validation loss = 0.24589355289936066
Validation loss = 0.24265006184577942
Validation loss = 0.24860703945159912
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.26e+03 |
| Iteration     | 4         |
| MaximumReturn | -121      |
| MinimumReturn | -2.53e+03 |
| TotalSamples  | 24000     |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2621811628341675
Validation loss = 0.2075425237417221
Validation loss = 0.19140909612178802
Validation loss = 0.18524223566055298
Validation loss = 0.18592016398906708
Validation loss = 0.18323425948619843
Validation loss = 0.18371956050395966
Validation loss = 0.185145303606987
Validation loss = 0.18310211598873138
Validation loss = 0.1793876439332962
Validation loss = 0.18621943891048431
Validation loss = 0.1804984211921692
Validation loss = 0.18656694889068604
Validation loss = 0.17928381264209747
Validation loss = 0.18440836668014526
Validation loss = 0.1882178783416748
Validation loss = 0.17410577833652496
Validation loss = 0.17441900074481964
Validation loss = 0.17529423534870148
Validation loss = 0.1759745478630066
Validation loss = 0.18016350269317627
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2377675324678421
Validation loss = 0.20220208168029785
Validation loss = 0.1982702612876892
Validation loss = 0.1898750513792038
Validation loss = 0.19268590211868286
Validation loss = 0.19478684663772583
Validation loss = 0.1986369490623474
Validation loss = 0.1983015090227127
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.25674310326576233
Validation loss = 0.21461059153079987
Validation loss = 0.20147718489170074
Validation loss = 0.1948920339345932
Validation loss = 0.19629669189453125
Validation loss = 0.19149385392665863
Validation loss = 0.20665383338928223
Validation loss = 0.19295646250247955
Validation loss = 0.19030022621154785
Validation loss = 0.19022460281848907
Validation loss = 0.19496192038059235
Validation loss = 0.18459300696849823
Validation loss = 0.1914926916360855
Validation loss = 0.1858961582183838
Validation loss = 0.1886853128671646
Validation loss = 0.1799803227186203
Validation loss = 0.1815672665834427
Validation loss = 0.18487514555454254
Validation loss = 0.1885017603635788
Validation loss = 0.21698667109012604
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2449609786272049
Validation loss = 0.2151385098695755
Validation loss = 0.19738638401031494
Validation loss = 0.18814164400100708
Validation loss = 0.19132106006145477
Validation loss = 0.18662142753601074
Validation loss = 0.18497584760189056
Validation loss = 0.19075000286102295
Validation loss = 0.1948322057723999
Validation loss = 0.1907762736082077
Validation loss = 0.1876680999994278
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2407122403383255
Validation loss = 0.22587692737579346
Validation loss = 0.21581454575061798
Validation loss = 0.2087498903274536
Validation loss = 0.20530326664447784
Validation loss = 0.2092895656824112
Validation loss = 0.2104964405298233
Validation loss = 0.20462077856063843
Validation loss = 0.20490966737270355
Validation loss = 0.20463740825653076
Validation loss = 0.19827483594417572
Validation loss = 0.19535385072231293
Validation loss = 0.19979143142700195
Validation loss = 0.20684950053691864
Validation loss = 0.19542956352233887
Validation loss = 0.19540750980377197
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -378      |
| Iteration     | 5         |
| MaximumReturn | 997       |
| MinimumReturn | -1.75e+03 |
| TotalSamples  | 28000     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2583100497722626
Validation loss = 0.19200491905212402
Validation loss = 0.18091320991516113
Validation loss = 0.1745285987854004
Validation loss = 0.17591670155525208
Validation loss = 0.17553913593292236
Validation loss = 0.17649276554584503
Validation loss = 0.1739935576915741
Validation loss = 0.1722070425748825
Validation loss = 0.17911814153194427
Validation loss = 0.18831314146518707
Validation loss = 0.17890188097953796
Validation loss = 0.1701338291168213
Validation loss = 0.16851499676704407
Validation loss = 0.16956131160259247
Validation loss = 0.1690964698791504
Validation loss = 0.1800374984741211
Validation loss = 0.1693749725818634
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.24245545268058777
Validation loss = 0.20132865011692047
Validation loss = 0.18913938105106354
Validation loss = 0.19295109808444977
Validation loss = 0.18609686195850372
Validation loss = 0.18703819811344147
Validation loss = 0.18756632506847382
Validation loss = 0.19061888754367828
Validation loss = 0.1818668097257614
Validation loss = 0.18765941262245178
Validation loss = 0.1794588565826416
Validation loss = 0.17975571751594543
Validation loss = 0.177693173289299
Validation loss = 0.1842091977596283
Validation loss = 0.1818130910396576
Validation loss = 0.1771731674671173
Validation loss = 0.1736840456724167
Validation loss = 0.17435322701931
Validation loss = 0.1771300733089447
Validation loss = 0.19142697751522064
Validation loss = 0.17279447615146637
Validation loss = 0.17032870650291443
Validation loss = 0.1731477677822113
Validation loss = 0.17940695583820343
Validation loss = 0.18041081726551056
Validation loss = 0.1744411736726761
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.26426181197166443
Validation loss = 0.19275249540805817
Validation loss = 0.18649980425834656
Validation loss = 0.18030498921871185
Validation loss = 0.1790674775838852
Validation loss = 0.17738661170005798
Validation loss = 0.18197926878929138
Validation loss = 0.19234919548034668
Validation loss = 0.1837475448846817
Validation loss = 0.17268332839012146
Validation loss = 0.17258374392986298
Validation loss = 0.17257697880268097
Validation loss = 0.1755981147289276
Validation loss = 0.182778999209404
Validation loss = 0.18451549112796783
Validation loss = 0.1812840849161148
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.24939487874507904
Validation loss = 0.19432388246059418
Validation loss = 0.18912743031978607
Validation loss = 0.18414807319641113
Validation loss = 0.1868176907300949
Validation loss = 0.18024666607379913
Validation loss = 0.18122725188732147
Validation loss = 0.1885136365890503
Validation loss = 0.18264301121234894
Validation loss = 0.18302559852600098
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.23702433705329895
Validation loss = 0.20293496549129486
Validation loss = 0.19759492576122284
Validation loss = 0.19839191436767578
Validation loss = 0.19565488398075104
Validation loss = 0.20606885850429535
Validation loss = 0.19689767062664032
Validation loss = 0.1895843744277954
Validation loss = 0.1850547045469284
Validation loss = 0.18325182795524597
Validation loss = 0.18953049182891846
Validation loss = 0.20760180056095123
Validation loss = 0.18465076386928558
Validation loss = 0.18689599633216858
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 868      |
| Iteration     | 6        |
| MaximumReturn | 1.6e+03  |
| MinimumReturn | -42.2    |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.19843420386314392
Validation loss = 0.1635434329509735
Validation loss = 0.15528841316699982
Validation loss = 0.15172189474105835
Validation loss = 0.1507302075624466
Validation loss = 0.1536162793636322
Validation loss = 0.15154734253883362
Validation loss = 0.1541942059993744
Validation loss = 0.1775924414396286
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.19678373634815216
Validation loss = 0.16902756690979004
Validation loss = 0.1587347835302353
Validation loss = 0.1533420979976654
Validation loss = 0.1547430157661438
Validation loss = 0.15184791386127472
Validation loss = 0.15243780612945557
Validation loss = 0.15610653162002563
Validation loss = 0.16831183433532715
Validation loss = 0.1546056866645813
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18708336353302002
Validation loss = 0.1618189513683319
Validation loss = 0.1548655927181244
Validation loss = 0.15551942586898804
Validation loss = 0.15537820756435394
Validation loss = 0.15571759641170502
Validation loss = 0.15570534765720367
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.20971068739891052
Validation loss = 0.17216193675994873
Validation loss = 0.16487959027290344
Validation loss = 0.16223932802677155
Validation loss = 0.16386908292770386
Validation loss = 0.15851908922195435
Validation loss = 0.16159629821777344
Validation loss = 0.16557306051254272
Validation loss = 0.15834559500217438
Validation loss = 0.15635277330875397
Validation loss = 0.15655283629894257
Validation loss = 0.15275689959526062
Validation loss = 0.19670377671718597
Validation loss = 0.16301771998405457
Validation loss = 0.15230467915534973
Validation loss = 0.15176987648010254
Validation loss = 0.15201908349990845
Validation loss = 0.15482324361801147
Validation loss = 0.16141244769096375
Validation loss = 0.1612061858177185
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2232242375612259
Validation loss = 0.18345069885253906
Validation loss = 0.17268727719783783
Validation loss = 0.16383683681488037
Validation loss = 0.16335193812847137
Validation loss = 0.16350513696670532
Validation loss = 0.1724991500377655
Validation loss = 0.17354822158813477
Validation loss = 0.16322572529315948
Validation loss = 0.1600339710712433
Validation loss = 0.15887995064258575
Validation loss = 0.16055995225906372
Validation loss = 0.15792831778526306
Validation loss = 0.16435766220092773
Validation loss = 0.16213348507881165
Validation loss = 0.16287532448768616
Validation loss = 0.15342947840690613
Validation loss = 0.15241500735282898
Validation loss = 0.1564210057258606
Validation loss = 0.17722390592098236
Validation loss = 0.16327215731143951
Validation loss = 0.1534789651632309
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 952      |
| Iteration     | 7        |
| MaximumReturn | 1.39e+03 |
| MinimumReturn | 311      |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16801930963993073
Validation loss = 0.16070929169654846
Validation loss = 0.1395786702632904
Validation loss = 0.1411457061767578
Validation loss = 0.1354178935289383
Validation loss = 0.1355879306793213
Validation loss = 0.13849884271621704
Validation loss = 0.13537432253360748
Validation loss = 0.1330350637435913
Validation loss = 0.13583588600158691
Validation loss = 0.13741065561771393
Validation loss = 0.14240996539592743
Validation loss = 0.13308598101139069
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1766359657049179
Validation loss = 0.1473134458065033
Validation loss = 0.14118100702762604
Validation loss = 0.1399143785238266
Validation loss = 0.14185525476932526
Validation loss = 0.13853947818279266
Validation loss = 0.14880961179733276
Validation loss = 0.14760951697826385
Validation loss = 0.13590073585510254
Validation loss = 0.13670749962329865
Validation loss = 0.13493652641773224
Validation loss = 0.15288949012756348
Validation loss = 0.13783937692642212
Validation loss = 0.1390966773033142
Validation loss = 0.13141202926635742
Validation loss = 0.130226269364357
Validation loss = 0.13202115893363953
Validation loss = 0.13361065089702606
Validation loss = 0.14860178530216217
Validation loss = 0.13334064185619354
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18907099962234497
Validation loss = 0.1725742220878601
Validation loss = 0.1458369493484497
Validation loss = 0.14184604585170746
Validation loss = 0.14140437543392181
Validation loss = 0.14051944017410278
Validation loss = 0.1532323956489563
Validation loss = 0.14546488225460052
Validation loss = 0.14012795686721802
Validation loss = 0.13617774844169617
Validation loss = 0.13890331983566284
Validation loss = 0.16532140970230103
Validation loss = 0.13496848940849304
Validation loss = 0.13442155718803406
Validation loss = 0.13410913944244385
Validation loss = 0.13291087746620178
Validation loss = 0.13666671514511108
Validation loss = 0.14831340312957764
Validation loss = 0.14045728743076324
Validation loss = 0.13238584995269775
Validation loss = 0.13167548179626465
Validation loss = 0.13181206583976746
Validation loss = 0.13933350145816803
Validation loss = 0.14153942465782166
Validation loss = 0.1494845598936081
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17977041006088257
Validation loss = 0.14973598718643188
Validation loss = 0.1396215558052063
Validation loss = 0.13526415824890137
Validation loss = 0.1373130828142166
Validation loss = 0.13641750812530518
Validation loss = 0.15050514042377472
Validation loss = 0.13999253511428833
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.17253467440605164
Validation loss = 0.15413573384284973
Validation loss = 0.14654912054538727
Validation loss = 0.14090487360954285
Validation loss = 0.1428775191307068
Validation loss = 0.1405394971370697
Validation loss = 0.13999825716018677
Validation loss = 0.13930320739746094
Validation loss = 0.1374795138835907
Validation loss = 0.14070527255535126
Validation loss = 0.151104137301445
Validation loss = 0.1459992378950119
Validation loss = 0.13650014996528625
Validation loss = 0.13447882235050201
Validation loss = 0.13595543801784515
Validation loss = 0.1383916139602661
Validation loss = 0.14961524307727814
Validation loss = 0.14717084169387817
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 776      |
| Iteration     | 8        |
| MaximumReturn | 1.92e+03 |
| MinimumReturn | 233      |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14596131443977356
Validation loss = 0.12807521224021912
Validation loss = 0.12118841707706451
Validation loss = 0.11889095604419708
Validation loss = 0.11877577006816864
Validation loss = 0.12140729278326035
Validation loss = 0.11861952394247055
Validation loss = 0.14388605952262878
Validation loss = 0.12045518308877945
Validation loss = 0.11517821252346039
Validation loss = 0.1146184653043747
Validation loss = 0.11576578766107559
Validation loss = 0.11782541126012802
Validation loss = 0.12160854041576385
Validation loss = 0.12359727919101715
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15550962090492249
Validation loss = 0.12538930773735046
Validation loss = 0.12163496017456055
Validation loss = 0.12670744955539703
Validation loss = 0.1233636885881424
Validation loss = 0.11910638958215714
Validation loss = 0.11721599102020264
Validation loss = 0.11870147287845612
Validation loss = 0.12267382442951202
Validation loss = 0.1479988396167755
Validation loss = 0.11922658979892731
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15313127636909485
Validation loss = 0.12378963083028793
Validation loss = 0.11883632093667984
Validation loss = 0.11995826661586761
Validation loss = 0.12381742894649506
Validation loss = 0.12019578367471695
Validation loss = 0.12008903175592422
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1515224426984787
Validation loss = 0.1295696347951889
Validation loss = 0.12320633232593536
Validation loss = 0.12132662534713745
Validation loss = 0.12090998888015747
Validation loss = 0.12587520480155945
Validation loss = 0.1282995343208313
Validation loss = 0.13977143168449402
Validation loss = 0.11781897395849228
Validation loss = 0.11726365238428116
Validation loss = 0.11803605407476425
Validation loss = 0.11754735559225082
Validation loss = 0.17109286785125732
Validation loss = 0.12101192772388458
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14943024516105652
Validation loss = 0.12944671511650085
Validation loss = 0.12468049675226212
Validation loss = 0.12210366874933243
Validation loss = 0.12105076014995575
Validation loss = 0.1254148930311203
Validation loss = 0.15383557975292206
Validation loss = 0.11834979057312012
Validation loss = 0.11713895946741104
Validation loss = 0.1178102120757103
Validation loss = 0.11909375339746475
Validation loss = 0.12546852231025696
Validation loss = 0.12152256071567535
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 809       |
| Iteration     | 9         |
| MaximumReturn | 1.95e+03  |
| MinimumReturn | -1.39e+03 |
| TotalSamples  | 44000     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11848814785480499
Validation loss = 0.10667610913515091
Validation loss = 0.10080432146787643
Validation loss = 0.0992511585354805
Validation loss = 0.1038275733590126
Validation loss = 0.09979691356420517
Validation loss = 0.10243383049964905
Validation loss = 0.10376536846160889
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12620629370212555
Validation loss = 0.10769229382276535
Validation loss = 0.1044812723994255
Validation loss = 0.10178006440401077
Validation loss = 0.10193424671888351
Validation loss = 0.1020212396979332
Validation loss = 0.10334989428520203
Validation loss = 0.11412450671195984
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15064378082752228
Validation loss = 0.10646674782037735
Validation loss = 0.10724563896656036
Validation loss = 0.10400872677564621
Validation loss = 0.10499542206525803
Validation loss = 0.10405108332633972
Validation loss = 0.10813172161579132
Validation loss = 0.10369608551263809
Validation loss = 0.11089062690734863
Validation loss = 0.1002543643116951
Validation loss = 0.09822149574756622
Validation loss = 0.10096588730812073
Validation loss = 0.10051064193248749
Validation loss = 0.09936845302581787
Validation loss = 0.11592314392328262
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12745507061481476
Validation loss = 0.10737044364213943
Validation loss = 0.10527360439300537
Validation loss = 0.10099849104881287
Validation loss = 0.10001376271247864
Validation loss = 0.09955890476703644
Validation loss = 0.09927099198102951
Validation loss = 0.109876848757267
Validation loss = 0.09933774173259735
Validation loss = 0.09631633013486862
Validation loss = 0.0972411185503006
Validation loss = 0.12033563107252121
Validation loss = 0.0991843193769455
Validation loss = 0.09542746096849442
Validation loss = 0.09421107172966003
Validation loss = 0.09919934719800949
Validation loss = 0.10283368080854416
Validation loss = 0.09401946514844894
Validation loss = 0.09353351593017578
Validation loss = 0.09599029272794724
Validation loss = 0.11014404147863388
Validation loss = 0.0985662192106247
Validation loss = 0.09209667891263962
Validation loss = 0.09190358966588974
Validation loss = 0.09919473528862
Validation loss = 0.09710424393415451
Validation loss = 0.09386077523231506
Validation loss = 0.09263121336698532
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13449431955814362
Validation loss = 0.10860073566436768
Validation loss = 0.10583531111478806
Validation loss = 0.10210645198822021
Validation loss = 0.10365409404039383
Validation loss = 0.1109134629368782
Validation loss = 0.10661529749631882
Validation loss = 0.1016300693154335
Validation loss = 0.10253860801458359
Validation loss = 0.10323695838451385
Validation loss = 0.10494793206453323
Validation loss = 0.11271362006664276
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -997      |
| Iteration     | 10        |
| MaximumReturn | -76.9     |
| MinimumReturn | -1.97e+03 |
| TotalSamples  | 48000     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12652647495269775
Validation loss = 0.09636688232421875
Validation loss = 0.09186425805091858
Validation loss = 0.08897925168275833
Validation loss = 0.09928587824106216
Validation loss = 0.09254392981529236
Validation loss = 0.08900532871484756
Validation loss = 0.0891440212726593
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11741989850997925
Validation loss = 0.09917882829904556
Validation loss = 0.10038652271032333
Validation loss = 0.09273578971624374
Validation loss = 0.09585896879434586
Validation loss = 0.09635922312736511
Validation loss = 0.09543462842702866
Validation loss = 0.09863078594207764
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11368048191070557
Validation loss = 0.09479589015245438
Validation loss = 0.08992870897054672
Validation loss = 0.09157297015190125
Validation loss = 0.09017648547887802
Validation loss = 0.09096208959817886
Validation loss = 0.090140201151371
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11065756529569626
Validation loss = 0.09255701303482056
Validation loss = 0.08503621816635132
Validation loss = 0.08296819776296616
Validation loss = 0.08440059423446655
Validation loss = 0.08433741331100464
Validation loss = 0.08575563877820969
Validation loss = 0.10168381780385971
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11718448251485825
Validation loss = 0.09400814026594162
Validation loss = 0.09093260765075684
Validation loss = 0.09100860357284546
Validation loss = 0.09383616596460342
Validation loss = 0.1100243330001831
Validation loss = 0.09522601217031479
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 44.9     |
| Iteration     | 11       |
| MaximumReturn | 1.05e+03 |
| MinimumReturn | -512     |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10000494122505188
Validation loss = 0.08887560665607452
Validation loss = 0.08395281434059143
Validation loss = 0.08118975162506104
Validation loss = 0.08109898120164871
Validation loss = 0.08049175143241882
Validation loss = 0.08690248429775238
Validation loss = 0.0799906924366951
Validation loss = 0.07934419810771942
Validation loss = 0.08248011767864227
Validation loss = 0.07984799891710281
Validation loss = 0.09129247069358826
Validation loss = 0.07793031632900238
Validation loss = 0.07661231607198715
Validation loss = 0.08196526020765305
Validation loss = 0.08092363178730011
Validation loss = 0.07779265940189362
Validation loss = 0.07605300098657608
Validation loss = 0.08228778839111328
Validation loss = 0.0852845162153244
Validation loss = 0.07492765784263611
Validation loss = 0.07419507950544357
Validation loss = 0.07478266954421997
Validation loss = 0.08599589765071869
Validation loss = 0.07638467103242874
Validation loss = 0.07280322909355164
Validation loss = 0.0727277472615242
Validation loss = 0.07295673340559006
Validation loss = 0.0863751620054245
Validation loss = 0.07398401200771332
Validation loss = 0.07131847739219666
Validation loss = 0.07334036380052567
Validation loss = 0.078159399330616
Validation loss = 0.08653886616230011
Validation loss = 0.07200811803340912
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11676037311553955
Validation loss = 0.0942472517490387
Validation loss = 0.08613591641187668
Validation loss = 0.0835263729095459
Validation loss = 0.08615387976169586
Validation loss = 0.14541569352149963
Validation loss = 0.08301278948783875
Validation loss = 0.08214407414197922
Validation loss = 0.08281657099723816
Validation loss = 0.08347974717617035
Validation loss = 0.0829419419169426
Validation loss = 0.08265937119722366
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10816482454538345
Validation loss = 0.08385885506868362
Validation loss = 0.0794772207736969
Validation loss = 0.08074529469013214
Validation loss = 0.08065709471702576
Validation loss = 0.08517151325941086
Validation loss = 0.07855837047100067
Validation loss = 0.07918817549943924
Validation loss = 0.09466876089572906
Validation loss = 0.08013325184583664
Validation loss = 0.07648804783821106
Validation loss = 0.07864587008953094
Validation loss = 0.08160947263240814
Validation loss = 0.08137350529432297
Validation loss = 0.07590068131685257
Validation loss = 0.07656243443489075
Validation loss = 0.07800190150737762
Validation loss = 0.0986422449350357
Validation loss = 0.07820501923561096
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09881014376878738
Validation loss = 0.0850549191236496
Validation loss = 0.08001307398080826
Validation loss = 0.07487668842077255
Validation loss = 0.07458782196044922
Validation loss = 0.07767478376626968
Validation loss = 0.08727717399597168
Validation loss = 0.07536041736602783
Validation loss = 0.07501742988824844
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09587915241718292
Validation loss = 0.08655397593975067
Validation loss = 0.08453042805194855
Validation loss = 0.08235698193311691
Validation loss = 0.08785724639892578
Validation loss = 0.08240314573049545
Validation loss = 0.09001071751117706
Validation loss = 0.08483824133872986
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 676      |
| Iteration     | 12       |
| MaximumReturn | 1.37e+03 |
| MinimumReturn | -100     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07816682755947113
Validation loss = 0.07398208230733871
Validation loss = 0.06702904403209686
Validation loss = 0.067349873483181
Validation loss = 0.07174905389547348
Validation loss = 0.07376732677221298
Validation loss = 0.06866712868213654
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09907560050487518
Validation loss = 0.07996521145105362
Validation loss = 0.07537935674190521
Validation loss = 0.07309621572494507
Validation loss = 0.08077564090490341
Validation loss = 0.07751613110303879
Validation loss = 0.07369358092546463
Validation loss = 0.07534204423427582
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08016292750835419
Validation loss = 0.07223282754421234
Validation loss = 0.07353097200393677
Validation loss = 0.07120261341333389
Validation loss = 0.07643964141607285
Validation loss = 0.07063467055559158
Validation loss = 0.06832312792539597
Validation loss = 0.0709238350391388
Validation loss = 0.07421007007360458
Validation loss = 0.06850645691156387
Validation loss = 0.06684724986553192
Validation loss = 0.06849393993616104
Validation loss = 0.07448994368314743
Validation loss = 0.06900099664926529
Validation loss = 0.06698530167341232
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0847598984837532
Validation loss = 0.07567797601222992
Validation loss = 0.07030902057886124
Validation loss = 0.06996176391839981
Validation loss = 0.07085464149713516
Validation loss = 0.0746106430888176
Validation loss = 0.06840775161981583
Validation loss = 0.06777095049619675
Validation loss = 0.07194022834300995
Validation loss = 0.07489822059869766
Validation loss = 0.06577876210212708
Validation loss = 0.06858187913894653
Validation loss = 0.06862198561429977
Validation loss = 0.07443440705537796
Validation loss = 0.06604070216417313
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09369396418333054
Validation loss = 0.07816653698682785
Validation loss = 0.07494097203016281
Validation loss = 0.07392921298742294
Validation loss = 0.07876072824001312
Validation loss = 0.08537133038043976
Validation loss = 0.07547809183597565
Validation loss = 0.07506909221410751
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 745       |
| Iteration     | 13        |
| MaximumReturn | 1.98e+03  |
| MinimumReturn | -1.03e+03 |
| TotalSamples  | 60000     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08217346668243408
Validation loss = 0.06970496475696564
Validation loss = 0.06498748064041138
Validation loss = 0.06511152535676956
Validation loss = 0.06791576743125916
Validation loss = 0.07001089304685593
Validation loss = 0.06503359228372574
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09249000996351242
Validation loss = 0.08223652839660645
Validation loss = 0.07220480591058731
Validation loss = 0.07148686796426773
Validation loss = 0.07287359237670898
Validation loss = 0.07699968665838242
Validation loss = 0.07258453965187073
Validation loss = 0.07375244051218033
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07487975060939789
Validation loss = 0.06676925718784332
Validation loss = 0.06535159796476364
Validation loss = 0.0673694908618927
Validation loss = 0.0754639208316803
Validation loss = 0.0656842514872551
Validation loss = 0.0660485252737999
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07669411599636078
Validation loss = 0.06690247356891632
Validation loss = 0.06631907820701599
Validation loss = 0.06863836199045181
Validation loss = 0.07616028189659119
Validation loss = 0.06956986337900162
Validation loss = 0.0699409767985344
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08436525613069534
Validation loss = 0.07570770382881165
Validation loss = 0.07444509118795395
Validation loss = 0.0734686627984047
Validation loss = 0.07654707133769989
Validation loss = 0.07115019857883453
Validation loss = 0.0704566091299057
Validation loss = 0.08557344228029251
Validation loss = 0.07120020687580109
Validation loss = 0.07061399519443512
Validation loss = 0.07431576400995255
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.7e+03  |
| Iteration     | 14       |
| MaximumReturn | 2.29e+03 |
| MinimumReturn | 1e+03    |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0743081271648407
Validation loss = 0.06394539773464203
Validation loss = 0.06095748767256737
Validation loss = 0.06760929524898529
Validation loss = 0.0620986670255661
Validation loss = 0.0616571381688118
Validation loss = 0.06318238377571106
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08561266958713531
Validation loss = 0.07003490626811981
Validation loss = 0.06722861528396606
Validation loss = 0.07595541328191757
Validation loss = 0.07510313391685486
Validation loss = 0.06741908192634583
Validation loss = 0.06518618762493134
Validation loss = 0.06604954600334167
Validation loss = 0.07485558092594147
Validation loss = 0.06579775363206863
Validation loss = 0.06468260288238525
Validation loss = 0.07479673624038696
Validation loss = 0.06515519320964813
Validation loss = 0.06319242715835571
Validation loss = 0.07290937006473541
Validation loss = 0.06611226499080658
Validation loss = 0.07613971084356308
Validation loss = 0.06610631942749023
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07454837113618851
Validation loss = 0.06371471285820007
Validation loss = 0.061380304396152496
Validation loss = 0.06260447204113007
Validation loss = 0.06707494705915451
Validation loss = 0.0628054291009903
Validation loss = 0.06217287853360176
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0782952532172203
Validation loss = 0.0637335404753685
Validation loss = 0.061565715819597244
Validation loss = 0.06236952170729637
Validation loss = 0.07210391759872437
Validation loss = 0.06160629168152809
Validation loss = 0.06028789281845093
Validation loss = 0.06225802004337311
Validation loss = 0.06500901281833649
Validation loss = 0.06321203708648682
Validation loss = 0.06379982829093933
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08781994879245758
Validation loss = 0.07053492963314056
Validation loss = 0.06817588210105896
Validation loss = 0.06627833843231201
Validation loss = 0.07528059184551239
Validation loss = 0.0695338100194931
Validation loss = 0.06548471003770828
Validation loss = 0.06563397496938705
Validation loss = 0.068341925740242
Validation loss = 0.06680811941623688
Validation loss = 0.0661369264125824
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 519       |
| Iteration     | 15        |
| MaximumReturn | 2e+03     |
| MinimumReturn | -1.57e+03 |
| TotalSamples  | 68000     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07191149145364761
Validation loss = 0.05903360992670059
Validation loss = 0.05519565939903259
Validation loss = 0.05853693559765816
Validation loss = 0.059646740555763245
Validation loss = 0.06599785387516022
Validation loss = 0.05934048444032669
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07381464540958405
Validation loss = 0.06111504137516022
Validation loss = 0.057996317744255066
Validation loss = 0.06649771332740784
Validation loss = 0.0596068874001503
Validation loss = 0.057920556515455246
Validation loss = 0.06236758828163147
Validation loss = 0.06311512738466263
Validation loss = 0.05775349587202072
Validation loss = 0.057693906128406525
Validation loss = 0.06609170883893967
Validation loss = 0.061465319246053696
Validation loss = 0.05708086118102074
Validation loss = 0.06020958349108696
Validation loss = 0.05886344984173775
Validation loss = 0.05588683485984802
Validation loss = 0.07161122560501099
Validation loss = 0.054978061467409134
Validation loss = 0.05532527714967728
Validation loss = 0.056986626237630844
Validation loss = 0.06156514585018158
Validation loss = 0.05640092119574547
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09683556109666824
Validation loss = 0.060164015740156174
Validation loss = 0.05728208273649216
Validation loss = 0.059862565249204636
Validation loss = 0.05871046334505081
Validation loss = 0.06198463588953018
Validation loss = 0.05708665773272514
Validation loss = 0.06245608255267143
Validation loss = 0.05565411224961281
Validation loss = 0.05570606887340546
Validation loss = 0.06167345121502876
Validation loss = 0.05905389040708542
Validation loss = 0.058237217366695404
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07251531630754471
Validation loss = 0.057798609137535095
Validation loss = 0.057464420795440674
Validation loss = 0.057954974472522736
Validation loss = 0.06541309505701065
Validation loss = 0.05984828621149063
Validation loss = 0.057015739381313324
Validation loss = 0.05787176638841629
Validation loss = 0.0641779825091362
Validation loss = 0.055206622928380966
Validation loss = 0.05645434930920601
Validation loss = 0.058335550129413605
Validation loss = 0.06682771444320679
Validation loss = 0.05652724206447601
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06871920824050903
Validation loss = 0.06792158633470535
Validation loss = 0.06137203425168991
Validation loss = 0.05904769152402878
Validation loss = 0.0629606768488884
Validation loss = 0.059005316346883774
Validation loss = 0.06249411776661873
Validation loss = 0.061709340661764145
Validation loss = 0.05873427540063858
Validation loss = 0.06947725266218185
Validation loss = 0.06712336093187332
Validation loss = 0.056902799755334854
Validation loss = 0.056816019117832184
Validation loss = 0.06390878558158875
Validation loss = 0.05964292585849762
Validation loss = 0.05745052918791771
Validation loss = 0.06527203321456909
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 708       |
| Iteration     | 16        |
| MaximumReturn | 2.66e+03  |
| MinimumReturn | -1.45e+03 |
| TotalSamples  | 72000     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07083000987768173
Validation loss = 0.06136196106672287
Validation loss = 0.05603767931461334
Validation loss = 0.05551022291183472
Validation loss = 0.061899617314338684
Validation loss = 0.05698617920279503
Validation loss = 0.05503252521157265
Validation loss = 0.06345328688621521
Validation loss = 0.05502862110733986
Validation loss = 0.054412245750427246
Validation loss = 0.06073065474629402
Validation loss = 0.05630534142255783
Validation loss = 0.054376233369112015
Validation loss = 0.055121198296546936
Validation loss = 0.05984039977192879
Validation loss = 0.05280233547091484
Validation loss = 0.05242085084319115
Validation loss = 0.05780680850148201
Validation loss = 0.08396823704242706
Validation loss = 0.052815262228250504
Validation loss = 0.053496308624744415
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0674322322010994
Validation loss = 0.05756489560008049
Validation loss = 0.05643143877387047
Validation loss = 0.054768409579992294
Validation loss = 0.057314224541187286
Validation loss = 0.05950954183936119
Validation loss = 0.0579812116920948
Validation loss = 0.05484164133667946
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06679221987724304
Validation loss = 0.057093195617198944
Validation loss = 0.056096699088811874
Validation loss = 0.06044444069266319
Validation loss = 0.05623387545347214
Validation loss = 0.05934051051735878
Validation loss = 0.06396045535802841
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06155596300959587
Validation loss = 0.056067876517772675
Validation loss = 0.0559370182454586
Validation loss = 0.062457408756017685
Validation loss = 0.0601569227874279
Validation loss = 0.05412762984633446
Validation loss = 0.06214506924152374
Validation loss = 0.05734207108616829
Validation loss = 0.05742080882191658
Validation loss = 0.05407794937491417
Validation loss = 0.05496811866760254
Validation loss = 0.05660267174243927
Validation loss = 0.05221223086118698
Validation loss = 0.05355684459209442
Validation loss = 0.05588463321328163
Validation loss = 0.05459818243980408
Validation loss = 0.05297721177339554
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07345947623252869
Validation loss = 0.05867937579751015
Validation loss = 0.05634172633290291
Validation loss = 0.057440903037786484
Validation loss = 0.05699717998504639
Validation loss = 0.06113085523247719
Validation loss = 0.05758777633309364
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 1.41e+03  |
| Iteration     | 17        |
| MaximumReturn | 2.65e+03  |
| MinimumReturn | -2.57e+03 |
| TotalSamples  | 76000     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06688612699508667
Validation loss = 0.05509571731090546
Validation loss = 0.05186564475297928
Validation loss = 0.052952997386455536
Validation loss = 0.05849030241370201
Validation loss = 0.05203937366604805
Validation loss = 0.051681891083717346
Validation loss = 0.05601053312420845
Validation loss = 0.05409613996744156
Validation loss = 0.05024328455328941
Validation loss = 0.0530209057033062
Validation loss = 0.056060440838336945
Validation loss = 0.04886269196867943
Validation loss = 0.05019613355398178
Validation loss = 0.058985766023397446
Validation loss = 0.052172962576150894
Validation loss = 0.049859434366226196
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06628638505935669
Validation loss = 0.05524977296590805
Validation loss = 0.055343855172395706
Validation loss = 0.06189733371138573
Validation loss = 0.054294705390930176
Validation loss = 0.053660981357097626
Validation loss = 0.05910811200737953
Validation loss = 0.054370053112506866
Validation loss = 0.051750507205724716
Validation loss = 0.05484743416309357
Validation loss = 0.05767710506916046
Validation loss = 0.05348588153719902
Validation loss = 0.052840836346149445
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06389588117599487
Validation loss = 0.0542534701526165
Validation loss = 0.05387670919299126
Validation loss = 0.058692581951618195
Validation loss = 0.05944165959954262
Validation loss = 0.0541902594268322
Validation loss = 0.05425524711608887
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05551297590136528
Validation loss = 0.0549439899623394
Validation loss = 0.059420157223939896
Validation loss = 0.05212166905403137
Validation loss = 0.05457238107919693
Validation loss = 0.06325056403875351
Validation loss = 0.05266626924276352
Validation loss = 0.05129703879356384
Validation loss = 0.06042300537228584
Validation loss = 0.053664859384298325
Validation loss = 0.05151013284921646
Validation loss = 0.05072048678994179
Validation loss = 0.06332587450742722
Validation loss = 0.05067170783877373
Validation loss = 0.051791127771139145
Validation loss = 0.056106097996234894
Validation loss = 0.059213362634181976
Validation loss = 0.04971708729863167
Validation loss = 0.05662137642502785
Validation loss = 0.059600215405225754
Validation loss = 0.04977714642882347
Validation loss = 0.05874839425086975
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.061052609235048294
Validation loss = 0.058981411159038544
Validation loss = 0.05711435526609421
Validation loss = 0.05470274016261101
Validation loss = 0.058406464755535126
Validation loss = 0.0585261695086956
Validation loss = 0.05451171100139618
Validation loss = 0.054101817309856415
Validation loss = 0.06605526804924011
Validation loss = 0.052882056683301926
Validation loss = 0.06106925010681152
Validation loss = 0.0548299103975296
Validation loss = 0.05449000746011734
Validation loss = 0.05265912786126137
Validation loss = 0.05778391659259796
Validation loss = 0.0642649307847023
Validation loss = 0.050881560891866684
Validation loss = 0.052086565643548965
Validation loss = 0.054206810891628265
Validation loss = 0.06402164697647095
Validation loss = 0.051636386662721634
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.29e+03 |
| Iteration     | 18       |
| MaximumReturn | 2.45e+03 |
| MinimumReturn | -903     |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05874428153038025
Validation loss = 0.05078590661287308
Validation loss = 0.04910973832011223
Validation loss = 0.052250124514102936
Validation loss = 0.04924873262643814
Validation loss = 0.05277333781123161
Validation loss = 0.04863167926669121
Validation loss = 0.05789691209793091
Validation loss = 0.05498788505792618
Validation loss = 0.046745166182518005
Validation loss = 0.04674951359629631
Validation loss = 0.05126311630010605
Validation loss = 0.05044085904955864
Validation loss = 0.045993901789188385
Validation loss = 0.048215173184871674
Validation loss = 0.050740502774715424
Validation loss = 0.046324603259563446
Validation loss = 0.04621835798025131
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0590481236577034
Validation loss = 0.05032126232981682
Validation loss = 0.04978669062256813
Validation loss = 0.05030401423573494
Validation loss = 0.05475127696990967
Validation loss = 0.04890350252389908
Validation loss = 0.05193667486310005
Validation loss = 0.0645238384604454
Validation loss = 0.049613721668720245
Validation loss = 0.04829332232475281
Validation loss = 0.05446655675768852
Validation loss = 0.047907955944538116
Validation loss = 0.047246210277080536
Validation loss = 0.05092684552073479
Validation loss = 0.04915308207273483
Validation loss = 0.0527925118803978
Validation loss = 0.05241663008928299
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07490485161542892
Validation loss = 0.05497732013463974
Validation loss = 0.05268002673983574
Validation loss = 0.052117861807346344
Validation loss = 0.0569671168923378
Validation loss = 0.051435839384794235
Validation loss = 0.05501669645309448
Validation loss = 0.05328911542892456
Validation loss = 0.058700453490018845
Validation loss = 0.050859253853559494
Validation loss = 0.05456441640853882
Validation loss = 0.054687827825546265
Validation loss = 0.04942212253808975
Validation loss = 0.05723195523023605
Validation loss = 0.05797401815652847
Validation loss = 0.04920036345720291
Validation loss = 0.04961343854665756
Validation loss = 0.060009002685546875
Validation loss = 0.04922328144311905
Validation loss = 0.04941462725400925
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06239895895123482
Validation loss = 0.04849696159362793
Validation loss = 0.04810958355665207
Validation loss = 0.05459577962756157
Validation loss = 0.048429954797029495
Validation loss = 0.04670611768960953
Validation loss = 0.05242500454187393
Validation loss = 0.05254683643579483
Validation loss = 0.0474499836564064
Validation loss = 0.04953709989786148
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.056167490780353546
Validation loss = 0.04988845810294151
Validation loss = 0.05251949280500412
Validation loss = 0.05068995803594589
Validation loss = 0.048912759870290756
Validation loss = 0.056089796125888824
Validation loss = 0.048782575875520706
Validation loss = 0.049077123403549194
Validation loss = 0.06026679277420044
Validation loss = 0.04962896928191185
Validation loss = 0.047571707516908646
Validation loss = 0.049168966710567474
Validation loss = 0.05723494291305542
Validation loss = 0.04735767841339111
Validation loss = 0.04863009229302406
Validation loss = 0.05173264816403389
Validation loss = 0.055118363350629807
Validation loss = 0.04732215404510498
Validation loss = 0.049180034548044205
Validation loss = 0.04655333608388901
Validation loss = 0.048839468508958817
Validation loss = 0.05020471289753914
Validation loss = 0.04547593742609024
Validation loss = 0.04934925585985184
Validation loss = 0.047398049384355545
Validation loss = 0.04542277380824089
Validation loss = 0.045531824231147766
Validation loss = 0.0537683479487896
Validation loss = 0.04701235145330429
Validation loss = 0.04531288146972656
Validation loss = 0.04782278463244438
Validation loss = 0.04650498554110527
Validation loss = 0.05499458312988281
Validation loss = 0.044177837669849396
Validation loss = 0.04439384490251541
Validation loss = 0.04638335853815079
Validation loss = 0.045237310230731964
Validation loss = 0.04674653336405754
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.71e+03 |
| Iteration     | 19       |
| MaximumReturn | 2.77e+03 |
| MinimumReturn | -1e+03   |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05488549917936325
Validation loss = 0.04796382412314415
Validation loss = 0.05216481536626816
Validation loss = 0.05036696419119835
Validation loss = 0.04519697278738022
Validation loss = 0.053771745413541794
Validation loss = 0.04413219541311264
Validation loss = 0.04399080574512482
Validation loss = 0.04548167437314987
Validation loss = 0.05000748485326767
Validation loss = 0.04397546499967575
Validation loss = 0.042712047696113586
Validation loss = 0.054194048047065735
Validation loss = 0.044416673481464386
Validation loss = 0.043872468173503876
Validation loss = 0.04437282681465149
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.054647911339998245
Validation loss = 0.04789279028773308
Validation loss = 0.045990437269210815
Validation loss = 0.05178026854991913
Validation loss = 0.048396650701761246
Validation loss = 0.04544087499380112
Validation loss = 0.04687618464231491
Validation loss = 0.05333869904279709
Validation loss = 0.04463418945670128
Validation loss = 0.04664570465683937
Validation loss = 0.047687429934740067
Validation loss = 0.04633540287613869
Validation loss = 0.04447431117296219
Validation loss = 0.04708690941333771
Validation loss = 0.04973930865526199
Validation loss = 0.043296150863170624
Validation loss = 0.04397699236869812
Validation loss = 0.04779892787337303
Validation loss = 0.044032853096723557
Validation loss = 0.04323260858654976
Validation loss = 0.049560632556676865
Validation loss = 0.05297445133328438
Validation loss = 0.04340207204222679
Validation loss = 0.04484771937131882
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06282073259353638
Validation loss = 0.04937252029776573
Validation loss = 0.046676069498062134
Validation loss = 0.050283581018447876
Validation loss = 0.05752021446824074
Validation loss = 0.04701859876513481
Validation loss = 0.04611750692129135
Validation loss = 0.04905417561531067
Validation loss = 0.048277467489242554
Validation loss = 0.04588152840733528
Validation loss = 0.05561999976634979
Validation loss = 0.0491647906601429
Validation loss = 0.0463603250682354
Validation loss = 0.04599612578749657
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.060815051198005676
Validation loss = 0.050986431539058685
Validation loss = 0.04535556584596634
Validation loss = 0.0463811457157135
Validation loss = 0.05283079668879509
Validation loss = 0.04591131582856178
Validation loss = 0.04391489550471306
Validation loss = 0.05055798962712288
Validation loss = 0.047013409435749054
Validation loss = 0.046148378401994705
Validation loss = 0.050842154771089554
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05837935954332352
Validation loss = 0.04288562014698982
Validation loss = 0.042657990008592606
Validation loss = 0.044964615255594254
Validation loss = 0.0431676059961319
Validation loss = 0.06549607217311859
Validation loss = 0.04145335406064987
Validation loss = 0.04216068238019943
Validation loss = 0.04331152141094208
Validation loss = 0.04712492227554321
Validation loss = 0.0423152893781662
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.65e+03 |
| Iteration     | 20       |
| MaximumReturn | 2.19e+03 |
| MinimumReturn | 1.02e+03 |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0562921017408371
Validation loss = 0.04381633177399635
Validation loss = 0.04426657781004906
Validation loss = 0.04314018413424492
Validation loss = 0.0574675053358078
Validation loss = 0.04338167607784271
Validation loss = 0.04268547520041466
Validation loss = 0.061179790645837784
Validation loss = 0.04338259622454643
Validation loss = 0.042587582021951675
Validation loss = 0.04722388833761215
Validation loss = 0.046019650995731354
Validation loss = 0.0422540009021759
Validation loss = 0.04325298219919205
Validation loss = 0.05099407210946083
Validation loss = 0.04258527606725693
Validation loss = 0.04236847162246704
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06056192144751549
Validation loss = 0.04440617561340332
Validation loss = 0.04437294229865074
Validation loss = 0.0474611297249794
Validation loss = 0.047684844583272934
Validation loss = 0.04320511594414711
Validation loss = 0.043245695531368256
Validation loss = 0.045707058161497116
Validation loss = 0.04624021053314209
Validation loss = 0.04600653797388077
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.056713446974754333
Validation loss = 0.048161447048187256
Validation loss = 0.04519355669617653
Validation loss = 0.0499519445002079
Validation loss = 0.048361942172050476
Validation loss = 0.04638083279132843
Validation loss = 0.048242416232824326
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05484849587082863
Validation loss = 0.04579208791255951
Validation loss = 0.04498043283820152
Validation loss = 0.04629744589328766
Validation loss = 0.049779120832681656
Validation loss = 0.045935217291116714
Validation loss = 0.044715505093336105
Validation loss = 0.0573045052587986
Validation loss = 0.04409365355968475
Validation loss = 0.043099697679281235
Validation loss = 0.05249488726258278
Validation loss = 0.045152828097343445
Validation loss = 0.04378556087613106
Validation loss = 0.05629223957657814
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.046630535274744034
Validation loss = 0.04282987490296364
Validation loss = 0.04199035465717316
Validation loss = 0.04904749616980553
Validation loss = 0.03983153775334358
Validation loss = 0.04011404141783714
Validation loss = 0.04364221915602684
Validation loss = 0.04005210101604462
Validation loss = 0.04071991518139839
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.11e+03 |
| Iteration     | 21       |
| MaximumReturn | 2.49e+03 |
| MinimumReturn | -327     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05668994039297104
Validation loss = 0.041645634919404984
Validation loss = 0.04037981480360031
Validation loss = 0.04275710508227348
Validation loss = 0.045349474996328354
Validation loss = 0.04548928141593933
Validation loss = 0.04713144525885582
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06615165621042252
Validation loss = 0.04477446898818016
Validation loss = 0.04029792547225952
Validation loss = 0.04197224602103233
Validation loss = 0.04510817304253578
Validation loss = 0.040731120854616165
Validation loss = 0.0482487827539444
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06526634097099304
Validation loss = 0.046620868146419525
Validation loss = 0.04543810710310936
Validation loss = 0.04653249680995941
Validation loss = 0.04425041005015373
Validation loss = 0.05524206534028053
Validation loss = 0.04376288503408432
Validation loss = 0.04579440504312515
Validation loss = 0.04563813656568527
Validation loss = 0.04386115074157715
Validation loss = 0.06419609487056732
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0591517798602581
Validation loss = 0.04457322508096695
Validation loss = 0.04252936691045761
Validation loss = 0.04299246147274971
Validation loss = 0.04579341411590576
Validation loss = 0.04243156313896179
Validation loss = 0.044711679220199585
Validation loss = 0.04471377283334732
Validation loss = 0.04922597482800484
Validation loss = 0.04234696552157402
Validation loss = 0.041306331753730774
Validation loss = 0.04370909556746483
Validation loss = 0.04651380702853203
Validation loss = 0.043503280729055405
Validation loss = 0.04066634178161621
Validation loss = 0.042540743947029114
Validation loss = 0.04718505218625069
Validation loss = 0.041645172983407974
Validation loss = 0.04065575450658798
Validation loss = 0.04353589937090874
Validation loss = 0.044264841824769974
Validation loss = 0.03987347334623337
Validation loss = 0.041071951389312744
Validation loss = 0.0446893684566021
Validation loss = 0.040905099362134933
Validation loss = 0.0402417927980423
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05738773196935654
Validation loss = 0.03916597366333008
Validation loss = 0.038912083953619
Validation loss = 0.04095280542969704
Validation loss = 0.03996395692229271
Validation loss = 0.03800138086080551
Validation loss = 0.04136437922716141
Validation loss = 0.04240495339035988
Validation loss = 0.04141411930322647
Validation loss = 0.04364732652902603
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.18e+03 |
| Iteration     | 22       |
| MaximumReturn | 2.37e+03 |
| MinimumReturn | 1.93e+03 |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0491272509098053
Validation loss = 0.03928093984723091
Validation loss = 0.040159787982702255
Validation loss = 0.0444885790348053
Validation loss = 0.0392766110599041
Validation loss = 0.040104445070028305
Validation loss = 0.04635832831263542
Validation loss = 0.043294280767440796
Validation loss = 0.03855946660041809
Validation loss = 0.048339586704969406
Validation loss = 0.03982464224100113
Validation loss = 0.03930450975894928
Validation loss = 0.04018963873386383
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.056706752628088
Validation loss = 0.040667299181222916
Validation loss = 0.042113304138183594
Validation loss = 0.04146542027592659
Validation loss = 0.04042891785502434
Validation loss = 0.05195111036300659
Validation loss = 0.04007585346698761
Validation loss = 0.04086240753531456
Validation loss = 0.041142188012599945
Validation loss = 0.0508807897567749
Validation loss = 0.03917769715189934
Validation loss = 0.039772484451532364
Validation loss = 0.04274570569396019
Validation loss = 0.04166070744395256
Validation loss = 0.038246527314186096
Validation loss = 0.04689301177859306
Validation loss = 0.0424056351184845
Validation loss = 0.03781532123684883
Validation loss = 0.0396418459713459
Validation loss = 0.03907124698162079
Validation loss = 0.042828064411878586
Validation loss = 0.03903704136610031
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06410194933414459
Validation loss = 0.04446719214320183
Validation loss = 0.042705733329057693
Validation loss = 0.04601411148905754
Validation loss = 0.047782257199287415
Validation loss = 0.0421656034886837
Validation loss = 0.045834094285964966
Validation loss = 0.04222775995731354
Validation loss = 0.04214552044868469
Validation loss = 0.06022268533706665
Validation loss = 0.04239450767636299
Validation loss = 0.04163505136966705
Validation loss = 0.05119625851511955
Validation loss = 0.041043899953365326
Validation loss = 0.04298674687743187
Validation loss = 0.04605253040790558
Validation loss = 0.04886791110038757
Validation loss = 0.040883708745241165
Validation loss = 0.040580082684755325
Validation loss = 0.04902328923344612
Validation loss = 0.04035042226314545
Validation loss = 0.041272226721048355
Validation loss = 0.0462934672832489
Validation loss = 0.04181418940424919
Validation loss = 0.04689866304397583
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0525682233273983
Validation loss = 0.04219064489006996
Validation loss = 0.03988083824515343
Validation loss = 0.04154455289244652
Validation loss = 0.04231402277946472
Validation loss = 0.03969978913664818
Validation loss = 0.03828424587845802
Validation loss = 0.04752590134739876
Validation loss = 0.0395316518843174
Validation loss = 0.044385772198438644
Validation loss = 0.044608648866415024
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04964997246861458
Validation loss = 0.03778879716992378
Validation loss = 0.03699653223156929
Validation loss = 0.04578274488449097
Validation loss = 0.03839908167719841
Validation loss = 0.03685999661684036
Validation loss = 0.05227186158299446
Validation loss = 0.03733917698264122
Validation loss = 0.03816811367869377
Validation loss = 0.04758184775710106
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.99e+03 |
| Iteration     | 23       |
| MaximumReturn | 2.75e+03 |
| MinimumReturn | 1.42e+03 |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05149967595934868
Validation loss = 0.03876602277159691
Validation loss = 0.03925381600856781
Validation loss = 0.04076169431209564
Validation loss = 0.04134054109454155
Validation loss = 0.037661779671907425
Validation loss = 0.04473567381501198
Validation loss = 0.040036048740148544
Validation loss = 0.038311246782541275
Validation loss = 0.0386001355946064
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.049596503376960754
Validation loss = 0.03824348375201225
Validation loss = 0.04093443974852562
Validation loss = 0.04077400639653206
Validation loss = 0.037688855081796646
Validation loss = 0.038151152431964874
Validation loss = 0.03928956016898155
Validation loss = 0.037132471799850464
Validation loss = 0.04654313623905182
Validation loss = 0.03803669288754463
Validation loss = 0.040573522448539734
Validation loss = 0.03932229056954384
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06232530623674393
Validation loss = 0.04092435911297798
Validation loss = 0.039087917655706406
Validation loss = 0.05087125301361084
Validation loss = 0.0396236889064312
Validation loss = 0.039703670889139175
Validation loss = 0.04772278666496277
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04402569308876991
Validation loss = 0.03804629668593407
Validation loss = 0.05617397278547287
Validation loss = 0.03889092057943344
Validation loss = 0.03885935619473457
Validation loss = 0.03839803859591484
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05225756764411926
Validation loss = 0.036227867007255554
Validation loss = 0.03615393489599228
Validation loss = 0.04201824218034744
Validation loss = 0.037154849618673325
Validation loss = 0.03640705719590187
Validation loss = 0.03884395956993103
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.87e+03 |
| Iteration     | 24       |
| MaximumReturn | 2.36e+03 |
| MinimumReturn | 1.2e+03  |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05382666364312172
Validation loss = 0.03751451149582863
Validation loss = 0.03813521936535835
Validation loss = 0.03859489783644676
Validation loss = 0.03726748377084732
Validation loss = 0.03758898749947548
Validation loss = 0.0403197780251503
Validation loss = 0.0368555411696434
Validation loss = 0.03714132308959961
Validation loss = 0.0413593091070652
Validation loss = 0.03592321649193764
Validation loss = 0.039330366998910904
Validation loss = 0.03606099262833595
Validation loss = 0.03679957613348961
Validation loss = 0.04696672782301903
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05049353465437889
Validation loss = 0.0361577607691288
Validation loss = 0.03782045841217041
Validation loss = 0.03902851790189743
Validation loss = 0.036085523664951324
Validation loss = 0.03782857581973076
Validation loss = 0.03684515878558159
Validation loss = 0.03507465869188309
Validation loss = 0.036726780235767365
Validation loss = 0.036928292363882065
Validation loss = 0.03423396870493889
Validation loss = 0.03665497153997421
Validation loss = 0.05393821373581886
Validation loss = 0.03397130221128464
Validation loss = 0.03516511619091034
Validation loss = 0.04533114656805992
Validation loss = 0.0350632444024086
Validation loss = 0.03471313416957855
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05785675719380379
Validation loss = 0.03845665231347084
Validation loss = 0.040720947086811066
Validation loss = 0.04625149816274643
Validation loss = 0.038925688713788986
Validation loss = 0.03793514892458916
Validation loss = 0.04176817089319229
Validation loss = 0.038856521248817444
Validation loss = 0.03834990784525871
Validation loss = 0.03952382132411003
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0526205450296402
Validation loss = 0.03705219179391861
Validation loss = 0.03698979690670967
Validation loss = 0.04065024480223656
Validation loss = 0.03722080588340759
Validation loss = 0.03826621547341347
Validation loss = 0.03854461759328842
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05443963780999184
Validation loss = 0.03526588901877403
Validation loss = 0.03674260526895523
Validation loss = 0.03961779549717903
Validation loss = 0.03569452464580536
Validation loss = 0.03561472147703171
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.8e+03  |
| Iteration     | 25       |
| MaximumReturn | 2.18e+03 |
| MinimumReturn | 959      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04581779986619949
Validation loss = 0.03476456552743912
Validation loss = 0.03412424772977829
Validation loss = 0.03899116441607475
Validation loss = 0.03450562059879303
Validation loss = 0.0343603752553463
Validation loss = 0.033070605248212814
Validation loss = 0.042914338409900665
Validation loss = 0.0341540165245533
Validation loss = 0.03497280925512314
Validation loss = 0.03506605327129364
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04274741932749748
Validation loss = 0.0335937961935997
Validation loss = 0.034105949103832245
Validation loss = 0.03536388650536537
Validation loss = 0.035426829010248184
Validation loss = 0.033468399196863174
Validation loss = 0.03717821463942528
Validation loss = 0.04974294453859329
Validation loss = 0.033631838858127594
Validation loss = 0.03346620127558708
Validation loss = 0.050312869250774384
Validation loss = 0.032481152564287186
Validation loss = 0.03290729969739914
Validation loss = 0.03529258444905281
Validation loss = 0.03309791907668114
Validation loss = 0.0338001511991024
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04744984209537506
Validation loss = 0.036641769111156464
Validation loss = 0.03847859054803848
Validation loss = 0.04510979726910591
Validation loss = 0.035160474479198456
Validation loss = 0.03652079403400421
Validation loss = 0.03973791003227234
Validation loss = 0.03520258888602257
Validation loss = 0.03651811182498932
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05002058297395706
Validation loss = 0.03568894416093826
Validation loss = 0.03462851420044899
Validation loss = 0.041935067623853683
Validation loss = 0.034863729029893875
Validation loss = 0.034206412732601166
Validation loss = 0.03801159933209419
Validation loss = 0.04046029970049858
Validation loss = 0.03439764678478241
Validation loss = 0.03625347092747688
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04487581178545952
Validation loss = 0.033816903829574585
Validation loss = 0.03316117078065872
Validation loss = 0.04351866617798805
Validation loss = 0.03493095189332962
Validation loss = 0.03277502954006195
Validation loss = 0.034786492586135864
Validation loss = 0.03795652464032173
Validation loss = 0.03348754346370697
Validation loss = 0.03160254284739494
Validation loss = 0.035114314407110214
Validation loss = 0.03251621127128601
Validation loss = 0.03171757236123085
Validation loss = 0.032341863960027695
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.78e+03 |
| Iteration     | 26       |
| MaximumReturn | 2.63e+03 |
| MinimumReturn | -98.9    |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04901845008134842
Validation loss = 0.03328525647521019
Validation loss = 0.03218809515237808
Validation loss = 0.03555091843008995
Validation loss = 0.034865740686655045
Validation loss = 0.03319566324353218
Validation loss = 0.03315036743879318
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.040124740451574326
Validation loss = 0.03263348713517189
Validation loss = 0.0333840437233448
Validation loss = 0.03303142264485359
Validation loss = 0.03152598813176155
Validation loss = 0.034676652401685715
Validation loss = 0.032840531319379807
Validation loss = 0.031883690506219864
Validation loss = 0.0381757915019989
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0560380294919014
Validation loss = 0.034859176725149155
Validation loss = 0.035889726132154465
Validation loss = 0.039249811321496964
Validation loss = 0.03616635128855705
Validation loss = 0.03431687504053116
Validation loss = 0.04168606176972389
Validation loss = 0.03670233488082886
Validation loss = 0.0355052724480629
Validation loss = 0.03759826347231865
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.048298776149749756
Validation loss = 0.034152112901210785
Validation loss = 0.03432851657271385
Validation loss = 0.038754142820835114
Validation loss = 0.03403358533978462
Validation loss = 0.03367822989821434
Validation loss = 0.03592611104249954
Validation loss = 0.03314734250307083
Validation loss = 0.0337807834148407
Validation loss = 0.03594401106238365
Validation loss = 0.03315494582056999
Validation loss = 0.03741282597184181
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.049242932349443436
Validation loss = 0.032477010041475296
Validation loss = 0.031841229647397995
Validation loss = 0.034132469445466995
Validation loss = 0.0314728282392025
Validation loss = 0.03315228968858719
Validation loss = 0.03331425040960312
Validation loss = 0.032854046672582626
Validation loss = 0.03859188035130501
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.46e+03 |
| Iteration     | 27       |
| MaximumReturn | 2.11e+03 |
| MinimumReturn | 992      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.037924401462078094
Validation loss = 0.03186284750699997
Validation loss = 0.031313683837652206
Validation loss = 0.04066988453269005
Validation loss = 0.03247395530343056
Validation loss = 0.0326169952750206
Validation loss = 0.034336697310209274
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.041979555040597916
Validation loss = 0.03194480761885643
Validation loss = 0.03094572201371193
Validation loss = 0.0339486338198185
Validation loss = 0.0336681604385376
Validation loss = 0.030801400542259216
Validation loss = 0.03303493186831474
Validation loss = 0.03014897182583809
Validation loss = 0.030609169974923134
Validation loss = 0.03519516810774803
Validation loss = 0.030054008588194847
Validation loss = 0.03142950311303139
Validation loss = 0.036058586090803146
Validation loss = 0.029686765745282173
Validation loss = 0.02972177229821682
Validation loss = 0.03231680393218994
Validation loss = 0.03146234527230263
Validation loss = 0.03235718235373497
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04141363501548767
Validation loss = 0.03435472771525383
Validation loss = 0.035050202161073685
Validation loss = 0.036458779126405716
Validation loss = 0.0339704193174839
Validation loss = 0.033774200826883316
Validation loss = 0.034273356199264526
Validation loss = 0.03310013562440872
Validation loss = 0.03660396486520767
Validation loss = 0.03339646756649017
Validation loss = 0.032914333045482635
Validation loss = 0.035453472286462784
Validation loss = 0.03520521894097328
Validation loss = 0.03543562442064285
Validation loss = 0.03372011333703995
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.040311552584171295
Validation loss = 0.03213448449969292
Validation loss = 0.03338087350130081
Validation loss = 0.032503411173820496
Validation loss = 0.039944104850292206
Validation loss = 0.03207317739725113
Validation loss = 0.03175165131688118
Validation loss = 0.0388987734913826
Validation loss = 0.031211374327540398
Validation loss = 0.031546127051115036
Validation loss = 0.036706648766994476
Validation loss = 0.03105926886200905
Validation loss = 0.03402804955840111
Validation loss = 0.032578304409980774
Validation loss = 0.03068922832608223
Validation loss = 0.038921669125556946
Validation loss = 0.03045652061700821
Validation loss = 0.030842425301671028
Validation loss = 0.036640893667936325
Validation loss = 0.03067990206182003
Validation loss = 0.031114084646105766
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03836525231599808
Validation loss = 0.029685217887163162
Validation loss = 0.030862562358379364
Validation loss = 0.03345770016312599
Validation loss = 0.03064853325486183
Validation loss = 0.03830023482441902
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.78e+03 |
| Iteration     | 28       |
| MaximumReturn | 2.41e+03 |
| MinimumReturn | 1.08e+03 |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04060687497258186
Validation loss = 0.031912222504615784
Validation loss = 0.03084481507539749
Validation loss = 0.03616255521774292
Validation loss = 0.029765112325549126
Validation loss = 0.03214221075177193
Validation loss = 0.03110525757074356
Validation loss = 0.03188048303127289
Validation loss = 0.029997626319527626
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.035705406218767166
Validation loss = 0.02894187532365322
Validation loss = 0.029254881665110588
Validation loss = 0.030436841771006584
Validation loss = 0.03391251340508461
Validation loss = 0.02967321313917637
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04851881414651871
Validation loss = 0.03170578554272652
Validation loss = 0.032862477004528046
Validation loss = 0.035691261291503906
Validation loss = 0.03080529347062111
Validation loss = 0.03234317526221275
Validation loss = 0.032651763409376144
Validation loss = 0.031152265146374702
Validation loss = 0.03770316764712334
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03922082856297493
Validation loss = 0.029878875240683556
Validation loss = 0.032363489270210266
Validation loss = 0.03341219946742058
Validation loss = 0.03154157102108002
Validation loss = 0.030331121757626534
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03921112045645714
Validation loss = 0.0291920006275177
Validation loss = 0.029777271673083305
Validation loss = 0.02892696112394333
Validation loss = 0.038451772183179855
Validation loss = 0.02817564457654953
Validation loss = 0.03006751276552677
Validation loss = 0.028618426993489265
Validation loss = 0.033351797610521317
Validation loss = 0.029797298833727837
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.41e+03 |
| Iteration     | 29       |
| MaximumReturn | 2.33e+03 |
| MinimumReturn | -320     |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.038777463138103485
Validation loss = 0.029107313603162766
Validation loss = 0.03374345600605011
Validation loss = 0.029373323544859886
Validation loss = 0.030904244631528854
Validation loss = 0.03209130838513374
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03682655841112137
Validation loss = 0.02954413741827011
Validation loss = 0.030825911089777946
Validation loss = 0.029630208387970924
Validation loss = 0.029127102345228195
Validation loss = 0.03314211964607239
Validation loss = 0.02986372821033001
Validation loss = 0.02816120535135269
Validation loss = 0.03165985643863678
Validation loss = 0.032221727073192596
Validation loss = 0.028759542852640152
Validation loss = 0.0376826710999012
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.045971669256687164
Validation loss = 0.03251410275697708
Validation loss = 0.0327894352376461
Validation loss = 0.03701683506369591
Validation loss = 0.03114600107073784
Validation loss = 0.030276568606495857
Validation loss = 0.031600575894117355
Validation loss = 0.03288741782307625
Validation loss = 0.03291755169630051
Validation loss = 0.03871696814894676
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04525049775838852
Validation loss = 0.029996734112501144
Validation loss = 0.03043649159371853
Validation loss = 0.031216850504279137
Validation loss = 0.02918064594268799
Validation loss = 0.03480365127325058
Validation loss = 0.02816076949238777
Validation loss = 0.029735280200839043
Validation loss = 0.03381088003516197
Validation loss = 0.028520671650767326
Validation loss = 0.028914306312799454
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03873191028833389
Validation loss = 0.02862216904759407
Validation loss = 0.02807137556374073
Validation loss = 0.027927439659833908
Validation loss = 0.030275288969278336
Validation loss = 0.027796581387519836
Validation loss = 0.03494108468294144
Validation loss = 0.027899406850337982
Validation loss = 0.034924302250146866
Validation loss = 0.027267199009656906
Validation loss = 0.026414068415760994
Validation loss = 0.03056108020246029
Validation loss = 0.026883456856012344
Validation loss = 0.03631516173481941
Validation loss = 0.026244424283504486
Validation loss = 0.028142400085926056
Validation loss = 0.026687178760766983
Validation loss = 0.02807161584496498
Validation loss = 0.0281510092318058
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.01e+03 |
| Iteration     | 30       |
| MaximumReturn | 2.25e+03 |
| MinimumReturn | 1.54e+03 |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03691587224602699
Validation loss = 0.029943257570266724
Validation loss = 0.0284566693007946
Validation loss = 0.0311429463326931
Validation loss = 0.02799924463033676
Validation loss = 0.029333429411053658
Validation loss = 0.031242288649082184
Validation loss = 0.028750067576766014
Validation loss = 0.03344696760177612
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.037412092089653015
Validation loss = 0.027911745011806488
Validation loss = 0.030321214348077774
Validation loss = 0.03000051900744438
Validation loss = 0.027192410081624985
Validation loss = 0.028519276529550552
Validation loss = 0.03244743496179581
Validation loss = 0.026715325191617012
Validation loss = 0.03119037300348282
Validation loss = 0.027927037328481674
Validation loss = 0.027197619900107384
Validation loss = 0.030260128900408745
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03707814961671829
Validation loss = 0.029844988137483597
Validation loss = 0.03135816380381584
Validation loss = 0.029914487153291702
Validation loss = 0.03662288933992386
Validation loss = 0.029548432677984238
Validation loss = 0.032407283782958984
Validation loss = 0.02968486025929451
Validation loss = 0.030520770698785782
Validation loss = 0.03302597254514694
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.038224682211875916
Validation loss = 0.028329622000455856
Validation loss = 0.029494397342205048
Validation loss = 0.028650615364313126
Validation loss = 0.027865905314683914
Validation loss = 0.031846653670072556
Validation loss = 0.027873435989022255
Validation loss = 0.029659811407327652
Validation loss = 0.02886142209172249
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.035252176225185394
Validation loss = 0.027707090601325035
Validation loss = 0.026054348796606064
Validation loss = 0.0358714684844017
Validation loss = 0.029264945536851883
Validation loss = 0.025293931365013123
Validation loss = 0.03458032384514809
Validation loss = 0.026511535048484802
Validation loss = 0.026640720665454865
Validation loss = 0.02622140198945999
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.54e+03 |
| Iteration     | 31       |
| MaximumReturn | 2.45e+03 |
| MinimumReturn | -142     |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04530010744929314
Validation loss = 0.026701711118221283
Validation loss = 0.027434086427092552
Validation loss = 0.028416959568858147
Validation loss = 0.028152156621217728
Validation loss = 0.028730982914566994
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03695465251803398
Validation loss = 0.028114255517721176
Validation loss = 0.02828763797879219
Validation loss = 0.0271637961268425
Validation loss = 0.0272347554564476
Validation loss = 0.02915501780807972
Validation loss = 0.025949347764253616
Validation loss = 0.027137812227010727
Validation loss = 0.029276009649038315
Validation loss = 0.025879019871354103
Validation loss = 0.029442112892866135
Validation loss = 0.027436133474111557
Validation loss = 0.02712499536573887
Validation loss = 0.026896728202700615
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03625769540667534
Validation loss = 0.02966555953025818
Validation loss = 0.02836785651743412
Validation loss = 0.02893761917948723
Validation loss = 0.027960695326328278
Validation loss = 0.03043336234986782
Validation loss = 0.028440719470381737
Validation loss = 0.030689571052789688
Validation loss = 0.029554083943367004
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0327126570045948
Validation loss = 0.026748059317469597
Validation loss = 0.027661975473165512
Validation loss = 0.029066018760204315
Validation loss = 0.027918800711631775
Validation loss = 0.02783273160457611
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03192635998129845
Validation loss = 0.025645680725574493
Validation loss = 0.025287197902798653
Validation loss = 0.026403799653053284
Validation loss = 0.02627922035753727
Validation loss = 0.03125517815351486
Validation loss = 0.024626536294817924
Validation loss = 0.025978896766901016
Validation loss = 0.03185100480914116
Validation loss = 0.024609828367829323
Validation loss = 0.02604036033153534
Validation loss = 0.02994750812649727
Validation loss = 0.024809231981635094
Validation loss = 0.025336328893899918
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.36e+03 |
| Iteration     | 32       |
| MaximumReturn | 2.57e+03 |
| MinimumReturn | -253     |
| TotalSamples  | 136000   |
----------------------------
