Logging to experiments/half_cheetah/test-exp-dir-2/test-exp2_seed4321
Print configuration .....
{'max_val_data': 100000, 'dynamics': {'kfac_params': {'damping': 0.001, 'cov_ema_decay': 0.99, 'momentum': 0.9, 'learning_rate': 0.1, 'kl_clip': 0.0001}, 'intrinsic_reward_only': False, 'enable_particle_ensemble': True, 'external_reward_evaluation_interval': 5, 'mode': 'random', 'batch_size': 1000, 'ensemble_model_count': 5, 'particles': 5, 'activation': 'relu', 'n_layers': 4, 'val': True, 'ensemble': True, 'hidden_size': 1000, 'intrinsic_reward_coeff': 1.0, 'learning_rate': 0.001, 'epochs': 200, 'ita': 1.0, 'pre_training': {'policy_itr': 20, 'mode': 'intrinsic_reward', 'itr': 0}, 'model': 'nn', 'obs_var': 1.0}, 'random_seeds': [4321, 2314, 2341, 3421], 'max_train_data': 200000, 'env_horizon': 1000, 'num_path_random': 6, 'discard_ratio': 0.0, 'start_onpol_iter': 0, 'num_path_onpol': 6, 'onpol_iters': 33, 'env_name': 'half_cheetah', 'trpo': {'gae': 0.95, 'batch_size': 50000, 'iterations': 40, 'step_size': 0.01, 'horizon': 1000, 'gamma': 0.99}, 'algo': 'trpo', 'policy': {'init_logstd': 0.0, 'reinitialize_every_itr': False, 'activation': 'tanh', 'network_shape': [32, 32]}, 'trpo_ext_reward': {'gae': 0.95, 'batch_size': 50000, 'iterations': 20, 'step_size': 0.01, 'horizon': 1000, 'gamma': 0.99}, 'save_variables': False, 'restore_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5107637643814087
Validation loss = 0.15970544517040253
Validation loss = 0.10387889295816422
Validation loss = 0.08876177668571472
Validation loss = 0.08171554654836655
Validation loss = 0.07949967682361603
Validation loss = 0.08579397946596146
Validation loss = 0.07718385756015778
Validation loss = 0.08557715266942978
Validation loss = 0.07369892299175262
Validation loss = 0.0728893131017685
Validation loss = 0.09228740632534027
Validation loss = 0.07361086457967758
Validation loss = 0.0745411068201065
Validation loss = 0.07505189627408981
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4818311929702759
Validation loss = 0.15975725650787354
Validation loss = 0.1021277904510498
Validation loss = 0.09313847124576569
Validation loss = 0.08422822505235672
Validation loss = 0.08297006785869598
Validation loss = 0.08086243271827698
Validation loss = 0.0865817591547966
Validation loss = 0.0765712782740593
Validation loss = 0.07930044829845428
Validation loss = 0.07488730549812317
Validation loss = 0.07521327584981918
Validation loss = 0.10478528589010239
Validation loss = 0.07284938544034958
Validation loss = 0.07277309894561768
Validation loss = 0.0867486298084259
Validation loss = 0.08103714883327484
Validation loss = 0.07086478918790817
Validation loss = 0.06913043558597565
Validation loss = 0.06891968846321106
Validation loss = 0.07746118307113647
Validation loss = 0.06938076764345169
Validation loss = 0.0675668865442276
Validation loss = 0.0695149302482605
Validation loss = 0.06919331848621368
Validation loss = 0.07215134799480438
Validation loss = 0.06870162487030029
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4541664719581604
Validation loss = 0.17099148035049438
Validation loss = 0.10978841781616211
Validation loss = 0.09368157386779785
Validation loss = 0.0854366272687912
Validation loss = 0.08166711032390594
Validation loss = 0.0834464356303215
Validation loss = 0.08401800692081451
Validation loss = 0.10684438794851303
Validation loss = 0.08003592491149902
Validation loss = 0.07924851775169373
Validation loss = 0.07172305881977081
Validation loss = 0.07233482599258423
Validation loss = 0.07766935229301453
Validation loss = 0.07645314931869507
Validation loss = 0.07213135063648224
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.44219639897346497
Validation loss = 0.1640128344297409
Validation loss = 0.10364080965518951
Validation loss = 0.08855313807725906
Validation loss = 0.08624545484781265
Validation loss = 0.07992544025182724
Validation loss = 0.09182173013687134
Validation loss = 0.08036049455404282
Validation loss = 0.0740228146314621
Validation loss = 0.07494711875915527
Validation loss = 0.08069903403520584
Validation loss = 0.08108198642730713
Validation loss = 0.07735496014356613
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3335687518119812
Validation loss = 0.17256462574005127
Validation loss = 0.10512526333332062
Validation loss = 0.08998435735702515
Validation loss = 0.08942630887031555
Validation loss = 0.08087402582168579
Validation loss = 0.07749330997467041
Validation loss = 0.09130135923624039
Validation loss = 0.07913036644458771
Validation loss = 0.08848616480827332
Validation loss = 0.0768033042550087
Validation loss = 0.07223746180534363
Validation loss = 0.07125206291675568
Validation loss = 0.07515691965818405
Validation loss = 0.06979164481163025
Validation loss = 0.07174016535282135
Validation loss = 0.07013518363237381
Validation loss = 0.07087381184101105
Validation loss = 0.07443908601999283
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.4    |
| Iteration     | 0        |
| MaximumReturn | 44.1     |
| MinimumReturn | -90.5    |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12394176423549652
Validation loss = 0.07739342749118805
Validation loss = 0.07537556439638138
Validation loss = 0.06676182895898819
Validation loss = 0.06495222449302673
Validation loss = 0.06596201658248901
Validation loss = 0.06359332799911499
Validation loss = 0.06167125701904297
Validation loss = 0.06300405412912369
Validation loss = 0.058524034917354584
Validation loss = 0.06046485900878906
Validation loss = 0.060032185167074203
Validation loss = 0.059545766562223434
Validation loss = 0.06227412819862366
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11863968521356583
Validation loss = 0.0734918937087059
Validation loss = 0.07036823034286499
Validation loss = 0.06439715623855591
Validation loss = 0.06714096665382385
Validation loss = 0.060791853815317154
Validation loss = 0.06022082269191742
Validation loss = 0.09286852180957794
Validation loss = 0.05863778293132782
Validation loss = 0.06003710627555847
Validation loss = 0.06680317968130112
Validation loss = 0.056485239416360855
Validation loss = 0.05866605043411255
Validation loss = 0.05817276984453201
Validation loss = 0.05718227103352547
Validation loss = 0.05830978602170944
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12519291043281555
Validation loss = 0.08105171471834183
Validation loss = 0.06950322538614273
Validation loss = 0.06956874579191208
Validation loss = 0.0651891902089119
Validation loss = 0.0664144977927208
Validation loss = 0.06373541802167892
Validation loss = 0.06415045261383057
Validation loss = 0.06857778131961823
Validation loss = 0.06947357207536697
Validation loss = 0.0631246417760849
Validation loss = 0.06066522002220154
Validation loss = 0.057599492371082306
Validation loss = 0.05954422801733017
Validation loss = 0.0581984743475914
Validation loss = 0.056291162967681885
Validation loss = 0.06071771681308746
Validation loss = 0.055632151663303375
Validation loss = 0.05955541506409645
Validation loss = 0.055392682552337646
Validation loss = 0.05682053044438362
Validation loss = 0.05605664476752281
Validation loss = 0.05538182705640793
Validation loss = 0.06785515695810318
Validation loss = 0.05626898258924484
Validation loss = 0.05578622221946716
Validation loss = 0.05423305183649063
Validation loss = 0.05425919592380524
Validation loss = 0.05325343832373619
Validation loss = 0.05660013109445572
Validation loss = 0.05700080096721649
Validation loss = 0.05504361167550087
Validation loss = 0.053551215678453445
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12008943408727646
Validation loss = 0.07422307133674622
Validation loss = 0.07204893231391907
Validation loss = 0.07018665969371796
Validation loss = 0.06254266947507858
Validation loss = 0.06284808367490768
Validation loss = 0.06314397603273392
Validation loss = 0.06486383825540543
Validation loss = 0.06883067637681961
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1163160502910614
Validation loss = 0.07557424157857895
Validation loss = 0.07000267505645752
Validation loss = 0.07592037320137024
Validation loss = 0.06285074353218079
Validation loss = 0.06761818379163742
Validation loss = 0.06545814871788025
Validation loss = 0.05864974856376648
Validation loss = 0.060107968747615814
Validation loss = 0.06183146685361862
Validation loss = 0.060708288103342056
Validation loss = 0.06181471049785614
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 453      |
| Iteration     | 1        |
| MaximumReturn | 577      |
| MinimumReturn | 225      |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05697670206427574
Validation loss = 0.047286853194236755
Validation loss = 0.04419022426009178
Validation loss = 0.04526035487651825
Validation loss = 0.04622951149940491
Validation loss = 0.04288576543331146
Validation loss = 0.043714869767427444
Validation loss = 0.04056026414036751
Validation loss = 0.04066741093993187
Validation loss = 0.03923773393034935
Validation loss = 0.039029646664857864
Validation loss = 0.03886151313781738
Validation loss = 0.037507470697164536
Validation loss = 0.03851223364472389
Validation loss = 0.03717692568898201
Validation loss = 0.03614424914121628
Validation loss = 0.03641647472977638
Validation loss = 0.04025454819202423
Validation loss = 0.03580212965607643
Validation loss = 0.03751467540860176
Validation loss = 0.03590228036046028
Validation loss = 0.03558504208922386
Validation loss = 0.036627307534217834
Validation loss = 0.03637916222214699
Validation loss = 0.0338730551302433
Validation loss = 0.040988851338624954
Validation loss = 0.036360934376716614
Validation loss = 0.03720890358090401
Validation loss = 0.03639259934425354
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.060322489589452744
Validation loss = 0.04595149680972099
Validation loss = 0.04268376901745796
Validation loss = 0.042485203593969345
Validation loss = 0.03995261341333389
Validation loss = 0.03824375942349434
Validation loss = 0.03863970562815666
Validation loss = 0.040710583329200745
Validation loss = 0.038135893642902374
Validation loss = 0.039152707904577255
Validation loss = 0.03821742162108421
Validation loss = 0.04049478843808174
Validation loss = 0.0408281609416008
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.061725541949272156
Validation loss = 0.0435941182076931
Validation loss = 0.04550822079181671
Validation loss = 0.042494311928749084
Validation loss = 0.043038297444581985
Validation loss = 0.03930380567908287
Validation loss = 0.039612073451280594
Validation loss = 0.041710395365953445
Validation loss = 0.03999510407447815
Validation loss = 0.03933821618556976
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.061260923743247986
Validation loss = 0.04860226437449455
Validation loss = 0.046973902732133865
Validation loss = 0.043912772089242935
Validation loss = 0.043199971318244934
Validation loss = 0.041244637221097946
Validation loss = 0.04327957704663277
Validation loss = 0.043025463819503784
Validation loss = 0.04102695360779762
Validation loss = 0.040041182190179825
Validation loss = 0.03853486478328705
Validation loss = 0.03985424339771271
Validation loss = 0.03739958629012108
Validation loss = 0.0388018898665905
Validation loss = 0.03587786853313446
Validation loss = 0.03818376362323761
Validation loss = 0.038057345896959305
Validation loss = 0.037603531032800674
Validation loss = 0.04383296146988869
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06207275390625
Validation loss = 0.04652763530611992
Validation loss = 0.04541192948818207
Validation loss = 0.04306390881538391
Validation loss = 0.049694206565618515
Validation loss = 0.049917567521333694
Validation loss = 0.040968991816043854
Validation loss = 0.0446903295814991
Validation loss = 0.04037030413746834
Validation loss = 0.042367834597826004
Validation loss = 0.03902206942439079
Validation loss = 0.03789328411221504
Validation loss = 0.0370124913752079
Validation loss = 0.03918152675032616
Validation loss = 0.037154678255319595
Validation loss = 0.041408199816942215
Validation loss = 0.035972900688648224
Validation loss = 0.03732868656516075
Validation loss = 0.036083538085222244
Validation loss = 0.03615821525454521
Validation loss = 0.03670527786016464
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 443      |
| Iteration     | 2        |
| MaximumReturn | 504      |
| MinimumReturn | 406      |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04405888915061951
Validation loss = 0.031206881627440453
Validation loss = 0.030416809022426605
Validation loss = 0.030535001307725906
Validation loss = 0.02757316082715988
Validation loss = 0.026516003534197807
Validation loss = 0.030890006572008133
Validation loss = 0.028329579159617424
Validation loss = 0.028894010931253433
Validation loss = 0.02618023380637169
Validation loss = 0.028377991169691086
Validation loss = 0.027868643403053284
Validation loss = 0.02710290253162384
Validation loss = 0.02671751007437706
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04264882951974869
Validation loss = 0.032771218568086624
Validation loss = 0.03092252090573311
Validation loss = 0.038797251880168915
Validation loss = 0.02995641715824604
Validation loss = 0.0289856418967247
Validation loss = 0.02863539382815361
Validation loss = 0.030339263379573822
Validation loss = 0.026743685826659203
Validation loss = 0.029116785153746605
Validation loss = 0.027289170771837234
Validation loss = 0.027599280700087547
Validation loss = 0.02835151180624962
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04386312887072563
Validation loss = 0.032785382121801376
Validation loss = 0.03245494142174721
Validation loss = 0.03352433443069458
Validation loss = 0.03400147706270218
Validation loss = 0.03045760467648506
Validation loss = 0.030222319066524506
Validation loss = 0.02928624488413334
Validation loss = 0.02795279026031494
Validation loss = 0.02839352749288082
Validation loss = 0.03104306571185589
Validation loss = 0.03080020286142826
Validation loss = 0.02781156450510025
Validation loss = 0.03174653649330139
Validation loss = 0.030069300904870033
Validation loss = 0.02858794294297695
Validation loss = 0.028059493750333786
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04375412315130234
Validation loss = 0.032281674444675446
Validation loss = 0.030955715104937553
Validation loss = 0.029613085091114044
Validation loss = 0.02854405716061592
Validation loss = 0.03121884912252426
Validation loss = 0.029344454407691956
Validation loss = 0.02717646211385727
Validation loss = 0.028398342430591583
Validation loss = 0.026886465027928352
Validation loss = 0.027453463524580002
Validation loss = 0.029418479651212692
Validation loss = 0.029272612184286118
Validation loss = 0.032905131578445435
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04482162371277809
Validation loss = 0.03314682096242905
Validation loss = 0.030901163816452026
Validation loss = 0.03126274049282074
Validation loss = 0.030550163239240646
Validation loss = 0.028938908129930496
Validation loss = 0.02783907577395439
Validation loss = 0.028153859078884125
Validation loss = 0.02594982460141182
Validation loss = 0.027383804321289062
Validation loss = 0.027285784482955933
Validation loss = 0.02631361410021782
Validation loss = 0.02736944705247879
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 600      |
| Iteration     | 3        |
| MaximumReturn | 653      |
| MinimumReturn | 495      |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03129849582910538
Validation loss = 0.0236229095607996
Validation loss = 0.023565003648400307
Validation loss = 0.024376586079597473
Validation loss = 0.02303047478199005
Validation loss = 0.021210486069321632
Validation loss = 0.021201182156801224
Validation loss = 0.021829640492796898
Validation loss = 0.024415085092186928
Validation loss = 0.022059261798858643
Validation loss = 0.021566083654761314
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028714075684547424
Validation loss = 0.025068718940019608
Validation loss = 0.02712755836546421
Validation loss = 0.024454433470964432
Validation loss = 0.02287408895790577
Validation loss = 0.022987913340330124
Validation loss = 0.022778363898396492
Validation loss = 0.02182784304022789
Validation loss = 0.024941015988588333
Validation loss = 0.021227775141596794
Validation loss = 0.02182323858141899
Validation loss = 0.02233993075788021
Validation loss = 0.021728014573454857
Validation loss = 0.023629851639270782
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031469643115997314
Validation loss = 0.02393418923020363
Validation loss = 0.025373782962560654
Validation loss = 0.022848475724458694
Validation loss = 0.022621402516961098
Validation loss = 0.023963067680597305
Validation loss = 0.026432117447257042
Validation loss = 0.022707561030983925
Validation loss = 0.021791836246848106
Validation loss = 0.028579875826835632
Validation loss = 0.022741738706827164
Validation loss = 0.02341679111123085
Validation loss = 0.021606730297207832
Validation loss = 0.022782668471336365
Validation loss = 0.021934006363153458
Validation loss = 0.022330481559038162
Validation loss = 0.0209750235080719
Validation loss = 0.02199382707476616
Validation loss = 0.021569106727838516
Validation loss = 0.02211056277155876
Validation loss = 0.020538385957479477
Validation loss = 0.020432356745004654
Validation loss = 0.02175011672079563
Validation loss = 0.021195121109485626
Validation loss = 0.020118076354265213
Validation loss = 0.020685037598013878
Validation loss = 0.020633231848478317
Validation loss = 0.021110977977514267
Validation loss = 0.02140616811811924
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030672749504446983
Validation loss = 0.02462194673717022
Validation loss = 0.023450708016753197
Validation loss = 0.023004112765192986
Validation loss = 0.02222578413784504
Validation loss = 0.022348303347826004
Validation loss = 0.022736530750989914
Validation loss = 0.02190367877483368
Validation loss = 0.02180745266377926
Validation loss = 0.021916721016168594
Validation loss = 0.021477101370692253
Validation loss = 0.021869976073503494
Validation loss = 0.023282673209905624
Validation loss = 0.022046878933906555
Validation loss = 0.021915588527917862
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03198770433664322
Validation loss = 0.02395138517022133
Validation loss = 0.024093851447105408
Validation loss = 0.023252198472619057
Validation loss = 0.022929664701223373
Validation loss = 0.023520436137914658
Validation loss = 0.02225547656416893
Validation loss = 0.02163541316986084
Validation loss = 0.02131793089210987
Validation loss = 0.022574668750166893
Validation loss = 0.021968424320220947
Validation loss = 0.02157418802380562
Validation loss = 0.025745635852217674
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 678      |
| Iteration     | 4        |
| MaximumReturn | 744      |
| MinimumReturn | 592      |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026206471025943756
Validation loss = 0.01960725337266922
Validation loss = 0.01931036449968815
Validation loss = 0.019620491191744804
Validation loss = 0.019396111369132996
Validation loss = 0.01859516091644764
Validation loss = 0.020212212577462196
Validation loss = 0.01912197656929493
Validation loss = 0.019445544108748436
Validation loss = 0.019085286185145378
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023581339046359062
Validation loss = 0.020037537440657616
Validation loss = 0.02114836685359478
Validation loss = 0.02191082388162613
Validation loss = 0.019010664895176888
Validation loss = 0.02095481939613819
Validation loss = 0.02005474641919136
Validation loss = 0.019325386732816696
Validation loss = 0.018329577520489693
Validation loss = 0.02053036354482174
Validation loss = 0.020059281960129738
Validation loss = 0.019043659791350365
Validation loss = 0.0191179309040308
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024357931688427925
Validation loss = 0.02186928130686283
Validation loss = 0.019173158332705498
Validation loss = 0.018350934609770775
Validation loss = 0.018363188952207565
Validation loss = 0.021744385361671448
Validation loss = 0.018686702474951744
Validation loss = 0.01848912611603737
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024343132972717285
Validation loss = 0.019555887207388878
Validation loss = 0.019961141049861908
Validation loss = 0.022579336538910866
Validation loss = 0.019086748361587524
Validation loss = 0.020260192453861237
Validation loss = 0.019829953089356422
Validation loss = 0.018147345632314682
Validation loss = 0.01831013336777687
Validation loss = 0.017011277377605438
Validation loss = 0.01926489546895027
Validation loss = 0.017495209351181984
Validation loss = 0.018588198348879814
Validation loss = 0.017881501466035843
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022880973294377327
Validation loss = 0.020015668123960495
Validation loss = 0.019452480599284172
Validation loss = 0.019646750763058662
Validation loss = 0.01872030831873417
Validation loss = 0.019564321264624596
Validation loss = 0.018910953775048256
Validation loss = 0.02334129624068737
Validation loss = 0.018323050811886787
Validation loss = 0.020470531657338142
Validation loss = 0.01875532791018486
Validation loss = 0.019386332482099533
Validation loss = 0.01868514157831669
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 884      |
| Iteration     | 5        |
| MaximumReturn | 966      |
| MinimumReturn | 840      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021257299929857254
Validation loss = 0.01684591732919216
Validation loss = 0.01840459369122982
Validation loss = 0.01744440570473671
Validation loss = 0.016010655090212822
Validation loss = 0.01656693033874035
Validation loss = 0.017610250040888786
Validation loss = 0.016503585502505302
Validation loss = 0.016107330098748207
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023979155346751213
Validation loss = 0.017310524359345436
Validation loss = 0.017993178218603134
Validation loss = 0.01763933338224888
Validation loss = 0.016913900151848793
Validation loss = 0.017416207119822502
Validation loss = 0.01616203971207142
Validation loss = 0.01778254099190235
Validation loss = 0.016142308712005615
Validation loss = 0.01768667623400688
Validation loss = 0.01753588393330574
Validation loss = 0.017156019806861877
Validation loss = 0.01605408266186714
Validation loss = 0.017140774056315422
Validation loss = 0.016790268942713737
Validation loss = 0.017081649973988533
Validation loss = 0.01797536574304104
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01974554918706417
Validation loss = 0.016454333439469337
Validation loss = 0.017442522570490837
Validation loss = 0.016494276002049446
Validation loss = 0.016818806529045105
Validation loss = 0.016343161463737488
Validation loss = 0.016900869086384773
Validation loss = 0.017500553280115128
Validation loss = 0.017382562160491943
Validation loss = 0.017237979918718338
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027774253860116005
Validation loss = 0.01661522313952446
Validation loss = 0.01613377407193184
Validation loss = 0.01878264546394348
Validation loss = 0.019041862338781357
Validation loss = 0.017370982095599174
Validation loss = 0.01531769149005413
Validation loss = 0.01683022454380989
Validation loss = 0.01621076464653015
Validation loss = 0.016054948791861534
Validation loss = 0.016381219029426575
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020867781713604927
Validation loss = 0.017193814739584923
Validation loss = 0.016491008922457695
Validation loss = 0.01725536584854126
Validation loss = 0.01800754852592945
Validation loss = 0.016745347529649734
Validation loss = 0.016627011820673943
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 984      |
| Iteration     | 6        |
| MaximumReturn | 1.13e+03 |
| MinimumReturn | 736      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018663471564650536
Validation loss = 0.01682504266500473
Validation loss = 0.015868540853261948
Validation loss = 0.017189186066389084
Validation loss = 0.016420261934399605
Validation loss = 0.015753552317619324
Validation loss = 0.015245448797941208
Validation loss = 0.01587732881307602
Validation loss = 0.018691178411245346
Validation loss = 0.01607048511505127
Validation loss = 0.014973495155572891
Validation loss = 0.015580590814352036
Validation loss = 0.015578153543174267
Validation loss = 0.01594729535281658
Validation loss = 0.01429103035479784
Validation loss = 0.01665917970240116
Validation loss = 0.013998307287693024
Validation loss = 0.01522570289671421
Validation loss = 0.015584541484713554
Validation loss = 0.01444236095994711
Validation loss = 0.014865366742014885
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01972942054271698
Validation loss = 0.017072854563593864
Validation loss = 0.016986405476927757
Validation loss = 0.015236961655318737
Validation loss = 0.01450556330382824
Validation loss = 0.014710823073983192
Validation loss = 0.015723291784524918
Validation loss = 0.014631889760494232
Validation loss = 0.015679221600294113
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01796955056488514
Validation loss = 0.01596789062023163
Validation loss = 0.015200100839138031
Validation loss = 0.016318319365382195
Validation loss = 0.016388539224863052
Validation loss = 0.015039365738630295
Validation loss = 0.015941554680466652
Validation loss = 0.014821295626461506
Validation loss = 0.01473912037909031
Validation loss = 0.0152962701395154
Validation loss = 0.015445608645677567
Validation loss = 0.014841577038168907
Validation loss = 0.01450994610786438
Validation loss = 0.015621215105056763
Validation loss = 0.014439456164836884
Validation loss = 0.01588117703795433
Validation loss = 0.01764606311917305
Validation loss = 0.013894328847527504
Validation loss = 0.014010515064001083
Validation loss = 0.01375311054289341
Validation loss = 0.014831604436039925
Validation loss = 0.01366187259554863
Validation loss = 0.015180890448391438
Validation loss = 0.015396968461573124
Validation loss = 0.014633027836680412
Validation loss = 0.01364860124886036
Validation loss = 0.01350763626396656
Validation loss = 0.013347398489713669
Validation loss = 0.014962151646614075
Validation loss = 0.015083239413797855
Validation loss = 0.014907662756741047
Validation loss = 0.014136075973510742
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017942646518349648
Validation loss = 0.015559906139969826
Validation loss = 0.01715574786067009
Validation loss = 0.01522408239543438
Validation loss = 0.014559386298060417
Validation loss = 0.016155196353793144
Validation loss = 0.01492314599454403
Validation loss = 0.014491155743598938
Validation loss = 0.015820374712347984
Validation loss = 0.014563674107193947
Validation loss = 0.015806924551725388
Validation loss = 0.015042209066450596
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020528439432382584
Validation loss = 0.017210185527801514
Validation loss = 0.015445428900420666
Validation loss = 0.016639526933431625
Validation loss = 0.015903625637292862
Validation loss = 0.015718327835202217
Validation loss = 0.015159940347075462
Validation loss = 0.015558832325041294
Validation loss = 0.015066547319293022
Validation loss = 0.017090559005737305
Validation loss = 0.01625608094036579
Validation loss = 0.01581193134188652
Validation loss = 0.014795059338212013
Validation loss = 0.015136517584323883
Validation loss = 0.015204884111881256
Validation loss = 0.015218078158795834
Validation loss = 0.01495285052806139
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 891      |
| Iteration     | 7        |
| MaximumReturn | 1.2e+03  |
| MinimumReturn | -475     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7793439030647278
Validation loss = 0.8050119280815125
Validation loss = 0.884982705116272
Validation loss = 0.7806040048599243
Validation loss = 0.8073304295539856
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 1.2949304580688477
Validation loss = 1.3434813022613525
Validation loss = 1.2556297779083252
Validation loss = 1.1808667182922363
Validation loss = 1.2397969961166382
Validation loss = 1.34842848777771
Validation loss = 1.273167371749878
Validation loss = 1.1514568328857422
Validation loss = 1.3145387172698975
Validation loss = 1.1540290117263794
Validation loss = 1.3184683322906494
Validation loss = 1.2341971397399902
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.956116259098053
Validation loss = 1.044311761856079
Validation loss = 0.9992197751998901
Validation loss = 1.1469112634658813
Validation loss = 1.137181043624878
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7941553592681885
Validation loss = 0.9085772037506104
Validation loss = 0.8538244962692261
Validation loss = 0.9601248502731323
Validation loss = 0.9049407243728638
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 1.120261549949646
Validation loss = 1.165924072265625
Validation loss = 1.1544063091278076
Validation loss = 1.100683331489563
Validation loss = 1.2135920524597168
Validation loss = 1.1662847995758057
Validation loss = 1.2057133913040161
Validation loss = 1.2572665214538574
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.14e+03 |
| Iteration     | 8        |
| MaximumReturn | 1.18e+03 |
| MinimumReturn | 1.11e+03 |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6525270938873291
Validation loss = 0.7269164323806763
Validation loss = 0.7649164199829102
Validation loss = 0.7778883576393127
Validation loss = 0.7517732977867126
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.9917661547660828
Validation loss = 1.0719125270843506
Validation loss = 1.2190148830413818
Validation loss = 1.2777118682861328
Validation loss = 1.1143063306808472
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.9185699224472046
Validation loss = 0.9769114255905151
Validation loss = 0.9763908386230469
Validation loss = 0.9514137506484985
Validation loss = 0.9803169369697571
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.8158186078071594
Validation loss = 0.7763581871986389
Validation loss = 0.8423202633857727
Validation loss = 0.8275333642959595
Validation loss = 0.876245379447937
Validation loss = 0.9042216539382935
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 1.0777668952941895
Validation loss = 0.9254509806632996
Validation loss = 1.0644645690917969
Validation loss = 1.0725244283676147
Validation loss = 1.0113780498504639
Validation loss = 1.014158010482788
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 904      |
| Iteration     | 9        |
| MaximumReturn | 1.23e+03 |
| MinimumReturn | -365     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02735978178679943
Validation loss = 0.016937952488660812
Validation loss = 0.014947501011192799
Validation loss = 0.015906652435660362
Validation loss = 0.01427975669503212
Validation loss = 0.013422607444226742
Validation loss = 0.013839786872267723
Validation loss = 0.014565806835889816
Validation loss = 0.013249238952994347
Validation loss = 0.012977629899978638
Validation loss = 0.014183887280523777
Validation loss = 0.016036121174693108
Validation loss = 0.013744479045271873
Validation loss = 0.014245610684156418
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03023868054151535
Validation loss = 0.01758222095668316
Validation loss = 0.01731736771762371
Validation loss = 0.014615897089242935
Validation loss = 0.014342657290399075
Validation loss = 0.013773233629763126
Validation loss = 0.01387733593583107
Validation loss = 0.015009579248726368
Validation loss = 0.014383670873939991
Validation loss = 0.014642488211393356
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03028803877532482
Validation loss = 0.015734924003481865
Validation loss = 0.01624259166419506
Validation loss = 0.014224079437553883
Validation loss = 0.014612844213843346
Validation loss = 0.01386001706123352
Validation loss = 0.013464588671922684
Validation loss = 0.013915956020355225
Validation loss = 0.014329582452774048
Validation loss = 0.013223386369645596
Validation loss = 0.013640382327139378
Validation loss = 0.014438045211136341
Validation loss = 0.012549728155136108
Validation loss = 0.014044132083654404
Validation loss = 0.01317662838846445
Validation loss = 0.01259582582861185
Validation loss = 0.012874875217676163
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028311794623732567
Validation loss = 0.018614718690514565
Validation loss = 0.01579945534467697
Validation loss = 0.014675598591566086
Validation loss = 0.014495164155960083
Validation loss = 0.015187564305961132
Validation loss = 0.01475723460316658
Validation loss = 0.015162337571382523
Validation loss = 0.015191805548965931
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03277982771396637
Validation loss = 0.01821395941078663
Validation loss = 0.015444421209394932
Validation loss = 0.016377829015254974
Validation loss = 0.014623223803937435
Validation loss = 0.015017655678093433
Validation loss = 0.01384344045072794
Validation loss = 0.014201518148183823
Validation loss = 0.013957572169601917
Validation loss = 0.015114056877791882
Validation loss = 0.013400985859334469
Validation loss = 0.01397352572530508
Validation loss = 0.013813809491693974
Validation loss = 0.013328209519386292
Validation loss = 0.013807815499603748
Validation loss = 0.013325122185051441
Validation loss = 0.013321847654879093
Validation loss = 0.013573971576988697
Validation loss = 0.014656619168817997
Validation loss = 0.013501252979040146
Validation loss = 0.014178895391523838
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.11e+03 |
| Iteration     | 10       |
| MaximumReturn | 1.15e+03 |
| MinimumReturn | 1.02e+03 |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014400054700672626
Validation loss = 0.01368016004562378
Validation loss = 0.012889992445707321
Validation loss = 0.013483583927154541
Validation loss = 0.013179331086575985
Validation loss = 0.013465247116982937
Validation loss = 0.013118579052388668
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01415761187672615
Validation loss = 0.013171722181141376
Validation loss = 0.014884118922054768
Validation loss = 0.012643811292946339
Validation loss = 0.013251102529466152
Validation loss = 0.014196782372891903
Validation loss = 0.012819220311939716
Validation loss = 0.012545985169708729
Validation loss = 0.013800484128296375
Validation loss = 0.012491832487285137
Validation loss = 0.01250948291271925
Validation loss = 0.011985178105533123
Validation loss = 0.01327887549996376
Validation loss = 0.012530256994068623
Validation loss = 0.012639782391488552
Validation loss = 0.0129978246986866
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015819691121578217
Validation loss = 0.012976132333278656
Validation loss = 0.013444490730762482
Validation loss = 0.01281751412898302
Validation loss = 0.012433484196662903
Validation loss = 0.013354090042412281
Validation loss = 0.013996776193380356
Validation loss = 0.011859687976539135
Validation loss = 0.01267832051962614
Validation loss = 0.012009571306407452
Validation loss = 0.01232206728309393
Validation loss = 0.012426408939063549
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01514766737818718
Validation loss = 0.014166228473186493
Validation loss = 0.012779687531292439
Validation loss = 0.013709690421819687
Validation loss = 0.012943943031132221
Validation loss = 0.014597681351006031
Validation loss = 0.013533306308090687
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014544923789799213
Validation loss = 0.012794450856745243
Validation loss = 0.01317523792386055
Validation loss = 0.012786522507667542
Validation loss = 0.013152729719877243
Validation loss = 0.012555642984807491
Validation loss = 0.012533601373434067
Validation loss = 0.013231095857918262
Validation loss = 0.012162129394710064
Validation loss = 0.012415044009685516
Validation loss = 0.011995941400527954
Validation loss = 0.012847661972045898
Validation loss = 0.012254808098077774
Validation loss = 0.012939505279064178
Validation loss = 0.011701341718435287
Validation loss = 0.012016628868877888
Validation loss = 0.012337066233158112
Validation loss = 0.011951838620007038
Validation loss = 0.011719434522092342
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.13e+03 |
| Iteration     | 11       |
| MaximumReturn | 1.17e+03 |
| MinimumReturn | 1.08e+03 |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013925239443778992
Validation loss = 0.012331601232290268
Validation loss = 0.013074779883027077
Validation loss = 0.012468799017369747
Validation loss = 0.012522607110440731
Validation loss = 0.012831811793148518
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013159518130123615
Validation loss = 0.012537283822894096
Validation loss = 0.013068068772554398
Validation loss = 0.012078359723091125
Validation loss = 0.011625153943896294
Validation loss = 0.01282447949051857
Validation loss = 0.012763125821948051
Validation loss = 0.012061654590070248
Validation loss = 0.011604616418480873
Validation loss = 0.011806407012045383
Validation loss = 0.011668909341096878
Validation loss = 0.013086887076497078
Validation loss = 0.01431613601744175
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012836015783250332
Validation loss = 0.012259598821401596
Validation loss = 0.013197790831327438
Validation loss = 0.01203073188662529
Validation loss = 0.012008683755993843
Validation loss = 0.012179283425211906
Validation loss = 0.012284135445952415
Validation loss = 0.01188605185598135
Validation loss = 0.012751216068863869
Validation loss = 0.012400771491229534
Validation loss = 0.011046478524804115
Validation loss = 0.011671782471239567
Validation loss = 0.011655888520181179
Validation loss = 0.012310798279941082
Validation loss = 0.011482472531497478
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013912731781601906
Validation loss = 0.0130667919293046
Validation loss = 0.01281503401696682
Validation loss = 0.013485884293913841
Validation loss = 0.012522975914180279
Validation loss = 0.012342112138867378
Validation loss = 0.012112811207771301
Validation loss = 0.012393340468406677
Validation loss = 0.012313402257859707
Validation loss = 0.012677793391048908
Validation loss = 0.012643709778785706
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012884681113064289
Validation loss = 0.012471066787838936
Validation loss = 0.011666529811918736
Validation loss = 0.011108161881566048
Validation loss = 0.012383663095533848
Validation loss = 0.011518328450620174
Validation loss = 0.011629064567387104
Validation loss = 0.011873558163642883
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 908      |
| Iteration     | 12       |
| MaximumReturn | 1.29e+03 |
| MinimumReturn | -205     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07168623059988022
Validation loss = 0.050691958516836166
Validation loss = 0.05938682332634926
Validation loss = 0.05413718894124031
Validation loss = 0.055210232734680176
Validation loss = 0.06414584815502167
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03876516595482826
Validation loss = 0.03750808164477348
Validation loss = 0.04392895847558975
Validation loss = 0.04138510301709175
Validation loss = 0.0340336449444294
Validation loss = 0.032812606543302536
Validation loss = 0.03101802058517933
Validation loss = 0.03348482400178909
Validation loss = 0.032195329666137695
Validation loss = 0.030519938096404076
Validation loss = 0.03549494221806526
Validation loss = 0.0484636053442955
Validation loss = 0.044068675488233566
Validation loss = 0.03476199880242348
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03441108390688896
Validation loss = 0.05647319182753563
Validation loss = 0.034190453588962555
Validation loss = 0.046748023480176926
Validation loss = 0.04812546446919441
Validation loss = 0.03256360813975334
Validation loss = 0.03949427604675293
Validation loss = 0.03151412680745125
Validation loss = 0.029259776696562767
Validation loss = 0.03290640190243721
Validation loss = 0.047200415283441544
Validation loss = 0.0349576510488987
Validation loss = 0.03258207440376282
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.061608485877513885
Validation loss = 0.04138273745775223
Validation loss = 0.0450468435883522
Validation loss = 0.05307522043585777
Validation loss = 0.04104190319776535
Validation loss = 0.05384358763694763
Validation loss = 0.044909898191690445
Validation loss = 0.043205104768276215
Validation loss = 0.06162136420607567
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.054401326924562454
Validation loss = 0.04119419306516647
Validation loss = 0.03557968512177467
Validation loss = 0.04755222797393799
Validation loss = 0.045820560306310654
Validation loss = 0.0410570465028286
Validation loss = 0.04195171222090721
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.26e+03 |
| Iteration     | 13       |
| MaximumReturn | 1.36e+03 |
| MinimumReturn | 1.19e+03 |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0376640148460865
Validation loss = 0.06120682135224342
Validation loss = 0.034254658967256546
Validation loss = 0.06728121638298035
Validation loss = 0.06770873814821243
Validation loss = 0.0513397678732872
Validation loss = 0.05198169872164726
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03151581063866615
Validation loss = 0.030716216191649437
Validation loss = 0.030239148065447807
Validation loss = 0.031560879200696945
Validation loss = 0.03726526349782944
Validation loss = 0.035334981977939606
Validation loss = 0.03793241083621979
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03859303891658783
Validation loss = 0.03897872194647789
Validation loss = 0.03896882385015488
Validation loss = 0.028886087238788605
Validation loss = 0.029931044206023216
Validation loss = 0.03239817917346954
Validation loss = 0.02754240483045578
Validation loss = 0.034086622297763824
Validation loss = 0.03829255327582359
Validation loss = 0.03951532393693924
Validation loss = 0.04349049553275108
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07993236929178238
Validation loss = 0.053349558264017105
Validation loss = 0.044587064534425735
Validation loss = 0.039155278354883194
Validation loss = 0.04204229265451431
Validation loss = 0.05319981276988983
Validation loss = 0.05492217093706131
Validation loss = 0.04144205525517464
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04705299809575081
Validation loss = 0.035763222724199295
Validation loss = 0.04918826371431351
Validation loss = 0.05872521921992302
Validation loss = 0.04211344197392464
Validation loss = 0.05684169381856918
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.18e+03 |
| Iteration     | 14       |
| MaximumReturn | 1.21e+03 |
| MinimumReturn | 1.11e+03 |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07141422480344772
Validation loss = 0.06112907826900482
Validation loss = 0.047281425446271896
Validation loss = 0.05873873829841614
Validation loss = 0.05305018275976181
Validation loss = 0.04633039981126785
Validation loss = 0.058830879628658295
Validation loss = 0.04178690165281296
Validation loss = 0.04361598193645477
Validation loss = 0.06430213898420334
Validation loss = 0.05667022988200188
Validation loss = 0.04337361827492714
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04872417077422142
Validation loss = 0.027811167761683464
Validation loss = 0.03289442136883736
Validation loss = 0.03739948570728302
Validation loss = 0.03185904771089554
Validation loss = 0.05302409455180168
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04057617485523224
Validation loss = 0.03809976577758789
Validation loss = 0.03712451830506325
Validation loss = 0.034560248255729675
Validation loss = 0.038783960044384
Validation loss = 0.0315718874335289
Validation loss = 0.028069011867046356
Validation loss = 0.04188711941242218
Validation loss = 0.03732680901885033
Validation loss = 0.04033678397536278
Validation loss = 0.03751853480935097
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04379640519618988
Validation loss = 0.03850182145833969
Validation loss = 0.041700102388858795
Validation loss = 0.04405777156352997
Validation loss = 0.05559907481074333
Validation loss = 0.03627489134669304
Validation loss = 0.06239357218146324
Validation loss = 0.041798774152994156
Validation loss = 0.038449328392744064
Validation loss = 0.040418192744255066
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05376944690942764
Validation loss = 0.040769290179014206
Validation loss = 0.03563942015171051
Validation loss = 0.035395547747612
Validation loss = 0.04037782922387123
Validation loss = 0.0526970736682415
Validation loss = 0.03672397509217262
Validation loss = 0.0503525473177433
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.09e+03 |
| Iteration     | 15       |
| MaximumReturn | 1.17e+03 |
| MinimumReturn | 927      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03604292869567871
Validation loss = 0.025537768378853798
Validation loss = 0.04689463600516319
Validation loss = 0.05961820110678673
Validation loss = 0.06600726395845413
Validation loss = 0.04630967974662781
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025120900943875313
Validation loss = 0.02943805791437626
Validation loss = 0.023254022002220154
Validation loss = 0.02624071016907692
Validation loss = 0.030199654400348663
Validation loss = 0.03442022576928139
Validation loss = 0.032998956739902496
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04381989687681198
Validation loss = 0.030046379193663597
Validation loss = 0.030660780146718025
Validation loss = 0.03092247247695923
Validation loss = 0.03658819943666458
Validation loss = 0.0392138734459877
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.031347356736660004
Validation loss = 0.035720039159059525
Validation loss = 0.04210762679576874
Validation loss = 0.045128293335437775
Validation loss = 0.036126185208559036
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03272612765431404
Validation loss = 0.0321948416531086
Validation loss = 0.04473783075809479
Validation loss = 0.040475811809301376
Validation loss = 0.05675770714879036
Validation loss = 0.0567079558968544
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 902      |
| Iteration     | 16       |
| MaximumReturn | 1.24e+03 |
| MinimumReturn | -338     |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016319969668984413
Validation loss = 0.01312574464827776
Validation loss = 0.011936493217945099
Validation loss = 0.013639431446790695
Validation loss = 0.012208182364702225
Validation loss = 0.011399535462260246
Validation loss = 0.011923897080123425
Validation loss = 0.012477032840251923
Validation loss = 0.01177594531327486
Validation loss = 0.011981223709881306
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015337694436311722
Validation loss = 0.012381551787257195
Validation loss = 0.012220685370266438
Validation loss = 0.011530016548931599
Validation loss = 0.01269449107348919
Validation loss = 0.012001486495137215
Validation loss = 0.012288572266697884
Validation loss = 0.011876752600073814
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015581182204186916
Validation loss = 0.013101070187985897
Validation loss = 0.013259653002023697
Validation loss = 0.012046919204294682
Validation loss = 0.011532691307365894
Validation loss = 0.012583117000758648
Validation loss = 0.011984072625637054
Validation loss = 0.011423371732234955
Validation loss = 0.011464753188192844
Validation loss = 0.011714966967701912
Validation loss = 0.012700039893388748
Validation loss = 0.011822313070297241
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017798583954572678
Validation loss = 0.014346633106470108
Validation loss = 0.012812569737434387
Validation loss = 0.012899475172162056
Validation loss = 0.012140396051108837
Validation loss = 0.012481559999287128
Validation loss = 0.011551868170499802
Validation loss = 0.011383363977074623
Validation loss = 0.012692820280790329
Validation loss = 0.011547788046300411
Validation loss = 0.012172568589448929
Validation loss = 0.011633189395070076
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01769905909895897
Validation loss = 0.013106814585626125
Validation loss = 0.012247058562934399
Validation loss = 0.012112692929804325
Validation loss = 0.012371032498776913
Validation loss = 0.0118581373244524
Validation loss = 0.011766648851335049
Validation loss = 0.011034565046429634
Validation loss = 0.011132869869470596
Validation loss = 0.011134409345686436
Validation loss = 0.011495023034512997
Validation loss = 0.010889274068176746
Validation loss = 0.011739738285541534
Validation loss = 0.011357777751982212
Validation loss = 0.011222640983760357
Validation loss = 0.010832026600837708
Validation loss = 0.011544497683644295
Validation loss = 0.010732466354966164
Validation loss = 0.011977151967585087
Validation loss = 0.011260765604674816
Validation loss = 0.012123707681894302
Validation loss = 0.010691172443330288
Validation loss = 0.01094305794686079
Validation loss = 0.011467970907688141
Validation loss = 0.010621335357427597
Validation loss = 0.011120352894067764
Validation loss = 0.012073409743607044
Validation loss = 0.01054355688393116
Validation loss = 0.010748704895377159
Validation loss = 0.01066864375025034
Validation loss = 0.010649179108440876
Validation loss = 0.01055340375751257
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.21e+03 |
| Iteration     | 17       |
| MaximumReturn | 1.27e+03 |
| MinimumReturn | 1.13e+03 |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012296034023165703
Validation loss = 0.01169842854142189
Validation loss = 0.011526251211762428
Validation loss = 0.011483192443847656
Validation loss = 0.011048710905015469
Validation loss = 0.011122217401862144
Validation loss = 0.011278851889073849
Validation loss = 0.011093979701399803
Validation loss = 0.010937631130218506
Validation loss = 0.011398222297430038
Validation loss = 0.011749408207833767
Validation loss = 0.01106330007314682
Validation loss = 0.01048874482512474
Validation loss = 0.011152482591569424
Validation loss = 0.01105497032403946
Validation loss = 0.010496855713427067
Validation loss = 0.010918962769210339
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013064498081803322
Validation loss = 0.011010466143488884
Validation loss = 0.011693636886775494
Validation loss = 0.011406649835407734
Validation loss = 0.0120273157954216
Validation loss = 0.011662755161523819
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012178890407085419
Validation loss = 0.011349414475262165
Validation loss = 0.010848813690245152
Validation loss = 0.011411695741117
Validation loss = 0.011660432443022728
Validation loss = 0.010968185029923916
Validation loss = 0.01104389876127243
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01241109985858202
Validation loss = 0.01097120251506567
Validation loss = 0.011012638919055462
Validation loss = 0.010875774547457695
Validation loss = 0.011170798912644386
Validation loss = 0.011094820685684681
Validation loss = 0.01079182792454958
Validation loss = 0.010820101015269756
Validation loss = 0.010735333897173405
Validation loss = 0.010740885511040688
Validation loss = 0.011192068457603455
Validation loss = 0.010868697427213192
Validation loss = 0.011105246841907501
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011102616786956787
Validation loss = 0.01156583707779646
Validation loss = 0.010202123783528805
Validation loss = 0.010390181094408035
Validation loss = 0.011176091618835926
Validation loss = 0.010232540778815746
Validation loss = 0.010986719280481339
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.18e+03 |
| Iteration     | 18       |
| MaximumReturn | 1.26e+03 |
| MinimumReturn | 1.14e+03 |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010962851345539093
Validation loss = 0.010204330086708069
Validation loss = 0.010415047407150269
Validation loss = 0.010040397755801678
Validation loss = 0.011176841333508492
Validation loss = 0.010095035657286644
Validation loss = 0.010118692182004452
Validation loss = 0.010954267345368862
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011520657688379288
Validation loss = 0.011444744653999805
Validation loss = 0.01061624102294445
Validation loss = 0.010786626487970352
Validation loss = 0.011120781302452087
Validation loss = 0.011076146736741066
Validation loss = 0.01207219623029232
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011528687551617622
Validation loss = 0.0107352826744318
Validation loss = 0.010983829386532307
Validation loss = 0.01079312339425087
Validation loss = 0.01155257411301136
Validation loss = 0.010050018317997456
Validation loss = 0.010042333975434303
Validation loss = 0.011101571843028069
Validation loss = 0.010435184463858604
Validation loss = 0.011239638552069664
Validation loss = 0.00978097878396511
Validation loss = 0.010978760197758675
Validation loss = 0.011015400290489197
Validation loss = 0.01030493900179863
Validation loss = 0.011219844222068787
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011021582409739494
Validation loss = 0.010603273287415504
Validation loss = 0.010487864725291729
Validation loss = 0.010323414579033852
Validation loss = 0.010765365324914455
Validation loss = 0.010419394820928574
Validation loss = 0.009956668131053448
Validation loss = 0.010056333616375923
Validation loss = 0.010415117256343365
Validation loss = 0.009728814475238323
Validation loss = 0.010416075587272644
Validation loss = 0.011118206195533276
Validation loss = 0.010152881033718586
Validation loss = 0.010121340863406658
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011080056428909302
Validation loss = 0.009863220155239105
Validation loss = 0.010430720634758472
Validation loss = 0.01068213302642107
Validation loss = 0.010319890454411507
Validation loss = 0.010338844731450081
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.22e+03 |
| Iteration     | 19       |
| MaximumReturn | 1.28e+03 |
| MinimumReturn | 1.16e+03 |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011717264540493488
Validation loss = 0.010642074048519135
Validation loss = 0.010685396380722523
Validation loss = 0.010642907582223415
Validation loss = 0.010035661980509758
Validation loss = 0.01053956151008606
Validation loss = 0.009779499843716621
Validation loss = 0.009827177040278912
Validation loss = 0.010403135791420937
Validation loss = 0.01089739240705967
Validation loss = 0.009886414743959904
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011346535757184029
Validation loss = 0.010492382571101189
Validation loss = 0.010297729633748531
Validation loss = 0.010108794085681438
Validation loss = 0.010684977285563946
Validation loss = 0.010476295836269855
Validation loss = 0.010014201514422894
Validation loss = 0.009992657229304314
Validation loss = 0.010367353446781635
Validation loss = 0.010650445707142353
Validation loss = 0.009995607659220695
Validation loss = 0.009793765842914581
Validation loss = 0.009704886004328728
Validation loss = 0.009497048333287239
Validation loss = 0.00991853792220354
Validation loss = 0.00966958049684763
Validation loss = 0.010501137934625149
Validation loss = 0.009969942271709442
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010195203125476837
Validation loss = 0.010777286253869534
Validation loss = 0.009816321544349194
Validation loss = 0.009981529787182808
Validation loss = 0.010034007951617241
Validation loss = 0.009971486404538155
Validation loss = 0.009672063402831554
Validation loss = 0.00955281499773264
Validation loss = 0.01068173348903656
Validation loss = 0.010015876963734627
Validation loss = 0.010535629466176033
Validation loss = 0.010076958686113358
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010119554586708546
Validation loss = 0.010246681049466133
Validation loss = 0.010305245406925678
Validation loss = 0.010333762504160404
Validation loss = 0.009700278751552105
Validation loss = 0.009269598871469498
Validation loss = 0.010055439546704292
Validation loss = 0.009992371313273907
Validation loss = 0.010358110070228577
Validation loss = 0.009693828411400318
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010092305019497871
Validation loss = 0.009745942428708076
Validation loss = 0.009907176718115807
Validation loss = 0.009469924494624138
Validation loss = 0.009932979941368103
Validation loss = 0.010346947237849236
Validation loss = 0.009530455805361271
Validation loss = 0.010207540355622768
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.21e+03 |
| Iteration     | 20       |
| MaximumReturn | 1.31e+03 |
| MinimumReturn | 1.16e+03 |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010990697890520096
Validation loss = 0.009935648180544376
Validation loss = 0.010375996120274067
Validation loss = 0.00973686296492815
Validation loss = 0.009893548674881458
Validation loss = 0.010001291520893574
Validation loss = 0.009584484621882439
Validation loss = 0.010752907022833824
Validation loss = 0.009717130102217197
Validation loss = 0.009685179218649864
Validation loss = 0.009936348535120487
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009873982518911362
Validation loss = 0.009343803860247135
Validation loss = 0.009439133107662201
Validation loss = 0.009693671949207783
Validation loss = 0.00945192202925682
Validation loss = 0.010047812014818192
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01026241760700941
Validation loss = 0.010396253317594528
Validation loss = 0.010005938820540905
Validation loss = 0.009793543256819248
Validation loss = 0.010375051759183407
Validation loss = 0.01064345333725214
Validation loss = 0.009623263031244278
Validation loss = 0.009783434681594372
Validation loss = 0.010358646512031555
Validation loss = 0.009926741011440754
Validation loss = 0.01044102106243372
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010715126059949398
Validation loss = 0.009197795763611794
Validation loss = 0.009713816456496716
Validation loss = 0.009354864247143269
Validation loss = 0.009366016834974289
Validation loss = 0.009192862547934055
Validation loss = 0.01075842883437872
Validation loss = 0.01018096599727869
Validation loss = 0.009347829967737198
Validation loss = 0.008890403434634209
Validation loss = 0.009598549455404282
Validation loss = 0.009564551524817944
Validation loss = 0.00961076095700264
Validation loss = 0.009867039509117603
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010129991918802261
Validation loss = 0.0092838816344738
Validation loss = 0.01090728398412466
Validation loss = 0.009515102952718735
Validation loss = 0.00901853758841753
Validation loss = 0.009856819175183773
Validation loss = 0.009684148244559765
Validation loss = 0.00918448343873024
Validation loss = 0.009321705438196659
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.18e+03 |
| Iteration     | 21       |
| MaximumReturn | 1.28e+03 |
| MinimumReturn | 1.07e+03 |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009691843762993813
Validation loss = 0.009529071860015392
Validation loss = 0.010807810351252556
Validation loss = 0.008986048400402069
Validation loss = 0.009293311275541782
Validation loss = 0.00988565944135189
Validation loss = 0.00977519154548645
Validation loss = 0.009278185665607452
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010182429105043411
Validation loss = 0.00972201582044363
Validation loss = 0.009543481282889843
Validation loss = 0.009483215399086475
Validation loss = 0.009799008257687092
Validation loss = 0.009120665490627289
Validation loss = 0.00881167035549879
Validation loss = 0.009172603487968445
Validation loss = 0.009355226531624794
Validation loss = 0.009679846465587616
Validation loss = 0.009075007401406765
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009774756617844105
Validation loss = 0.009163817390799522
Validation loss = 0.009795582853257656
Validation loss = 0.009768185205757618
Validation loss = 0.009971046820282936
Validation loss = 0.009290896356105804
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009963979944586754
Validation loss = 0.009334678761661053
Validation loss = 0.009792610071599483
Validation loss = 0.008605421520769596
Validation loss = 0.009011726826429367
Validation loss = 0.009273046627640724
Validation loss = 0.009259316138923168
Validation loss = 0.010418510064482689
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00999064464122057
Validation loss = 0.009591314010322094
Validation loss = 0.009371473453938961
Validation loss = 0.010450056754052639
Validation loss = 0.008980545215308666
Validation loss = 0.008901037275791168
Validation loss = 0.009107250720262527
Validation loss = 0.008756422437727451
Validation loss = 0.009434572421014309
Validation loss = 0.009565085172653198
Validation loss = 0.008886783383786678
Validation loss = 0.00868525542318821
Validation loss = 0.009402397088706493
Validation loss = 0.008817961439490318
Validation loss = 0.008475788868963718
Validation loss = 0.00887251552194357
Validation loss = 0.008992842398583889
Validation loss = 0.00931537989526987
Validation loss = 0.009062052704393864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.2e+03  |
| Iteration     | 22       |
| MaximumReturn | 1.25e+03 |
| MinimumReturn | 1.14e+03 |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010042828507721424
Validation loss = 0.009156949818134308
Validation loss = 0.009409662336111069
Validation loss = 0.009183921851217747
Validation loss = 0.009167511947453022
Validation loss = 0.009228497743606567
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009642756544053555
Validation loss = 0.009388801641762257
Validation loss = 0.009138324297964573
Validation loss = 0.009238269180059433
Validation loss = 0.008948869071900845
Validation loss = 0.009458101354539394
Validation loss = 0.009202643297612667
Validation loss = 0.008649206720292568
Validation loss = 0.008946035988628864
Validation loss = 0.008836542256176472
Validation loss = 0.009402614086866379
Validation loss = 0.009115711785852909
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011390428058803082
Validation loss = 0.009462007321417332
Validation loss = 0.008657151833176613
Validation loss = 0.009141587652266026
Validation loss = 0.009095861576497555
Validation loss = 0.008901280350983143
Validation loss = 0.009422135539352894
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009176001884043217
Validation loss = 0.009511907584965229
Validation loss = 0.009363779798150063
Validation loss = 0.009211801923811436
Validation loss = 0.00907484907656908
Validation loss = 0.009616218507289886
Validation loss = 0.0096043786033988
Validation loss = 0.008974169380962849
Validation loss = 0.009196555241942406
Validation loss = 0.009186581708490849
Validation loss = 0.00862827803939581
Validation loss = 0.009210399352014065
Validation loss = 0.008827362209558487
Validation loss = 0.008747783489525318
Validation loss = 0.008422710932791233
Validation loss = 0.009733819402754307
Validation loss = 0.00881171878427267
Validation loss = 0.008224660530686378
Validation loss = 0.008962835185229778
Validation loss = 0.008557888679206371
Validation loss = 0.009482518769800663
Validation loss = 0.008568703196942806
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009772575460374355
Validation loss = 0.010419015772640705
Validation loss = 0.008600634522736073
Validation loss = 0.008516799658536911
Validation loss = 0.008639817126095295
Validation loss = 0.008764178492128849
Validation loss = 0.00889971200376749
Validation loss = 0.009366155602037907
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.14e+03 |
| Iteration     | 23       |
| MaximumReturn | 1.25e+03 |
| MinimumReturn | 968      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010366643778979778
Validation loss = 0.008798851631581783
Validation loss = 0.009232645854353905
Validation loss = 0.008870381861925125
Validation loss = 0.008844506926834583
Validation loss = 0.009019081480801105
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009019652381539345
Validation loss = 0.008988473564386368
Validation loss = 0.008917279541492462
Validation loss = 0.008578062057495117
Validation loss = 0.00909588485956192
Validation loss = 0.00839388370513916
Validation loss = 0.008556797169148922
Validation loss = 0.008429843001067638
Validation loss = 0.008275859989225864
Validation loss = 0.009021238423883915
Validation loss = 0.00835193507373333
Validation loss = 0.008678051643073559
Validation loss = 0.008803335949778557
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009646880440413952
Validation loss = 0.009826884604990482
Validation loss = 0.009072578512132168
Validation loss = 0.009335008449852467
Validation loss = 0.009150125086307526
Validation loss = 0.008731118403375149
Validation loss = 0.008789397776126862
Validation loss = 0.009206385351717472
Validation loss = 0.009074189700186253
Validation loss = 0.009228439070284367
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00956681277602911
Validation loss = 0.008802425116300583
Validation loss = 0.00830340851098299
Validation loss = 0.008642343804240227
Validation loss = 0.008544381707906723
Validation loss = 0.008373016491532326
Validation loss = 0.00893856305629015
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008865801617503166
Validation loss = 0.008417600765824318
Validation loss = 0.008741425350308418
Validation loss = 0.00832649227231741
Validation loss = 0.008419130928814411
Validation loss = 0.008886408060789108
Validation loss = 0.008252238854765892
Validation loss = 0.009038747288286686
Validation loss = 0.008063329383730888
Validation loss = 0.008237849920988083
Validation loss = 0.007918544113636017
Validation loss = 0.008590581826865673
Validation loss = 0.00949111394584179
Validation loss = 0.007690424099564552
Validation loss = 0.008420982398092747
Validation loss = 0.007975084707140923
Validation loss = 0.007809496484696865
Validation loss = 0.00841463916003704
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 939      |
| Iteration     | 24       |
| MaximumReturn | 1.29e+03 |
| MinimumReturn | -551     |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009814154356718063
Validation loss = 0.008786536753177643
Validation loss = 0.009279059246182442
Validation loss = 0.008647340349853039
Validation loss = 0.008401240222156048
Validation loss = 0.008723167702555656
Validation loss = 0.008462025783956051
Validation loss = 0.008487801067531109
Validation loss = 0.008510877378284931
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009432593360543251
Validation loss = 0.008333852514624596
Validation loss = 0.008749800734221935
Validation loss = 0.008275061845779419
Validation loss = 0.008111213333904743
Validation loss = 0.008358646184206009
Validation loss = 0.008407946676015854
Validation loss = 0.00821309071034193
Validation loss = 0.007851112633943558
Validation loss = 0.007895146496593952
Validation loss = 0.00836359616369009
Validation loss = 0.008001706562936306
Validation loss = 0.00906810350716114
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00971091166138649
Validation loss = 0.008489700965583324
Validation loss = 0.008684387430548668
Validation loss = 0.008605093695223331
Validation loss = 0.008101127110421658
Validation loss = 0.008828782476484776
Validation loss = 0.00842649582773447
Validation loss = 0.008584287017583847
Validation loss = 0.008512229658663273
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009505552239716053
Validation loss = 0.008211283944547176
Validation loss = 0.008751950226724148
Validation loss = 0.008809713646769524
Validation loss = 0.00841160211712122
Validation loss = 0.0077573079615831375
Validation loss = 0.008461778983473778
Validation loss = 0.008546482771635056
Validation loss = 0.008234416134655476
Validation loss = 0.008389914408326149
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008237365633249283
Validation loss = 0.008025238290429115
Validation loss = 0.008029956370592117
Validation loss = 0.008037831634283066
Validation loss = 0.007934912107884884
Validation loss = 0.008570828475058079
Validation loss = 0.007629898842424154
Validation loss = 0.008260147646069527
Validation loss = 0.007749989628791809
Validation loss = 0.008137173019349575
Validation loss = 0.007711222395300865
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.23e+03 |
| Iteration     | 25       |
| MaximumReturn | 1.33e+03 |
| MinimumReturn | 1.16e+03 |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009185214526951313
Validation loss = 0.008538038469851017
Validation loss = 0.00859286729246378
Validation loss = 0.00841100700199604
Validation loss = 0.008775587193667889
Validation loss = 0.008384966291487217
Validation loss = 0.008429454639554024
Validation loss = 0.008141718804836273
Validation loss = 0.00844220444560051
Validation loss = 0.008134972304105759
Validation loss = 0.00797779206186533
Validation loss = 0.008922972716391087
Validation loss = 0.008053915575146675
Validation loss = 0.008408292196691036
Validation loss = 0.008073227480053902
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008354374207556248
Validation loss = 0.008319467306137085
Validation loss = 0.008355112746357918
Validation loss = 0.00743780517950654
Validation loss = 0.008513566106557846
Validation loss = 0.007772756740450859
Validation loss = 0.008358277380466461
Validation loss = 0.007877912372350693
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009036348201334476
Validation loss = 0.008358112536370754
Validation loss = 0.008090057410299778
Validation loss = 0.008461958728730679
Validation loss = 0.008465880528092384
Validation loss = 0.0084840664640069
Validation loss = 0.008381081745028496
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008469386957585812
Validation loss = 0.008341653272509575
Validation loss = 0.008638919331133366
Validation loss = 0.008121649734675884
Validation loss = 0.008009539917111397
Validation loss = 0.008231665007770061
Validation loss = 0.007680034730583429
Validation loss = 0.008236479014158249
Validation loss = 0.007899194024503231
Validation loss = 0.007853553630411625
Validation loss = 0.008184127509593964
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008511640131473541
Validation loss = 0.008692492730915546
Validation loss = 0.007890228182077408
Validation loss = 0.007566791959106922
Validation loss = 0.008113511838018894
Validation loss = 0.007950146682560444
Validation loss = 0.007947579026222229
Validation loss = 0.0075464495457708836
Validation loss = 0.007885647937655449
Validation loss = 0.007842952385544777
Validation loss = 0.008269025944173336
Validation loss = 0.00730945123359561
Validation loss = 0.007293956819921732
Validation loss = 0.00801600981503725
Validation loss = 0.008256543427705765
Validation loss = 0.00759805366396904
Validation loss = 0.007851365953683853
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.21e+03 |
| Iteration     | 26       |
| MaximumReturn | 1.27e+03 |
| MinimumReturn | 1.1e+03  |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009417561814188957
Validation loss = 0.008112451061606407
Validation loss = 0.00851679127663374
Validation loss = 0.00782041810452938
Validation loss = 0.00826453510671854
Validation loss = 0.00807425007224083
Validation loss = 0.008415290154516697
Validation loss = 0.007957923226058483
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008077442646026611
Validation loss = 0.007875175215303898
Validation loss = 0.007625414989888668
Validation loss = 0.007426831405609846
Validation loss = 0.00759249459952116
Validation loss = 0.007552034221589565
Validation loss = 0.00805956032127142
Validation loss = 0.007479920517653227
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008485628291964531
Validation loss = 0.008673434145748615
Validation loss = 0.008354582823812962
Validation loss = 0.008020428009331226
Validation loss = 0.00827188789844513
Validation loss = 0.008097619749605656
Validation loss = 0.008371168747544289
Validation loss = 0.007827839814126492
Validation loss = 0.008296542800962925
Validation loss = 0.008028306998312473
Validation loss = 0.008213098160922527
Validation loss = 0.008066317066550255
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0081521887332201
Validation loss = 0.007712967693805695
Validation loss = 0.007757131475955248
Validation loss = 0.007803001906722784
Validation loss = 0.007887627929449081
Validation loss = 0.008147827349603176
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007722742389887571
Validation loss = 0.007518726866692305
Validation loss = 0.0075971935875713825
Validation loss = 0.007314303424209356
Validation loss = 0.007491060998290777
Validation loss = 0.0078025758266448975
Validation loss = 0.00752184959128499
Validation loss = 0.007081747520714998
Validation loss = 0.007647110614925623
Validation loss = 0.0076596699655056
Validation loss = 0.007376167923212051
Validation loss = 0.007403329014778137
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.23e+03 |
| Iteration     | 27       |
| MaximumReturn | 1.3e+03  |
| MinimumReturn | 1.16e+03 |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008190931752324104
Validation loss = 0.007835570722818375
Validation loss = 0.007859684526920319
Validation loss = 0.008046029135584831
Validation loss = 0.007694805506616831
Validation loss = 0.007903539575636387
Validation loss = 0.007854214869439602
Validation loss = 0.008368240669369698
Validation loss = 0.007793486583977938
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007869304157793522
Validation loss = 0.008016016334295273
Validation loss = 0.007811821065843105
Validation loss = 0.0075722867622971535
Validation loss = 0.008314963430166245
Validation loss = 0.007839923724532127
Validation loss = 0.0074676526710391045
Validation loss = 0.008038832806050777
Validation loss = 0.007579554803669453
Validation loss = 0.007543391548097134
Validation loss = 0.007871715351939201
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008739929646253586
Validation loss = 0.00798124447464943
Validation loss = 0.00797840766608715
Validation loss = 0.007902724668383598
Validation loss = 0.008211350999772549
Validation loss = 0.007691309787333012
Validation loss = 0.008316164836287498
Validation loss = 0.007560094352811575
Validation loss = 0.007890709675848484
Validation loss = 0.008004036732017994
Validation loss = 0.007569102570414543
Validation loss = 0.008135583251714706
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008193896152079105
Validation loss = 0.007673348765820265
Validation loss = 0.007795342709869146
Validation loss = 0.007803508546203375
Validation loss = 0.007337840273976326
Validation loss = 0.007481545675545931
Validation loss = 0.0075751314871013165
Validation loss = 0.007669080514460802
Validation loss = 0.0072560696862638
Validation loss = 0.007566164713352919
Validation loss = 0.007139140274375677
Validation loss = 0.007647230289876461
Validation loss = 0.007529481779783964
Validation loss = 0.007314582355320454
Validation loss = 0.007599562406539917
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007626178674399853
Validation loss = 0.007161222863942385
Validation loss = 0.007264275569468737
Validation loss = 0.0070420666597783566
Validation loss = 0.006989356130361557
Validation loss = 0.00724573340266943
Validation loss = 0.007321199402213097
Validation loss = 0.007127308286726475
Validation loss = 0.0072400071658194065
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.22e+03 |
| Iteration     | 28       |
| MaximumReturn | 1.26e+03 |
| MinimumReturn | 1.18e+03 |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007990213111042976
Validation loss = 0.0073961555026471615
Validation loss = 0.007578082848340273
Validation loss = 0.007478443905711174
Validation loss = 0.007504598703235388
Validation loss = 0.008575175888836384
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007915130816400051
Validation loss = 0.007493099197745323
Validation loss = 0.007389552425593138
Validation loss = 0.007327163126319647
Validation loss = 0.007502460852265358
Validation loss = 0.007693655323237181
Validation loss = 0.008359242230653763
Validation loss = 0.007451067212969065
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008366168476641178
Validation loss = 0.008681955747306347
Validation loss = 0.007481624837964773
Validation loss = 0.007968593388795853
Validation loss = 0.007546362932771444
Validation loss = 0.008517073467373848
Validation loss = 0.007868243381381035
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0079251229763031
Validation loss = 0.00797506608068943
Validation loss = 0.007347227074205875
Validation loss = 0.007374437991529703
Validation loss = 0.0070508988574147224
Validation loss = 0.0075441086664795876
Validation loss = 0.0072927712462842464
Validation loss = 0.0075564091093838215
Validation loss = 0.007356582675129175
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008165697567164898
Validation loss = 0.007032579276710749
Validation loss = 0.007281080819666386
Validation loss = 0.0072027225978672504
Validation loss = 0.007084351498633623
Validation loss = 0.007002100348472595
Validation loss = 0.006966122426092625
Validation loss = 0.007041811477392912
Validation loss = 0.006766836624592543
Validation loss = 0.007854442112147808
Validation loss = 0.0072772870771586895
Validation loss = 0.006879741325974464
Validation loss = 0.0069333831779658794
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.01e+03 |
| Iteration     | 29       |
| MaximumReturn | 1.31e+03 |
| MinimumReturn | -234     |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008441124111413956
Validation loss = 0.007890503853559494
Validation loss = 0.007954144850373268
Validation loss = 0.007568406872451305
Validation loss = 0.008002731949090958
Validation loss = 0.007381753530353308
Validation loss = 0.007687403820455074
Validation loss = 0.007870450615882874
Validation loss = 0.007614900823682547
Validation loss = 0.007362304255366325
Validation loss = 0.0072076451033353806
Validation loss = 0.007132007274776697
Validation loss = 0.0071910591796040535
Validation loss = 0.007814709097146988
Validation loss = 0.007198082748800516
Validation loss = 0.007589280139654875
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007720000576227903
Validation loss = 0.007316610310226679
Validation loss = 0.007118815556168556
Validation loss = 0.007321349810808897
Validation loss = 0.007582687307149172
Validation loss = 0.007464337628334761
Validation loss = 0.0072745634242892265
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008232388645410538
Validation loss = 0.007418656721711159
Validation loss = 0.007479785941541195
Validation loss = 0.007270881440490484
Validation loss = 0.0073308092541992664
Validation loss = 0.007672184146940708
Validation loss = 0.0075609395280480385
Validation loss = 0.007864173501729965
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007420154754072428
Validation loss = 0.007177665363997221
Validation loss = 0.0071944259107112885
Validation loss = 0.007461204659193754
Validation loss = 0.007224604021757841
Validation loss = 0.007078805472701788
Validation loss = 0.0074531384743750095
Validation loss = 0.006967483088374138
Validation loss = 0.006889501586556435
Validation loss = 0.0074883415363729
Validation loss = 0.007382196374237537
Validation loss = 0.006990727968513966
Validation loss = 0.006923577282577753
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007335727568715811
Validation loss = 0.007219994440674782
Validation loss = 0.006886257324367762
Validation loss = 0.0068612005561590195
Validation loss = 0.006622491870075464
Validation loss = 0.00718424329534173
Validation loss = 0.006899010855704546
Validation loss = 0.006838429253548384
Validation loss = 0.006805411074310541
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.27e+03 |
| Iteration     | 30       |
| MaximumReturn | 1.38e+03 |
| MinimumReturn | 1.18e+03 |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007350026164203882
Validation loss = 0.007281750440597534
Validation loss = 0.007313716225326061
Validation loss = 0.007188761606812477
Validation loss = 0.007117815315723419
Validation loss = 0.007072998210787773
Validation loss = 0.006981812417507172
Validation loss = 0.007005970925092697
Validation loss = 0.00710594467818737
Validation loss = 0.007361066527664661
Validation loss = 0.007082539144903421
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007380676921457052
Validation loss = 0.006826725788414478
Validation loss = 0.007176863495260477
Validation loss = 0.007567805703729391
Validation loss = 0.007284233812242746
Validation loss = 0.006820418406277895
Validation loss = 0.007625686936080456
Validation loss = 0.007017740048468113
Validation loss = 0.007394198793917894
Validation loss = 0.007008104119449854
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00728557538241148
Validation loss = 0.007173401303589344
Validation loss = 0.007694480009377003
Validation loss = 0.007244572043418884
Validation loss = 0.007468201220035553
Validation loss = 0.007452879101037979
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006776209454983473
Validation loss = 0.0068006357178092
Validation loss = 0.006920235697180033
Validation loss = 0.007242988795042038
Validation loss = 0.007040994707494974
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006973158568143845
Validation loss = 0.007010774686932564
Validation loss = 0.006765330210328102
Validation loss = 0.007129584439098835
Validation loss = 0.007138695567846298
Validation loss = 0.006827271543443203
Validation loss = 0.00681725237518549
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.27e+03 |
| Iteration     | 31       |
| MaximumReturn | 1.35e+03 |
| MinimumReturn | 1.19e+03 |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007014880422502756
Validation loss = 0.007790381088852882
Validation loss = 0.007066548801958561
Validation loss = 0.007164053153246641
Validation loss = 0.007255904376506805
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006984953302890062
Validation loss = 0.006830860860645771
Validation loss = 0.006845233030617237
Validation loss = 0.007072312757372856
Validation loss = 0.007179154548794031
Validation loss = 0.00697605824097991
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007606766652315855
Validation loss = 0.006933615542948246
Validation loss = 0.007028650492429733
Validation loss = 0.007088836748152971
Validation loss = 0.007218442391604185
Validation loss = 0.0074639227241277695
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0071638040244579315
Validation loss = 0.0072528645396232605
Validation loss = 0.00732403015717864
Validation loss = 0.006614773999899626
Validation loss = 0.006639224011451006
Validation loss = 0.00683228624984622
Validation loss = 0.0070918891578912735
Validation loss = 0.006692938972264528
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007797066122293472
Validation loss = 0.007378406822681427
Validation loss = 0.006657479330897331
Validation loss = 0.0069749848917126656
Validation loss = 0.0066366600804030895
Validation loss = 0.0071822465397417545
Validation loss = 0.006640907377004623
Validation loss = 0.006805171258747578
Validation loss = 0.006685839965939522
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.3e+03  |
| Iteration     | 32       |
| MaximumReturn | 1.37e+03 |
| MinimumReturn | 1.25e+03 |
| TotalSamples  | 136000   |
----------------------------
