Logging to experiments/hopper/hopperO01/Tue-01-Nov-2022-09-35-15-AM-CDT_hopper_trpo_iteration_20_seed2231
Print configuration .....
{'env_name': 'hopper', 'random_seeds': [1234, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/hopper_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7408145666122437
Validation loss = 0.6664491891860962
Validation loss = 0.6440469622612
Validation loss = 0.6387633085250854
Validation loss = 0.6454180479049683
Validation loss = 0.6627216935157776
Validation loss = 0.6819713711738586
Validation loss = 0.7032150626182556
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7624582648277283
Validation loss = 0.6616639494895935
Validation loss = 0.6448132991790771
Validation loss = 0.6474334001541138
Validation loss = 0.648910641670227
Validation loss = 0.6685144901275635
Validation loss = 0.7005207538604736
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.8466442823410034
Validation loss = 0.6592116355895996
Validation loss = 0.6439088582992554
Validation loss = 0.6531398296356201
Validation loss = 0.6748665571212769
Validation loss = 0.6892938613891602
Validation loss = 0.7029845118522644
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7764089107513428
Validation loss = 0.6580828428268433
Validation loss = 0.6434460878372192
Validation loss = 0.6429228782653809
Validation loss = 0.647943377494812
Validation loss = 0.6731744408607483
Validation loss = 0.6906161308288574
Validation loss = 0.7213891744613647
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7762713432312012
Validation loss = 0.6657808423042297
Validation loss = 0.6434358954429626
Validation loss = 0.662895917892456
Validation loss = 0.6548464894294739
Validation loss = 0.6622240543365479
Validation loss = 0.6887747049331665
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.49e+03 |
| Iteration     | 0         |
| MaximumReturn | -2.12e+03 |
| MinimumReturn | -3.01e+03 |
| TotalSamples  | 8000      |
-----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.723163366317749
Validation loss = 0.6735103130340576
Validation loss = 0.6783409714698792
Validation loss = 0.6985156536102295
Validation loss = 0.7293322086334229
Validation loss = 0.7625948190689087
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7209588885307312
Validation loss = 0.6660293340682983
Validation loss = 0.6685126423835754
Validation loss = 0.688001275062561
Validation loss = 0.7181283831596375
Validation loss = 0.7377820014953613
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7157089710235596
Validation loss = 0.6687643527984619
Validation loss = 0.6692237854003906
Validation loss = 0.6840341687202454
Validation loss = 0.7082041501998901
Validation loss = 0.7551345825195312
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7250440120697021
Validation loss = 0.6768389940261841
Validation loss = 0.677573561668396
Validation loss = 0.6989340782165527
Validation loss = 0.7342370748519897
Validation loss = 0.7607218027114868
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7177974581718445
Validation loss = 0.6657332181930542
Validation loss = 0.6704177856445312
Validation loss = 0.6851581931114197
Validation loss = 0.7000538110733032
Validation loss = 0.7429523468017578
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.36e+03 |
| Iteration     | 1         |
| MaximumReturn | -2.33e+03 |
| MinimumReturn | -2.38e+03 |
| TotalSamples  | 12000     |
-----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6790063381195068
Validation loss = 0.7040868401527405
Validation loss = 0.7294385433197021
Validation loss = 0.7711758017539978
Validation loss = 0.7849294543266296
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6811330914497375
Validation loss = 0.7103841304779053
Validation loss = 0.7372019290924072
Validation loss = 0.7637923359870911
Validation loss = 0.767859160900116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6834828853607178
Validation loss = 0.704669713973999
Validation loss = 0.7123997807502747
Validation loss = 0.7401064038276672
Validation loss = 0.762749433517456
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.687181293964386
Validation loss = 0.7113878726959229
Validation loss = 0.743534505367279
Validation loss = 0.7700764536857605
Validation loss = 0.797462522983551
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6848356127738953
Validation loss = 0.6939082145690918
Validation loss = 0.7231913208961487
Validation loss = 0.7356433868408203
Validation loss = 0.7666841149330139
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.26e+03 |
| Iteration     | 2         |
| MaximumReturn | -2.15e+03 |
| MinimumReturn | -2.34e+03 |
| TotalSamples  | 16000     |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7288354635238647
Validation loss = 0.7449633479118347
Validation loss = 0.7672644257545471
Validation loss = 0.7873141765594482
Validation loss = 0.7967730760574341
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7188863754272461
Validation loss = 0.7294294834136963
Validation loss = 0.7499828934669495
Validation loss = 0.7689478397369385
Validation loss = 0.7733440399169922
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7274303436279297
Validation loss = 0.7251043915748596
Validation loss = 0.7476471662521362
Validation loss = 0.7629522085189819
Validation loss = 0.7776836156845093
Validation loss = 0.7957887053489685
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7285325527191162
Validation loss = 0.7344977855682373
Validation loss = 0.7606157064437866
Validation loss = 0.7803007364273071
Validation loss = 0.775610625743866
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7106849551200867
Validation loss = 0.7266647219657898
Validation loss = 0.7496018409729004
Validation loss = 0.7629810571670532
Validation loss = 0.7761556506156921
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.65e+03 |
| Iteration     | 3         |
| MaximumReturn | -2.42e+03 |
| MinimumReturn | -2.77e+03 |
| TotalSamples  | 20000     |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.706562340259552
Validation loss = 0.7687992453575134
Validation loss = 0.779544472694397
Validation loss = 0.7911283373832703
Validation loss = 0.8175902366638184
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7148211002349854
Validation loss = 0.7542961239814758
Validation loss = 0.7761725783348083
Validation loss = 0.7878113985061646
Validation loss = 0.8022525906562805
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.705962598323822
Validation loss = 0.7587897181510925
Validation loss = 0.7840479612350464
Validation loss = 0.7905392646789551
Validation loss = 0.8021413683891296
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7084981203079224
Validation loss = 0.7628801465034485
Validation loss = 0.7738574743270874
Validation loss = 0.7941004037857056
Validation loss = 0.7996825575828552
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6938918828964233
Validation loss = 0.7548825740814209
Validation loss = 0.7659748196601868
Validation loss = 0.7777273654937744
Validation loss = 0.7956419587135315
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.73e+03 |
| Iteration     | 4         |
| MaximumReturn | -2.71e+03 |
| MinimumReturn | -2.75e+03 |
| TotalSamples  | 24000     |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7223175168037415
Validation loss = 0.7711343169212341
Validation loss = 0.7809590697288513
Validation loss = 0.7936291694641113
Validation loss = 0.7967695593833923
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7441975474357605
Validation loss = 0.7783636450767517
Validation loss = 0.7838944792747498
Validation loss = 0.8020848631858826
Validation loss = 0.8083952069282532
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7459318041801453
Validation loss = 0.7750073075294495
Validation loss = 0.789237916469574
Validation loss = 0.7975113987922668
Validation loss = 0.8050665259361267
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7436773777008057
Validation loss = 0.7777101993560791
Validation loss = 0.8009470105171204
Validation loss = 0.7959641814231873
Validation loss = 0.8042715191841125
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7256562113761902
Validation loss = 0.7684121131896973
Validation loss = 0.7787135243415833
Validation loss = 0.7914395332336426
Validation loss = 0.7963190078735352
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3.42e+03 |
| Iteration     | 5         |
| MaximumReturn | -3.35e+03 |
| MinimumReturn | -3.46e+03 |
| TotalSamples  | 28000     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6908274292945862
Validation loss = 0.7513958215713501
Validation loss = 0.7791873812675476
Validation loss = 0.800092875957489
Validation loss = 0.8136324882507324
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6801314353942871
Validation loss = 0.7704801559448242
Validation loss = 0.7855193018913269
Validation loss = 0.7983034253120422
Validation loss = 0.8159785270690918
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6785834431648254
Validation loss = 0.7593384981155396
Validation loss = 0.7853783965110779
Validation loss = 0.796880841255188
Validation loss = 0.8162899017333984
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.689299464225769
Validation loss = 0.7582860589027405
Validation loss = 0.785913348197937
Validation loss = 0.8052765727043152
Validation loss = 0.8124672770500183
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6701931357383728
Validation loss = 0.7457088232040405
Validation loss = 0.772499680519104
Validation loss = 0.7884373068809509
Validation loss = 0.8040494322776794
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.69e+03 |
| Iteration     | 6         |
| MaximumReturn | -1.5e+03  |
| MinimumReturn | -2.09e+03 |
| TotalSamples  | 32000     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7581642270088196
Validation loss = 0.7825723886489868
Validation loss = 0.7980843782424927
Validation loss = 0.8160033226013184
Validation loss = 0.8275119662284851
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7597865462303162
Validation loss = 0.7940207123756409
Validation loss = 0.8133794069290161
Validation loss = 0.8182026147842407
Validation loss = 0.8237076997756958
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7634361386299133
Validation loss = 0.7802635431289673
Validation loss = 0.8052599430084229
Validation loss = 0.8220410346984863
Validation loss = 0.8342971801757812
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7626259922981262
Validation loss = 0.7950028777122498
Validation loss = 0.8113762736320496
Validation loss = 0.8156365156173706
Validation loss = 0.8204851150512695
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7513319253921509
Validation loss = 0.7815102338790894
Validation loss = 0.7971716523170471
Validation loss = 0.8100180625915527
Validation loss = 0.8229236602783203
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.31e+03 |
| Iteration     | 7         |
| MaximumReturn | -2.26e+03 |
| MinimumReturn | -2.35e+03 |
| TotalSamples  | 36000     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7994904518127441
Validation loss = 0.7982288599014282
Validation loss = 0.8119552135467529
Validation loss = 0.8120291829109192
Validation loss = 0.8169295787811279
Validation loss = 0.8231781721115112
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.807357668876648
Validation loss = 0.8072679042816162
Validation loss = 0.8144932389259338
Validation loss = 0.8276705145835876
Validation loss = 0.8274237513542175
Validation loss = 0.8337839841842651
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.797247052192688
Validation loss = 0.7958294749259949
Validation loss = 0.8070855736732483
Validation loss = 0.8104267120361328
Validation loss = 0.8183457851409912
Validation loss = 0.8252549767494202
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7917816638946533
Validation loss = 0.8078601360321045
Validation loss = 0.8168944120407104
Validation loss = 0.8221959471702576
Validation loss = 0.8245198726654053
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7924301028251648
Validation loss = 0.8038017749786377
Validation loss = 0.8142426609992981
Validation loss = 0.8169273734092712
Validation loss = 0.8271670937538147
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.7e+03  |
| Iteration     | 8         |
| MaximumReturn | -1.61e+03 |
| MinimumReturn | -1.88e+03 |
| TotalSamples  | 40000     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.8027488589286804
Validation loss = 0.8020501136779785
Validation loss = 0.8034555315971375
Validation loss = 0.8137179613113403
Validation loss = 0.8162781596183777
Validation loss = 0.8248767852783203
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.8023399114608765
Validation loss = 0.8162616491317749
Validation loss = 0.8146333694458008
Validation loss = 0.8233880996704102
Validation loss = 0.8192558288574219
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.8022961616516113
Validation loss = 0.8234589695930481
Validation loss = 0.805846095085144
Validation loss = 0.8157964944839478
Validation loss = 0.8182072639465332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.8015877604484558
Validation loss = 0.8072291612625122
Validation loss = 0.8045153617858887
Validation loss = 0.8178920745849609
Validation loss = 0.8198333978652954
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.8050944209098816
Validation loss = 0.8013616800308228
Validation loss = 0.8043692708015442
Validation loss = 0.8144614100456238
Validation loss = 0.8209349513053894
Validation loss = 0.8235675692558289
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.33e+03 |
| Iteration     | 9         |
| MaximumReturn | -2.24e+03 |
| MinimumReturn | -2.38e+03 |
| TotalSamples  | 44000     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.8022573590278625
Validation loss = 0.7922771573066711
Validation loss = 0.7990288138389587
Validation loss = 0.8173220157623291
Validation loss = 0.8048757314682007
Validation loss = 0.8090455532073975
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7975350022315979
Validation loss = 0.7967367172241211
Validation loss = 0.8027514815330505
Validation loss = 0.8062105774879456
Validation loss = 0.8208297491073608
Validation loss = 0.8121103048324585
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7889789342880249
Validation loss = 0.790061891078949
Validation loss = 0.7904873490333557
Validation loss = 0.7992992997169495
Validation loss = 0.8058383464813232
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7884029746055603
Validation loss = 0.7930675745010376
Validation loss = 0.7956733107566833
Validation loss = 0.8052127957344055
Validation loss = 0.813035249710083
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7942137122154236
Validation loss = 0.7982072234153748
Validation loss = 0.7976243495941162
Validation loss = 0.8073899745941162
Validation loss = 0.8057769536972046
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.04e+03 |
| Iteration     | 10        |
| MaximumReturn | -2e+03    |
| MinimumReturn | -2.12e+03 |
| TotalSamples  | 48000     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7788341641426086
Validation loss = 0.7743501663208008
Validation loss = 0.7800378203392029
Validation loss = 0.7891682982444763
Validation loss = 0.786076545715332
Validation loss = 0.7930493354797363
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7931919097900391
Validation loss = 0.781745970249176
Validation loss = 0.7853627800941467
Validation loss = 0.7879279255867004
Validation loss = 0.7984464168548584
Validation loss = 0.8025354743003845
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7828877568244934
Validation loss = 0.7630360722541809
Validation loss = 0.7734807133674622
Validation loss = 0.7792772650718689
Validation loss = 0.7838740944862366
Validation loss = 0.7855178713798523
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7927780747413635
Validation loss = 0.7786429524421692
Validation loss = 0.7808713316917419
Validation loss = 0.7855455279350281
Validation loss = 0.7882978320121765
Validation loss = 0.7953095436096191
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7775080800056458
Validation loss = 0.7717564702033997
Validation loss = 0.7791905999183655
Validation loss = 0.7872456908226013
Validation loss = 0.7889429926872253
Validation loss = 0.8016689419746399
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.1e+03  |
| Iteration     | 11        |
| MaximumReturn | -2.05e+03 |
| MinimumReturn | -2.16e+03 |
| TotalSamples  | 52000     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7527385354042053
Validation loss = 0.7443474531173706
Validation loss = 0.7571973204612732
Validation loss = 0.764642059803009
Validation loss = 0.766987681388855
Validation loss = 0.7682608962059021
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7663236856460571
Validation loss = 0.7678959369659424
Validation loss = 0.7648814916610718
Validation loss = 0.7729231715202332
Validation loss = 0.7733291983604431
Validation loss = 0.7806570529937744
Validation loss = 0.776278555393219
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7610095739364624
Validation loss = 0.7539145946502686
Validation loss = 0.7650189995765686
Validation loss = 0.7671962380409241
Validation loss = 0.7683741450309753
Validation loss = 0.7692442536354065
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7550687193870544
Validation loss = 0.7512862682342529
Validation loss = 0.765089750289917
Validation loss = 0.7661488056182861
Validation loss = 0.7731419205665588
Validation loss = 0.7724125981330872
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7620080709457397
Validation loss = 0.7594675421714783
Validation loss = 0.7650383710861206
Validation loss = 0.7696548104286194
Validation loss = 0.7692858576774597
Validation loss = 0.7799416780471802
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.1e+03  |
| Iteration     | 12        |
| MaximumReturn | -2.07e+03 |
| MinimumReturn | -2.19e+03 |
| TotalSamples  | 56000     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7647984623908997
Validation loss = 0.7421969175338745
Validation loss = 0.742484450340271
Validation loss = 0.7446879744529724
Validation loss = 0.7522358894348145
Validation loss = 0.7489594221115112
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7508776783943176
Validation loss = 0.7443090081214905
Validation loss = 0.7505167126655579
Validation loss = 0.7528418302536011
Validation loss = 0.7519506812095642
Validation loss = 0.7549905776977539
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.74009770154953
Validation loss = 0.7368936538696289
Validation loss = 0.7458249926567078
Validation loss = 0.7460559606552124
Validation loss = 0.7486624717712402
Validation loss = 0.7542728781700134
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7440615892410278
Validation loss = 0.7383431196212769
Validation loss = 0.7411628365516663
Validation loss = 0.7463135123252869
Validation loss = 0.7477113008499146
Validation loss = 0.7531422972679138
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7552126049995422
Validation loss = 0.7432833313941956
Validation loss = 0.7542161345481873
Validation loss = 0.7555772662162781
Validation loss = 0.757649302482605
Validation loss = 0.757656455039978
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.05e+03 |
| Iteration     | 13        |
| MaximumReturn | -1.91e+03 |
| MinimumReturn | -2.25e+03 |
| TotalSamples  | 60000     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7370033860206604
Validation loss = 0.7264840602874756
Validation loss = 0.7387763261795044
Validation loss = 0.7376471757888794
Validation loss = 0.7407583594322205
Validation loss = 0.7413358092308044
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7517954111099243
Validation loss = 0.7378707528114319
Validation loss = 0.7377074956893921
Validation loss = 0.7405071258544922
Validation loss = 0.739598274230957
Validation loss = 0.7471098303794861
Validation loss = 0.7408499121665955
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7349568009376526
Validation loss = 0.7279435992240906
Validation loss = 0.735783576965332
Validation loss = 0.7335923910140991
Validation loss = 0.7395336031913757
Validation loss = 0.7412855625152588
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7307084798812866
Validation loss = 0.7310711145401001
Validation loss = 0.7391417622566223
Validation loss = 0.737445056438446
Validation loss = 0.7391555905342102
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7318769097328186
Validation loss = 0.7376531958580017
Validation loss = 0.7412779927253723
Validation loss = 0.741804301738739
Validation loss = 0.7474873065948486
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.16e+03 |
| Iteration     | 14        |
| MaximumReturn | -2.1e+03  |
| MinimumReturn | -2.19e+03 |
| TotalSamples  | 64000     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.719318151473999
Validation loss = 0.7246463894844055
Validation loss = 0.7268670797348022
Validation loss = 0.7343953251838684
Validation loss = 0.7308544516563416
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7274145483970642
Validation loss = 0.7225661277770996
Validation loss = 0.7256190776824951
Validation loss = 0.727870762348175
Validation loss = 0.7338366508483887
Validation loss = 0.7270886898040771
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.727565348148346
Validation loss = 0.7186452746391296
Validation loss = 0.7232550382614136
Validation loss = 0.7252628207206726
Validation loss = 0.7283214330673218
Validation loss = 0.7306851744651794
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.72281813621521
Validation loss = 0.7207159996032715
Validation loss = 0.7216973304748535
Validation loss = 0.7276214957237244
Validation loss = 0.7323484420776367
Validation loss = 0.7265404462814331
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7228263020515442
Validation loss = 0.7244695425033569
Validation loss = 0.7258418798446655
Validation loss = 0.7337943315505981
Validation loss = 0.7375986576080322
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.25e+03 |
| Iteration     | 15        |
| MaximumReturn | -2.15e+03 |
| MinimumReturn | -2.32e+03 |
| TotalSamples  | 68000     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7193601727485657
Validation loss = 0.7119677066802979
Validation loss = 0.7153357267379761
Validation loss = 0.7229372262954712
Validation loss = 0.724556028842926
Validation loss = 0.7202908992767334
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7210508584976196
Validation loss = 0.7132619023323059
Validation loss = 0.726878821849823
Validation loss = 0.7210779786109924
Validation loss = 0.7265992164611816
Validation loss = 0.7214102745056152
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7182942628860474
Validation loss = 0.716002345085144
Validation loss = 0.7182643413543701
Validation loss = 0.7183313965797424
Validation loss = 0.7185785174369812
Validation loss = 0.7236427068710327
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7272594571113586
Validation loss = 0.7103673219680786
Validation loss = 0.7197126746177673
Validation loss = 0.7225033640861511
Validation loss = 0.7196006178855896
Validation loss = 0.7211970686912537
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7171527147293091
Validation loss = 0.7214938998222351
Validation loss = 0.7164173722267151
Validation loss = 0.7217487096786499
Validation loss = 0.7242740988731384
Validation loss = 0.7277997136116028
Validation loss = 0.730827271938324
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.2e+03  |
| Iteration     | 16        |
| MaximumReturn | -2.09e+03 |
| MinimumReturn | -2.36e+03 |
| TotalSamples  | 72000     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7094769477844238
Validation loss = 0.7044135332107544
Validation loss = 0.7088995575904846
Validation loss = 0.714349091053009
Validation loss = 0.7098020315170288
Validation loss = 0.7113195061683655
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7113957405090332
Validation loss = 0.7049827575683594
Validation loss = 0.7092823386192322
Validation loss = 0.7125608325004578
Validation loss = 0.7124786972999573
Validation loss = 0.7119409441947937
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7123194932937622
Validation loss = 0.7034986615180969
Validation loss = 0.7147629261016846
Validation loss = 0.7158895134925842
Validation loss = 0.7120057344436646
Validation loss = 0.7139780521392822
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7095370292663574
Validation loss = 0.7073518633842468
Validation loss = 0.7050052881240845
Validation loss = 0.7139137387275696
Validation loss = 0.7139571309089661
Validation loss = 0.7136749029159546
Validation loss = 0.7161400318145752
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7121536135673523
Validation loss = 0.7064675092697144
Validation loss = 0.7161407470703125
Validation loss = 0.7167251110076904
Validation loss = 0.7214959263801575
Validation loss = 0.7187986373901367
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.2e+03  |
| Iteration     | 17        |
| MaximumReturn | -2.17e+03 |
| MinimumReturn | -2.25e+03 |
| TotalSamples  | 76000     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7032933831214905
Validation loss = 0.6985084414482117
Validation loss = 0.7023618221282959
Validation loss = 0.7029934525489807
Validation loss = 0.7040581107139587
Validation loss = 0.7089681625366211
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7038697600364685
Validation loss = 0.6988574862480164
Validation loss = 0.6979677081108093
Validation loss = 0.7077334523200989
Validation loss = 0.709221363067627
Validation loss = 0.7054572701454163
Validation loss = 0.706540048122406
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7056092619895935
Validation loss = 0.6968569755554199
Validation loss = 0.6998980045318604
Validation loss = 0.6999585032463074
Validation loss = 0.7117884159088135
Validation loss = 0.7102677822113037
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7009937167167664
Validation loss = 0.7005151510238647
Validation loss = 0.7023987770080566
Validation loss = 0.7050336599349976
Validation loss = 0.7027090191841125
Validation loss = 0.7030514478683472
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7085485458374023
Validation loss = 0.7004071474075317
Validation loss = 0.7051386833190918
Validation loss = 0.7119951248168945
Validation loss = 0.7094693779945374
Validation loss = 0.7126542925834656
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.94e+03 |
| Iteration     | 18        |
| MaximumReturn | -1.53e+03 |
| MinimumReturn | -2.26e+03 |
| TotalSamples  | 80000     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6947752833366394
Validation loss = 0.695097804069519
Validation loss = 0.6987324953079224
Validation loss = 0.7054481506347656
Validation loss = 0.7040290832519531
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6991695165634155
Validation loss = 0.694290280342102
Validation loss = 0.6995624303817749
Validation loss = 0.7007367014884949
Validation loss = 0.702659010887146
Validation loss = 0.7021840214729309
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6990891695022583
Validation loss = 0.6995566487312317
Validation loss = 0.7040879130363464
Validation loss = 0.7074364423751831
Validation loss = 0.7050415277481079
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7028626203536987
Validation loss = 0.69316166639328
Validation loss = 0.6997156143188477
Validation loss = 0.705618143081665
Validation loss = 0.7099413275718689
Validation loss = 0.7043982148170471
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7012995481491089
Validation loss = 0.7040040493011475
Validation loss = 0.7057653665542603
Validation loss = 0.7099172472953796
Validation loss = 0.7090582847595215
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.74e+03 |
| Iteration     | 19        |
| MaximumReturn | -1.47e+03 |
| MinimumReturn | -1.96e+03 |
| TotalSamples  | 84000     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6982654929161072
Validation loss = 0.6953614950180054
Validation loss = 0.6978785395622253
Validation loss = 0.6979423761367798
Validation loss = 0.6989991664886475
Validation loss = 0.696575939655304
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6949083209037781
Validation loss = 0.6902344226837158
Validation loss = 0.6958765983581543
Validation loss = 0.6964278817176819
Validation loss = 0.695526659488678
Validation loss = 0.7068819403648376
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6976494193077087
Validation loss = 0.6937431693077087
Validation loss = 0.6999098062515259
Validation loss = 0.7083427906036377
Validation loss = 0.7048230767250061
Validation loss = 0.7062487006187439
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6954677700996399
Validation loss = 0.6916213631629944
Validation loss = 0.6984643936157227
Validation loss = 0.6996922492980957
Validation loss = 0.7046928405761719
Validation loss = 0.7007582783699036
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6992250084877014
Validation loss = 0.697007954120636
Validation loss = 0.7044737935066223
Validation loss = 0.7048758864402771
Validation loss = 0.7114791870117188
Validation loss = 0.7097305059432983
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.04e+03 |
| Iteration     | 20        |
| MaximumReturn | -1.88e+03 |
| MinimumReturn | -2.18e+03 |
| TotalSamples  | 88000     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6889472603797913
Validation loss = 0.6885313987731934
Validation loss = 0.693276584148407
Validation loss = 0.6968165040016174
Validation loss = 0.696657121181488
Validation loss = 0.6963905692100525
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6915205121040344
Validation loss = 0.6884650588035583
Validation loss = 0.6906607747077942
Validation loss = 0.6924470067024231
Validation loss = 0.6932963728904724
Validation loss = 0.6959841251373291
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6978029608726501
Validation loss = 0.6918111443519592
Validation loss = 0.6994255781173706
Validation loss = 0.6974355578422546
Validation loss = 0.6984845995903015
Validation loss = 0.698896586894989
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6897014379501343
Validation loss = 0.6915967464447021
Validation loss = 0.6968827843666077
Validation loss = 0.7011244297027588
Validation loss = 0.7002333402633667
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6965212821960449
Validation loss = 0.6951232552528381
Validation loss = 0.6988462805747986
Validation loss = 0.7011528015136719
Validation loss = 0.7012412548065186
Validation loss = 0.7022766470909119
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.55e+03 |
| Iteration     | 21        |
| MaximumReturn | -1.4e+03  |
| MinimumReturn | -1.85e+03 |
| TotalSamples  | 92000     |
-----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.692866861820221
Validation loss = 0.6839864253997803
Validation loss = 0.6884169578552246
Validation loss = 0.6930767893791199
Validation loss = 0.6908393502235413
Validation loss = 0.691621720790863
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6935986280441284
Validation loss = 0.6833106875419617
Validation loss = 0.6899915933609009
Validation loss = 0.6926261782646179
Validation loss = 0.6901663541793823
Validation loss = 0.6906331777572632
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6937265992164612
Validation loss = 0.6910872459411621
Validation loss = 0.6920278072357178
Validation loss = 0.6929183006286621
Validation loss = 0.6956859827041626
Validation loss = 0.6968926191329956
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6927340030670166
Validation loss = 0.6823753118515015
Validation loss = 0.6903075575828552
Validation loss = 0.6904386281967163
Validation loss = 0.6922556757926941
Validation loss = 0.6939888000488281
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.690477192401886
Validation loss = 0.6941763162612915
Validation loss = 0.6963154077529907
Validation loss = 0.6972993016242981
Validation loss = 0.7013090252876282
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.56e+03 |
| Iteration     | 22        |
| MaximumReturn | -1.41e+03 |
| MinimumReturn | -1.74e+03 |
| TotalSamples  | 96000     |
-----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6855721473693848
Validation loss = 0.682295024394989
Validation loss = 0.6896597743034363
Validation loss = 0.6913496851921082
Validation loss = 0.6937644481658936
Validation loss = 0.6869872212409973
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6853888630867004
Validation loss = 0.6847417950630188
Validation loss = 0.688591480255127
Validation loss = 0.6904521584510803
Validation loss = 0.6901991963386536
Validation loss = 0.6890761852264404
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6903403401374817
Validation loss = 0.6872795224189758
Validation loss = 0.6899874210357666
Validation loss = 0.6912775039672852
Validation loss = 0.6940705180168152
Validation loss = 0.6974775195121765
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6831655502319336
Validation loss = 0.6837080121040344
Validation loss = 0.6880246996879578
Validation loss = 0.6861281394958496
Validation loss = 0.6893044114112854
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.684531033039093
Validation loss = 0.6952065825462341
Validation loss = 0.6939971446990967
Validation loss = 0.698951780796051
Validation loss = 0.7001600861549377
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.35e+03 |
| Iteration     | 23        |
| MaximumReturn | -1e+03    |
| MinimumReturn | -1.66e+03 |
| TotalSamples  | 100000    |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6899097561836243
Validation loss = 0.6837878227233887
Validation loss = 0.6894954442977905
Validation loss = 0.689017653465271
Validation loss = 0.6895835399627686
Validation loss = 0.6897553205490112
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6907394528388977
Validation loss = 0.6816743612289429
Validation loss = 0.6888471245765686
Validation loss = 0.6903620958328247
Validation loss = 0.6920384168624878
Validation loss = 0.693453311920166
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6883726716041565
Validation loss = 0.6908631324768066
Validation loss = 0.6909828186035156
Validation loss = 0.6977201700210571
Validation loss = 0.6953790783882141
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.681627631187439
Validation loss = 0.6829540133476257
Validation loss = 0.6944518089294434
Validation loss = 0.692470908164978
Validation loss = 0.6930909752845764
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6958891153335571
Validation loss = 0.695559561252594
Validation loss = 0.7004448175430298
Validation loss = 0.6981980800628662
Validation loss = 0.6973105669021606
Validation loss = 0.6970760822296143
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.88e+03 |
| Iteration     | 24        |
| MaximumReturn | -1.25e+03 |
| MinimumReturn | -2.07e+03 |
| TotalSamples  | 104000    |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6905375719070435
Validation loss = 0.6847761869430542
Validation loss = 0.6889427900314331
Validation loss = 0.6909754872322083
Validation loss = 0.6903046369552612
Validation loss = 0.6939563751220703
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6995029449462891
Validation loss = 0.6849053502082825
Validation loss = 0.6890150904655457
Validation loss = 0.6899784803390503
Validation loss = 0.6880765557289124
Validation loss = 0.6898928284645081
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6940028667449951
Validation loss = 0.6909417510032654
Validation loss = 0.6947651505470276
Validation loss = 0.6934633851051331
Validation loss = 0.6972419023513794
Validation loss = 0.6930080652236938
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.689251184463501
Validation loss = 0.6885756254196167
Validation loss = 0.687847375869751
Validation loss = 0.6928888559341431
Validation loss = 0.6915675401687622
Validation loss = 0.6862647533416748
Validation loss = 0.6915878057479858
Validation loss = 0.6889666318893433
Validation loss = 0.6892917156219482
Validation loss = 0.6902588605880737
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7003768086433411
Validation loss = 0.6958510279655457
Validation loss = 0.693009078502655
Validation loss = 0.6972326040267944
Validation loss = 0.6972519159317017
Validation loss = 0.6981956958770752
Validation loss = 0.7013585567474365
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.83e+03 |
| Iteration     | 25        |
| MaximumReturn | -802      |
| MinimumReturn | -2.11e+03 |
| TotalSamples  | 108000    |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6823447346687317
Validation loss = 0.6800430417060852
Validation loss = 0.6839034557342529
Validation loss = 0.6874091029167175
Validation loss = 0.684776782989502
Validation loss = 0.68766188621521
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6827578544616699
Validation loss = 0.6772362589836121
Validation loss = 0.6811534762382507
Validation loss = 0.6835787892341614
Validation loss = 0.6844303607940674
Validation loss = 0.6814319491386414
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6867904663085938
Validation loss = 0.6808745265007019
Validation loss = 0.685638964176178
Validation loss = 0.6861758232116699
Validation loss = 0.6873160600662231
Validation loss = 0.6851593255996704
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6859657168388367
Validation loss = 0.6759873032569885
Validation loss = 0.6817071437835693
Validation loss = 0.6833457350730896
Validation loss = 0.6817225813865662
Validation loss = 0.6827272772789001
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6867969036102295
Validation loss = 0.6838080286979675
Validation loss = 0.6893547177314758
Validation loss = 0.6898312568664551
Validation loss = 0.6926016211509705
Validation loss = 0.6936874985694885
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.61e+03 |
| Iteration     | 26        |
| MaximumReturn | -372      |
| MinimumReturn | -2.36e+03 |
| TotalSamples  | 112000    |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7150791883468628
Validation loss = 0.709489643573761
Validation loss = 0.7181101441383362
Validation loss = 0.7160993218421936
Validation loss = 0.7140384316444397
Validation loss = 0.714042603969574
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7363954186439514
Validation loss = 0.7282260060310364
Validation loss = 0.7312195897102356
Validation loss = 0.7322433590888977
Validation loss = 0.7283440828323364
Validation loss = 0.7419480085372925
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7423783540725708
Validation loss = 0.7428189516067505
Validation loss = 0.7456915974617004
Validation loss = 0.7481271028518677
Validation loss = 0.7433415651321411
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7158904671669006
Validation loss = 0.7098568677902222
Validation loss = 0.7084922790527344
Validation loss = 0.7122055888175964
Validation loss = 0.7165856957435608
Validation loss = 0.714140772819519
Validation loss = 0.7120381593704224
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7159014940261841
Validation loss = 0.7131363749504089
Validation loss = 0.7141224145889282
Validation loss = 0.7119488716125488
Validation loss = 0.7137206196784973
Validation loss = 0.7218558192253113
Validation loss = 0.7179039120674133
Validation loss = 0.7153592705726624
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.58e+03 |
| Iteration     | 27        |
| MaximumReturn | -780      |
| MinimumReturn | -1.95e+03 |
| TotalSamples  | 116000    |
-----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7100914120674133
Validation loss = 0.7127827405929565
Validation loss = 0.7161420583724976
Validation loss = 0.7177202105522156
Validation loss = 0.7175196409225464
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7292370200157166
Validation loss = 0.7333953380584717
Validation loss = 0.7359793186187744
Validation loss = 0.7358877658843994
Validation loss = 0.7279471158981323
Validation loss = 0.7317283153533936
Validation loss = 0.7341887950897217
Validation loss = 0.7341777086257935
Validation loss = 0.7304201722145081
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7357173562049866
Validation loss = 0.7384606599807739
Validation loss = 0.7590320706367493
Validation loss = 0.7433913946151733
Validation loss = 0.7414328455924988
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7223981022834778
Validation loss = 0.7087445855140686
Validation loss = 0.7107048630714417
Validation loss = 0.7171899080276489
Validation loss = 0.7124983072280884
Validation loss = 0.7161419987678528
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7157682180404663
Validation loss = 0.7141547799110413
Validation loss = 0.7123088240623474
Validation loss = 0.7188470959663391
Validation loss = 0.7168498635292053
Validation loss = 0.7175706624984741
Validation loss = 0.7154977917671204
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.61e+03 |
| Iteration     | 28        |
| MaximumReturn | -1.17e+03 |
| MinimumReturn | -2.55e+03 |
| TotalSamples  | 120000    |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6870310306549072
Validation loss = 0.676527738571167
Validation loss = 0.6850212216377258
Validation loss = 0.6891940832138062
Validation loss = 0.6868197321891785
Validation loss = 0.6894367933273315
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6835184693336487
Validation loss = 0.6791555285453796
Validation loss = 0.6836194396018982
Validation loss = 0.6885818839073181
Validation loss = 0.6897928714752197
Validation loss = 0.6903925538063049
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6901229619979858
Validation loss = 0.6843146681785583
Validation loss = 0.6886951327323914
Validation loss = 0.6912250518798828
Validation loss = 0.6925191283226013
Validation loss = 0.6933843493461609
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6865504384040833
Validation loss = 0.6805911660194397
Validation loss = 0.68458491563797
Validation loss = 0.6874942183494568
Validation loss = 0.6877979636192322
Validation loss = 0.688838541507721
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6910534501075745
Validation loss = 0.6896133422851562
Validation loss = 0.6932085156440735
Validation loss = 0.6932966113090515
Validation loss = 0.6937690377235413
Validation loss = 0.696862518787384
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.11e+03 |
| Iteration     | 29        |
| MaximumReturn | -1.47e+03 |
| MinimumReturn | -2.71e+03 |
| TotalSamples  | 124000    |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6795887351036072
Validation loss = 0.6842728853225708
Validation loss = 0.6867451071739197
Validation loss = 0.6867252588272095
Validation loss = 0.681891679763794
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6857471466064453
Validation loss = 0.6849801540374756
Validation loss = 0.6834635734558105
Validation loss = 0.6853203177452087
Validation loss = 0.6834354996681213
Validation loss = 0.6871861815452576
Validation loss = 0.6869239807128906
Validation loss = 0.6857904195785522
Validation loss = 0.6865061521530151
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6916496157646179
Validation loss = 0.6854949593544006
Validation loss = 0.687793493270874
Validation loss = 0.6915841102600098
Validation loss = 0.6917585134506226
Validation loss = 0.6913245916366577
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6901796460151672
Validation loss = 0.6816530823707581
Validation loss = 0.6838629245758057
Validation loss = 0.6844723224639893
Validation loss = 0.686896026134491
Validation loss = 0.6884375810623169
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6913543939590454
Validation loss = 0.6890279054641724
Validation loss = 0.6899500489234924
Validation loss = 0.6929613947868347
Validation loss = 0.693039059638977
Validation loss = 0.6944039463996887
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.62e+03 |
| Iteration     | 30        |
| MaximumReturn | -1.14e+03 |
| MinimumReturn | -2.54e+03 |
| TotalSamples  | 128000    |
-----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.687627911567688
Validation loss = 0.6828467845916748
Validation loss = 0.686939537525177
Validation loss = 0.6924264430999756
Validation loss = 0.6899024844169617
Validation loss = 0.6899174451828003
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6859967708587646
Validation loss = 0.6868873238563538
Validation loss = 0.6879520416259766
Validation loss = 0.6949033737182617
Validation loss = 0.688668966293335
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.687414288520813
Validation loss = 0.6846217513084412
Validation loss = 0.6925888061523438
Validation loss = 0.693170428276062
Validation loss = 0.6927998065948486
Validation loss = 0.6924777030944824
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6953005790710449
Validation loss = 0.6820571422576904
Validation loss = 0.6862098574638367
Validation loss = 0.6876913905143738
Validation loss = 0.6909613609313965
Validation loss = 0.6916184425354004
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6899476051330566
Validation loss = 0.6930161714553833
Validation loss = 0.6972205638885498
Validation loss = 0.6952702403068542
Validation loss = 0.695513129234314
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.76e+03 |
| Iteration     | 31        |
| MaximumReturn | -1.53e+03 |
| MinimumReturn | -2.06e+03 |
| TotalSamples  | 132000    |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6837766170501709
Validation loss = 0.6836733222007751
Validation loss = 0.6919229030609131
Validation loss = 0.6934843063354492
Validation loss = 0.6954302191734314
Validation loss = 0.6936728358268738
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6851489543914795
Validation loss = 0.6852220892906189
Validation loss = 0.6890640258789062
Validation loss = 0.691945493221283
Validation loss = 0.6939516663551331
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.690254271030426
Validation loss = 0.685763418674469
Validation loss = 0.6912715435028076
Validation loss = 0.6935911178588867
Validation loss = 0.6956591010093689
Validation loss = 0.6950005292892456
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6862456202507019
Validation loss = 0.6863791942596436
Validation loss = 0.6888348460197449
Validation loss = 0.6930181980133057
Validation loss = 0.689904510974884
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6930055618286133
Validation loss = 0.6903704404830933
Validation loss = 0.6961978077888489
Validation loss = 0.6970952749252319
Validation loss = 0.7029075026512146
Validation loss = 0.7008528113365173
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.37e+03 |
| Iteration     | 32        |
| MaximumReturn | -302      |
| MinimumReturn | -2.17e+03 |
| TotalSamples  | 136000    |
-----------------------------
