Logging to experiments/hopper/oct31/w350e03_Durl_seed2431
Print configuration .....
{'env_name': 'hopper', 'random_seeds': [1234, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/hopper_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.8349924683570862
Validation loss = 0.2653907537460327
Validation loss = 0.2378508448600769
Validation loss = 0.1986679583787918
Validation loss = 0.19228725135326385
Validation loss = 0.19369706511497498
Validation loss = 0.19821007549762726
Validation loss = 0.19508087635040283
Validation loss = 0.2043289840221405
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7699138522148132
Validation loss = 0.2656407356262207
Validation loss = 0.22799569368362427
Validation loss = 0.19762569665908813
Validation loss = 0.1978190541267395
Validation loss = 0.1907348334789276
Validation loss = 0.21075156331062317
Validation loss = 0.19515147805213928
Validation loss = 0.19601938128471375
Validation loss = 0.20052799582481384
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6928378343582153
Validation loss = 0.2659696936607361
Validation loss = 0.2396318018436432
Validation loss = 0.20326831936836243
Validation loss = 0.1944427192211151
Validation loss = 0.19081948697566986
Validation loss = 0.19565756618976593
Validation loss = 0.1936737596988678
Validation loss = 0.20056802034378052
Validation loss = 0.20295894145965576
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.871453046798706
Validation loss = 0.25893139839172363
Validation loss = 0.2292579561471939
Validation loss = 0.19660943746566772
Validation loss = 0.19421161711215973
Validation loss = 0.19182130694389343
Validation loss = 0.1929129958152771
Validation loss = 0.19930502772331238
Validation loss = 0.19609659910202026
Validation loss = 0.19793203473091125
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6967587471008301
Validation loss = 0.2618391513824463
Validation loss = 0.2369890660047531
Validation loss = 0.20173436403274536
Validation loss = 0.19109001755714417
Validation loss = 0.19855472445487976
Validation loss = 0.19346165657043457
Validation loss = 0.19732843339443207
Validation loss = 0.20545053482055664
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 420
average number of affinization = 60.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 464
average number of affinization = 110.5
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 416
average number of affinization = 144.44444444444446
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 440
average number of affinization = 174.0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 441
average number of affinization = 198.27272727272728
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 416
average number of affinization = 216.41666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.2e+03  |
| Iteration     | 0         |
| MaximumReturn | -1.9e+03  |
| MinimumReturn | -2.29e+03 |
| TotalSamples  | 8000      |
-----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.18760834634304047
Validation loss = 0.1373598575592041
Validation loss = 0.1399882584810257
Validation loss = 0.1367187201976776
Validation loss = 0.13020464777946472
Validation loss = 0.13296011090278625
Validation loss = 0.13423752784729004
Validation loss = 0.1359490007162094
Validation loss = 0.1339639574289322
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17957258224487305
Validation loss = 0.13233555853366852
Validation loss = 0.1314840167760849
Validation loss = 0.128944531083107
Validation loss = 0.12960246205329895
Validation loss = 0.1301700919866562
Validation loss = 0.12994274497032166
Validation loss = 0.1337628960609436
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1719963550567627
Validation loss = 0.14386823773384094
Validation loss = 0.13487614691257477
Validation loss = 0.1314171850681305
Validation loss = 0.13541379570960999
Validation loss = 0.1368681639432907
Validation loss = 0.1298488825559616
Validation loss = 0.13215112686157227
Validation loss = 0.13488852977752686
Validation loss = 0.1311435103416443
Validation loss = 0.1312432885169983
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.18001841008663177
Validation loss = 0.1325482428073883
Validation loss = 0.13265304267406464
Validation loss = 0.13472066819667816
Validation loss = 0.14216485619544983
Validation loss = 0.13523545861244202
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.17685475945472717
Validation loss = 0.1376885026693344
Validation loss = 0.1350586861371994
Validation loss = 0.13228511810302734
Validation loss = 0.13978199660778046
Validation loss = 0.13543075323104858
Validation loss = 0.13717500865459442
Validation loss = 0.13542652130126953
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 605
average number of affinization = 246.30769230769232
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 611
average number of affinization = 272.35714285714283
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 641
average number of affinization = 296.93333333333334
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 621
average number of affinization = 317.1875
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 617
average number of affinization = 334.8235294117647
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 646
average number of affinization = 352.1111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.49e+03 |
| Iteration     | 1         |
| MaximumReturn | -2.45e+03 |
| MinimumReturn | -2.51e+03 |
| TotalSamples  | 12000     |
-----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16417931020259857
Validation loss = 0.14344538748264313
Validation loss = 0.1453884094953537
Validation loss = 0.14152364432811737
Validation loss = 0.13368719816207886
Validation loss = 0.13159362971782684
Validation loss = 0.1370266228914261
Validation loss = 0.13324421644210815
Validation loss = 0.1315375566482544
Validation loss = 0.13314269483089447
Validation loss = 0.13287560641765594
Validation loss = 0.13585412502288818
Validation loss = 0.12881648540496826
Validation loss = 0.1309128999710083
Validation loss = 0.1398955136537552
Validation loss = 0.13257791101932526
Validation loss = 0.13735701143741608
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1552845984697342
Validation loss = 0.13480247557163239
Validation loss = 0.13352012634277344
Validation loss = 0.14066068828105927
Validation loss = 0.12671856582164764
Validation loss = 0.13540326058864594
Validation loss = 0.13350673019886017
Validation loss = 0.1286991536617279
Validation loss = 0.1343277543783188
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1700441688299179
Validation loss = 0.1467314511537552
Validation loss = 0.13896967470645905
Validation loss = 0.13591544330120087
Validation loss = 0.1393754631280899
Validation loss = 0.130278080701828
Validation loss = 0.13192985951900482
Validation loss = 0.14092767238616943
Validation loss = 0.13636307418346405
Validation loss = 0.13038907945156097
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1689687967300415
Validation loss = 0.15427254140377045
Validation loss = 0.13972656428813934
Validation loss = 0.13429316878318787
Validation loss = 0.13859644532203674
Validation loss = 0.1354009211063385
Validation loss = 0.14171917736530304
Validation loss = 0.14733290672302246
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.16578781604766846
Validation loss = 0.14305683970451355
Validation loss = 0.13435222208499908
Validation loss = 0.13324275612831116
Validation loss = 0.13797785341739655
Validation loss = 0.13784484565258026
Validation loss = 0.13695405423641205
Validation loss = 0.13989335298538208
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 454
average number of affinization = 357.4736842105263
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 504
average number of affinization = 364.8
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 437
average number of affinization = 368.23809523809524
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 496
average number of affinization = 374.04545454545456
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 430
average number of affinization = 376.4782608695652
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 511
average number of affinization = 382.0833333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.57e+03 |
| Iteration     | 2         |
| MaximumReturn | -1.34e+03 |
| MinimumReturn | -1.95e+03 |
| TotalSamples  | 16000     |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.24145792424678802
Validation loss = 0.19018538296222687
Validation loss = 0.1863827407360077
Validation loss = 0.1853470504283905
Validation loss = 0.19854459166526794
Validation loss = 0.17682746052742004
Validation loss = 0.1818712055683136
Validation loss = 0.1783638298511505
Validation loss = 0.1851268857717514
Validation loss = 0.18714946508407593
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2400679588317871
Validation loss = 0.18975535035133362
Validation loss = 0.18537944555282593
Validation loss = 0.18116755783557892
Validation loss = 0.17769058048725128
Validation loss = 0.17379583418369293
Validation loss = 0.17354539036750793
Validation loss = 0.16936135292053223
Validation loss = 0.17373782396316528
Validation loss = 0.172215074300766
Validation loss = 0.17748871445655823
Validation loss = 0.17176923155784607
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.23133185505867004
Validation loss = 0.19790175557136536
Validation loss = 0.18841952085494995
Validation loss = 0.1844777762889862
Validation loss = 0.18294598162174225
Validation loss = 0.17380976676940918
Validation loss = 0.17847484350204468
Validation loss = 0.1835533082485199
Validation loss = 0.17329376935958862
Validation loss = 0.17804953455924988
Validation loss = 0.18557336926460266
Validation loss = 0.17726776003837585
Validation loss = 0.18767966330051422
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.22862137854099274
Validation loss = 0.1970398873090744
Validation loss = 0.18718357384204865
Validation loss = 0.18397486209869385
Validation loss = 0.1765182614326477
Validation loss = 0.18492653965950012
Validation loss = 0.17794737219810486
Validation loss = 0.1770646870136261
Validation loss = 0.18019193410873413
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2323182225227356
Validation loss = 0.20577913522720337
Validation loss = 0.18481513857841492
Validation loss = 0.1871614158153534
Validation loss = 0.19000019133090973
Validation loss = 0.17757916450500488
Validation loss = 0.1904580295085907
Validation loss = 0.18216833472251892
Validation loss = 0.18377405405044556
Validation loss = 0.18403394520282745
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 436
average number of affinization = 384.24
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 472
average number of affinization = 387.61538461538464
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 376
average number of affinization = 387.18518518518516
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 407
average number of affinization = 387.89285714285717
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 441
average number of affinization = 389.7241379310345
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 443
average number of affinization = 391.5
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.51e+03 |
| Iteration     | 3         |
| MaximumReturn | -1.05e+03 |
| MinimumReturn | -2.15e+03 |
| TotalSamples  | 20000     |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.19452083110809326
Validation loss = 0.16332033276557922
Validation loss = 0.160440593957901
Validation loss = 0.15307359397411346
Validation loss = 0.1512104868888855
Validation loss = 0.15344390273094177
Validation loss = 0.14877663552761078
Validation loss = 0.14818719029426575
Validation loss = 0.14886096119880676
Validation loss = 0.145187109708786
Validation loss = 0.1432954967021942
Validation loss = 0.15560035407543182
Validation loss = 0.15273192524909973
Validation loss = 0.14817050099372864
Validation loss = 0.14490309357643127
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.19580607116222382
Validation loss = 0.1655823290348053
Validation loss = 0.15471121668815613
Validation loss = 0.15407441556453705
Validation loss = 0.15069523453712463
Validation loss = 0.14836525917053223
Validation loss = 0.14575107395648956
Validation loss = 0.14900270104408264
Validation loss = 0.14325761795043945
Validation loss = 0.1444266140460968
Validation loss = 0.14596132934093475
Validation loss = 0.1463157832622528
Validation loss = 0.14337441325187683
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.19682765007019043
Validation loss = 0.15930013358592987
Validation loss = 0.1519383192062378
Validation loss = 0.1488339751958847
Validation loss = 0.1524837613105774
Validation loss = 0.14700786769390106
Validation loss = 0.1459573656320572
Validation loss = 0.1503797471523285
Validation loss = 0.14779363572597504
Validation loss = 0.14186808466911316
Validation loss = 0.1524442881345749
Validation loss = 0.14476457238197327
Validation loss = 0.14653131365776062
Validation loss = 0.14673137664794922
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.19480451941490173
Validation loss = 0.16505703330039978
Validation loss = 0.1526918113231659
Validation loss = 0.15630558133125305
Validation loss = 0.15416833758354187
Validation loss = 0.14465877413749695
Validation loss = 0.15100842714309692
Validation loss = 0.15087224543094635
Validation loss = 0.14840582013130188
Validation loss = 0.14747923612594604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1771363466978073
Validation loss = 0.1642157882452011
Validation loss = 0.151937797665596
Validation loss = 0.15724484622478485
Validation loss = 0.14909668266773224
Validation loss = 0.15486791729927063
Validation loss = 0.14865443110466003
Validation loss = 0.15323615074157715
Validation loss = 0.15406832098960876
Validation loss = 0.14451605081558228
Validation loss = 0.14380775392055511
Validation loss = 0.14709602296352386
Validation loss = 0.14941346645355225
Validation loss = 0.14428472518920898
Validation loss = 0.1461138278245926
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 421
average number of affinization = 392.4516129032258
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 426
average number of affinization = 393.5
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 482
average number of affinization = 396.1818181818182
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 444
average number of affinization = 397.5882352941176
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 473
average number of affinization = 399.74285714285713
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 431
average number of affinization = 400.6111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.52e+03 |
| Iteration     | 4         |
| MaximumReturn | -1.17e+03 |
| MinimumReturn | -1.74e+03 |
| TotalSamples  | 24000     |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16919200122356415
Validation loss = 0.15234960615634918
Validation loss = 0.14188043773174286
Validation loss = 0.1368289738893509
Validation loss = 0.1398487091064453
Validation loss = 0.13497640192508698
Validation loss = 0.13452258706092834
Validation loss = 0.13513609766960144
Validation loss = 0.13407476246356964
Validation loss = 0.1345423310995102
Validation loss = 0.13406942784786224
Validation loss = 0.13088013231754303
Validation loss = 0.1342579573392868
Validation loss = 0.13624468445777893
Validation loss = 0.13647250831127167
Validation loss = 0.12990915775299072
Validation loss = 0.13399077951908112
Validation loss = 0.13904090225696564
Validation loss = 0.12544657289981842
Validation loss = 0.12813647091388702
Validation loss = 0.1259010136127472
Validation loss = 0.12695033848285675
Validation loss = 0.12751486897468567
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17651991546154022
Validation loss = 0.152793288230896
Validation loss = 0.13457909226417542
Validation loss = 0.13176226615905762
Validation loss = 0.1344519406557083
Validation loss = 0.13259203732013702
Validation loss = 0.13207051157951355
Validation loss = 0.13035617768764496
Validation loss = 0.13123202323913574
Validation loss = 0.13835780322551727
Validation loss = 0.1325191706418991
Validation loss = 0.13257957994937897
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16423283517360687
Validation loss = 0.14504121243953705
Validation loss = 0.13476480543613434
Validation loss = 0.13585488498210907
Validation loss = 0.13703182339668274
Validation loss = 0.1289203017950058
Validation loss = 0.13610944151878357
Validation loss = 0.13509368896484375
Validation loss = 0.13063101470470428
Validation loss = 0.12945257127285004
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17732985317707062
Validation loss = 0.15295900404453278
Validation loss = 0.14237354695796967
Validation loss = 0.14302417635917664
Validation loss = 0.13464747369289398
Validation loss = 0.13693472743034363
Validation loss = 0.13625948131084442
Validation loss = 0.1365760713815689
Validation loss = 0.13683800399303436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1595107465982437
Validation loss = 0.15162724256515503
Validation loss = 0.13966399431228638
Validation loss = 0.13360260426998138
Validation loss = 0.12946529686450958
Validation loss = 0.12954597175121307
Validation loss = 0.1261632740497589
Validation loss = 0.12789858877658844
Validation loss = 0.1294374018907547
Validation loss = 0.13337759673595428
Validation loss = 0.12399274110794067
Validation loss = 0.12401046603918076
Validation loss = 0.12313556671142578
Validation loss = 0.12334100157022476
Validation loss = 0.13125358521938324
Validation loss = 0.12992282211780548
Validation loss = 0.12327572703361511
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 519
average number of affinization = 403.81081081081084
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 552
average number of affinization = 407.7105263157895
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 546
average number of affinization = 411.2564102564103
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 557
average number of affinization = 414.9
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 448
average number of affinization = 415.7073170731707
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 543
average number of affinization = 418.73809523809524
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.71e+03 |
| Iteration     | 5         |
| MaximumReturn | -871      |
| MinimumReturn | -2.22e+03 |
| TotalSamples  | 28000     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15817232429981232
Validation loss = 0.13615921139717102
Validation loss = 0.13479216396808624
Validation loss = 0.1358165442943573
Validation loss = 0.12836529314517975
Validation loss = 0.13244177401065826
Validation loss = 0.13132663071155548
Validation loss = 0.1276058852672577
Validation loss = 0.12811698019504547
Validation loss = 0.1369716078042984
Validation loss = 0.125602126121521
Validation loss = 0.12730227410793304
Validation loss = 0.13148069381713867
Validation loss = 0.1357845515012741
Validation loss = 0.13188540935516357
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1580437570810318
Validation loss = 0.13497622311115265
Validation loss = 0.1313919872045517
Validation loss = 0.12922164797782898
Validation loss = 0.12980936467647552
Validation loss = 0.13167908787727356
Validation loss = 0.12657536566257477
Validation loss = 0.13190488517284393
Validation loss = 0.12754584848880768
Validation loss = 0.13940951228141785
Validation loss = 0.1330982893705368
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14999575912952423
Validation loss = 0.1388891637325287
Validation loss = 0.13711924850940704
Validation loss = 0.13065513968467712
Validation loss = 0.12866857647895813
Validation loss = 0.13216839730739594
Validation loss = 0.12876929342746735
Validation loss = 0.1313728541135788
Validation loss = 0.13259153068065643
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1494569480419159
Validation loss = 0.14709170162677765
Validation loss = 0.1426398754119873
Validation loss = 0.13605186343193054
Validation loss = 0.13376159965991974
Validation loss = 0.13487812876701355
Validation loss = 0.13566352427005768
Validation loss = 0.13789606094360352
Validation loss = 0.14460334181785583
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15466371178627014
Validation loss = 0.13619177043437958
Validation loss = 0.13045810163021088
Validation loss = 0.1270129233598709
Validation loss = 0.12283851951360703
Validation loss = 0.12300432473421097
Validation loss = 0.12620607018470764
Validation loss = 0.13993771374225616
Validation loss = 0.12500374019145966
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 418
average number of affinization = 418.72093023255815
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 418
average number of affinization = 418.70454545454544
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 422
average number of affinization = 418.77777777777777
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 417
average number of affinization = 418.7391304347826
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 412
average number of affinization = 418.59574468085106
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 449
average number of affinization = 419.2291666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -578      |
| Iteration     | 6         |
| MaximumReturn | -220      |
| MinimumReturn | -1.12e+03 |
| TotalSamples  | 32000     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14511238038539886
Validation loss = 0.12290137261152267
Validation loss = 0.11990679800510406
Validation loss = 0.11920642107725143
Validation loss = 0.12017261981964111
Validation loss = 0.12285298109054565
Validation loss = 0.12353122234344482
Validation loss = 0.12001782655715942
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1378767192363739
Validation loss = 0.13610966503620148
Validation loss = 0.12646785378456116
Validation loss = 0.12409369647502899
Validation loss = 0.12228633463382721
Validation loss = 0.1354321837425232
Validation loss = 0.12179604917764664
Validation loss = 0.11944065988063812
Validation loss = 0.12065331637859344
Validation loss = 0.12590736150741577
Validation loss = 0.1257551610469818
Validation loss = 0.12033089250326157
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14616012573242188
Validation loss = 0.13082006573677063
Validation loss = 0.12446936964988708
Validation loss = 0.12374734878540039
Validation loss = 0.12312815338373184
Validation loss = 0.12339702248573303
Validation loss = 0.12405649572610855
Validation loss = 0.12795311212539673
Validation loss = 0.12129004299640656
Validation loss = 0.12141790986061096
Validation loss = 0.12294743955135345
Validation loss = 0.12387661635875702
Validation loss = 0.11744457483291626
Validation loss = 0.11784979701042175
Validation loss = 0.12350983917713165
Validation loss = 0.12059527635574341
Validation loss = 0.11587555706501007
Validation loss = 0.11747954040765762
Validation loss = 0.1397639662027359
Validation loss = 0.12057606875896454
Validation loss = 0.11529356241226196
Validation loss = 0.11587800830602646
Validation loss = 0.11249376833438873
Validation loss = 0.11779434978961945
Validation loss = 0.12067070603370667
Validation loss = 0.13800525665283203
Validation loss = 0.11268436163663864
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.14520414173603058
Validation loss = 0.1263485848903656
Validation loss = 0.12910982966423035
Validation loss = 0.12449976801872253
Validation loss = 0.12888658046722412
Validation loss = 0.1285877823829651
Validation loss = 0.12499874085187912
Validation loss = 0.12343177944421768
Validation loss = 0.12633097171783447
Validation loss = 0.13676825165748596
Validation loss = 0.1346243917942047
Validation loss = 0.12486357986927032
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13559752702713013
Validation loss = 0.12648609280586243
Validation loss = 0.11724647879600525
Validation loss = 0.11692673712968826
Validation loss = 0.11555951833724976
Validation loss = 0.11711631715297699
Validation loss = 0.11832345277070999
Validation loss = 0.12000104784965515
Validation loss = 0.11806994676589966
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 400
average number of affinization = 418.83673469387753
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 388
average number of affinization = 418.22
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 432
average number of affinization = 418.4901960784314
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 354
average number of affinization = 417.25
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 388
average number of affinization = 416.6981132075472
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 400
average number of affinization = 416.3888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -443      |
| Iteration     | 7         |
| MaximumReturn | 198       |
| MinimumReturn | -1.26e+03 |
| TotalSamples  | 36000     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1468570977449417
Validation loss = 0.13300973176956177
Validation loss = 0.12390550225973129
Validation loss = 0.11721505224704742
Validation loss = 0.11871986836194992
Validation loss = 0.11643021553754807
Validation loss = 0.11786205321550369
Validation loss = 0.12577128410339355
Validation loss = 0.12626680731773376
Validation loss = 0.11912189424037933
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1356383115053177
Validation loss = 0.12811428308486938
Validation loss = 0.12271147221326828
Validation loss = 0.12358590960502625
Validation loss = 0.1187242791056633
Validation loss = 0.12335094809532166
Validation loss = 0.12390854954719543
Validation loss = 0.13680177927017212
Validation loss = 0.11733266711235046
Validation loss = 0.11520648002624512
Validation loss = 0.11690037697553635
Validation loss = 0.11797811090946198
Validation loss = 0.11939921975135803
Validation loss = 0.12116643041372299
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12514649331569672
Validation loss = 0.11647757887840271
Validation loss = 0.11525462567806244
Validation loss = 0.11392053961753845
Validation loss = 0.11741798371076584
Validation loss = 0.1144300028681755
Validation loss = 0.11697615683078766
Validation loss = 0.11642476171255112
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.14621557295322418
Validation loss = 0.13289208710193634
Validation loss = 0.12622907757759094
Validation loss = 0.12351003289222717
Validation loss = 0.1239309310913086
Validation loss = 0.1279824823141098
Validation loss = 0.13473297655582428
Validation loss = 0.12065645307302475
Validation loss = 0.12436599284410477
Validation loss = 0.12365077435970306
Validation loss = 0.1313174068927765
Validation loss = 0.13137808442115784
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14226621389389038
Validation loss = 0.13179577887058258
Validation loss = 0.11898806691169739
Validation loss = 0.12034349888563156
Validation loss = 0.11766090989112854
Validation loss = 0.11668668687343597
Validation loss = 0.12023190408945084
Validation loss = 0.12344631552696228
Validation loss = 0.12182995676994324
Validation loss = 0.11611531674861908
Validation loss = 0.11708708852529526
Validation loss = 0.11570920050144196
Validation loss = 0.11884261667728424
Validation loss = 0.11691483855247498
Validation loss = 0.1267421543598175
Validation loss = 0.11632299423217773
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 396
average number of affinization = 416.0181818181818
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 370
average number of affinization = 415.19642857142856
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 366
average number of affinization = 414.3333333333333
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 377
average number of affinization = 413.6896551724138
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 396
average number of affinization = 413.3898305084746
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 367
average number of affinization = 412.6166666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -580     |
| Iteration     | 8        |
| MaximumReturn | 264      |
| MinimumReturn | -1.3e+03 |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1223255842924118
Validation loss = 0.11741703748703003
Validation loss = 0.11357202380895615
Validation loss = 0.11126531660556793
Validation loss = 0.117978535592556
Validation loss = 0.1190243735909462
Validation loss = 0.11092283576726913
Validation loss = 0.10712562501430511
Validation loss = 0.10997549444437027
Validation loss = 0.10798308998346329
Validation loss = 0.11190829426050186
Validation loss = 0.11767026036977768
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.133035808801651
Validation loss = 0.11873183399438858
Validation loss = 0.1116117388010025
Validation loss = 0.11254096031188965
Validation loss = 0.11043435335159302
Validation loss = 0.11060458421707153
Validation loss = 0.11593396961688995
Validation loss = 0.11750897020101547
Validation loss = 0.10910630226135254
Validation loss = 0.11090048402547836
Validation loss = 0.10877826064825058
Validation loss = 0.10931146144866943
Validation loss = 0.10970717668533325
Validation loss = 0.11045973002910614
Validation loss = 0.11494524776935577
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11889185011386871
Validation loss = 0.11127235740423203
Validation loss = 0.11405064165592194
Validation loss = 0.10896603018045425
Validation loss = 0.11036056280136108
Validation loss = 0.10739395767450333
Validation loss = 0.11479027569293976
Validation loss = 0.12353750318288803
Validation loss = 0.11020338535308838
Validation loss = 0.10471022129058838
Validation loss = 0.10812325775623322
Validation loss = 0.10461945831775665
Validation loss = 0.10721094906330109
Validation loss = 0.1071896180510521
Validation loss = 0.11924667656421661
Validation loss = 0.1055506244301796
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.13486550748348236
Validation loss = 0.12109885364770889
Validation loss = 0.11811816692352295
Validation loss = 0.11710144579410553
Validation loss = 0.1174250990152359
Validation loss = 0.11544211953878403
Validation loss = 0.11344343423843384
Validation loss = 0.11945321410894394
Validation loss = 0.12053970992565155
Validation loss = 0.11275164037942886
Validation loss = 0.11403889954090118
Validation loss = 0.11335001140832901
Validation loss = 0.11736080795526505
Validation loss = 0.11427734792232513
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11812227964401245
Validation loss = 0.11198687553405762
Validation loss = 0.11155389249324799
Validation loss = 0.11578235775232315
Validation loss = 0.11254856735467911
Validation loss = 0.1109917014837265
Validation loss = 0.11675573885440826
Validation loss = 0.11914154142141342
Validation loss = 0.1082831397652626
Validation loss = 0.10741370916366577
Validation loss = 0.10755707323551178
Validation loss = 0.11080817133188248
Validation loss = 0.10696538537740707
Validation loss = 0.10992149263620377
Validation loss = 0.11716486513614655
Validation loss = 0.11121239513158798
Validation loss = 0.1031746044754982
Validation loss = 0.10267578065395355
Validation loss = 0.10719694942235947
Validation loss = 0.10855458676815033
Validation loss = 0.10977419465780258
Validation loss = 0.1076754480600357
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 396
average number of affinization = 412.344262295082
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 400
average number of affinization = 412.14516129032256
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 414
average number of affinization = 412.1746031746032
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 384
average number of affinization = 411.734375
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 395
average number of affinization = 411.4769230769231
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 415
average number of affinization = 411.530303030303
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 228      |
| Iteration     | 9        |
| MaximumReturn | 704      |
| MinimumReturn | -172     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12269356101751328
Validation loss = 0.11259359866380692
Validation loss = 0.10260701179504395
Validation loss = 0.1008698046207428
Validation loss = 0.09992408752441406
Validation loss = 0.10182664543390274
Validation loss = 0.09955095499753952
Validation loss = 0.10043317824602127
Validation loss = 0.10607218742370605
Validation loss = 0.09804856777191162
Validation loss = 0.09829313308000565
Validation loss = 0.09891416877508163
Validation loss = 0.09895941615104675
Validation loss = 0.11170043796300888
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12430992722511292
Validation loss = 0.11082518845796585
Validation loss = 0.1034349873661995
Validation loss = 0.09911175072193146
Validation loss = 0.10555942356586456
Validation loss = 0.10271637886762619
Validation loss = 0.09858283400535583
Validation loss = 0.0991622731089592
Validation loss = 0.10228756070137024
Validation loss = 0.10917746275663376
Validation loss = 0.0997011661529541
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11238735169172287
Validation loss = 0.10260851681232452
Validation loss = 0.10053615272045135
Validation loss = 0.09887497872114182
Validation loss = 0.0986102744936943
Validation loss = 0.09767811000347137
Validation loss = 0.0983891561627388
Validation loss = 0.1044287160038948
Validation loss = 0.09577793627977371
Validation loss = 0.09462794661521912
Validation loss = 0.09884436428546906
Validation loss = 0.10866548866033554
Validation loss = 0.09776794910430908
Validation loss = 0.09586190432310104
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12122371047735214
Validation loss = 0.11195997893810272
Validation loss = 0.10946200042963028
Validation loss = 0.1035444438457489
Validation loss = 0.10392241179943085
Validation loss = 0.10857279598712921
Validation loss = 0.11290360242128372
Validation loss = 0.10247878730297089
Validation loss = 0.1002800241112709
Validation loss = 0.09956044703722
Validation loss = 0.10268256813287735
Validation loss = 0.11144597083330154
Validation loss = 0.1062241867184639
Validation loss = 0.10000736266374588
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11459207534790039
Validation loss = 0.10416055470705032
Validation loss = 0.10382511466741562
Validation loss = 0.09920790046453476
Validation loss = 0.09738736599683762
Validation loss = 0.09766954928636551
Validation loss = 0.09905264526605606
Validation loss = 0.09850407391786575
Validation loss = 0.09988324344158173
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 427
average number of affinization = 411.76119402985074
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 412
average number of affinization = 411.7647058823529
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 390
average number of affinization = 411.4492753623188
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 400
average number of affinization = 411.2857142857143
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 443
average number of affinization = 411.7323943661972
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 398
average number of affinization = 411.5416666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -83.4     |
| Iteration     | 10        |
| MaximumReturn | 1.74e+03  |
| MinimumReturn | -1.25e+03 |
| TotalSamples  | 48000     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12042129039764404
Validation loss = 0.09415175765752792
Validation loss = 0.09195414185523987
Validation loss = 0.08738064765930176
Validation loss = 0.0896272137761116
Validation loss = 0.08990100026130676
Validation loss = 0.08801912516355515
Validation loss = 0.09014206379652023
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1130458191037178
Validation loss = 0.09767308086156845
Validation loss = 0.09124860912561417
Validation loss = 0.09222070127725601
Validation loss = 0.09192707389593124
Validation loss = 0.09369125217199326
Validation loss = 0.0883973017334938
Validation loss = 0.09775611013174057
Validation loss = 0.09037992358207703
Validation loss = 0.09057927131652832
Validation loss = 0.0868472158908844
Validation loss = 0.08766397088766098
Validation loss = 0.08945444971323013
Validation loss = 0.08585455268621445
Validation loss = 0.0970216616988182
Validation loss = 0.08650489896535873
Validation loss = 0.08674819022417068
Validation loss = 0.08513367176055908
Validation loss = 0.0922083929181099
Validation loss = 0.08994847536087036
Validation loss = 0.08263739198446274
Validation loss = 0.08233500272035599
Validation loss = 0.08347424119710922
Validation loss = 0.09781187027692795
Validation loss = 0.08290141820907593
Validation loss = 0.08158009499311447
Validation loss = 0.08234434574842453
Validation loss = 0.08675482124090195
Validation loss = 0.08901294320821762
Validation loss = 0.08604074269533157
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10391626507043839
Validation loss = 0.09331396967172623
Validation loss = 0.08913349360227585
Validation loss = 0.08731120079755783
Validation loss = 0.08584245294332504
Validation loss = 0.0867011621594429
Validation loss = 0.10585471242666245
Validation loss = 0.08538591116666794
Validation loss = 0.08518513292074203
Validation loss = 0.08298342674970627
Validation loss = 0.09229648858308792
Validation loss = 0.08253264427185059
Validation loss = 0.09118896722793579
Validation loss = 0.09284603595733643
Validation loss = 0.09780222177505493
Validation loss = 0.08337648957967758
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12090752273797989
Validation loss = 0.09780015796422958
Validation loss = 0.09304683655500412
Validation loss = 0.0923401340842247
Validation loss = 0.09175682067871094
Validation loss = 0.08944257348775864
Validation loss = 0.10255765914916992
Validation loss = 0.08952047675848007
Validation loss = 0.0875951424241066
Validation loss = 0.08800148963928223
Validation loss = 0.09446505457162857
Validation loss = 0.08936303108930588
Validation loss = 0.08737435936927795
Validation loss = 0.09107059985399246
Validation loss = 0.10167369991540909
Validation loss = 0.10553453117609024
Validation loss = 0.08711627870798111
Validation loss = 0.08655547350645065
Validation loss = 0.09141593426465988
Validation loss = 0.0878080502152443
Validation loss = 0.09123728424310684
Validation loss = 0.08356352895498276
Validation loss = 0.08324184268712997
Validation loss = 0.0844789668917656
Validation loss = 0.10181299597024918
Validation loss = 0.0839901939034462
Validation loss = 0.08239036053419113
Validation loss = 0.08443509787321091
Validation loss = 0.09268134087324142
Validation loss = 0.0866939127445221
Validation loss = 0.08136368542909622
Validation loss = 0.08122465759515762
Validation loss = 0.09186216443777084
Validation loss = 0.08693424612283707
Validation loss = 0.08101997524499893
Validation loss = 0.08142900466918945
Validation loss = 0.08588900417089462
Validation loss = 0.09559619426727295
Validation loss = 0.08078699558973312
Validation loss = 0.07904043793678284
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12370238453149796
Validation loss = 0.09538290649652481
Validation loss = 0.08804438263177872
Validation loss = 0.08910258859395981
Validation loss = 0.08792045712471008
Validation loss = 0.08904639631509781
Validation loss = 0.09048229455947876
Validation loss = 0.0902690514922142
Validation loss = 0.08708971738815308
Validation loss = 0.08494438976049423
Validation loss = 0.08263329416513443
Validation loss = 0.09464645385742188
Validation loss = 0.0939922109246254
Validation loss = 0.08405429124832153
Validation loss = 0.08219344168901443
Validation loss = 0.08305705338716507
Validation loss = 0.08716406673192978
Validation loss = 0.08852972835302353
Validation loss = 0.08211704343557358
Validation loss = 0.0803530290722847
Validation loss = 0.08265633136034012
Validation loss = 0.08337248116731644
Validation loss = 0.10300622135400772
Validation loss = 0.08083654195070267
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 407
average number of affinization = 411.47945205479454
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 467
average number of affinization = 412.22972972972974
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 432
average number of affinization = 412.49333333333334
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 434
average number of affinization = 412.7763157894737
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 354
average number of affinization = 412.012987012987
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 447
average number of affinization = 412.46153846153845
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -145      |
| Iteration     | 11        |
| MaximumReturn | 555       |
| MinimumReturn | -1.03e+03 |
| TotalSamples  | 52000     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10260089486837387
Validation loss = 0.08839336782693863
Validation loss = 0.08653794974088669
Validation loss = 0.08233372122049332
Validation loss = 0.08757240325212479
Validation loss = 0.09107610583305359
Validation loss = 0.08199413865804672
Validation loss = 0.08428724855184555
Validation loss = 0.08581255376338959
Validation loss = 0.08815737068653107
Validation loss = 0.08415000140666962
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0956624448299408
Validation loss = 0.08362556993961334
Validation loss = 0.07970045506954193
Validation loss = 0.07959389686584473
Validation loss = 0.08035950362682343
Validation loss = 0.08002186566591263
Validation loss = 0.08803857117891312
Validation loss = 0.07737354189157486
Validation loss = 0.07652126997709274
Validation loss = 0.07942680269479752
Validation loss = 0.08047286421060562
Validation loss = 0.08359791338443756
Validation loss = 0.07915784418582916
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09098506718873978
Validation loss = 0.08367379009723663
Validation loss = 0.08033430576324463
Validation loss = 0.08020507544279099
Validation loss = 0.08035902678966522
Validation loss = 0.08271437138319016
Validation loss = 0.08154202252626419
Validation loss = 0.07652530074119568
Validation loss = 0.08209721744060516
Validation loss = 0.07840327173471451
Validation loss = 0.07743877917528152
Validation loss = 0.07783503830432892
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09335554391145706
Validation loss = 0.08542203158140182
Validation loss = 0.07879011332988739
Validation loss = 0.07758547365665436
Validation loss = 0.07842910289764404
Validation loss = 0.08678733557462692
Validation loss = 0.08074916899204254
Validation loss = 0.07772672176361084
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09504028409719467
Validation loss = 0.08055524528026581
Validation loss = 0.07954958081245422
Validation loss = 0.08142415434122086
Validation loss = 0.0790444165468216
Validation loss = 0.0809418335556984
Validation loss = 0.08355262875556946
Validation loss = 0.08576623350381851
Validation loss = 0.07784419506788254
Validation loss = 0.0759463906288147
Validation loss = 0.0769057348370552
Validation loss = 0.09933771193027496
Validation loss = 0.07612244039773941
Validation loss = 0.07596830278635025
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 455
average number of affinization = 413.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 487
average number of affinization = 413.925
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 451
average number of affinization = 414.38271604938274
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 454
average number of affinization = 414.8658536585366
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 449
average number of affinization = 415.27710843373495
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 428
average number of affinization = 415.42857142857144
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 247      |
| Iteration     | 12       |
| MaximumReturn | 485      |
| MinimumReturn | -31.8    |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08123428374528885
Validation loss = 0.0792054757475853
Validation loss = 0.07635412365198135
Validation loss = 0.075454942882061
Validation loss = 0.09286689013242722
Validation loss = 0.07384002953767776
Validation loss = 0.07324191182851791
Validation loss = 0.07524727284908295
Validation loss = 0.07621223479509354
Validation loss = 0.07939749956130981
Validation loss = 0.07530689239501953
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08921856433153152
Validation loss = 0.07534509152173996
Validation loss = 0.07294275611639023
Validation loss = 0.07250439375638962
Validation loss = 0.07399154454469681
Validation loss = 0.08749473839998245
Validation loss = 0.07449883222579956
Validation loss = 0.07057637721300125
Validation loss = 0.07561327517032623
Validation loss = 0.07374817878007889
Validation loss = 0.07029354572296143
Validation loss = 0.08155113458633423
Validation loss = 0.06957785785198212
Validation loss = 0.06911375373601913
Validation loss = 0.07532946765422821
Validation loss = 0.07794473320245743
Validation loss = 0.06869076937437057
Validation loss = 0.06805109232664108
Validation loss = 0.07083849608898163
Validation loss = 0.07291090488433838
Validation loss = 0.07105030119419098
Validation loss = 0.06864284723997116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09822835773229599
Validation loss = 0.0752490684390068
Validation loss = 0.07277942448854446
Validation loss = 0.07123226672410965
Validation loss = 0.07254008948802948
Validation loss = 0.07698065787553787
Validation loss = 0.07136170566082001
Validation loss = 0.07306043058633804
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09068350493907928
Validation loss = 0.07590963691473007
Validation loss = 0.07350526750087738
Validation loss = 0.07264218479394913
Validation loss = 0.0718991607427597
Validation loss = 0.07579230517148972
Validation loss = 0.08419211953878403
Validation loss = 0.07260824739933014
Validation loss = 0.07038035988807678
Validation loss = 0.0709063932299614
Validation loss = 0.07656984776258469
Validation loss = 0.07461831718683243
Validation loss = 0.06797514110803604
Validation loss = 0.06964200735092163
Validation loss = 0.07906468212604523
Validation loss = 0.07156796008348465
Validation loss = 0.0683060958981514
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08138936012983322
Validation loss = 0.07654415816068649
Validation loss = 0.07322676479816437
Validation loss = 0.07118195295333862
Validation loss = 0.07321853935718536
Validation loss = 0.07458104193210602
Validation loss = 0.07134011387825012
Validation loss = 0.06967390328645706
Validation loss = 0.09225157648324966
Validation loss = 0.07035940140485764
Validation loss = 0.06761043518781662
Validation loss = 0.06787198036909103
Validation loss = 0.0690964087843895
Validation loss = 0.07202701270580292
Validation loss = 0.07065902650356293
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 429
average number of affinization = 415.5882352941176
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 436
average number of affinization = 415.8255813953488
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 452
average number of affinization = 416.2413793103448
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 404
average number of affinization = 416.10227272727275
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 413
average number of affinization = 416.0674157303371
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 350
average number of affinization = 415.3333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -134      |
| Iteration     | 13        |
| MaximumReturn | 1.33e+03  |
| MinimumReturn | -1.62e+03 |
| TotalSamples  | 60000     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09071175009012222
Validation loss = 0.07569854706525803
Validation loss = 0.0737379789352417
Validation loss = 0.07177874445915222
Validation loss = 0.07609102874994278
Validation loss = 0.0767936259508133
Validation loss = 0.07568289339542389
Validation loss = 0.0715656504034996
Validation loss = 0.07404438406229019
Validation loss = 0.07390134781599045
Validation loss = 0.06929934769868851
Validation loss = 0.06905779987573624
Validation loss = 0.07749143242835999
Validation loss = 0.07394742965698242
Validation loss = 0.06924013048410416
Validation loss = 0.0702916607260704
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08660884201526642
Validation loss = 0.07168661803007126
Validation loss = 0.06757398694753647
Validation loss = 0.07531914860010147
Validation loss = 0.06878182291984558
Validation loss = 0.07053378224372864
Validation loss = 0.07226891815662384
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08305487781763077
Validation loss = 0.07332798838615417
Validation loss = 0.06912172585725784
Validation loss = 0.06812404096126556
Validation loss = 0.07428152859210968
Validation loss = 0.0739058181643486
Validation loss = 0.07123135775327682
Validation loss = 0.06770195811986923
Validation loss = 0.07122419029474258
Validation loss = 0.07133163511753082
Validation loss = 0.07236264646053314
Validation loss = 0.06997990608215332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07563262432813644
Validation loss = 0.07585486024618149
Validation loss = 0.07285884022712708
Validation loss = 0.07026272267103195
Validation loss = 0.07092656940221786
Validation loss = 0.07040659338235855
Validation loss = 0.07880852371454239
Validation loss = 0.06994021683931351
Validation loss = 0.06711266934871674
Validation loss = 0.06998751312494278
Validation loss = 0.07773058116436005
Validation loss = 0.06806785613298416
Validation loss = 0.06636340171098709
Validation loss = 0.07035589218139648
Validation loss = 0.07018058001995087
Validation loss = 0.06867407262325287
Validation loss = 0.06612423062324524
Validation loss = 0.07024762779474258
Validation loss = 0.07104375213384628
Validation loss = 0.0672697052359581
Validation loss = 0.0666315034031868
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08473669737577438
Validation loss = 0.06841627508401871
Validation loss = 0.06956696510314941
Validation loss = 0.07394956797361374
Validation loss = 0.06867118924856186
Validation loss = 0.06913527101278305
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 437
average number of affinization = 415.57142857142856
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 420
average number of affinization = 415.6195652173913
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 466
average number of affinization = 416.16129032258067
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 415
average number of affinization = 416.1489361702128
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 466
average number of affinization = 416.6736842105263
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 436
average number of affinization = 416.875
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 292      |
| Iteration     | 14       |
| MaximumReturn | 726      |
| MinimumReturn | -359     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08238986134529114
Validation loss = 0.07412054389715195
Validation loss = 0.06708794832229614
Validation loss = 0.06895637512207031
Validation loss = 0.07237664610147476
Validation loss = 0.07121974974870682
Validation loss = 0.06657658517360687
Validation loss = 0.06907651573419571
Validation loss = 0.07493579387664795
Validation loss = 0.0715147852897644
Validation loss = 0.0681624487042427
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08030759543180466
Validation loss = 0.06979963183403015
Validation loss = 0.06529508531093597
Validation loss = 0.06781384348869324
Validation loss = 0.07267145067453384
Validation loss = 0.07235148549079895
Validation loss = 0.0718325674533844
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0756654217839241
Validation loss = 0.07187309861183167
Validation loss = 0.07044535130262375
Validation loss = 0.07268791645765305
Validation loss = 0.06712020933628082
Validation loss = 0.06679503619670868
Validation loss = 0.07433074712753296
Validation loss = 0.06530258059501648
Validation loss = 0.06439945101737976
Validation loss = 0.06907685101032257
Validation loss = 0.07702868431806564
Validation loss = 0.06552603840827942
Validation loss = 0.0642671138048172
Validation loss = 0.06407089531421661
Validation loss = 0.06759127974510193
Validation loss = 0.07280529290437698
Validation loss = 0.0638381615281105
Validation loss = 0.06371431052684784
Validation loss = 0.07420796900987625
Validation loss = 0.06382773071527481
Validation loss = 0.06145083159208298
Validation loss = 0.06454238295555115
Validation loss = 0.06617505848407745
Validation loss = 0.0654132217168808
Validation loss = 0.062226057052612305
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09555734694004059
Validation loss = 0.07555605471134186
Validation loss = 0.06588267534971237
Validation loss = 0.06581991165876389
Validation loss = 0.06881968677043915
Validation loss = 0.06983522325754166
Validation loss = 0.06803973764181137
Validation loss = 0.06559944152832031
Validation loss = 0.06509891152381897
Validation loss = 0.06689067929983139
Validation loss = 0.06669600307941437
Validation loss = 0.06914231181144714
Validation loss = 0.07220564037561417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0816757082939148
Validation loss = 0.06884753704071045
Validation loss = 0.0664682686328888
Validation loss = 0.06659188866615295
Validation loss = 0.07131804525852203
Validation loss = 0.06769900768995285
Validation loss = 0.06550218164920807
Validation loss = 0.07598694413900375
Validation loss = 0.06442917883396149
Validation loss = 0.06260661035776138
Validation loss = 0.06412253528833389
Validation loss = 0.06609106063842773
Validation loss = 0.06581701338291168
Validation loss = 0.07160485535860062
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 395
average number of affinization = 416.64948453608247
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 425
average number of affinization = 416.734693877551
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 288
average number of affinization = 415.4343434343434
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 417
average number of affinization = 415.45
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 415
average number of affinization = 415.44554455445547
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 432
average number of affinization = 415.6078431372549
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -40.6    |
| Iteration     | 15       |
| MaximumReturn | 294      |
| MinimumReturn | -487     |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08270926028490067
Validation loss = 0.06872619688510895
Validation loss = 0.06445884704589844
Validation loss = 0.06519180536270142
Validation loss = 0.06474272906780243
Validation loss = 0.06880052387714386
Validation loss = 0.06299468129873276
Validation loss = 0.06315569579601288
Validation loss = 0.06984367966651917
Validation loss = 0.06344324350357056
Validation loss = 0.061638545244932175
Validation loss = 0.06603033095598221
Validation loss = 0.06578698754310608
Validation loss = 0.060434069484472275
Validation loss = 0.0637015551328659
Validation loss = 0.06942088156938553
Validation loss = 0.061472050845623016
Validation loss = 0.061231691390275955
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07891053706407547
Validation loss = 0.06628928333520889
Validation loss = 0.06294767558574677
Validation loss = 0.06214764714241028
Validation loss = 0.06443888694047928
Validation loss = 0.06321301311254501
Validation loss = 0.07135587930679321
Validation loss = 0.06819101423025131
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07974283397197723
Validation loss = 0.06214817240834236
Validation loss = 0.060401853173971176
Validation loss = 0.06148725375533104
Validation loss = 0.06802885979413986
Validation loss = 0.05821782350540161
Validation loss = 0.07457153499126434
Validation loss = 0.05857349932193756
Validation loss = 0.05962296202778816
Validation loss = 0.060687050223350525
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08078031241893768
Validation loss = 0.06348126381635666
Validation loss = 0.06081472337245941
Validation loss = 0.06305929273366928
Validation loss = 0.06248188018798828
Validation loss = 0.06194699555635452
Validation loss = 0.06210191920399666
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07231691479682922
Validation loss = 0.07005109637975693
Validation loss = 0.06384009122848511
Validation loss = 0.061963390558958054
Validation loss = 0.06186685711145401
Validation loss = 0.0622205026447773
Validation loss = 0.07668160647153854
Validation loss = 0.06271905452013016
Validation loss = 0.05978484824299812
Validation loss = 0.059684254229068756
Validation loss = 0.06748416274785995
Validation loss = 0.060131851583719254
Validation loss = 0.059795163571834564
Validation loss = 0.06185282766819
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 430
average number of affinization = 415.747572815534
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 424
average number of affinization = 415.8269230769231
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 516
average number of affinization = 416.7809523809524
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 465
average number of affinization = 417.2358490566038
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 392
average number of affinization = 417.0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 456
average number of affinization = 417.3611111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 819      |
| Iteration     | 16       |
| MaximumReturn | 1.9e+03  |
| MinimumReturn | -192     |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.071998231112957
Validation loss = 0.06005491688847542
Validation loss = 0.059036001563072205
Validation loss = 0.06777604669332504
Validation loss = 0.05840834602713585
Validation loss = 0.057987332344055176
Validation loss = 0.0599793866276741
Validation loss = 0.06918375194072723
Validation loss = 0.05814950913190842
Validation loss = 0.056595299392938614
Validation loss = 0.06877187639474869
Validation loss = 0.05814919248223305
Validation loss = 0.06056228652596474
Validation loss = 0.05697886273264885
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06829918920993805
Validation loss = 0.05991921201348305
Validation loss = 0.06020424887537956
Validation loss = 0.06095847859978676
Validation loss = 0.06184697151184082
Validation loss = 0.06296741962432861
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06766922771930695
Validation loss = 0.059660229831933975
Validation loss = 0.0569494292140007
Validation loss = 0.06018979102373123
Validation loss = 0.05773495137691498
Validation loss = 0.06386560946702957
Validation loss = 0.06426121294498444
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07256291061639786
Validation loss = 0.062117889523506165
Validation loss = 0.059681084007024765
Validation loss = 0.05924166738986969
Validation loss = 0.061005957424640656
Validation loss = 0.058137569576501846
Validation loss = 0.06012054905295372
Validation loss = 0.05862753093242645
Validation loss = 0.05854284018278122
Validation loss = 0.05836063623428345
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07324865460395813
Validation loss = 0.05821806937456131
Validation loss = 0.0571678951382637
Validation loss = 0.057979121804237366
Validation loss = 0.06393912434577942
Validation loss = 0.05669402331113815
Validation loss = 0.05664058402180672
Validation loss = 0.056319460272789
Validation loss = 0.0632043182849884
Validation loss = 0.05547264218330383
Validation loss = 0.05503712594509125
Validation loss = 0.05670933425426483
Validation loss = 0.06772676855325699
Validation loss = 0.05605128034949303
Validation loss = 0.05466967821121216
Validation loss = 0.05614148825407028
Validation loss = 0.056919679045677185
Validation loss = 0.07145500183105469
Validation loss = 0.05457192659378052
Validation loss = 0.054024919867515564
Validation loss = 0.0584728866815567
Validation loss = 0.054997190833091736
Validation loss = 0.05648525059223175
Validation loss = 0.05977562442421913
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 244
average number of affinization = 415.77064220183485
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 409
average number of affinization = 415.7090909090909
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 425
average number of affinization = 415.7927927927928
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 401
average number of affinization = 415.6607142857143
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 326
average number of affinization = 414.86725663716817
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 452
average number of affinization = 415.1929824561403
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 185       |
| Iteration     | 17        |
| MaximumReturn | 1.45e+03  |
| MinimumReturn | -1.31e+03 |
| TotalSamples  | 76000     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07773084938526154
Validation loss = 0.06066536903381348
Validation loss = 0.05495472997426987
Validation loss = 0.05587935447692871
Validation loss = 0.05832425504922867
Validation loss = 0.06130792573094368
Validation loss = 0.05573619529604912
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07212115824222565
Validation loss = 0.05872488021850586
Validation loss = 0.0587293803691864
Validation loss = 0.05698283016681671
Validation loss = 0.059924475848674774
Validation loss = 0.05756116658449173
Validation loss = 0.07944643497467041
Validation loss = 0.05588347092270851
Validation loss = 0.05670754611492157
Validation loss = 0.06072856858372688
Validation loss = 0.05668874830007553
Validation loss = 0.05526420846581459
Validation loss = 0.059701405465602875
Validation loss = 0.06698288768529892
Validation loss = 0.0558408685028553
Validation loss = 0.05581666901707649
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06794651597738266
Validation loss = 0.056072354316711426
Validation loss = 0.055264636874198914
Validation loss = 0.055983129888772964
Validation loss = 0.06463136523962021
Validation loss = 0.05351667478680611
Validation loss = 0.05687064304947853
Validation loss = 0.05719096586108208
Validation loss = 0.053341977298259735
Validation loss = 0.057368528097867966
Validation loss = 0.05960802733898163
Validation loss = 0.05245403200387955
Validation loss = 0.0522552914917469
Validation loss = 0.06858554482460022
Validation loss = 0.05440821126103401
Validation loss = 0.05177868530154228
Validation loss = 0.05150960758328438
Validation loss = 0.0560261495411396
Validation loss = 0.06301525980234146
Validation loss = 0.05254877731204033
Validation loss = 0.05160444974899292
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06744566559791565
Validation loss = 0.05807638168334961
Validation loss = 0.05549457296729088
Validation loss = 0.06875626742839813
Validation loss = 0.05584664270281792
Validation loss = 0.05457595735788345
Validation loss = 0.059203196316957474
Validation loss = 0.056069254875183105
Validation loss = 0.05340970307588577
Validation loss = 0.061289165169000626
Validation loss = 0.06270924210548401
Validation loss = 0.05362343788146973
Validation loss = 0.05535228177905083
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0677902102470398
Validation loss = 0.053899627178907394
Validation loss = 0.05268629640340805
Validation loss = 0.053008031100034714
Validation loss = 0.060162417590618134
Validation loss = 0.051577772945165634
Validation loss = 0.051922209560871124
Validation loss = 0.057052481919527054
Validation loss = 0.05135945975780487
Validation loss = 0.05515127256512642
Validation loss = 0.05292795971035957
Validation loss = 0.05189703777432442
Validation loss = 0.05257275700569153
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 388
average number of affinization = 414.95652173913044
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 363
average number of affinization = 414.5086206896552
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 426
average number of affinization = 414.6068376068376
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 415
average number of affinization = 414.6101694915254
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 431
average number of affinization = 414.74789915966386
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 449
average number of affinization = 415.03333333333336
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 685      |
| Iteration     | 18       |
| MaximumReturn | 1.14e+03 |
| MinimumReturn | 323      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06304284185171127
Validation loss = 0.0551440604031086
Validation loss = 0.05413951352238655
Validation loss = 0.05370517820119858
Validation loss = 0.06621447205543518
Validation loss = 0.05580176040530205
Validation loss = 0.05263931304216385
Validation loss = 0.05773169919848442
Validation loss = 0.05535588413476944
Validation loss = 0.05237050727009773
Validation loss = 0.05440395325422287
Validation loss = 0.05920403450727463
Validation loss = 0.053212713450193405
Validation loss = 0.05170537158846855
Validation loss = 0.056770604103803635
Validation loss = 0.07234197854995728
Validation loss = 0.05154113098978996
Validation loss = 0.05088088661432266
Validation loss = 0.0576910600066185
Validation loss = 0.05187124013900757
Validation loss = 0.051091231405735016
Validation loss = 0.052783213555812836
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0690869688987732
Validation loss = 0.05785543844103813
Validation loss = 0.05720794200897217
Validation loss = 0.05390457436442375
Validation loss = 0.05532463639974594
Validation loss = 0.055083584040403366
Validation loss = 0.05394751951098442
Validation loss = 0.05453641340136528
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06809206306934357
Validation loss = 0.05229351669549942
Validation loss = 0.05073133856058121
Validation loss = 0.05438380315899849
Validation loss = 0.05100361257791519
Validation loss = 0.06612856686115265
Validation loss = 0.0526544563472271
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06897061318159103
Validation loss = 0.054690051823854446
Validation loss = 0.05139384791254997
Validation loss = 0.05466002970933914
Validation loss = 0.055007435381412506
Validation loss = 0.05461981147527695
Validation loss = 0.05086906999349594
Validation loss = 0.05114084482192993
Validation loss = 0.06223646551370621
Validation loss = 0.05091480538249016
Validation loss = 0.05280439928174019
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06931576877832413
Validation loss = 0.051413871347904205
Validation loss = 0.05540939420461655
Validation loss = 0.05114427208900452
Validation loss = 0.052667587995529175
Validation loss = 0.05084185674786568
Validation loss = 0.05293642356991768
Validation loss = 0.05347267538309097
Validation loss = 0.051294922828674316
Validation loss = 0.06040176749229431
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 420
average number of affinization = 415.07438016528926
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 374
average number of affinization = 414.73770491803276
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 458
average number of affinization = 415.0894308943089
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 401
average number of affinization = 414.9758064516129
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 459
average number of affinization = 415.328
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 430
average number of affinization = 415.44444444444446
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 655      |
| Iteration     | 19       |
| MaximumReturn | 1.42e+03 |
| MinimumReturn | 190      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07040289044380188
Validation loss = 0.05171067267656326
Validation loss = 0.050011925399303436
Validation loss = 0.051997750997543335
Validation loss = 0.05708545073866844
Validation loss = 0.05049369856715202
Validation loss = 0.049982503056526184
Validation loss = 0.05437416583299637
Validation loss = 0.052885692566633224
Validation loss = 0.05553581193089485
Validation loss = 0.052260495722293854
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07363101840019226
Validation loss = 0.05338306725025177
Validation loss = 0.053313516080379486
Validation loss = 0.05591312050819397
Validation loss = 0.0584210604429245
Validation loss = 0.05401843413710594
Validation loss = 0.06183087080717087
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05783017352223396
Validation loss = 0.05145137757062912
Validation loss = 0.05211765319108963
Validation loss = 0.052041180431842804
Validation loss = 0.06195130944252014
Validation loss = 0.05013743415474892
Validation loss = 0.049776121973991394
Validation loss = 0.06285586953163147
Validation loss = 0.05248991772532463
Validation loss = 0.04989616572856903
Validation loss = 0.04947797954082489
Validation loss = 0.05513325706124306
Validation loss = 0.049227941781282425
Validation loss = 0.05000001937150955
Validation loss = 0.058798450976610184
Validation loss = 0.049757298082113266
Validation loss = 0.04825521260499954
Validation loss = 0.057767849415540695
Validation loss = 0.049361683428287506
Validation loss = 0.04954427853226662
Validation loss = 0.054285645484924316
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06721857190132141
Validation loss = 0.05325794965028763
Validation loss = 0.05199553817510605
Validation loss = 0.06035386398434639
Validation loss = 0.05203445255756378
Validation loss = 0.052084844559431076
Validation loss = 0.05262548476457596
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06951557099819183
Validation loss = 0.05321904644370079
Validation loss = 0.05001600459218025
Validation loss = 0.053689438849687576
Validation loss = 0.050019629299640656
Validation loss = 0.06068067252635956
Validation loss = 0.05136080086231232
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 423
average number of affinization = 415.503937007874
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 419
average number of affinization = 415.53125
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 387
average number of affinization = 415.31007751937983
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 343
average number of affinization = 414.75384615384615
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 363
average number of affinization = 414.3587786259542
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 387
average number of affinization = 414.1515151515151
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.13e+03 |
| Iteration     | 20       |
| MaximumReturn | 1.91e+03 |
| MinimumReturn | 441      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06390687078237534
Validation loss = 0.04889700189232826
Validation loss = 0.04855097457766533
Validation loss = 0.048780206590890884
Validation loss = 0.05594882741570473
Validation loss = 0.04907412454485893
Validation loss = 0.05032423883676529
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07155182212591171
Validation loss = 0.051606129854917526
Validation loss = 0.05094028636813164
Validation loss = 0.05159037187695503
Validation loss = 0.060468509793281555
Validation loss = 0.04911975562572479
Validation loss = 0.04974738135933876
Validation loss = 0.05103987082839012
Validation loss = 0.049996260553598404
Validation loss = 0.06038928031921387
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08295394480228424
Validation loss = 0.05054115876555443
Validation loss = 0.046971533447504044
Validation loss = 0.049272824078798294
Validation loss = 0.049422215670347214
Validation loss = 0.046512652188539505
Validation loss = 0.05346215143799782
Validation loss = 0.04815702512860298
Validation loss = 0.04809918627142906
Validation loss = 0.05524928495287895
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06614988297224045
Validation loss = 0.053550589829683304
Validation loss = 0.04922766983509064
Validation loss = 0.06778716295957565
Validation loss = 0.052144695073366165
Validation loss = 0.04898848384618759
Validation loss = 0.0511464923620224
Validation loss = 0.05781731382012367
Validation loss = 0.04886462911963463
Validation loss = 0.04817492887377739
Validation loss = 0.05315657705068588
Validation loss = 0.05519275739789009
Validation loss = 0.04850597679615021
Validation loss = 0.04847515746951103
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05522402375936508
Validation loss = 0.04795181751251221
Validation loss = 0.0491749607026577
Validation loss = 0.049727026373147964
Validation loss = 0.04800444841384888
Validation loss = 0.048755016177892685
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 457
average number of affinization = 414.4736842105263
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 386
average number of affinization = 414.26119402985074
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 427
average number of affinization = 414.35555555555555
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 416
average number of affinization = 414.36764705882354
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 383
average number of affinization = 414.13868613138686
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 388
average number of affinization = 413.9492753623188
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.18e+03 |
| Iteration     | 21       |
| MaximumReturn | 1.99e+03 |
| MinimumReturn | 53.7     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05700192600488663
Validation loss = 0.05052749067544937
Validation loss = 0.05314670875668526
Validation loss = 0.050197623670101166
Validation loss = 0.0536082424223423
Validation loss = 0.046698521822690964
Validation loss = 0.048592232167720795
Validation loss = 0.050274807959795
Validation loss = 0.04647093638777733
Validation loss = 0.05129466950893402
Validation loss = 0.04824213311076164
Validation loss = 0.05246604606509209
Validation loss = 0.04717806726694107
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06431299448013306
Validation loss = 0.05037527158856392
Validation loss = 0.050416089594364166
Validation loss = 0.04958024621009827
Validation loss = 0.04979562386870384
Validation loss = 0.051342420279979706
Validation loss = 0.048276904970407486
Validation loss = 0.05440262705087662
Validation loss = 0.0496341809630394
Validation loss = 0.04911641404032707
Validation loss = 0.04827607050538063
Validation loss = 0.05021366849541664
Validation loss = 0.04849441722035408
Validation loss = 0.05621851980686188
Validation loss = 0.047320444136857986
Validation loss = 0.051267027854919434
Validation loss = 0.04744813218712807
Validation loss = 0.05221738666296005
Validation loss = 0.05281655117869377
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0540156327188015
Validation loss = 0.04596826806664467
Validation loss = 0.04594319686293602
Validation loss = 0.04772469773888588
Validation loss = 0.0458252988755703
Validation loss = 0.04749440774321556
Validation loss = 0.046639155596494675
Validation loss = 0.06012430414557457
Validation loss = 0.04523653909564018
Validation loss = 0.048364125192165375
Validation loss = 0.0467415526509285
Validation loss = 0.050515372306108475
Validation loss = 0.058174531906843185
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06139411777257919
Validation loss = 0.048193883150815964
Validation loss = 0.04744501784443855
Validation loss = 0.052928484976291656
Validation loss = 0.04893907159566879
Validation loss = 0.04742598161101341
Validation loss = 0.04868268594145775
Validation loss = 0.06657480448484421
Validation loss = 0.047081109136343
Validation loss = 0.047232721000909805
Validation loss = 0.04768693074584007
Validation loss = 0.050099730491638184
Validation loss = 0.05496522784233093
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05949797481298447
Validation loss = 0.04722798243165016
Validation loss = 0.04579182341694832
Validation loss = 0.04921006038784981
Validation loss = 0.05528749153017998
Validation loss = 0.04612044245004654
Validation loss = 0.04629896581172943
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 405
average number of affinization = 413.8848920863309
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 458
average number of affinization = 414.2
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 361
average number of affinization = 413.822695035461
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 430
average number of affinization = 413.9366197183099
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 441
average number of affinization = 414.12587412587413
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 231
average number of affinization = 412.8541666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 987      |
| Iteration     | 22       |
| MaximumReturn | 1.62e+03 |
| MinimumReturn | 55.4     |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05676029995083809
Validation loss = 0.04834869131445885
Validation loss = 0.051597077399492264
Validation loss = 0.047976721078157425
Validation loss = 0.051910046488046646
Validation loss = 0.04793894290924072
Validation loss = 0.04490780457854271
Validation loss = 0.046747591346502304
Validation loss = 0.07016333192586899
Validation loss = 0.04656907916069031
Validation loss = 0.046962082386016846
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.056552860885858536
Validation loss = 0.04762725159525871
Validation loss = 0.04755140841007233
Validation loss = 0.050082866102457047
Validation loss = 0.04878092184662819
Validation loss = 0.048008669167757034
Validation loss = 0.05464518070220947
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.050502169877290726
Validation loss = 0.04508839175105095
Validation loss = 0.04564444348216057
Validation loss = 0.05903591588139534
Validation loss = 0.04484826326370239
Validation loss = 0.04803432151675224
Validation loss = 0.05410343408584595
Validation loss = 0.04526780918240547
Validation loss = 0.045172885060310364
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0646091029047966
Validation loss = 0.04662685468792915
Validation loss = 0.0477425716817379
Validation loss = 0.05435271933674812
Validation loss = 0.04782770201563835
Validation loss = 0.04674549028277397
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0561775378882885
Validation loss = 0.04763786122202873
Validation loss = 0.048151809722185135
Validation loss = 0.049567628651857376
Validation loss = 0.046157535165548325
Validation loss = 0.059231773018836975
Validation loss = 0.048417285084724426
Validation loss = 0.04567443206906319
Validation loss = 0.04804079607129097
Validation loss = 0.05007012188434601
Validation loss = 0.04920133948326111
Validation loss = 0.044977378100156784
Validation loss = 0.04749200865626335
Validation loss = 0.05162397027015686
Validation loss = 0.04616527631878853
Validation loss = 0.04815495014190674
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 416
average number of affinization = 412.8758620689655
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 269
average number of affinization = 411.8904109589041
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 387
average number of affinization = 411.72108843537416
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 385
average number of affinization = 411.5405405405405
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 372
average number of affinization = 411.2751677852349
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 258
average number of affinization = 410.25333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 640       |
| Iteration     | 23        |
| MaximumReturn | 1.68e+03  |
| MinimumReturn | -1.29e+03 |
| TotalSamples  | 100000    |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.060006894171237946
Validation loss = 0.04586606100201607
Validation loss = 0.045382060110569
Validation loss = 0.04539686068892479
Validation loss = 0.055417485535144806
Validation loss = 0.0433083213865757
Validation loss = 0.04505370557308197
Validation loss = 0.04738212004303932
Validation loss = 0.04666297137737274
Validation loss = 0.044720858335494995
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06924717128276825
Validation loss = 0.045940518379211426
Validation loss = 0.048642586916685104
Validation loss = 0.046943459659814835
Validation loss = 0.06672032177448273
Validation loss = 0.046325474977493286
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05663811042904854
Validation loss = 0.04445601999759674
Validation loss = 0.0455927848815918
Validation loss = 0.050346627831459045
Validation loss = 0.04399465397000313
Validation loss = 0.04413812980055809
Validation loss = 0.04480956494808197
Validation loss = 0.050759896636009216
Validation loss = 0.04390905797481537
Validation loss = 0.04710298404097557
Validation loss = 0.04302427917718887
Validation loss = 0.044122111052274704
Validation loss = 0.0482536219060421
Validation loss = 0.04273921623826027
Validation loss = 0.05157644674181938
Validation loss = 0.042835917323827744
Validation loss = 0.0433335155248642
Validation loss = 0.05126738175749779
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.060660939663648605
Validation loss = 0.04610034450888634
Validation loss = 0.04564741253852844
Validation loss = 0.05383239686489105
Validation loss = 0.045401833951473236
Validation loss = 0.045109838247299194
Validation loss = 0.047133669257164
Validation loss = 0.046379365026950836
Validation loss = 0.047871895134449005
Validation loss = 0.04660749435424805
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06284105777740479
Validation loss = 0.04570511728525162
Validation loss = 0.04597976803779602
Validation loss = 0.04559669643640518
Validation loss = 0.04665127396583557
Validation loss = 0.044655028730630875
Validation loss = 0.04776478931307793
Validation loss = 0.04944869130849838
Validation loss = 0.047100529074668884
Validation loss = 0.05356514826416969
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 265
average number of affinization = 409.2913907284768
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 359
average number of affinization = 408.9605263157895
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 245
average number of affinization = 407.8888888888889
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 278
average number of affinization = 407.04545454545456
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 398
average number of affinization = 406.98709677419356
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 361
average number of affinization = 406.6923076923077
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 107       |
| Iteration     | 24        |
| MaximumReturn | 2.04e+03  |
| MinimumReturn | -1.57e+03 |
| TotalSamples  | 104000    |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04689518362283707
Validation loss = 0.045652031898498535
Validation loss = 0.04562225192785263
Validation loss = 0.04377792775630951
Validation loss = 0.045324262231588364
Validation loss = 0.054495617747306824
Validation loss = 0.04433461278676987
Validation loss = 0.044610340148210526
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04910675436258316
Validation loss = 0.045044898986816406
Validation loss = 0.045457180589437485
Validation loss = 0.046776868402957916
Validation loss = 0.047438204288482666
Validation loss = 0.05142706632614136
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.054251112043857574
Validation loss = 0.04351123422384262
Validation loss = 0.04261939227581024
Validation loss = 0.04338936135172844
Validation loss = 0.04432423412799835
Validation loss = 0.04284414276480675
Validation loss = 0.04596466198563576
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04828464612364769
Validation loss = 0.048831745982170105
Validation loss = 0.04568015784025192
Validation loss = 0.048056505620479584
Validation loss = 0.04718649759888649
Validation loss = 0.047435879707336426
Validation loss = 0.0435858778655529
Validation loss = 0.04437790438532829
Validation loss = 0.05283584073185921
Validation loss = 0.04353398457169533
Validation loss = 0.04484356939792633
Validation loss = 0.047129079699516296
Validation loss = 0.051043637096881866
Validation loss = 0.04396260157227516
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05690564960241318
Validation loss = 0.0447796955704689
Validation loss = 0.04328306391835213
Validation loss = 0.044862255454063416
Validation loss = 0.04460013285279274
Validation loss = 0.048872485756874084
Validation loss = 0.04433560371398926
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 437
average number of affinization = 406.88535031847135
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 455
average number of affinization = 407.1898734177215
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 399
average number of affinization = 407.1383647798742
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 343
average number of affinization = 406.7375
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 383
average number of affinization = 406.5900621118012
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 412
average number of affinization = 406.62345679012344
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 638      |
| Iteration     | 25       |
| MaximumReturn | 1.61e+03 |
| MinimumReturn | -513     |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05365344509482384
Validation loss = 0.04106110706925392
Validation loss = 0.042531535029411316
Validation loss = 0.043689459562301636
Validation loss = 0.04675104096531868
Validation loss = 0.04095309227705002
Validation loss = 0.04165732488036156
Validation loss = 0.0415537990629673
Validation loss = 0.04274475574493408
Validation loss = 0.040206193923950195
Validation loss = 0.04156634584069252
Validation loss = 0.04792230576276779
Validation loss = 0.03993837162852287
Validation loss = 0.04577542096376419
Validation loss = 0.04170747101306915
Validation loss = 0.048350222408771515
Validation loss = 0.040792640298604965
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05111566558480263
Validation loss = 0.0430198609828949
Validation loss = 0.04154941812157631
Validation loss = 0.04671359062194824
Validation loss = 0.042293474078178406
Validation loss = 0.04896083101630211
Validation loss = 0.04186907410621643
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.048042695969343185
Validation loss = 0.040526971220970154
Validation loss = 0.04094205051660538
Validation loss = 0.0418340228497982
Validation loss = 0.04508135840296745
Validation loss = 0.04004843905568123
Validation loss = 0.03960566595196724
Validation loss = 0.052966222167015076
Validation loss = 0.03906741365790367
Validation loss = 0.0406145378947258
Validation loss = 0.03967530280351639
Validation loss = 0.03918677940964699
Validation loss = 0.0556766539812088
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04388680309057236
Validation loss = 0.0435933880507946
Validation loss = 0.04112271964550018
Validation loss = 0.04461684450507164
Validation loss = 0.04489508643746376
Validation loss = 0.044244442135095596
Validation loss = 0.0411834716796875
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04962918907403946
Validation loss = 0.043034229427576065
Validation loss = 0.040869567543268204
Validation loss = 0.04243890568614006
Validation loss = 0.04435083270072937
Validation loss = 0.041250571608543396
Validation loss = 0.053232673555612564
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 399
average number of affinization = 406.5766871165644
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 293
average number of affinization = 405.8841463414634
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 395
average number of affinization = 405.8181818181818
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 431
average number of affinization = 405.9698795180723
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 419
average number of affinization = 406.0479041916168
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 373
average number of affinization = 405.8511904761905
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 615      |
| Iteration     | 26       |
| MaximumReturn | 1.34e+03 |
| MinimumReturn | -1.4e+03 |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04264770820736885
Validation loss = 0.041605208069086075
Validation loss = 0.04156212881207466
Validation loss = 0.040066540241241455
Validation loss = 0.04687007889151573
Validation loss = 0.04039391502737999
Validation loss = 0.04051763191819191
Validation loss = 0.04568552225828171
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04640864208340645
Validation loss = 0.04075709730386734
Validation loss = 0.04076320305466652
Validation loss = 0.04301553964614868
Validation loss = 0.04053972288966179
Validation loss = 0.05213532969355583
Validation loss = 0.040231771767139435
Validation loss = 0.04193313792347908
Validation loss = 0.043794162571430206
Validation loss = 0.042218390852212906
Validation loss = 0.041021235287189484
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0425398163497448
Validation loss = 0.03840029984712601
Validation loss = 0.040449388325214386
Validation loss = 0.04823713377118111
Validation loss = 0.040464043617248535
Validation loss = 0.03981807455420494
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04630982130765915
Validation loss = 0.03961580991744995
Validation loss = 0.04016236588358879
Validation loss = 0.06641272455453873
Validation loss = 0.03941166400909424
Validation loss = 0.046281661838293076
Validation loss = 0.0394480861723423
Validation loss = 0.03991680592298508
Validation loss = 0.043757639825344086
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.050961706787347794
Validation loss = 0.0396796353161335
Validation loss = 0.04035614803433418
Validation loss = 0.044190552085638046
Validation loss = 0.03879836946725845
Validation loss = 0.0397360622882843
Validation loss = 0.040832050144672394
Validation loss = 0.041670020669698715
Validation loss = 0.040666401386260986
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 375
average number of affinization = 405.6686390532544
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 377
average number of affinization = 405.5
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 404
average number of affinization = 405.49122807017545
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 359
average number of affinization = 405.22093023255815
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 390
average number of affinization = 405.1329479768786
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 443
average number of affinization = 405.35057471264366
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.22e+03 |
| Iteration     | 27       |
| MaximumReturn | 1.65e+03 |
| MinimumReturn | 878      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04120380058884621
Validation loss = 0.039033982902765274
Validation loss = 0.038599610328674316
Validation loss = 0.041251290589571
Validation loss = 0.04128238931298256
Validation loss = 0.03751726448535919
Validation loss = 0.03818152844905853
Validation loss = 0.04074127972126007
Validation loss = 0.03620920330286026
Validation loss = 0.03953024744987488
Validation loss = 0.04200274497270584
Validation loss = 0.037231117486953735
Validation loss = 0.04280291497707367
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04911278560757637
Validation loss = 0.03956121951341629
Validation loss = 0.03903980553150177
Validation loss = 0.04266098886728287
Validation loss = 0.04021235927939415
Validation loss = 0.040020428597927094
Validation loss = 0.03857845440506935
Validation loss = 0.04013049602508545
Validation loss = 0.04574267938733101
Validation loss = 0.03837713226675987
Validation loss = 0.03873806074261665
Validation loss = 0.04957829415798187
Validation loss = 0.038258444517850876
Validation loss = 0.04059601202607155
Validation loss = 0.03721164911985397
Validation loss = 0.038222625851631165
Validation loss = 0.040025971829891205
Validation loss = 0.03864939883351326
Validation loss = 0.04041517153382301
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04314207658171654
Validation loss = 0.03807074949145317
Validation loss = 0.039133816957473755
Validation loss = 0.03822842612862587
Validation loss = 0.04913479834794998
Validation loss = 0.037042997777462006
Validation loss = 0.03687288612127304
Validation loss = 0.040166083723306656
Validation loss = 0.03693828731775284
Validation loss = 0.03753054887056351
Validation loss = 0.04295962676405907
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.047518130391836166
Validation loss = 0.037766747176647186
Validation loss = 0.038934655487537384
Validation loss = 0.04012617468833923
Validation loss = 0.04207489266991615
Validation loss = 0.038713328540325165
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04738941043615341
Validation loss = 0.039191797375679016
Validation loss = 0.038595788180828094
Validation loss = 0.041206542402505875
Validation loss = 0.038378480821847916
Validation loss = 0.041759684681892395
Validation loss = 0.03931787610054016
Validation loss = 0.037884894758462906
Validation loss = 0.04056214168667793
Validation loss = 0.040436115115880966
Validation loss = 0.03867987170815468
Validation loss = 0.03938644751906395
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 373
average number of affinization = 405.1657142857143
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 396
average number of affinization = 405.1136363636364
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 270
average number of affinization = 404.3502824858757
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 378
average number of affinization = 404.2022471910112
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 305
average number of affinization = 403.6480446927374
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 469
average number of affinization = 404.0111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 123       |
| Iteration     | 28        |
| MaximumReturn | 1.32e+03  |
| MinimumReturn | -1.55e+03 |
| TotalSamples  | 120000    |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.050978220999240875
Validation loss = 0.03816695883870125
Validation loss = 0.03896944224834442
Validation loss = 0.039664126932621
Validation loss = 0.03909139335155487
Validation loss = 0.04804942384362221
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04368513450026512
Validation loss = 0.03875769302248955
Validation loss = 0.040825650095939636
Validation loss = 0.03985614329576492
Validation loss = 0.03979136794805527
Validation loss = 0.038920946419239044
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0616835281252861
Validation loss = 0.037792500108480453
Validation loss = 0.03889458626508713
Validation loss = 0.038912709802389145
Validation loss = 0.03777875006198883
Validation loss = 0.03781264275312424
Validation loss = 0.04689999297261238
Validation loss = 0.03717594966292381
Validation loss = 0.03965180367231369
Validation loss = 0.03670388460159302
Validation loss = 0.036290381103754044
Validation loss = 0.04391591623425484
Validation loss = 0.0377873070538044
Validation loss = 0.03772788122296333
Validation loss = 0.03761332854628563
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04725150391459465
Validation loss = 0.03932313621044159
Validation loss = 0.03894833102822304
Validation loss = 0.04012315720319748
Validation loss = 0.04037797078490257
Validation loss = 0.038695741444826126
Validation loss = 0.04018587991595268
Validation loss = 0.039287976920604706
Validation loss = 0.03916621580719948
Validation loss = 0.038247402757406235
Validation loss = 0.039361219853162766
Validation loss = 0.04570071026682854
Validation loss = 0.03927966579794884
Validation loss = 0.04058545082807541
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04639343172311783
Validation loss = 0.04205076023936272
Validation loss = 0.03821363300085068
Validation loss = 0.041076187044382095
Validation loss = 0.04167627543210983
Validation loss = 0.04273258149623871
Validation loss = 0.039822936058044434
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 418
average number of affinization = 404.08839779005524
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 308
average number of affinization = 403.56043956043953
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 375
average number of affinization = 403.40437158469945
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 334
average number of affinization = 403.0271739130435
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 423
average number of affinization = 403.13513513513516
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 363
average number of affinization = 402.9193548387097
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.04e+03 |
| Iteration     | 29       |
| MaximumReturn | 1.83e+03 |
| MinimumReturn | 42.4     |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04341156780719757
Validation loss = 0.037805743515491486
Validation loss = 0.03883438929915428
Validation loss = 0.0437769889831543
Validation loss = 0.03893855959177017
Validation loss = 0.039588022977113724
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04746855050325394
Validation loss = 0.03963395580649376
Validation loss = 0.038435693830251694
Validation loss = 0.04989893361926079
Validation loss = 0.037827279418706894
Validation loss = 0.03912254422903061
Validation loss = 0.04059519246220589
Validation loss = 0.03751702234148979
Validation loss = 0.04032682254910469
Validation loss = 0.03737552464008331
Validation loss = 0.040553849190473557
Validation loss = 0.037142544984817505
Validation loss = 0.04086604714393616
Validation loss = 0.038700833916664124
Validation loss = 0.04213238134980202
Validation loss = 0.041597723960876465
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.040409594774246216
Validation loss = 0.03682003915309906
Validation loss = 0.038757119327783585
Validation loss = 0.037787359207868576
Validation loss = 0.03879048302769661
Validation loss = 0.037527166306972504
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04530427232384682
Validation loss = 0.03932741656899452
Validation loss = 0.039012473076581955
Validation loss = 0.040780168026685715
Validation loss = 0.03949446603655815
Validation loss = 0.03726152330636978
Validation loss = 0.0377974733710289
Validation loss = 0.038258083164691925
Validation loss = 0.03823975473642349
Validation loss = 0.04235759377479553
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04976116120815277
Validation loss = 0.0380999818444252
Validation loss = 0.040144868195056915
Validation loss = 0.04211200401186943
Validation loss = 0.03860851749777794
Validation loss = 0.04398288205265999
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 353
average number of affinization = 402.6524064171123
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 391
average number of affinization = 402.5904255319149
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 396
average number of affinization = 402.55555555555554
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 388
average number of affinization = 402.4789473684211
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 349
average number of affinization = 402.19895287958116
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 321
average number of affinization = 401.7760416666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.1e+03  |
| Iteration     | 30       |
| MaximumReturn | 2.33e+03 |
| MinimumReturn | 47.4     |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04138832166790962
Validation loss = 0.036570705473423004
Validation loss = 0.03811948001384735
Validation loss = 0.03790447860956192
Validation loss = 0.04253634810447693
Validation loss = 0.0377262681722641
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.040262605994939804
Validation loss = 0.03633609414100647
Validation loss = 0.038369547575712204
Validation loss = 0.03800041228532791
Validation loss = 0.037492748349905014
Validation loss = 0.04060719907283783
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.042517319321632385
Validation loss = 0.03748275339603424
Validation loss = 0.03713329881429672
Validation loss = 0.03806447982788086
Validation loss = 0.03708065301179886
Validation loss = 0.03636070340871811
Validation loss = 0.047440074384212494
Validation loss = 0.03567042201757431
Validation loss = 0.036896198987960815
Validation loss = 0.036954235285520554
Validation loss = 0.03697797656059265
Validation loss = 0.03711273521184921
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03951381891965866
Validation loss = 0.03721623495221138
Validation loss = 0.038446247577667236
Validation loss = 0.04146561026573181
Validation loss = 0.03775867074728012
Validation loss = 0.038350045680999756
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04882751777768135
Validation loss = 0.04255769029259682
Validation loss = 0.037615448236465454
Validation loss = 0.038370609283447266
Validation loss = 0.03723250702023506
Validation loss = 0.03792488947510719
Validation loss = 0.037134744226932526
Validation loss = 0.04339703172445297
Validation loss = 0.03716673702001572
Validation loss = 0.0420171283185482
Validation loss = 0.03682917356491089
Validation loss = 0.03786884993314743
Validation loss = 0.036764536052942276
Validation loss = 0.03709455206990242
Validation loss = 0.05539891496300697
Validation loss = 0.036111414432525635
Validation loss = 0.03716621920466423
Validation loss = 0.03638812154531479
Validation loss = 0.041405823081731796
Validation loss = 0.035756103694438934
Validation loss = 0.037324368953704834
Validation loss = 0.03744453564286232
Validation loss = 0.036748044192790985
Validation loss = 0.04118030518293381
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 335
average number of affinization = 401.4300518134715
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 369
average number of affinization = 401.2628865979381
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 361
average number of affinization = 401.05641025641023
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 374
average number of affinization = 400.9183673469388
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 327
average number of affinization = 400.5431472081218
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 363
average number of affinization = 400.35353535353534
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.73e+03 |
| Iteration     | 31       |
| MaximumReturn | 2.21e+03 |
| MinimumReturn | 1.43e+03 |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.038427237421274185
Validation loss = 0.03766423463821411
Validation loss = 0.03628193587064743
Validation loss = 0.036455389112234116
Validation loss = 0.037756383419036865
Validation loss = 0.03793211653828621
Validation loss = 0.03665680065751076
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.042940616607666016
Validation loss = 0.03598788380622864
Validation loss = 0.03733394294977188
Validation loss = 0.03909112513065338
Validation loss = 0.03639381006360054
Validation loss = 0.04419764503836632
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.045856375247240067
Validation loss = 0.03517976775765419
Validation loss = 0.0357213132083416
Validation loss = 0.038622356951236725
Validation loss = 0.0363982655107975
Validation loss = 0.03802647069096565
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.040050044655799866
Validation loss = 0.03602404147386551
Validation loss = 0.03604462370276451
Validation loss = 0.035480450838804245
Validation loss = 0.03848987817764282
Validation loss = 0.037033915519714355
Validation loss = 0.035918764770030975
Validation loss = 0.036749742925167084
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04105072095990181
Validation loss = 0.03563612699508667
Validation loss = 0.03678654879331589
Validation loss = 0.03757539764046669
Validation loss = 0.03633024916052818
Validation loss = 0.04072023555636406
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 0.3 is 325
average number of affinization = 399.9748743718593
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 0.3 is 340
average number of affinization = 399.675
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 0.3 is 347
average number of affinization = 399.41293532338307
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 0.3 is 386
average number of affinization = 399.34653465346537
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 0.3 is 326
average number of affinization = 398.98522167487687
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 0.3 is 227
average number of affinization = 398.1421568627451
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.04e+03 |
| Iteration     | 32       |
| MaximumReturn | 2.3e+03  |
| MinimumReturn | 1.57e+03 |
| TotalSamples  | 136000   |
----------------------------
