Logging to experiments/gym_fswimmer/nov4/Sw350e1_seed1231
Print configuration .....
{'env_name': 'gym_fswimmer', 'random_seeds': [2312, 1231, 2631, 5543], 'save_variables': False, 'model_save_dir': '/tmp/gym_fswimmer_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 200, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3751503825187683
Validation loss = 0.18611079454421997
Validation loss = 0.12454062700271606
Validation loss = 0.10527808219194412
Validation loss = 0.09587299823760986
Validation loss = 0.09260258078575134
Validation loss = 0.09164684265851974
Validation loss = 0.08681808412075043
Validation loss = 0.08292256295681
Validation loss = 0.08630014955997467
Validation loss = 0.08660692721605301
Validation loss = 0.08264131844043732
Validation loss = 0.09007436782121658
Validation loss = 0.08715596795082092
Validation loss = 0.07791785895824432
Validation loss = 0.0849870890378952
Validation loss = 0.0913025513291359
Validation loss = 0.07890144735574722
Validation loss = 0.08048659563064575
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5319804549217224
Validation loss = 0.19062943756580353
Validation loss = 0.13545049726963043
Validation loss = 0.10386621952056885
Validation loss = 0.09881731867790222
Validation loss = 0.0914812758564949
Validation loss = 0.08733318746089935
Validation loss = 0.08829469233751297
Validation loss = 0.08541201055049896
Validation loss = 0.0827028900384903
Validation loss = 0.08245312422513962
Validation loss = 0.08401034772396088
Validation loss = 0.08221793174743652
Validation loss = 0.08909104019403458
Validation loss = 0.08660344779491425
Validation loss = 0.08076128363609314
Validation loss = 0.08506403863430023
Validation loss = 0.08591175079345703
Validation loss = 0.08109914511442184
Validation loss = 0.09812919050455093
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5864412784576416
Validation loss = 0.18139827251434326
Validation loss = 0.1225011944770813
Validation loss = 0.10767079889774323
Validation loss = 0.0937155932188034
Validation loss = 0.08873087167739868
Validation loss = 0.09372878074645996
Validation loss = 0.0859716534614563
Validation loss = 0.0853409543633461
Validation loss = 0.09007255733013153
Validation loss = 0.08745039254426956
Validation loss = 0.08687793463468552
Validation loss = 0.08704951405525208
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3709338903427124
Validation loss = 0.1606450378894806
Validation loss = 0.12677882611751556
Validation loss = 0.10194131731987
Validation loss = 0.09262460470199585
Validation loss = 0.08787326514720917
Validation loss = 0.08959435671567917
Validation loss = 0.08459175378084183
Validation loss = 0.08228521794080734
Validation loss = 0.08448386937379837
Validation loss = 0.08191411197185516
Validation loss = 0.0837038978934288
Validation loss = 0.09152017533779144
Validation loss = 0.08186660706996918
Validation loss = 0.08359183371067047
Validation loss = 0.0811086967587471
Validation loss = 0.08332236111164093
Validation loss = 0.08474883437156677
Validation loss = 0.08475616574287415
Validation loss = 0.08099035918712616
Validation loss = 0.08676023781299591
Validation loss = 0.08793915808200836
Validation loss = 0.09065908938646317
Validation loss = 0.0832349956035614
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3843648433685303
Validation loss = 0.19119060039520264
Validation loss = 0.1347392350435257
Validation loss = 0.10377517342567444
Validation loss = 0.09641297161579132
Validation loss = 0.0981106162071228
Validation loss = 0.09090906381607056
Validation loss = 0.0900924876332283
Validation loss = 0.08590953797101974
Validation loss = 0.08255314081907272
Validation loss = 0.0791291743516922
Validation loss = 0.08320499211549759
Validation loss = 0.08145080506801605
Validation loss = 0.082769013941288
Validation loss = 0.08017060905694962
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 30
average number of affinization = 4.285714285714286
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 18
average number of affinization = 6.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 9
average number of affinization = 6.333333333333333
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 29
average number of affinization = 8.6
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 6
average number of affinization = 8.363636363636363
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 20
average number of affinization = 9.333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -57      |
| Iteration     | 0        |
| MaximumReturn | -50.3    |
| MinimumReturn | -65.8    |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10278299450874329
Validation loss = 0.04112814739346504
Validation loss = 0.03616156429052353
Validation loss = 0.03689016401767731
Validation loss = 0.035948023200035095
Validation loss = 0.03523850440979004
Validation loss = 0.03619711473584175
Validation loss = 0.03376924246549606
Validation loss = 0.03726740553975105
Validation loss = 0.034787680953741074
Validation loss = 0.03343912959098816
Validation loss = 0.03381059691309929
Validation loss = 0.04181930050253868
Validation loss = 0.03573261573910713
Validation loss = 0.03420432657003403
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0975344106554985
Validation loss = 0.0409165695309639
Validation loss = 0.03617791831493378
Validation loss = 0.03503783419728279
Validation loss = 0.03676259517669678
Validation loss = 0.03509533032774925
Validation loss = 0.035914480686187744
Validation loss = 0.03559988737106323
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08764325827360153
Validation loss = 0.04234194755554199
Validation loss = 0.039126768708229065
Validation loss = 0.03792741894721985
Validation loss = 0.03803540766239166
Validation loss = 0.03562013432383537
Validation loss = 0.038318317383527756
Validation loss = 0.035790394991636276
Validation loss = 0.036984506994485855
Validation loss = 0.036369115114212036
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07778163254261017
Validation loss = 0.04050702974200249
Validation loss = 0.03672333061695099
Validation loss = 0.03575893118977547
Validation loss = 0.03514653816819191
Validation loss = 0.03536158800125122
Validation loss = 0.03877135366201401
Validation loss = 0.035609494894742966
Validation loss = 0.03438052907586098
Validation loss = 0.035186026245355606
Validation loss = 0.03427574038505554
Validation loss = 0.03505920618772507
Validation loss = 0.035804346203804016
Validation loss = 0.0325142964720726
Validation loss = 0.03464633971452713
Validation loss = 0.03740733489394188
Validation loss = 0.03835465759038925
Validation loss = 0.03407669439911842
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08596588671207428
Validation loss = 0.04064049944281578
Validation loss = 0.036654628813266754
Validation loss = 0.03530552610754967
Validation loss = 0.0352991558611393
Validation loss = 0.03481461480259895
Validation loss = 0.033303823322057724
Validation loss = 0.03414925932884216
Validation loss = 0.03556476905941963
Validation loss = 0.034942928701639175
Validation loss = 0.033445775508880615
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 148
average number of affinization = 20.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 34
average number of affinization = 21.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 110
average number of affinization = 26.933333333333334
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 44
average number of affinization = 28.0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 54
average number of affinization = 29.529411764705884
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 123
average number of affinization = 34.72222222222222
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 14.6     |
| Iteration     | 1        |
| MaximumReturn | 20.7     |
| MinimumReturn | 10.1     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.041704997420310974
Validation loss = 0.02441493608057499
Validation loss = 0.023239606991410255
Validation loss = 0.02386440522968769
Validation loss = 0.02167382836341858
Validation loss = 0.02717488817870617
Validation loss = 0.024960963055491447
Validation loss = 0.021922225132584572
Validation loss = 0.024031953886151314
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.036053139716386795
Validation loss = 0.024380220100283623
Validation loss = 0.02512480318546295
Validation loss = 0.025881223380565643
Validation loss = 0.02488309144973755
Validation loss = 0.022971101105213165
Validation loss = 0.026642704382538795
Validation loss = 0.023099800571799278
Validation loss = 0.022475354373455048
Validation loss = 0.024768918752670288
Validation loss = 0.02235591411590576
Validation loss = 0.021092498674988747
Validation loss = 0.023329487070441246
Validation loss = 0.021195242181420326
Validation loss = 0.02303585223853588
Validation loss = 0.023307735100388527
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03559750318527222
Validation loss = 0.024402426555752754
Validation loss = 0.023788636550307274
Validation loss = 0.02470102347433567
Validation loss = 0.023787759244441986
Validation loss = 0.023444034159183502
Validation loss = 0.024948880076408386
Validation loss = 0.023060157895088196
Validation loss = 0.02236764319241047
Validation loss = 0.023427920415997505
Validation loss = 0.025410139933228493
Validation loss = 0.024743275716900826
Validation loss = 0.025593183934688568
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04506199434399605
Validation loss = 0.02381969429552555
Validation loss = 0.0232586320489645
Validation loss = 0.0244769174605608
Validation loss = 0.02330700308084488
Validation loss = 0.02317296527326107
Validation loss = 0.026055149734020233
Validation loss = 0.02356703020632267
Validation loss = 0.022377630695700645
Validation loss = 0.023610100150108337
Validation loss = 0.023771636188030243
Validation loss = 0.023233884945511818
Validation loss = 0.022453436627984047
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04296625033020973
Validation loss = 0.023235438391566277
Validation loss = 0.023592405021190643
Validation loss = 0.025375964120030403
Validation loss = 0.024186760187149048
Validation loss = 0.022676708176732063
Validation loss = 0.022320538759231567
Validation loss = 0.022732339799404144
Validation loss = 0.022589826956391335
Validation loss = 0.024768127128481865
Validation loss = 0.024632999673485756
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 32
average number of affinization = 34.578947368421055
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 109
average number of affinization = 38.3
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 114
average number of affinization = 41.904761904761905
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 84
average number of affinization = 43.81818181818182
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 348
average number of affinization = 57.04347826086956
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 46
average number of affinization = 56.583333333333336
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.25     |
| Iteration     | 2        |
| MaximumReturn | 12.5     |
| MinimumReturn | -9.55    |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03174934163689613
Validation loss = 0.019053447991609573
Validation loss = 0.01886584982275963
Validation loss = 0.019079608842730522
Validation loss = 0.020504256710410118
Validation loss = 0.017964983358979225
Validation loss = 0.019235365092754364
Validation loss = 0.019144579768180847
Validation loss = 0.016957856714725494
Validation loss = 0.019434833899140358
Validation loss = 0.021972205489873886
Validation loss = 0.022051485255360603
Validation loss = 0.01671011745929718
Validation loss = 0.016550974920392036
Validation loss = 0.019923992455005646
Validation loss = 0.020971573889255524
Validation loss = 0.01798410899937153
Validation loss = 0.017062939703464508
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0276492852717638
Validation loss = 0.01706533692777157
Validation loss = 0.01809084042906761
Validation loss = 0.017839357256889343
Validation loss = 0.018610283732414246
Validation loss = 0.01806887425482273
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.033127959817647934
Validation loss = 0.019227752462029457
Validation loss = 0.01951565034687519
Validation loss = 0.019999587908387184
Validation loss = 0.019242431968450546
Validation loss = 0.01958720199763775
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.032672375440597534
Validation loss = 0.020266037434339523
Validation loss = 0.020419007167220116
Validation loss = 0.022739578038454056
Validation loss = 0.017330430448055267
Validation loss = 0.017434794455766678
Validation loss = 0.020408904179930687
Validation loss = 0.01781800016760826
Validation loss = 0.018135657534003258
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03080664575099945
Validation loss = 0.020970815792679787
Validation loss = 0.018727436661720276
Validation loss = 0.017993004992604256
Validation loss = 0.01790340431034565
Validation loss = 0.018877794966101646
Validation loss = 0.018242137506604195
Validation loss = 0.017137661576271057
Validation loss = 0.0181618370115757
Validation loss = 0.01737682893872261
Validation loss = 0.02100040577352047
Validation loss = 0.01781737431883812
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 77
average number of affinization = 57.4
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 100
average number of affinization = 59.03846153846154
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 19
average number of affinization = 57.55555555555556
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 95
average number of affinization = 58.892857142857146
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 102
average number of affinization = 60.37931034482759
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 110
average number of affinization = 62.03333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 7.76     |
| Iteration     | 3        |
| MaximumReturn | 11.3     |
| MinimumReturn | 3.64     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020758938044309616
Validation loss = 0.016248878091573715
Validation loss = 0.01505646575242281
Validation loss = 0.017403695732355118
Validation loss = 0.01514127291738987
Validation loss = 0.014655718579888344
Validation loss = 0.014717794954776764
Validation loss = 0.014698529615998268
Validation loss = 0.015863679349422455
Validation loss = 0.01500807236880064
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018111305311322212
Validation loss = 0.014798249118030071
Validation loss = 0.015832247212529182
Validation loss = 0.01592755690217018
Validation loss = 0.015915285795927048
Validation loss = 0.01683611422777176
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020786847919225693
Validation loss = 0.016444077715277672
Validation loss = 0.01467025838792324
Validation loss = 0.01807035319507122
Validation loss = 0.014807252213358879
Validation loss = 0.015178531408309937
Validation loss = 0.014284363016486168
Validation loss = 0.015034687705338001
Validation loss = 0.015792522579431534
Validation loss = 0.01603761874139309
Validation loss = 0.014739896170794964
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01772012561559677
Validation loss = 0.015202710404992104
Validation loss = 0.014763588085770607
Validation loss = 0.015065250918269157
Validation loss = 0.020709047093987465
Validation loss = 0.016860630363225937
Validation loss = 0.020341984927654266
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0172955971211195
Validation loss = 0.015251504257321358
Validation loss = 0.016772154718637466
Validation loss = 0.01612934283912182
Validation loss = 0.014301672577857971
Validation loss = 0.01741400733590126
Validation loss = 0.01567685976624489
Validation loss = 0.01468302495777607
Validation loss = 0.014634030871093273
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 188
average number of affinization = 66.09677419354838
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 11
average number of affinization = 64.375
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 143
average number of affinization = 66.75757575757575
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 57
average number of affinization = 66.47058823529412
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 12
average number of affinization = 64.91428571428571
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 165
average number of affinization = 67.69444444444444
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 13.2     |
| Iteration     | 4        |
| MaximumReturn | 25.4     |
| MinimumReturn | 6.68     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014244955964386463
Validation loss = 0.01289354171603918
Validation loss = 0.012151272036135197
Validation loss = 0.015994546934962273
Validation loss = 0.015789028257131577
Validation loss = 0.01386890560388565
Validation loss = 0.012877166271209717
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01625980995595455
Validation loss = 0.013245531357824802
Validation loss = 0.012519405223429203
Validation loss = 0.01250942051410675
Validation loss = 0.0125552574172616
Validation loss = 0.014910560101270676
Validation loss = 0.012853498570621014
Validation loss = 0.012507175095379353
Validation loss = 0.01339781191200018
Validation loss = 0.012399987317621708
Validation loss = 0.011908870190382004
Validation loss = 0.014784987084567547
Validation loss = 0.013542472384870052
Validation loss = 0.013946178369224072
Validation loss = 0.014196385629475117
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01400642842054367
Validation loss = 0.014140437357127666
Validation loss = 0.01362472865730524
Validation loss = 0.013020866550505161
Validation loss = 0.013113702647387981
Validation loss = 0.016248097643256187
Validation loss = 0.01261101383715868
Validation loss = 0.012856467626988888
Validation loss = 0.012808635830879211
Validation loss = 0.012918617576360703
Validation loss = 0.014496746473014355
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01508523989468813
Validation loss = 0.01467870268970728
Validation loss = 0.0220282431691885
Validation loss = 0.012302838265895844
Validation loss = 0.014047638513147831
Validation loss = 0.013447903096675873
Validation loss = 0.013457090593874454
Validation loss = 0.012823988683521748
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014647748321294785
Validation loss = 0.012674576602876186
Validation loss = 0.0139614874497056
Validation loss = 0.013789460994303226
Validation loss = 0.012910415418446064
Validation loss = 0.014441839419305325
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 165
average number of affinization = 70.32432432432432
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 127
average number of affinization = 71.8157894736842
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 167
average number of affinization = 74.25641025641026
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 43
average number of affinization = 73.475
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 48
average number of affinization = 72.85365853658537
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 52
average number of affinization = 72.35714285714286
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 19.7     |
| Iteration     | 5        |
| MaximumReturn | 21.8     |
| MinimumReturn | 16.2     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012737979181110859
Validation loss = 0.010154878720641136
Validation loss = 0.013067676685750484
Validation loss = 0.011273081414401531
Validation loss = 0.011221959255635738
Validation loss = 0.012424087151885033
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011025081388652325
Validation loss = 0.010656598024070263
Validation loss = 0.010314149782061577
Validation loss = 0.011472865007817745
Validation loss = 0.012485512532293797
Validation loss = 0.012055651284754276
Validation loss = 0.010676614008843899
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012232257053256035
Validation loss = 0.010626702569425106
Validation loss = 0.011025642976164818
Validation loss = 0.010938398540019989
Validation loss = 0.010539340786635876
Validation loss = 0.009995155967772007
Validation loss = 0.011899517849087715
Validation loss = 0.01065834704786539
Validation loss = 0.0099297184497118
Validation loss = 0.01041025947779417
Validation loss = 0.010154834017157555
Validation loss = 0.010893186554312706
Validation loss = 0.009815127588808537
Validation loss = 0.01097518764436245
Validation loss = 0.010740426369011402
Validation loss = 0.015988487750291824
Validation loss = 0.010753831826150417
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014628632925450802
Validation loss = 0.011430444195866585
Validation loss = 0.010791241191327572
Validation loss = 0.011065812781453133
Validation loss = 0.012217027135193348
Validation loss = 0.010767563246190548
Validation loss = 0.01107906736433506
Validation loss = 0.012992849573493004
Validation loss = 0.013471693731844425
Validation loss = 0.01142833661288023
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01293176505714655
Validation loss = 0.01021680049598217
Validation loss = 0.010556505061686039
Validation loss = 0.0117802107706666
Validation loss = 0.011425478383898735
Validation loss = 0.00983994361013174
Validation loss = 0.010979504324495792
Validation loss = 0.010567696765065193
Validation loss = 0.010721816681325436
Validation loss = 0.011813074350357056
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 187
average number of affinization = 75.02325581395348
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 282
average number of affinization = 79.72727272727273
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 347
average number of affinization = 85.66666666666667
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 291
average number of affinization = 90.1304347826087
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 421
average number of affinization = 97.17021276595744
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 316
average number of affinization = 101.72916666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 15.8     |
| Iteration     | 6        |
| MaximumReturn | 21.6     |
| MinimumReturn | 11.2     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010028590448200703
Validation loss = 0.008054357022047043
Validation loss = 0.00787045806646347
Validation loss = 0.008060605265200138
Validation loss = 0.0076767392456531525
Validation loss = 0.00853172317147255
Validation loss = 0.007978152483701706
Validation loss = 0.007645539939403534
Validation loss = 0.008192898705601692
Validation loss = 0.008049615658819675
Validation loss = 0.00787439290434122
Validation loss = 0.008423355408012867
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00888083316385746
Validation loss = 0.0075103482231497765
Validation loss = 0.008143284358084202
Validation loss = 0.007686279248446226
Validation loss = 0.008430092595517635
Validation loss = 0.007298728451132774
Validation loss = 0.0076660094782710075
Validation loss = 0.007723129820078611
Validation loss = 0.007670814171433449
Validation loss = 0.007703613489866257
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008814029395580292
Validation loss = 0.0073623573407530785
Validation loss = 0.007233099080622196
Validation loss = 0.0071260505355894566
Validation loss = 0.007847950793802738
Validation loss = 0.009212582372128963
Validation loss = 0.007845758460462093
Validation loss = 0.007727271877229214
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01137456577271223
Validation loss = 0.00774860056117177
Validation loss = 0.007643760181963444
Validation loss = 0.008445746265351772
Validation loss = 0.00824614055454731
Validation loss = 0.008824032731354237
Validation loss = 0.008736584335565567
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009394928812980652
Validation loss = 0.007779407314956188
Validation loss = 0.007774787023663521
Validation loss = 0.009238066151738167
Validation loss = 0.008597749285399914
Validation loss = 0.007802772335708141
Validation loss = 0.007944107986986637
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 255
average number of affinization = 104.85714285714286
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 258
average number of affinization = 107.92
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 317
average number of affinization = 112.01960784313725
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 361
average number of affinization = 116.8076923076923
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 263
average number of affinization = 119.56603773584905
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 164
average number of affinization = 120.38888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 27.1     |
| Iteration     | 7        |
| MaximumReturn | 32.9     |
| MinimumReturn | 21.9     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00701161241158843
Validation loss = 0.007205526810139418
Validation loss = 0.006393086165189743
Validation loss = 0.007169860880821943
Validation loss = 0.009854413568973541
Validation loss = 0.007387750782072544
Validation loss = 0.008370832540094852
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007783022243529558
Validation loss = 0.007380129769444466
Validation loss = 0.008100861683487892
Validation loss = 0.0064461552537977695
Validation loss = 0.007139868102967739
Validation loss = 0.007298187352716923
Validation loss = 0.006687650457024574
Validation loss = 0.006556675303727388
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0072017814964056015
Validation loss = 0.008046258240938187
Validation loss = 0.006795298773795366
Validation loss = 0.007499439641833305
Validation loss = 0.006893321871757507
Validation loss = 0.007637889124453068
Validation loss = 0.006408492103219032
Validation loss = 0.006417122203856707
Validation loss = 0.006239352747797966
Validation loss = 0.007823106832802296
Validation loss = 0.00764246191829443
Validation loss = 0.006534999702125788
Validation loss = 0.006416415795683861
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007585578598082066
Validation loss = 0.007986034266650677
Validation loss = 0.0069458698853850365
Validation loss = 0.00680427486076951
Validation loss = 0.007137425243854523
Validation loss = 0.0072319540195167065
Validation loss = 0.00780153926461935
Validation loss = 0.007650821469724178
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0074019827879965305
Validation loss = 0.007456304971128702
Validation loss = 0.007048958912491798
Validation loss = 0.007842127233743668
Validation loss = 0.0069908988662064075
Validation loss = 0.006553107872605324
Validation loss = 0.006823176983743906
Validation loss = 0.006979025900363922
Validation loss = 0.007456229999661446
Validation loss = 0.006744693499058485
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 155
average number of affinization = 121.01818181818182
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 422
average number of affinization = 126.39285714285714
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 304
average number of affinization = 129.50877192982455
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 320
average number of affinization = 132.79310344827587
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 152
average number of affinization = 133.11864406779662
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 221
average number of affinization = 134.58333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 19.8     |
| Iteration     | 8        |
| MaximumReturn | 25.4     |
| MinimumReturn | 13.1     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00775246974080801
Validation loss = 0.006476477719843388
Validation loss = 0.006881414912641048
Validation loss = 0.006343859247863293
Validation loss = 0.006476191338151693
Validation loss = 0.006475798785686493
Validation loss = 0.006154700648039579
Validation loss = 0.007347077131271362
Validation loss = 0.007548829074949026
Validation loss = 0.006001526024192572
Validation loss = 0.006252084858715534
Validation loss = 0.006431113928556442
Validation loss = 0.006207168567925692
Validation loss = 0.007246549241244793
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0069219740107655525
Validation loss = 0.006759564392268658
Validation loss = 0.005765932612121105
Validation loss = 0.005830452777445316
Validation loss = 0.006368809845298529
Validation loss = 0.00575454905629158
Validation loss = 0.005877387709915638
Validation loss = 0.006785220000892878
Validation loss = 0.005678118206560612
Validation loss = 0.006035848055034876
Validation loss = 0.006441310979425907
Validation loss = 0.005898857489228249
Validation loss = 0.0058729881420731544
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006160871125757694
Validation loss = 0.005948541685938835
Validation loss = 0.006211090832948685
Validation loss = 0.005968003533780575
Validation loss = 0.006172065623104572
Validation loss = 0.006602533161640167
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006113884039223194
Validation loss = 0.00625199219211936
Validation loss = 0.005703945644199848
Validation loss = 0.006492094602435827
Validation loss = 0.007179784122854471
Validation loss = 0.006532472558319569
Validation loss = 0.006221200339496136
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0063226232305169106
Validation loss = 0.0062826634384691715
Validation loss = 0.00615069130435586
Validation loss = 0.006143862847238779
Validation loss = 0.005859961733222008
Validation loss = 0.006828424520790577
Validation loss = 0.007566308137029409
Validation loss = 0.005945941433310509
Validation loss = 0.006688982248306274
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 164
average number of affinization = 135.0655737704918
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 155
average number of affinization = 135.38709677419354
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 211
average number of affinization = 136.5873015873016
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 181
average number of affinization = 137.28125
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 255
average number of affinization = 139.09230769230768
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 179
average number of affinization = 139.6969696969697
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 42.8     |
| Iteration     | 9        |
| MaximumReturn | 51.5     |
| MinimumReturn | 35.4     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0067925723269581795
Validation loss = 0.006312087643891573
Validation loss = 0.005854027345776558
Validation loss = 0.0069530620239675045
Validation loss = 0.006368418224155903
Validation loss = 0.006224875338375568
Validation loss = 0.005341480486094952
Validation loss = 0.006784765049815178
Validation loss = 0.00644869776442647
Validation loss = 0.005463809240609407
Validation loss = 0.00675411568954587
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006058037281036377
Validation loss = 0.005758908577263355
Validation loss = 0.006521344184875488
Validation loss = 0.006733198184520006
Validation loss = 0.005908393766731024
Validation loss = 0.006334352772682905
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006492942105978727
Validation loss = 0.006145468447357416
Validation loss = 0.005342793185263872
Validation loss = 0.005236885044723749
Validation loss = 0.0056489259004592896
Validation loss = 0.0062504918314516544
Validation loss = 0.005761222448199987
Validation loss = 0.005523962900042534
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005907302722334862
Validation loss = 0.006028636824339628
Validation loss = 0.0063524022698402405
Validation loss = 0.005810211878269911
Validation loss = 0.005714805796742439
Validation loss = 0.0059601496905088425
Validation loss = 0.005585329607129097
Validation loss = 0.005965905264019966
Validation loss = 0.005747292656451464
Validation loss = 0.005471776705235243
Validation loss = 0.0054847910068929195
Validation loss = 0.005556148011237383
Validation loss = 0.006509900558739901
Validation loss = 0.006231777369976044
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007766182534396648
Validation loss = 0.006399777717888355
Validation loss = 0.006861318834125996
Validation loss = 0.005710871424525976
Validation loss = 0.006467889528721571
Validation loss = 0.005826891865581274
Validation loss = 0.006348285358399153
Validation loss = 0.007868615910410881
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 100
average number of affinization = 139.1044776119403
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 343
average number of affinization = 142.10294117647058
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 341
average number of affinization = 144.9855072463768
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 231
average number of affinization = 146.21428571428572
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 26
average number of affinization = 144.5211267605634
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 100
average number of affinization = 143.90277777777777
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 51.3     |
| Iteration     | 10       |
| MaximumReturn | 66       |
| MinimumReturn | 38.4     |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005867220927029848
Validation loss = 0.006005265284329653
Validation loss = 0.005420422647148371
Validation loss = 0.005245972890406847
Validation loss = 0.005482261534780264
Validation loss = 0.006165515165776014
Validation loss = 0.006716564763337374
Validation loss = 0.005588671192526817
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006216485518962145
Validation loss = 0.005615221802145243
Validation loss = 0.006147156003862619
Validation loss = 0.005024955607950687
Validation loss = 0.005537472199648619
Validation loss = 0.005315214395523071
Validation loss = 0.005460006650537252
Validation loss = 0.005388476420193911
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005455842241644859
Validation loss = 0.005583247169852257
Validation loss = 0.005426833406090736
Validation loss = 0.005420101340860128
Validation loss = 0.005268104840070009
Validation loss = 0.004946490284055471
Validation loss = 0.004777373280376196
Validation loss = 0.005059723276644945
Validation loss = 0.005622100550681353
Validation loss = 0.005365775432437658
Validation loss = 0.005753277335315943
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005689164157956839
Validation loss = 0.005508353468030691
Validation loss = 0.005449887365102768
Validation loss = 0.005807532463222742
Validation loss = 0.006545084062963724
Validation loss = 0.005671315360814333
Validation loss = 0.005244899075478315
Validation loss = 0.006766158621758223
Validation loss = 0.005798202008008957
Validation loss = 0.005280770361423492
Validation loss = 0.005971705075353384
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006123378407210112
Validation loss = 0.007908770814538002
Validation loss = 0.005627238657325506
Validation loss = 0.006048800889402628
Validation loss = 0.00630984827876091
Validation loss = 0.006660275161266327
Validation loss = 0.005832258146256208
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 253
average number of affinization = 145.3972602739726
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 180
average number of affinization = 145.86486486486487
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 216
average number of affinization = 146.8
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 110
average number of affinization = 146.31578947368422
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 110
average number of affinization = 145.84415584415584
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 222
average number of affinization = 146.82051282051282
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 61.9     |
| Iteration     | 11       |
| MaximumReturn | 67.5     |
| MinimumReturn | 56.4     |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005235894583165646
Validation loss = 0.005396503489464521
Validation loss = 0.005201069638133049
Validation loss = 0.005063263233751059
Validation loss = 0.0055456035770475864
Validation loss = 0.005026878789067268
Validation loss = 0.006800102535635233
Validation loss = 0.005296577233821154
Validation loss = 0.005842312704771757
Validation loss = 0.0061684781685471535
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005779211409389973
Validation loss = 0.004969939589500427
Validation loss = 0.005410938523709774
Validation loss = 0.004734803922474384
Validation loss = 0.00502379285171628
Validation loss = 0.005785263143479824
Validation loss = 0.005300792399793863
Validation loss = 0.005199869628995657
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005482039414346218
Validation loss = 0.004850668832659721
Validation loss = 0.004860374610871077
Validation loss = 0.0051221949979662895
Validation loss = 0.0059448848478496075
Validation loss = 0.004760058596730232
Validation loss = 0.0059404755011200905
Validation loss = 0.00500148581340909
Validation loss = 0.004718530457466841
Validation loss = 0.0047444733791053295
Validation loss = 0.0054548499174416065
Validation loss = 0.005248699802905321
Validation loss = 0.0047902269288897514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005914819426834583
Validation loss = 0.005044200457632542
Validation loss = 0.0054574632085859776
Validation loss = 0.0055235824547708035
Validation loss = 0.005213926546275616
Validation loss = 0.006047083996236324
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006270847748965025
Validation loss = 0.0060021462850272655
Validation loss = 0.005572172347456217
Validation loss = 0.005625269375741482
Validation loss = 0.005655201151967049
Validation loss = 0.00618381891399622
Validation loss = 0.0058103944174945354
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 199
average number of affinization = 147.48101265822785
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 252
average number of affinization = 148.7875
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 255
average number of affinization = 150.09876543209876
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 370
average number of affinization = 152.78048780487805
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 215
average number of affinization = 153.53012048192772
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 287
average number of affinization = 155.11904761904762
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 101      |
| Iteration     | 12       |
| MaximumReturn | 113      |
| MinimumReturn | 83.7     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005416874308139086
Validation loss = 0.006383764091879129
Validation loss = 0.004805604927241802
Validation loss = 0.005023621022701263
Validation loss = 0.005237273406237364
Validation loss = 0.0054166680201888084
Validation loss = 0.004833925981074572
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004862545523792505
Validation loss = 0.006003764923661947
Validation loss = 0.004925072193145752
Validation loss = 0.00497345719486475
Validation loss = 0.004932793322950602
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004962884355336428
Validation loss = 0.00548770185559988
Validation loss = 0.004833129700273275
Validation loss = 0.004721572156995535
Validation loss = 0.004765960853546858
Validation loss = 0.00448599411174655
Validation loss = 0.004994572140276432
Validation loss = 0.006128325127065182
Validation loss = 0.004623073153197765
Validation loss = 0.004444928374141455
Validation loss = 0.004479420371353626
Validation loss = 0.004999822471290827
Validation loss = 0.005324640776962042
Validation loss = 0.004467287100851536
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0052539026364684105
Validation loss = 0.005530908238142729
Validation loss = 0.005149994045495987
Validation loss = 0.005313741974532604
Validation loss = 0.00535694370046258
Validation loss = 0.0048819067887961864
Validation loss = 0.004854976665228605
Validation loss = 0.004563327878713608
Validation loss = 0.004878831095993519
Validation loss = 0.005458328407257795
Validation loss = 0.005307583604007959
Validation loss = 0.005467458162456751
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006334593053907156
Validation loss = 0.005100171081721783
Validation loss = 0.005438750144094229
Validation loss = 0.0068767257034778595
Validation loss = 0.004927797708660364
Validation loss = 0.005203669425100088
Validation loss = 0.004928094334900379
Validation loss = 0.006444879807531834
Validation loss = 0.005668133497238159
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 51
average number of affinization = 153.89411764705883
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 93
average number of affinization = 153.1860465116279
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 108
average number of affinization = 152.66666666666666
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 85
average number of affinization = 151.89772727272728
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 43
average number of affinization = 150.67415730337078
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 46
average number of affinization = 149.51111111111112
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 105      |
| Iteration     | 13       |
| MaximumReturn | 124      |
| MinimumReturn | 88.4     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01109354104846716
Validation loss = 0.00975037645548582
Validation loss = 0.007365925703197718
Validation loss = 0.0088163698092103
Validation loss = 0.008152071386575699
Validation loss = 0.007339908741414547
Validation loss = 0.008609747514128685
Validation loss = 0.007509562186896801
Validation loss = 0.007023792248219252
Validation loss = 0.008010548539459705
Validation loss = 0.006463359110057354
Validation loss = 0.007058151997625828
Validation loss = 0.007367157842963934
Validation loss = 0.007158827967941761
Validation loss = 0.007892782799899578
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010166055522859097
Validation loss = 0.009406598284840584
Validation loss = 0.007606462575495243
Validation loss = 0.008891584351658821
Validation loss = 0.007830263115465641
Validation loss = 0.007838348858058453
Validation loss = 0.00794974248856306
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009840265847742558
Validation loss = 0.007900896482169628
Validation loss = 0.006284491159021854
Validation loss = 0.007054300978779793
Validation loss = 0.007010390516370535
Validation loss = 0.006601403467357159
Validation loss = 0.006948602385818958
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011154893785715103
Validation loss = 0.007326612249016762
Validation loss = 0.007398878689855337
Validation loss = 0.008378226310014725
Validation loss = 0.0070476531982421875
Validation loss = 0.007616195362061262
Validation loss = 0.008286060765385628
Validation loss = 0.007706127129495144
Validation loss = 0.007667866535484791
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009522292762994766
Validation loss = 0.008287733420729637
Validation loss = 0.008096178993582726
Validation loss = 0.007969166152179241
Validation loss = 0.010110720992088318
Validation loss = 0.007230612449347973
Validation loss = 0.006967869121581316
Validation loss = 0.006867013871669769
Validation loss = 0.006514155305922031
Validation loss = 0.0076365540735423565
Validation loss = 0.00733522791415453
Validation loss = 0.00699003878980875
Validation loss = 0.007316987030208111
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 105
average number of affinization = 149.02197802197801
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 58
average number of affinization = 148.0326086956522
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 85
average number of affinization = 147.3548387096774
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 70
average number of affinization = 146.53191489361703
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 103
average number of affinization = 146.07368421052632
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 82
average number of affinization = 145.40625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 162      |
| Iteration     | 14       |
| MaximumReturn | 173      |
| MinimumReturn | 156      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008404483087360859
Validation loss = 0.007172806654125452
Validation loss = 0.008396698161959648
Validation loss = 0.0077942414209246635
Validation loss = 0.007109181024134159
Validation loss = 0.007083553820848465
Validation loss = 0.007371528074145317
Validation loss = 0.007535499986261129
Validation loss = 0.010053999722003937
Validation loss = 0.008641366846859455
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006989017128944397
Validation loss = 0.007214344572275877
Validation loss = 0.007441964000463486
Validation loss = 0.007003938313573599
Validation loss = 0.00781472958624363
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0066967918537557125
Validation loss = 0.008067484013736248
Validation loss = 0.006873462814837694
Validation loss = 0.006306575611233711
Validation loss = 0.006676941178739071
Validation loss = 0.006416762247681618
Validation loss = 0.007075917907059193
Validation loss = 0.007716100197285414
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007541082799434662
Validation loss = 0.006233792752027512
Validation loss = 0.00714361434802413
Validation loss = 0.007284887135028839
Validation loss = 0.0073336344212293625
Validation loss = 0.00648850854486227
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007707899436354637
Validation loss = 0.006482194177806377
Validation loss = 0.007738545536994934
Validation loss = 0.0067061325535178185
Validation loss = 0.007651950232684612
Validation loss = 0.0070445057936012745
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 88
average number of affinization = 144.81443298969072
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 85
average number of affinization = 144.20408163265307
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 81
average number of affinization = 143.56565656565655
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 78
average number of affinization = 142.91
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 52
average number of affinization = 142.009900990099
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 72
average number of affinization = 141.3235294117647
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 210      |
| Iteration     | 15       |
| MaximumReturn | 217      |
| MinimumReturn | 200      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00693741999566555
Validation loss = 0.010936310514807701
Validation loss = 0.009337306022644043
Validation loss = 0.007241350598633289
Validation loss = 0.008199036121368408
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0067813582718372345
Validation loss = 0.006154698319733143
Validation loss = 0.007867416366934776
Validation loss = 0.0063436515629291534
Validation loss = 0.00626583956182003
Validation loss = 0.007513429969549179
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0071203080005943775
Validation loss = 0.00702483206987381
Validation loss = 0.006996721960604191
Validation loss = 0.007011450361460447
Validation loss = 0.006059418898075819
Validation loss = 0.006367086432874203
Validation loss = 0.007495526224374771
Validation loss = 0.006807701662182808
Validation loss = 0.006675019860267639
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006623078137636185
Validation loss = 0.007622596807777882
Validation loss = 0.006867700722068548
Validation loss = 0.005861112382262945
Validation loss = 0.007081252988427877
Validation loss = 0.006367478519678116
Validation loss = 0.00572859076783061
Validation loss = 0.005807582288980484
Validation loss = 0.005911076907068491
Validation loss = 0.006110555492341518
Validation loss = 0.0059992303140461445
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005900645162910223
Validation loss = 0.006894844584167004
Validation loss = 0.005883027799427509
Validation loss = 0.006260822992771864
Validation loss = 0.005769640207290649
Validation loss = 0.005616858601570129
Validation loss = 0.006752035580575466
Validation loss = 0.005932819098234177
Validation loss = 0.005943057592958212
Validation loss = 0.007320281118154526
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 25
average number of affinization = 140.19417475728156
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 91
average number of affinization = 139.72115384615384
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 77
average number of affinization = 139.1238095238095
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 67
average number of affinization = 138.4433962264151
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 83
average number of affinization = 137.92523364485982
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 71
average number of affinization = 137.30555555555554
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 223      |
| Iteration     | 16       |
| MaximumReturn | 233      |
| MinimumReturn | 204      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007572251837700605
Validation loss = 0.009585876017808914
Validation loss = 0.00961969792842865
Validation loss = 0.007833698764443398
Validation loss = 0.007789196912199259
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00620612408965826
Validation loss = 0.006747701670974493
Validation loss = 0.00773240439593792
Validation loss = 0.0068381293676793575
Validation loss = 0.006221551913768053
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006786237470805645
Validation loss = 0.005926979705691338
Validation loss = 0.005942059680819511
Validation loss = 0.0068644313141703606
Validation loss = 0.006051680073142052
Validation loss = 0.005994345992803574
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006499742157757282
Validation loss = 0.006008947733789682
Validation loss = 0.006777891889214516
Validation loss = 0.007067551836371422
Validation loss = 0.006234052591025829
Validation loss = 0.006662273779511452
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005757996812462807
Validation loss = 0.006553628947585821
Validation loss = 0.005807772278785706
Validation loss = 0.00633638771250844
Validation loss = 0.00654895044863224
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 135
average number of affinization = 137.28440366972478
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 141
average number of affinization = 137.3181818181818
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 132
average number of affinization = 137.27027027027026
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 158
average number of affinization = 137.45535714285714
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 116
average number of affinization = 137.26548672566372
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 0
average number of affinization = 136.06140350877192
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 249      |
| Iteration     | 17       |
| MaximumReturn | 256      |
| MinimumReturn | 240      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008170774206519127
Validation loss = 0.006778216455131769
Validation loss = 0.012277284637093544
Validation loss = 0.008827435784041882
Validation loss = 0.011210191063582897
Validation loss = 0.006292631849646568
Validation loss = 0.012730617076158524
Validation loss = 0.009420563466846943
Validation loss = 0.009870300069451332
Validation loss = 0.00924596469849348
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006020387168973684
Validation loss = 0.006298856344074011
Validation loss = 0.005973335355520248
Validation loss = 0.007691592909395695
Validation loss = 0.0068380325101315975
Validation loss = 0.0062671261839568615
Validation loss = 0.007059257943183184
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006881291046738625
Validation loss = 0.005681303795427084
Validation loss = 0.006198105867952108
Validation loss = 0.006129543762654066
Validation loss = 0.00553110521286726
Validation loss = 0.0060605239123106
Validation loss = 0.006103323306888342
Validation loss = 0.006165511906147003
Validation loss = 0.0062788743525743484
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005786011926829815
Validation loss = 0.005803961306810379
Validation loss = 0.006399905774742365
Validation loss = 0.00535191036760807
Validation loss = 0.005923959892243147
Validation loss = 0.006065616849809885
Validation loss = 0.005690271966159344
Validation loss = 0.006137832533568144
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006098689045757055
Validation loss = 0.006836319342255592
Validation loss = 0.006113818380981684
Validation loss = 0.0062451548874378204
Validation loss = 0.005688713863492012
Validation loss = 0.006495643872767687
Validation loss = 0.005529904272407293
Validation loss = 0.007893635891377926
Validation loss = 0.005619566421955824
Validation loss = 0.005920715164393187
Validation loss = 0.005601381883025169
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 83
average number of affinization = 135.6
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 112
average number of affinization = 135.39655172413794
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 103
average number of affinization = 135.1196581196581
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 91
average number of affinization = 134.74576271186442
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 81
average number of affinization = 134.2941176470588
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 84
average number of affinization = 133.875
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 249      |
| Iteration     | 18       |
| MaximumReturn | 261      |
| MinimumReturn | 236      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009801836684346199
Validation loss = 0.0126317348331213
Validation loss = 0.007364827208220959
Validation loss = 0.008093687705695629
Validation loss = 0.008611944504082203
Validation loss = 0.00791558064520359
Validation loss = 0.00806922186166048
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009742144495248795
Validation loss = 0.006644226610660553
Validation loss = 0.006272240541875362
Validation loss = 0.006087802350521088
Validation loss = 0.006479545030742884
Validation loss = 0.005874199327081442
Validation loss = 0.005640369839966297
Validation loss = 0.006563619710505009
Validation loss = 0.006661688443273306
Validation loss = 0.005909495055675507
Validation loss = 0.007256154902279377
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005828057881444693
Validation loss = 0.006576612591743469
Validation loss = 0.006762057542800903
Validation loss = 0.0053056082688272
Validation loss = 0.005684831645339727
Validation loss = 0.0062212273478507996
Validation loss = 0.006254342384636402
Validation loss = 0.005481471307575703
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0053365351632237434
Validation loss = 0.005075075663626194
Validation loss = 0.005809650756418705
Validation loss = 0.005150866694748402
Validation loss = 0.005098247434943914
Validation loss = 0.006263272371143103
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005726946983486414
Validation loss = 0.007147506810724735
Validation loss = 0.0057840351946651936
Validation loss = 0.005771000869572163
Validation loss = 0.005975225009024143
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 125
average number of affinization = 133.80165289256198
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 129
average number of affinization = 133.7622950819672
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 101
average number of affinization = 133.4959349593496
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 125
average number of affinization = 133.42741935483872
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 127
average number of affinization = 133.376
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 126
average number of affinization = 133.31746031746033
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 268      |
| Iteration     | 19       |
| MaximumReturn | 277      |
| MinimumReturn | 261      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007102824281901121
Validation loss = 0.008602798916399479
Validation loss = 0.00858290120959282
Validation loss = 0.007553768344223499
Validation loss = 0.009236342273652554
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0063301390036940575
Validation loss = 0.005866474937647581
Validation loss = 0.006582390516996384
Validation loss = 0.007024677004665136
Validation loss = 0.00543222576379776
Validation loss = 0.006357075180858374
Validation loss = 0.0059140510857105255
Validation loss = 0.006478272844105959
Validation loss = 0.006577052641659975
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005900385323911905
Validation loss = 0.006167334970086813
Validation loss = 0.006428154651075602
Validation loss = 0.005817517172545195
Validation loss = 0.006390491966158152
Validation loss = 0.005689790938049555
Validation loss = 0.0057384236715734005
Validation loss = 0.007121552247554064
Validation loss = 0.0058213407173752785
Validation loss = 0.005426119081676006
Validation loss = 0.005536817014217377
Validation loss = 0.007552264723926783
Validation loss = 0.005904791411012411
Validation loss = 0.005134291481226683
Validation loss = 0.0056852055713534355
Validation loss = 0.006637666840106249
Validation loss = 0.005539592355489731
Validation loss = 0.006001844070851803
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00576723413541913
Validation loss = 0.004978958051651716
Validation loss = 0.005110056605190039
Validation loss = 0.004802199080586433
Validation loss = 0.0060584223829209805
Validation loss = 0.0052154031582176685
Validation loss = 0.005767542868852615
Validation loss = 0.006037554703652859
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005346584599465132
Validation loss = 0.005567337851971388
Validation loss = 0.005777508486062288
Validation loss = 0.005653894040733576
Validation loss = 0.005717724561691284
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 106
average number of affinization = 133.10236220472441
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 42
average number of affinization = 132.390625
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 131
average number of affinization = 132.3798449612403
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 20
average number of affinization = 131.51538461538462
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 131
average number of affinization = 131.5114503816794
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 116
average number of affinization = 131.3939393939394
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 282      |
| Iteration     | 20       |
| MaximumReturn | 297      |
| MinimumReturn | 257      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008115825243294239
Validation loss = 0.008657516911625862
Validation loss = 0.009237282909452915
Validation loss = 0.006456299219280481
Validation loss = 0.0068620117381215096
Validation loss = 0.007935027591884136
Validation loss = 0.008227173238992691
Validation loss = 0.007767627947032452
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006031131837517023
Validation loss = 0.007004177663475275
Validation loss = 0.0060593863017857075
Validation loss = 0.0068299425765872
Validation loss = 0.005332048516720533
Validation loss = 0.005744585767388344
Validation loss = 0.007569904904812574
Validation loss = 0.009238701313734055
Validation loss = 0.006640926003456116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006783055141568184
Validation loss = 0.006277571897953749
Validation loss = 0.008218943141400814
Validation loss = 0.005239252466708422
Validation loss = 0.005355131346732378
Validation loss = 0.0054796780459582806
Validation loss = 0.005237262696027756
Validation loss = 0.004894569981843233
Validation loss = 0.005245871376246214
Validation loss = 0.006587735842913389
Validation loss = 0.004915118217468262
Validation loss = 0.005004188045859337
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0050894650630652905
Validation loss = 0.005620067473500967
Validation loss = 0.0053956229239702225
Validation loss = 0.00551035488024354
Validation loss = 0.005401612259447575
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006076669320464134
Validation loss = 0.005954916588962078
Validation loss = 0.005426411051303148
Validation loss = 0.006956508383154869
Validation loss = 0.007778472732752562
Validation loss = 0.0062867505475878716
Validation loss = 0.005425113718956709
Validation loss = 0.005286743864417076
Validation loss = 0.0056270151399075985
Validation loss = 0.006382322404533625
Validation loss = 0.005393592640757561
Validation loss = 0.00758044607937336
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 84
average number of affinization = 131.0375939849624
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 132
average number of affinization = 131.044776119403
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 133
average number of affinization = 131.05925925925925
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 63
average number of affinization = 130.55882352941177
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 148
average number of affinization = 130.68613138686132
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 142
average number of affinization = 130.768115942029
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 290      |
| Iteration     | 21       |
| MaximumReturn | 298      |
| MinimumReturn | 271      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009626504965126514
Validation loss = 0.005675279535353184
Validation loss = 0.0065226079896092415
Validation loss = 0.008391856215894222
Validation loss = 0.0064333281479775906
Validation loss = 0.005652689374983311
Validation loss = 0.007129262201488018
Validation loss = 0.00878381822258234
Validation loss = 0.007157303858548403
Validation loss = 0.006336363963782787
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005800732411444187
Validation loss = 0.0078578544780612
Validation loss = 0.006730593275278807
Validation loss = 0.007134756539016962
Validation loss = 0.009325710125267506
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0053678350523114204
Validation loss = 0.005700837820768356
Validation loss = 0.00577257527038455
Validation loss = 0.0058782449923455715
Validation loss = 0.005013035610318184
Validation loss = 0.006054861471056938
Validation loss = 0.00540585070848465
Validation loss = 0.00521572632715106
Validation loss = 0.004940498620271683
Validation loss = 0.005336154717952013
Validation loss = 0.004871666897088289
Validation loss = 0.004892551805824041
Validation loss = 0.004693587776273489
Validation loss = 0.004718938376754522
Validation loss = 0.005019348580390215
Validation loss = 0.005047326907515526
Validation loss = 0.005175426602363586
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005430266726762056
Validation loss = 0.00484264874830842
Validation loss = 0.0057486738078296185
Validation loss = 0.005120478570461273
Validation loss = 0.005716666579246521
Validation loss = 0.005155182909220457
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005905604921281338
Validation loss = 0.0057140556164085865
Validation loss = 0.006662464234977961
Validation loss = 0.005556211341172457
Validation loss = 0.005659789312630892
Validation loss = 0.005718191619962454
Validation loss = 0.006479906849563122
Validation loss = 0.005929510109126568
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 69
average number of affinization = 130.32374100719426
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 88
average number of affinization = 130.02142857142857
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 108
average number of affinization = 129.86524822695034
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 92
average number of affinization = 129.59859154929578
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 103
average number of affinization = 129.4125874125874
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 116
average number of affinization = 129.31944444444446
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 302      |
| Iteration     | 22       |
| MaximumReturn | 304      |
| MinimumReturn | 301      |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007517263293266296
Validation loss = 0.00931018590927124
Validation loss = 0.005936510860919952
Validation loss = 0.006320390850305557
Validation loss = 0.007794937584549189
Validation loss = 0.006554406136274338
Validation loss = 0.005914872977882624
Validation loss = 0.01035389769822359
Validation loss = 0.005885189864784479
Validation loss = 0.007749969605356455
Validation loss = 0.007191268727183342
Validation loss = 0.0053445082157850266
Validation loss = 0.005408404860645533
Validation loss = 0.007047597318887711
Validation loss = 0.006487678736448288
Validation loss = 0.004952969495207071
Validation loss = 0.0058534047566354275
Validation loss = 0.005621167365461588
Validation loss = 0.0057023330591619015
Validation loss = 0.006571053992956877
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006517017725855112
Validation loss = 0.006267666816711426
Validation loss = 0.009491527453064919
Validation loss = 0.0063967518508434296
Validation loss = 0.005498824175447226
Validation loss = 0.007053875830024481
Validation loss = 0.007197033613920212
Validation loss = 0.011537757702171803
Validation loss = 0.007423679810017347
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005017304793000221
Validation loss = 0.004937826190143824
Validation loss = 0.005277479533106089
Validation loss = 0.004597008228302002
Validation loss = 0.004770656581968069
Validation loss = 0.0050451611168682575
Validation loss = 0.005171625409275293
Validation loss = 0.004696577787399292
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0049917069263756275
Validation loss = 0.0048192404210567474
Validation loss = 0.004671749193221331
Validation loss = 0.0050056190229952335
Validation loss = 0.005244725849479437
Validation loss = 0.005137699656188488
Validation loss = 0.005189552903175354
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005580015014857054
Validation loss = 0.005206551868468523
Validation loss = 0.0056002214550971985
Validation loss = 0.005191613920032978
Validation loss = 0.007673633750528097
Validation loss = 0.005100848153233528
Validation loss = 0.005330302286893129
Validation loss = 0.005576461553573608
Validation loss = 0.006044744048267603
Validation loss = 0.006917124148458242
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 88
average number of affinization = 129.0344827586207
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 110
average number of affinization = 128.9041095890411
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 97
average number of affinization = 128.68707482993196
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 95
average number of affinization = 128.45945945945945
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 83
average number of affinization = 128.15436241610738
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 72
average number of affinization = 127.78
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 299      |
| Iteration     | 23       |
| MaximumReturn | 305      |
| MinimumReturn | 292      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00625609653070569
Validation loss = 0.006074418313801289
Validation loss = 0.006001404952257872
Validation loss = 0.00593913858756423
Validation loss = 0.006545639131218195
Validation loss = 0.005795385222882032
Validation loss = 0.005877071525901556
Validation loss = 0.00596141442656517
Validation loss = 0.006598471198230982
Validation loss = 0.006150352768599987
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00539914146065712
Validation loss = 0.006641447078436613
Validation loss = 0.006258086767047644
Validation loss = 0.006283795461058617
Validation loss = 0.0063862549141049385
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004534718580543995
Validation loss = 0.0047376505099236965
Validation loss = 0.004797290079295635
Validation loss = 0.004458616487681866
Validation loss = 0.005444498732686043
Validation loss = 0.0052206628024578094
Validation loss = 0.004562986548990011
Validation loss = 0.004426715895533562
Validation loss = 0.004545687232166529
Validation loss = 0.004578567575663328
Validation loss = 0.004450100474059582
Validation loss = 0.004823632538318634
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00512612983584404
Validation loss = 0.005106289405375719
Validation loss = 0.004684661980718374
Validation loss = 0.004966130945831537
Validation loss = 0.005194151308387518
Validation loss = 0.004986800253391266
Validation loss = 0.00501471059396863
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0064390795305371284
Validation loss = 0.005308100953698158
Validation loss = 0.0054628970101475716
Validation loss = 0.005034900736063719
Validation loss = 0.005351993255317211
Validation loss = 0.005139467306435108
Validation loss = 0.005719133652746677
Validation loss = 0.005413556005805731
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 110
average number of affinization = 127.66225165562913
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 84
average number of affinization = 127.375
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 33
average number of affinization = 126.75816993464052
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 94
average number of affinization = 126.54545454545455
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 94
average number of affinization = 126.33548387096774
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 89
average number of affinization = 126.09615384615384
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 298      |
| Iteration     | 24       |
| MaximumReturn | 303      |
| MinimumReturn | 293      |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005435888655483723
Validation loss = 0.006105139851570129
Validation loss = 0.00827235821634531
Validation loss = 0.005832984112203121
Validation loss = 0.005913893226534128
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007593939080834389
Validation loss = 0.00502000842243433
Validation loss = 0.005957658868283033
Validation loss = 0.006440476514399052
Validation loss = 0.004943090490996838
Validation loss = 0.005917854607105255
Validation loss = 0.006439580582082272
Validation loss = 0.008392943069338799
Validation loss = 0.005995656829327345
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004991351626813412
Validation loss = 0.004611148964613676
Validation loss = 0.005029613617807627
Validation loss = 0.004732880275696516
Validation loss = 0.004692090209573507
Validation loss = 0.005145368631929159
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004708654712885618
Validation loss = 0.00582482386380434
Validation loss = 0.004735276568681002
Validation loss = 0.004981817677617073
Validation loss = 0.0050377268344163895
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005329569801688194
Validation loss = 0.0050858864560723305
Validation loss = 0.005744508933275938
Validation loss = 0.005246680695563555
Validation loss = 0.004718487150967121
Validation loss = 0.004690655041486025
Validation loss = 0.005047032609581947
Validation loss = 0.0054069687612354755
Validation loss = 0.005663213320076466
Validation loss = 0.005094734486192465
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 64
average number of affinization = 125.70063694267516
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 51
average number of affinization = 125.22784810126582
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 88
average number of affinization = 124.99371069182389
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 88
average number of affinization = 124.7625
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 65
average number of affinization = 124.3913043478261
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 91
average number of affinization = 124.18518518518519
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 305      |
| Iteration     | 25       |
| MaximumReturn | 307      |
| MinimumReturn | 304      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0054609691724181175
Validation loss = 0.005368703510612249
Validation loss = 0.005011516157537699
Validation loss = 0.005893299821764231
Validation loss = 0.006410238333046436
Validation loss = 0.00541037879884243
Validation loss = 0.0065572443418204784
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006403601728379726
Validation loss = 0.0056434995494782925
Validation loss = 0.005644191522151232
Validation loss = 0.006154506932944059
Validation loss = 0.005375570151954889
Validation loss = 0.008801629766821861
Validation loss = 0.004991075024008751
Validation loss = 0.006789663340896368
Validation loss = 0.007029074709862471
Validation loss = 0.005872231442481279
Validation loss = 0.005532689858227968
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005517241079360247
Validation loss = 0.004479857161641121
Validation loss = 0.005205863621085882
Validation loss = 0.004311942961066961
Validation loss = 0.0044556790962815285
Validation loss = 0.00443242909386754
Validation loss = 0.0051134866662323475
Validation loss = 0.0042598736472427845
Validation loss = 0.004747898317873478
Validation loss = 0.004666459280997515
Validation loss = 0.004394294694066048
Validation loss = 0.004748290404677391
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005738625768572092
Validation loss = 0.005139052402228117
Validation loss = 0.00470811827108264
Validation loss = 0.004502492491155863
Validation loss = 0.004474927671253681
Validation loss = 0.004870304837822914
Validation loss = 0.004772587679326534
Validation loss = 0.005286138504743576
Validation loss = 0.005031136330217123
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005271163769066334
Validation loss = 0.004995620809495449
Validation loss = 0.005083716940134764
Validation loss = 0.004966794978827238
Validation loss = 0.004769245628267527
Validation loss = 0.004661257844418287
Validation loss = 0.004465981386601925
Validation loss = 0.0047028400003910065
Validation loss = 0.00480446545407176
Validation loss = 0.00495657604187727
Validation loss = 0.0057082646526396275
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 45
average number of affinization = 123.69938650306749
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 85
average number of affinization = 123.46341463414635
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 107
average number of affinization = 123.36363636363636
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 112
average number of affinization = 123.29518072289157
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 89
average number of affinization = 123.08982035928143
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 93
average number of affinization = 122.91071428571429
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 304      |
| Iteration     | 26       |
| MaximumReturn | 308      |
| MinimumReturn | 301      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0050732409581542015
Validation loss = 0.0050119501538574696
Validation loss = 0.006424176041036844
Validation loss = 0.005103671457618475
Validation loss = 0.0056197806261479855
Validation loss = 0.006035688798874617
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007755550090223551
Validation loss = 0.00780175207182765
Validation loss = 0.006124158855527639
Validation loss = 0.005940552800893784
Validation loss = 0.006929623894393444
Validation loss = 0.00572185218334198
Validation loss = 0.007076037582010031
Validation loss = 0.005101453047245741
Validation loss = 0.005196091253310442
Validation loss = 0.005693577229976654
Validation loss = 0.005093019921332598
Validation loss = 0.005551342386752367
Validation loss = 0.006385100539773703
Validation loss = 0.004830049816519022
Validation loss = 0.006503072567284107
Validation loss = 0.006064478307962418
Validation loss = 0.006005025468766689
Validation loss = 0.005055611487478018
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005339946132153273
Validation loss = 0.004487114492803812
Validation loss = 0.004088074434548616
Validation loss = 0.004746351856738329
Validation loss = 0.00435592420399189
Validation loss = 0.004952775780111551
Validation loss = 0.0047968486323952675
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004811667837202549
Validation loss = 0.005265598651021719
Validation loss = 0.005025613587349653
Validation loss = 0.005124840885400772
Validation loss = 0.005223091226071119
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004571742378175259
Validation loss = 0.00496376259252429
Validation loss = 0.004866410046815872
Validation loss = 0.005033611785620451
Validation loss = 0.005034086760133505
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 88
average number of affinization = 122.70414201183432
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 74
average number of affinization = 122.41764705882353
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 110
average number of affinization = 122.34502923976608
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 98
average number of affinization = 122.20348837209302
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 86
average number of affinization = 121.99421965317919
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 81
average number of affinization = 121.75862068965517
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 284      |
| Iteration     | 27       |
| MaximumReturn | 292      |
| MinimumReturn | 272      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005308692809194326
Validation loss = 0.0051960633136332035
Validation loss = 0.004603992681950331
Validation loss = 0.004917324520647526
Validation loss = 0.00472661480307579
Validation loss = 0.004607899580150843
Validation loss = 0.005058536306023598
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005788340233266354
Validation loss = 0.005103570409119129
Validation loss = 0.004439166747033596
Validation loss = 0.006015400402247906
Validation loss = 0.005288764368742704
Validation loss = 0.004636853002011776
Validation loss = 0.005210768431425095
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005564802326261997
Validation loss = 0.0049507287330925465
Validation loss = 0.004820290021598339
Validation loss = 0.004533958621323109
Validation loss = 0.004508179146796465
Validation loss = 0.00471448851749301
Validation loss = 0.0050181979313492775
Validation loss = 0.00455417251214385
Validation loss = 0.004722306504845619
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005007653962820768
Validation loss = 0.00548790767788887
Validation loss = 0.005547753535211086
Validation loss = 0.005117286927998066
Validation loss = 0.005796209443360567
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005420784931629896
Validation loss = 0.005523327738046646
Validation loss = 0.0051191421225667
Validation loss = 0.005251954309642315
Validation loss = 0.005198984406888485
Validation loss = 0.005172558594495058
Validation loss = 0.005840728525072336
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 83
average number of affinization = 121.53714285714285
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 82
average number of affinization = 121.3125
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 110
average number of affinization = 121.24858757062147
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 97
average number of affinization = 121.11235955056179
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 98
average number of affinization = 120.98324022346368
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 86
average number of affinization = 120.78888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 295      |
| Iteration     | 28       |
| MaximumReturn | 301      |
| MinimumReturn | 280      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0058487714268267155
Validation loss = 0.00639778608456254
Validation loss = 0.005748285446316004
Validation loss = 0.005626599304378033
Validation loss = 0.0047687953338027
Validation loss = 0.004971400368958712
Validation loss = 0.00490576820448041
Validation loss = 0.005055178888142109
Validation loss = 0.0055320290848612785
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005292857065796852
Validation loss = 0.004876834340393543
Validation loss = 0.0054959882982075214
Validation loss = 0.0058092987164855
Validation loss = 0.005148880183696747
Validation loss = 0.004727900493890047
Validation loss = 0.00458558602258563
Validation loss = 0.004448762629181147
Validation loss = 0.005221845582127571
Validation loss = 0.0043580555357038975
Validation loss = 0.005243942141532898
Validation loss = 0.004815603606402874
Validation loss = 0.004433168563991785
Validation loss = 0.005311502609401941
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004585847724229097
Validation loss = 0.0045260838232934475
Validation loss = 0.00528687471523881
Validation loss = 0.004548032768070698
Validation loss = 0.005015777889639139
Validation loss = 0.0050002275966107845
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005352267064154148
Validation loss = 0.004720381461083889
Validation loss = 0.004668287932872772
Validation loss = 0.004731288179755211
Validation loss = 0.004458224400877953
Validation loss = 0.0044945115223526955
Validation loss = 0.005028007552027702
Validation loss = 0.0048922584392130375
Validation loss = 0.00481905834749341
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005495580844581127
Validation loss = 0.004536218009889126
Validation loss = 0.005047517362982035
Validation loss = 0.004927969072014093
Validation loss = 0.0052093868143856525
Validation loss = 0.004480141680687666
Validation loss = 0.005064138676971197
Validation loss = 0.00516927195712924
Validation loss = 0.004963946063071489
Validation loss = 0.0058050379157066345
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 20
average number of affinization = 120.23204419889503
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 88
average number of affinization = 120.05494505494505
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 96
average number of affinization = 119.92349726775956
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 79
average number of affinization = 119.70108695652173
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 74
average number of affinization = 119.45405405405405
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 86
average number of affinization = 119.2741935483871
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 297      |
| Iteration     | 29       |
| MaximumReturn | 302      |
| MinimumReturn | 294      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004875008948147297
Validation loss = 0.0048461430706083775
Validation loss = 0.005844480823725462
Validation loss = 0.004563353024423122
Validation loss = 0.004742718767374754
Validation loss = 0.0048942482098937035
Validation loss = 0.004558895714581013
Validation loss = 0.004755901638418436
Validation loss = 0.004928802605718374
Validation loss = 0.004535386338829994
Validation loss = 0.004766537807881832
Validation loss = 0.005026410799473524
Validation loss = 0.005508845206350088
Validation loss = 0.004541176371276379
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006031571421772242
Validation loss = 0.00497708935290575
Validation loss = 0.0049822330474853516
Validation loss = 0.004654434509575367
Validation loss = 0.004832640755921602
Validation loss = 0.00423318799585104
Validation loss = 0.005955610424280167
Validation loss = 0.005599642172455788
Validation loss = 0.004694378934800625
Validation loss = 0.00461024884134531
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0046433256939053535
Validation loss = 0.0043815006501972675
Validation loss = 0.0045431009493768215
Validation loss = 0.006033833138644695
Validation loss = 0.004893728531897068
Validation loss = 0.004592876881361008
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004986135754734278
Validation loss = 0.004597044549882412
Validation loss = 0.0043911319226026535
Validation loss = 0.005014645867049694
Validation loss = 0.00455091567710042
Validation loss = 0.005083908326923847
Validation loss = 0.004976834636181593
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004645921755582094
Validation loss = 0.004476959817111492
Validation loss = 0.0063278088346123695
Validation loss = 0.005556163843721151
Validation loss = 0.004917957354336977
Validation loss = 0.004684755112975836
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 59
average number of affinization = 118.95187165775401
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 78
average number of affinization = 118.73404255319149
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 89
average number of affinization = 118.57671957671958
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 107
average number of affinization = 118.51578947368421
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 77
average number of affinization = 118.29842931937172
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 106
average number of affinization = 118.234375
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 294      |
| Iteration     | 30       |
| MaximumReturn | 299      |
| MinimumReturn | 285      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004189525730907917
Validation loss = 0.004200845956802368
Validation loss = 0.005512990057468414
Validation loss = 0.004277602769434452
Validation loss = 0.005282293539494276
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0045000286772847176
Validation loss = 0.004426694940775633
Validation loss = 0.005056215450167656
Validation loss = 0.0046849800273776054
Validation loss = 0.0048765139654278755
Validation loss = 0.005047223996371031
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0042788125574588776
Validation loss = 0.004498559050261974
Validation loss = 0.005248155444860458
Validation loss = 0.004476167727261782
Validation loss = 0.0045790099538862705
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00422033853828907
Validation loss = 0.004656593780964613
Validation loss = 0.004806915298104286
Validation loss = 0.00466183852404356
Validation loss = 0.004765388555824757
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004847628064453602
Validation loss = 0.004801020957529545
Validation loss = 0.005815025418996811
Validation loss = 0.004802673123776913
Validation loss = 0.0056304363533854485
Validation loss = 0.004804066848009825
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 72
average number of affinization = 117.99481865284974
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 78
average number of affinization = 117.78865979381443
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 65
average number of affinization = 117.51794871794871
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 95
average number of affinization = 117.40306122448979
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 96
average number of affinization = 117.29441624365482
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 100
average number of affinization = 117.20707070707071
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 272      |
| Iteration     | 31       |
| MaximumReturn | 276      |
| MinimumReturn | 265      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00465633999556303
Validation loss = 0.004348586779087782
Validation loss = 0.006051723379641771
Validation loss = 0.004805796314030886
Validation loss = 0.004547599237412214
Validation loss = 0.004214043263345957
Validation loss = 0.004746412392705679
Validation loss = 0.00442091329023242
Validation loss = 0.004458314273506403
Validation loss = 0.004768219776451588
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005383288953453302
Validation loss = 0.005256754346191883
Validation loss = 0.0044214543886482716
Validation loss = 0.004735960159450769
Validation loss = 0.004421253688633442
Validation loss = 0.004426568280905485
Validation loss = 0.004818717949092388
Validation loss = 0.00450480729341507
Validation loss = 0.004383557941764593
Validation loss = 0.006294327322393656
Validation loss = 0.004908241331577301
Validation loss = 0.0046915593557059765
Validation loss = 0.004277800675481558
Validation loss = 0.005896823480725288
Validation loss = 0.004282341804355383
Validation loss = 0.004423833917826414
Validation loss = 0.004941149149090052
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004712305497378111
Validation loss = 0.005000318866223097
Validation loss = 0.004786121193319559
Validation loss = 0.005176623817533255
Validation loss = 0.005088789854198694
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004725894425064325
Validation loss = 0.004505230113863945
Validation loss = 0.004611891694366932
Validation loss = 0.005008930340409279
Validation loss = 0.004524906165897846
Validation loss = 0.004515748471021652
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0048053632490336895
Validation loss = 0.004861811175942421
Validation loss = 0.004648706875741482
Validation loss = 0.005088783334940672
Validation loss = 0.004477524198591709
Validation loss = 0.0045457459054887295
Validation loss = 0.004641084466129541
Validation loss = 0.005749907810240984
Validation loss = 0.004511750768870115
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 89
average number of affinization = 117.06532663316582
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 71
average number of affinization = 116.835
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 88
average number of affinization = 116.69154228855722
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 87
average number of affinization = 116.54455445544555
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 93
average number of affinization = 116.42857142857143
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 95
average number of affinization = 116.32352941176471
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 303      |
| Iteration     | 32       |
| MaximumReturn | 306      |
| MinimumReturn | 301      |
| TotalSamples  | 136000   |
----------------------------
