Logging to logs/HalfCheetah-v4/IRL/2025_02_01_11_36_42
2025-02-01 11:36:56.198072 Eastern Standard Time
| Itration            | 0        |
| Real Det Return     | -0.37    |
| Real Sto Return     | -24.5    |
| Reward Loss         | 36.9     |
| Running Env Steps   | 0        |
| Running Forward KL  | 9.91     |
| Running Reverse KL  | 7.16     |
| Running Update Time | 0        |
----------------------------------
2025-02-01 11:37:10.497713 Eastern Standard Time
| Itration            | 1        |
| Real Det Return     | -0.31    |
| Real Sto Return     | -39.2    |
| Reward Loss         | 34.3     |
| Running Env Steps   | 500      |
| Running Forward KL  | 9.59     |
| Running Reverse KL  | 7.19     |
| Running Update Time | 1        |
----------------------------------
2025-02-01 11:37:24.791541 Eastern Standard Time
| Itration            | 2        |
| Real Det Return     | -1.11    |
| Real Sto Return     | -26      |
| Reward Loss         | 33.4     |
| Running Env Steps   | 1000     |
| Running Forward KL  | 9.66     |
| Running Reverse KL  | 7.12     |
| Running Update Time | 2        |
----------------------------------
2025-02-01 11:37:39.006787 Eastern Standard Time
| Itration            | 3        |
| Real Det Return     | -0.84    |
| Real Sto Return     | -32.6    |
| Reward Loss         | 30.8     |
| Running Env Steps   | 1500     |
| Running Forward KL  | 9.55     |
| Running Reverse KL  | 7.18     |
| Running Update Time | 3        |
----------------------------------
2025-02-01 11:37:53.265000 Eastern Standard Time
| Itration            | 4        |
| Real Det Return     | -1.3     |
| Real Sto Return     | -25.8    |
| Reward Loss         | 29.1     |
| Running Env Steps   | 2000     |
| Running Forward KL  | 9.31     |
| Running Reverse KL  | 7.1      |
| Running Update Time | 4        |
----------------------------------
2025-02-01 11:38:07.490126 Eastern Standard Time
| Itration            | 5        |
| Real Det Return     | -2.23    |
| Real Sto Return     | -17.6    |
| Reward Loss         | 27.5     |
| Running Env Steps   | 2500     |
| Running Forward KL  | 9.72     |
| Running Reverse KL  | 7.39     |
| Running Update Time | 5        |
----------------------------------
2025-02-01 11:38:21.716609 Eastern Standard Time
| Itration            | 6        |
| Real Det Return     | -2       |
| Real Sto Return     | -24.6    |
| Reward Loss         | 28.9     |
| Running Env Steps   | 3000     |
| Running Forward KL  | 9.7      |
| Running Reverse KL  | 7.57     |
| Running Update Time | 6        |
----------------------------------
2025-02-01 11:38:35.885978 Eastern Standard Time
| Itration            | 7        |
| Real Det Return     | -1.4     |
| Real Sto Return     | -14.4    |
| Reward Loss         | 31       |
| Running Env Steps   | 3500     |
| Running Forward KL  | 10       |
| Running Reverse KL  | 8.32     |
| Running Update Time | 7        |
----------------------------------
2025-02-01 11:38:50.182334 Eastern Standard Time
| Itration            | 8        |
| Real Det Return     | -1.91    |
| Real Sto Return     | -10.8    |
| Reward Loss         | 29.4     |
| Running Env Steps   | 4000     |
| Running Forward KL  | 9.29     |
| Running Reverse KL  | 7.83     |
| Running Update Time | 8        |
----------------------------------
2025-02-01 11:39:04.353123 Eastern Standard Time
| Itration            | 9        |
| Real Det Return     | -1.36    |
| Real Sto Return     | -17.9    |
| Reward Loss         | 22.3     |
| Running Env Steps   | 4500     |
| Running Forward KL  | 9.79     |
| Running Reverse KL  | 7.8      |
| Running Update Time | 9        |
----------------------------------
2025-02-01 11:39:18.597565 Eastern Standard Time
| Itration            | 10       |
| Real Det Return     | -2.11    |
| Real Sto Return     | -0.43    |
| Reward Loss         | 27       |
| Running Env Steps   | 5000     |
| Running Forward KL  | 9.85     |
| Running Reverse KL  | 8.3      |
| Running Update Time | 10       |
----------------------------------
2025-02-01 11:39:32.795714 Eastern Standard Time
| Itration            | 11       |
| Real Det Return     | -4.13    |
| Real Sto Return     | -3.5     |
| Reward Loss         | 25.8     |
| Running Env Steps   | 5500     |
| Running Forward KL  | 9.48     |
| Running Reverse KL  | 8.71     |
| Running Update Time | 11       |
----------------------------------
2025-02-01 11:39:47.072693 Eastern Standard Time
| Itration            | 12       |
| Real Det Return     | -0.55    |
| Real Sto Return     | -3.1     |
| Reward Loss         | 21.4     |
| Running Env Steps   | 6000     |
| Running Forward KL  | 9.62     |
| Running Reverse KL  | 8.26     |
| Running Update Time | 12       |
----------------------------------
2025-02-01 11:40:01.271172 Eastern Standard Time
| Itration            | 13       |
| Real Det Return     | 11.8     |
| Real Sto Return     | -3.53    |
| Reward Loss         | 20.7     |
| Running Env Steps   | 6500     |
| Running Forward KL  | 9.58     |
| Running Reverse KL  | 8.61     |
| Running Update Time | 13       |
----------------------------------
2025-02-01 11:40:15.469853 Eastern Standard Time
| Itration            | 14       |
| Real Det Return     | 11.7     |
| Real Sto Return     | -0.61    |
| Reward Loss         | 20.8     |
| Running Env Steps   | 7000     |
| Running Forward KL  | 9.41     |
| Running Reverse KL  | 8.43     |
| Running Update Time | 14       |
----------------------------------
2025-02-01 11:40:29.697015 Eastern Standard Time
| Itration            | 15       |
| Real Det Return     | 9.02     |
| Real Sto Return     | -0.03    |
| Reward Loss         | 19.6     |
| Running Env Steps   | 7500     |
| Running Forward KL  | 10.1     |
| Running Reverse KL  | 9.3      |
| Running Update Time | 15       |
----------------------------------
2025-02-01 11:40:44.257077 Eastern Standard Time
| Itration            | 16       |
| Real Det Return     | 11.9     |
| Real Sto Return     | -2.14    |
| Reward Loss         | 12.6     |
| Running Env Steps   | 8000     |
| Running Forward KL  | 9.47     |
| Running Reverse KL  | 8.16     |
| Running Update Time | 16       |
----------------------------------
2025-02-01 11:40:58.467040 Eastern Standard Time
| Itration            | 17       |
| Real Det Return     | 12.1     |
| Real Sto Return     | -2.62    |
| Reward Loss         | 13.4     |
| Running Env Steps   | 8500     |
| Running Forward KL  | 9.42     |
| Running Reverse KL  | 8.59     |
| Running Update Time | 17       |
----------------------------------
2025-02-01 11:41:13.131619 Eastern Standard Time
| Itration            | 18       |
| Real Det Return     | 8.61     |
| Real Sto Return     | -3.61    |
| Reward Loss         | 10.7     |
| Running Env Steps   | 9000     |
| Running Forward KL  | 9.69     |
| Running Reverse KL  | 8.38     |
| Running Update Time | 18       |
----------------------------------
2025-02-01 11:41:27.390635 Eastern Standard Time
| Itration            | 19       |
| Real Det Return     | 9.14     |
| Real Sto Return     | -1.09    |
| Reward Loss         | 9.27     |
| Running Env Steps   | 9500     |
| Running Forward KL  | 9.55     |
| Running Reverse KL  | 8.71     |
| Running Update Time | 19       |
----------------------------------
2025-02-01 11:41:41.673564 Eastern Standard Time
| Itration            | 20       |
| Real Det Return     | 8.59     |
| Real Sto Return     | -4.31    |
| Reward Loss         | 9.08     |
| Running Env Steps   | 10000    |
| Running Forward KL  | 9.59     |
| Running Reverse KL  | 8.98     |
| Running Update Time | 20       |
----------------------------------
2025-02-01 11:41:56.037062 Eastern Standard Time
| Itration            | 21       |
| Real Det Return     | 10.6     |
| Real Sto Return     | -0.97    |
| Reward Loss         | 3.65     |
| Running Env Steps   | 10500    |
| Running Forward KL  | 9.71     |
| Running Reverse KL  | 8.67     |
| Running Update Time | 21       |
----------------------------------
2025-02-01 11:42:10.230939 Eastern Standard Time
| Itration            | 22       |
| Real Det Return     | 8.24     |
| Real Sto Return     | 0.21     |
| Reward Loss         | 3.45     |
| Running Env Steps   | 11000    |
| Running Forward KL  | 9.62     |
| Running Reverse KL  | 8.65     |
| Running Update Time | 22       |
----------------------------------
2025-02-01 11:42:24.388380 Eastern Standard Time
| Itration            | 23       |
| Real Det Return     | 6.5      |
| Real Sto Return     | -1.47    |
| Reward Loss         | 1.96     |
| Running Env Steps   | 11500    |
| Running Forward KL  | 10       |
| Running Reverse KL  | 9.67     |
| Running Update Time | 23       |
----------------------------------
2025-02-01 11:42:38.590658 Eastern Standard Time
| Itration            | 24       |
| Real Det Return     | 11.2     |
| Real Sto Return     | 1.04     |
| Reward Loss         | -4.91    |
| Running Env Steps   | 12000    |
| Running Forward KL  | 9.74     |
| Running Reverse KL  | 8.09     |
| Running Update Time | 24       |
----------------------------------
2025-02-01 11:42:52.815318 Eastern Standard Time
| Itration            | 25       |
| Real Det Return     | 4.98     |
| Real Sto Return     | 1.29     |
| Reward Loss         | -3.14    |
| Running Env Steps   | 12500    |
| Running Forward KL  | 9.15     |
| Running Reverse KL  | 8.83     |
| Running Update Time | 25       |
----------------------------------
2025-02-01 11:43:06.907008 Eastern Standard Time
| Itration            | 26       |
| Real Det Return     | 7.56     |
| Real Sto Return     | 2.9      |
| Reward Loss         | -5.68    |
| Running Env Steps   | 13000    |
| Running Forward KL  | 9.42     |
| Running Reverse KL  | 9.08     |
| Running Update Time | 26       |
----------------------------------
2025-02-01 11:43:20.963225 Eastern Standard Time
| Itration            | 27       |
| Real Det Return     | 8.41     |
| Real Sto Return     | 3.68     |
| Reward Loss         | -9.57    |
| Running Env Steps   | 13500    |
| Running Forward KL  | 9.55     |
| Running Reverse KL  | 8.91     |
| Running Update Time | 27       |
----------------------------------
2025-02-01 11:43:35.028819 Eastern Standard Time
| Itration            | 28       |
| Real Det Return     | 7.83     |
| Real Sto Return     | 2.37     |
| Reward Loss         | -11.3    |
| Running Env Steps   | 14000    |
| Running Forward KL  | 9.45     |
| Running Reverse KL  | 9.09     |
| Running Update Time | 28       |
----------------------------------
2025-02-01 11:43:49.067121 Eastern Standard Time
| Itration            | 29       |
| Real Det Return     | 6.75     |
| Real Sto Return     | 0.05     |
| Reward Loss         | -12      |
| Running Env Steps   | 14500    |
| Running Forward KL  | 9.72     |
| Running Reverse KL  | 9.28     |
| Running Update Time | 29       |
----------------------------------
2025-02-01 11:44:03.081336 Eastern Standard Time
| Itration            | 30       |
| Real Det Return     | 6.65     |
| Real Sto Return     | -0.04    |
| Reward Loss         | -14.9    |
| Running Env Steps   | 15000    |
| Running Forward KL  | 9.58     |
| Running Reverse KL  | 9.13     |
| Running Update Time | 30       |
----------------------------------
2025-02-01 11:44:17.130858 Eastern Standard Time
| Itration            | 31       |
| Real Det Return     | 6.03     |
| Real Sto Return     | -0.1     |
| Reward Loss         | -18.7    |
| Running Env Steps   | 15500    |
| Running Forward KL  | 9.77     |
| Running Reverse KL  | 9.04     |
| Running Update Time | 31       |
----------------------------------
2025-02-01 11:44:31.213460 Eastern Standard Time
| Itration            | 32       |
| Real Det Return     | 6.39     |
| Real Sto Return     | -1.31    |
| Reward Loss         | -21.6    |
| Running Env Steps   | 16000    |
| Running Forward KL  | 9.52     |
| Running Reverse KL  | 8.73     |
| Running Update Time | 32       |
----------------------------------
2025-02-01 11:44:45.283984 Eastern Standard Time
| Itration            | 33       |
| Real Det Return     | 7.12     |
| Real Sto Return     | 0.96     |
| Reward Loss         | -22.3    |
| Running Env Steps   | 16500    |
| Running Forward KL  | 9.59     |
| Running Reverse KL  | 8.92     |
| Running Update Time | 33       |
----------------------------------
2025-02-01 11:44:59.353357 Eastern Standard Time
| Itration            | 34       |
| Real Det Return     | 8.38     |
| Real Sto Return     | 2.33     |
| Reward Loss         | -24.1    |
| Running Env Steps   | 17000    |
| Running Forward KL  | 9.55     |
| Running Reverse KL  | 9.05     |
| Running Update Time | 34       |
----------------------------------
2025-02-01 11:45:13.460432 Eastern Standard Time
| Itration            | 35       |
| Real Det Return     | 7.75     |
| Real Sto Return     | 3.06     |
| Reward Loss         | -24      |
| Running Env Steps   | 17500    |
| Running Forward KL  | 9.19     |
| Running Reverse KL  | 9.12     |
| Running Update Time | 35       |
----------------------------------
2025-02-01 11:45:27.512271 Eastern Standard Time
| Itration            | 36       |
| Real Det Return     | 7.89     |
| Real Sto Return     | 0.71     |
| Reward Loss         | -26.6    |
| Running Env Steps   | 18000    |
| Running Forward KL  | 9.6      |
| Running Reverse KL  | 9.34     |
| Running Update Time | 36       |
----------------------------------
2025-02-01 11:45:41.681554 Eastern Standard Time
| Itration            | 37       |
| Real Det Return     | 7.87     |
| Real Sto Return     | 0.79     |
| Reward Loss         | -28.8    |
| Running Env Steps   | 18500    |
| Running Forward KL  | 9.5      |
| Running Reverse KL  | 9.31     |
| Running Update Time | 37       |
----------------------------------
2025-02-01 11:45:55.754588 Eastern Standard Time
| Itration            | 38       |
| Real Det Return     | 8.52     |
| Real Sto Return     | 3.87     |
| Reward Loss         | -31.8    |
| Running Env Steps   | 19000    |
| Running Forward KL  | 9.4      |
| Running Reverse KL  | 8.99     |
| Running Update Time | 38       |
----------------------------------
2025-02-01 11:46:09.964580 Eastern Standard Time
| Itration            | 39       |
| Real Det Return     | 6.4      |
| Real Sto Return     | 1.51     |
| Reward Loss         | -34.5    |
| Running Env Steps   | 19500    |
| Running Forward KL  | 9.6      |
| Running Reverse KL  | 9.19     |
| Running Update Time | 39       |
----------------------------------
2025-02-01 11:46:24.011321 Eastern Standard Time
| Itration            | 40       |
| Real Det Return     | 7.31     |
| Real Sto Return     | 2.65     |
| Reward Loss         | -37.3    |
| Running Env Steps   | 20000    |
| Running Forward KL  | 9.34     |
| Running Reverse KL  | 9.44     |
| Running Update Time | 40       |
----------------------------------
2025-02-01 11:46:38.119178 Eastern Standard Time
| Itration            | 41       |
| Real Det Return     | 7.29     |
| Real Sto Return     | 1.24     |
| Reward Loss         | -37.8    |
| Running Env Steps   | 20500    |
| Running Forward KL  | 9.42     |
| Running Reverse KL  | 8.88     |
| Running Update Time | 41       |
----------------------------------
2025-02-01 11:46:52.272809 Eastern Standard Time
| Itration            | 42       |
| Real Det Return     | 7.29     |
| Real Sto Return     | 3.78     |
| Reward Loss         | -42      |
| Running Env Steps   | 21000    |
| Running Forward KL  | 8.95     |
| Running Reverse KL  | 8.73     |
| Running Update Time | 42       |
----------------------------------
2025-02-01 11:47:06.637491 Eastern Standard Time
| Itration            | 43       |
| Real Det Return     | 7.27     |
| Real Sto Return     | 3.38     |
| Reward Loss         | -42.3    |
| Running Env Steps   | 21500    |
| Running Forward KL  | 9.48     |
| Running Reverse KL  | 9.32     |
| Running Update Time | 43       |
----------------------------------
2025-02-01 11:47:20.899571 Eastern Standard Time
| Itration            | 44       |
| Real Det Return     | 7.81     |
| Real Sto Return     | 2.88     |
| Reward Loss         | -44.5    |
| Running Env Steps   | 22000    |
| Running Forward KL  | 9.45     |
| Running Reverse KL  | 8.7      |
| Running Update Time | 44       |
----------------------------------
2025-02-01 11:47:35.003956 Eastern Standard Time
| Itration            | 45       |
| Real Det Return     | 9.01     |
| Real Sto Return     | 4.26     |
| Reward Loss         | -48      |
| Running Env Steps   | 22500    |
| Running Forward KL  | 9.26     |
| Running Reverse KL  | 8.67     |
| Running Update Time | 45       |
----------------------------------
2025-02-01 11:47:49.025245 Eastern Standard Time
| Itration            | 46       |
| Real Det Return     | 6.85     |
| Real Sto Return     | 0.64     |
| Reward Loss         | -48.5    |
| Running Env Steps   | 23000    |
| Running Forward KL  | 9        |
| Running Reverse KL  | 8.7      |
| Running Update Time | 46       |
----------------------------------
2025-02-01 11:48:03.063355 Eastern Standard Time
| Itration            | 47       |
| Real Det Return     | 7.4      |
| Real Sto Return     | 6.93     |
| Reward Loss         | -54.4    |
| Running Env Steps   | 23500    |
| Running Forward KL  | 9.15     |
| Running Reverse KL  | 8.77     |
| Running Update Time | 47       |
----------------------------------
2025-02-01 11:48:17.071839 Eastern Standard Time
| Itration            | 48       |
| Real Det Return     | 7.99     |
| Real Sto Return     | 4        |
| Reward Loss         | -53.2    |
| Running Env Steps   | 24000    |
| Running Forward KL  | 9.3      |
| Running Reverse KL  | 9.06     |
| Running Update Time | 48       |
----------------------------------
2025-02-01 11:48:31.168312 Eastern Standard Time
| Itration            | 49       |
| Real Det Return     | 8.1      |
| Real Sto Return     | 2.16     |
| Reward Loss         | -53.8    |
| Running Env Steps   | 24500    |
| Running Forward KL  | 9.13     |
| Running Reverse KL  | 9.27     |
| Running Update Time | 49       |
----------------------------------
2025-02-01 11:48:45.285223 Eastern Standard Time
| Itration            | 50       |
| Real Det Return     | 7.55     |
| Real Sto Return     | 2.47     |
| Reward Loss         | -57.4    |
| Running Env Steps   | 25000    |
| Running Forward KL  | 9.32     |
| Running Reverse KL  | 9.05     |
| Running Update Time | 50       |
----------------------------------
2025-02-01 11:48:59.396148 Eastern Standard Time
| Itration            | 51       |
| Real Det Return     | 6.88     |
| Real Sto Return     | 1.13     |
| Reward Loss         | -58.6    |
| Running Env Steps   | 25500    |
| Running Forward KL  | 8.94     |
| Running Reverse KL  | 8.99     |
| Running Update Time | 51       |
----------------------------------
2025-02-01 11:49:13.418088 Eastern Standard Time
| Itration            | 52       |
| Real Det Return     | 6.99     |
| Real Sto Return     | 3.23     |
| Reward Loss         | -60.6    |
| Running Env Steps   | 26000    |
| Running Forward KL  | 9.58     |
| Running Reverse KL  | 9.76     |
| Running Update Time | 52       |
----------------------------------
2025-02-01 11:49:27.520905 Eastern Standard Time
| Itration            | 53       |
| Real Det Return     | 7.52     |
| Real Sto Return     | 3.64     |
| Reward Loss         | -65.7    |
| Running Env Steps   | 26500    |
| Running Forward KL  | 8.97     |
| Running Reverse KL  | 8.99     |
| Running Update Time | 53       |
----------------------------------
2025-02-01 11:49:41.694016 Eastern Standard Time
| Itration            | 54       |
| Real Det Return     | 5.24     |
| Real Sto Return     | 2.49     |
| Reward Loss         | -64.3    |
| Running Env Steps   | 27000    |
| Running Forward KL  | 9.05     |
| Running Reverse KL  | 8.94     |
| Running Update Time | 54       |
----------------------------------
2025-02-01 11:49:55.778397 Eastern Standard Time
| Itration            | 55       |
| Real Det Return     | 5.48     |
| Real Sto Return     | 1.72     |
| Reward Loss         | -68.2    |
| Running Env Steps   | 27500    |
| Running Forward KL  | 9.02     |
| Running Reverse KL  | 9.36     |
| Running Update Time | 55       |
----------------------------------
2025-02-01 11:50:09.835215 Eastern Standard Time
| Itration            | 56       |
| Real Det Return     | 7.46     |
| Real Sto Return     | 6.92     |
| Reward Loss         | -69      |
| Running Env Steps   | 28000    |
| Running Forward KL  | 9.42     |
| Running Reverse KL  | 9.41     |
| Running Update Time | 56       |
----------------------------------
2025-02-01 11:50:23.870487 Eastern Standard Time
| Itration            | 57       |
| Real Det Return     | 6.89     |
| Real Sto Return     | 5.37     |
| Reward Loss         | -72.4    |
| Running Env Steps   | 28500    |
| Running Forward KL  | 9.43     |
| Running Reverse KL  | 9.13     |
| Running Update Time | 57       |
----------------------------------
2025-02-01 11:50:37.895059 Eastern Standard Time
| Itration            | 58       |
| Real Det Return     | 6.26     |
| Real Sto Return     | 7.67     |
| Reward Loss         | -72.9    |
| Running Env Steps   | 29000    |
| Running Forward KL  | 9.11     |
| Running Reverse KL  | 8.86     |
| Running Update Time | 58       |
----------------------------------
2025-02-01 11:50:51.900354 Eastern Standard Time
| Itration            | 59       |
| Real Det Return     | 8.83     |
| Real Sto Return     | 4.88     |
| Reward Loss         | -76      |
| Running Env Steps   | 29500    |
| Running Forward KL  | 9.11     |
| Running Reverse KL  | 8.94     |
| Running Update Time | 59       |
----------------------------------
2025-02-01 11:51:05.915131 Eastern Standard Time
| Itration            | 60       |
| Real Det Return     | 7.06     |
| Real Sto Return     | 9.62     |
| Reward Loss         | -79.2    |
| Running Env Steps   | 30000    |
| Running Forward KL  | 8.86     |
| Running Reverse KL  | 7.99     |
| Running Update Time | 60       |
----------------------------------
2025-02-01 11:51:19.938854 Eastern Standard Time
| Itration            | 61       |
| Real Det Return     | 6.74     |
| Real Sto Return     | 8.28     |
| Reward Loss         | -80.3    |
| Running Env Steps   | 30500    |
| Running Forward KL  | 8.97     |
| Running Reverse KL  | 8.9      |
| Running Update Time | 61       |
----------------------------------
2025-02-01 11:51:34.012115 Eastern Standard Time
| Itration            | 62       |
| Real Det Return     | 5.92     |
| Real Sto Return     | 4.85     |
| Reward Loss         | -80.2    |
| Running Env Steps   | 31000    |
| Running Forward KL  | 8.93     |
| Running Reverse KL  | 9.14     |
| Running Update Time | 62       |
----------------------------------
2025-02-01 11:51:47.978700 Eastern Standard Time
| Itration            | 63       |
| Real Det Return     | 6.86     |
| Real Sto Return     | 7.99     |
| Reward Loss         | -83.3    |
| Running Env Steps   | 31500    |
| Running Forward KL  | 8.96     |
| Running Reverse KL  | 8.87     |
| Running Update Time | 63       |
----------------------------------
2025-02-01 11:52:01.949348 Eastern Standard Time
| Itration            | 64       |
| Real Det Return     | 7.13     |
| Real Sto Return     | 6.34     |
| Reward Loss         | -89      |
| Running Env Steps   | 32000    |
| Running Forward KL  | 8.93     |
| Running Reverse KL  | 8.79     |
| Running Update Time | 64       |
----------------------------------
2025-02-01 11:52:15.945622 Eastern Standard Time
| Itration            | 65       |
| Real Det Return     | 5.73     |
| Real Sto Return     | 6.07     |
| Reward Loss         | -84.3    |
| Running Env Steps   | 32500    |
| Running Forward KL  | 8.79     |
| Running Reverse KL  | 8.97     |
| Running Update Time | 65       |
----------------------------------
2025-02-01 11:52:29.982129 Eastern Standard Time
| Itration            | 66       |
| Real Det Return     | 7.75     |
| Real Sto Return     | 7.36     |
| Reward Loss         | -91.1    |
| Running Env Steps   | 33000    |
| Running Forward KL  | 9.09     |
| Running Reverse KL  | 8.44     |
| Running Update Time | 66       |
----------------------------------
2025-02-01 11:52:43.929959 Eastern Standard Time
| Itration            | 67       |
| Real Det Return     | 8.8      |
| Real Sto Return     | 9.96     |
| Reward Loss         | -92.8    |
| Running Env Steps   | 33500    |
| Running Forward KL  | 8.57     |
| Running Reverse KL  | 8.55     |
| Running Update Time | 67       |
----------------------------------
2025-02-01 11:52:57.884638 Eastern Standard Time
| Itration            | 68       |
| Real Det Return     | 7.33     |
| Real Sto Return     | 7.97     |
| Reward Loss         | -93.9    |
| Running Env Steps   | 34000    |
| Running Forward KL  | 8.93     |
| Running Reverse KL  | 9.06     |
| Running Update Time | 68       |
----------------------------------
2025-02-01 11:53:11.952595 Eastern Standard Time
| Itration            | 69       |
| Real Det Return     | 7.22     |
| Real Sto Return     | 8.28     |
| Reward Loss         | -93.8    |
| Running Env Steps   | 34500    |
| Running Forward KL  | 8.63     |
| Running Reverse KL  | 8.22     |
| Running Update Time | 69       |
----------------------------------
2025-02-01 11:53:25.966866 Eastern Standard Time
| Itration            | 70       |
| Real Det Return     | 11.4     |
| Real Sto Return     | 14.3     |
| Reward Loss         | -95.5    |
| Running Env Steps   | 35000    |
| Running Forward KL  | 8.77     |
| Running Reverse KL  | 9.27     |
| Running Update Time | 70       |
----------------------------------
2025-02-01 11:53:39.975444 Eastern Standard Time
| Itration            | 71       |
| Real Det Return     | 9.19     |
| Real Sto Return     | 11.1     |
| Reward Loss         | -95.3    |
| Running Env Steps   | 35500    |
| Running Forward KL  | 8.61     |
| Running Reverse KL  | 9.12     |
| Running Update Time | 71       |
----------------------------------
2025-02-01 11:53:53.977960 Eastern Standard Time
| Itration            | 72       |
| Real Det Return     | 4.69     |
| Real Sto Return     | 6.87     |
| Reward Loss         | -101     |
| Running Env Steps   | 36000    |
| Running Forward KL  | 8.22     |
| Running Reverse KL  | 8.29     |
| Running Update Time | 72       |
----------------------------------
2025-02-01 11:54:07.941976 Eastern Standard Time
| Itration            | 73       |
| Real Det Return     | 8.86     |
| Real Sto Return     | 11.5     |
| Reward Loss         | -98.9    |
| Running Env Steps   | 36500    |
| Running Forward KL  | 8.57     |
| Running Reverse KL  | 9.23     |
| Running Update Time | 73       |
----------------------------------
2025-02-01 11:54:21.939691 Eastern Standard Time
| Itration            | 74       |
| Real Det Return     | 5.7      |
| Real Sto Return     | 10.6     |
| Reward Loss         | -105     |
| Running Env Steps   | 37000    |
| Running Forward KL  | 8.63     |
| Running Reverse KL  | 8.2      |
| Running Update Time | 74       |
----------------------------------
2025-02-01 11:54:35.937813 Eastern Standard Time
| Itration            | 75       |
| Real Det Return     | 5.31     |
| Real Sto Return     | 7.99     |
| Reward Loss         | -107     |
| Running Env Steps   | 37500    |
| Running Forward KL  | 8.27     |
| Running Reverse KL  | 8.76     |
| Running Update Time | 75       |
----------------------------------
2025-02-01 11:54:49.929440 Eastern Standard Time
| Itration            | 76       |
| Real Det Return     | 15.8     |
| Real Sto Return     | 13.1     |
| Reward Loss         | -106     |
| Running Env Steps   | 38000    |
| Running Forward KL  | 8.25     |
| Running Reverse KL  | 9.13     |
| Running Update Time | 76       |
----------------------------------
2025-02-01 11:55:03.982154 Eastern Standard Time
| Itration            | 77       |
| Real Det Return     | 7.54     |
| Real Sto Return     | 17.5     |
| Reward Loss         | -109     |
| Running Env Steps   | 38500    |
| Running Forward KL  | 8.02     |
| Running Reverse KL  | 8.33     |
| Running Update Time | 77       |
----------------------------------
2025-02-01 11:55:18.044354 Eastern Standard Time
| Itration            | 78       |
| Real Det Return     | 6.92     |
| Real Sto Return     | 8.89     |
| Reward Loss         | -110     |
| Running Env Steps   | 39000    |
| Running Forward KL  | 7.9      |
| Running Reverse KL  | 8.76     |
| Running Update Time | 78       |
----------------------------------
2025-02-01 11:55:32.057906 Eastern Standard Time
| Itration            | 79       |
| Real Det Return     | 10.5     |
| Real Sto Return     | 13.5     |
| Reward Loss         | -113     |
| Running Env Steps   | 39500    |
| Running Forward KL  | 8.1      |
| Running Reverse KL  | 8.33     |
| Running Update Time | 79       |
----------------------------------
2025-02-01 11:55:46.044813 Eastern Standard Time
| Itration            | 80       |
| Real Det Return     | -0.09    |
| Real Sto Return     | 10.6     |
| Reward Loss         | -115     |
| Running Env Steps   | 40000    |
| Running Forward KL  | 8.1      |
| Running Reverse KL  | 7.99     |
| Running Update Time | 80       |
----------------------------------
2025-02-01 11:55:59.973807 Eastern Standard Time
| Itration            | 81       |
| Real Det Return     | 13.8     |
| Real Sto Return     | 16.7     |
| Reward Loss         | -116     |
| Running Env Steps   | 40500    |
| Running Forward KL  | 8.03     |
| Running Reverse KL  | 8.69     |
| Running Update Time | 81       |
----------------------------------
2025-02-01 11:56:13.929646 Eastern Standard Time
| Itration            | 82       |
| Real Det Return     | 0.82     |
| Real Sto Return     | 15.7     |
| Reward Loss         | -116     |
| Running Env Steps   | 41000    |
| Running Forward KL  | 7.81     |
| Running Reverse KL  | 8.19     |
| Running Update Time | 82       |
----------------------------------
2025-02-01 11:56:27.936091 Eastern Standard Time
| Itration            | 83       |
| Real Det Return     | 7.99     |
| Real Sto Return     | 14.4     |
| Reward Loss         | -120     |
| Running Env Steps   | 41500    |
| Running Forward KL  | 7.83     |
| Running Reverse KL  | 8.65     |
| Running Update Time | 83       |
----------------------------------
2025-02-01 11:56:41.876308 Eastern Standard Time
| Itration            | 84       |
| Real Det Return     | 14.8     |
| Real Sto Return     | 14.1     |
| Reward Loss         | -119     |
| Running Env Steps   | 42000    |
| Running Forward KL  | 7.9      |
| Running Reverse KL  | 8.83     |
| Running Update Time | 84       |
----------------------------------
2025-02-01 11:56:55.841310 Eastern Standard Time
| Itration            | 85       |
| Real Det Return     | 8.95     |
| Real Sto Return     | 18.8     |
| Reward Loss         | -118     |
| Running Env Steps   | 42500    |
| Running Forward KL  | 8.21     |
| Running Reverse KL  | 9.13     |
| Running Update Time | 85       |
----------------------------------
2025-02-01 11:57:09.857604 Eastern Standard Time
| Itration            | 86       |
| Real Det Return     | 20.2     |
| Real Sto Return     | 17.5     |
| Reward Loss         | -124     |
| Running Env Steps   | 43000    |
| Running Forward KL  | 7.98     |
| Running Reverse KL  | 8.82     |
| Running Update Time | 86       |
----------------------------------
2025-02-01 11:57:23.838907 Eastern Standard Time
| Itration            | 87       |
| Real Det Return     | 20.1     |
| Real Sto Return     | 13.5     |
| Reward Loss         | -123     |
| Running Env Steps   | 43500    |
| Running Forward KL  | 7.63     |
| Running Reverse KL  | 8.24     |
| Running Update Time | 87       |
----------------------------------
2025-02-01 11:57:37.775440 Eastern Standard Time
| Itration            | 88       |
| Real Det Return     | 12.8     |
| Real Sto Return     | 18.5     |
| Reward Loss         | -127     |
| Running Env Steps   | 44000    |
| Running Forward KL  | 7.92     |
| Running Reverse KL  | 9        |
| Running Update Time | 88       |
----------------------------------
2025-02-01 11:57:51.775042 Eastern Standard Time
| Itration            | 89       |
| Real Det Return     | 28       |
| Real Sto Return     | 22.7     |
| Reward Loss         | -127     |
| Running Env Steps   | 44500    |
| Running Forward KL  | 7.83     |
| Running Reverse KL  | 8.36     |
| Running Update Time | 89       |
----------------------------------
2025-02-01 11:58:05.748572 Eastern Standard Time
| Itration            | 90       |
| Real Det Return     | 20.8     |
| Real Sto Return     | 22.2     |
| Reward Loss         | -129     |
| Running Env Steps   | 45000    |
| Running Forward KL  | 7.61     |
| Running Reverse KL  | 8.35     |
| Running Update Time | 90       |
----------------------------------
2025-02-01 11:58:19.725158 Eastern Standard Time
| Itration            | 91       |
| Real Det Return     | 9        |
| Real Sto Return     | 12.4     |
| Reward Loss         | -127     |
| Running Env Steps   | 45500    |
| Running Forward KL  | 7.38     |
| Running Reverse KL  | 8.13     |
| Running Update Time | 91       |
----------------------------------
2025-02-01 11:58:33.775870 Eastern Standard Time
| Itration            | 92       |
| Real Det Return     | 31.7     |
| Real Sto Return     | 14.3     |
| Reward Loss         | -130     |
| Running Env Steps   | 46000    |
| Running Forward KL  | 7.58     |
| Running Reverse KL  | 8.5      |
| Running Update Time | 92       |
----------------------------------
2025-02-01 11:58:47.681126 Eastern Standard Time
| Itration            | 93       |
| Real Det Return     | 15.6     |
| Real Sto Return     | 24.4     |
| Reward Loss         | -130     |
| Running Env Steps   | 46500    |
| Running Forward KL  | 7.19     |
| Running Reverse KL  | 8.63     |
| Running Update Time | 93       |
----------------------------------
2025-02-01 11:59:01.585798 Eastern Standard Time
| Itration            | 94       |
| Real Det Return     | 19       |
| Real Sto Return     | 19.8     |
| Reward Loss         | -136     |
| Running Env Steps   | 47000    |
| Running Forward KL  | 7.53     |
| Running Reverse KL  | 8.26     |
| Running Update Time | 94       |
----------------------------------
2025-02-01 11:59:15.600660 Eastern Standard Time
| Itration            | 95       |
| Real Det Return     | 39.4     |
| Real Sto Return     | 20.3     |
| Reward Loss         | -134     |
| Running Env Steps   | 47500    |
| Running Forward KL  | 7.65     |
| Running Reverse KL  | 8.4      |
| Running Update Time | 95       |
----------------------------------
2025-02-01 11:59:29.544718 Eastern Standard Time
| Itration            | 96       |
| Real Det Return     | 21.8     |
| Real Sto Return     | 20.9     |
| Reward Loss         | -133     |
| Running Env Steps   | 48000    |
| Running Forward KL  | 7.21     |
| Running Reverse KL  | 7.49     |
| Running Update Time | 96       |
----------------------------------
2025-02-01 11:59:43.486764 Eastern Standard Time
| Itration            | 97       |
| Real Det Return     | 21.2     |
| Real Sto Return     | 18.3     |
| Reward Loss         | -137     |
| Running Env Steps   | 48500    |
| Running Forward KL  | 7.39     |
| Running Reverse KL  | 8.86     |
| Running Update Time | 97       |
----------------------------------
2025-02-01 11:59:57.437316 Eastern Standard Time
| Itration            | 98       |
| Real Det Return     | 24.2     |
| Real Sto Return     | 18.4     |
| Reward Loss         | -139     |
| Running Env Steps   | 49000    |
| Running Forward KL  | 7        |
| Running Reverse KL  | 8.48     |
| Running Update Time | 98       |
----------------------------------
2025-02-01 12:00:11.658523 Eastern Standard Time
| Itration            | 99       |
| Real Det Return     | 51.8     |
| Real Sto Return     | 30.8     |
| Reward Loss         | -133     |
| Running Env Steps   | 49500    |
| Running Forward KL  | 6.47     |
| Running Reverse KL  | 7.44     |
| Running Update Time | 99       |
----------------------------------
2025-02-01 12:00:25.612936 Eastern Standard Time
| Itration            | 100      |
| Real Det Return     | 19.9     |
| Real Sto Return     | 18.5     |
| Reward Loss         | -138     |
| Running Env Steps   | 50000    |
| Running Forward KL  | 6.63     |
| Running Reverse KL  | 6.75     |
| Running Update Time | 100      |
----------------------------------
2025-02-01 12:00:39.556712 Eastern Standard Time
| Itration            | 101      |
| Real Det Return     | 38.1     |
| Real Sto Return     | 25.6     |
| Reward Loss         | -143     |
| Running Env Steps   | 50500    |
| Running Forward KL  | 7.02     |
| Running Reverse KL  | 7.83     |
| Running Update Time | 101      |
----------------------------------
2025-02-01 12:00:53.511990 Eastern Standard Time
| Itration            | 102      |
| Real Det Return     | 66       |
| Real Sto Return     | 30.9     |
| Reward Loss         | -142     |
| Running Env Steps   | 51000    |
| Running Forward KL  | 6.42     |
| Running Reverse KL  | 7.18     |
| Running Update Time | 102      |
----------------------------------
2025-02-01 12:01:07.546392 Eastern Standard Time
| Itration            | 103      |
| Real Det Return     | 40.3     |
| Real Sto Return     | 30.9     |
| Reward Loss         | -150     |
| Running Env Steps   | 51500    |
| Running Forward KL  | 7.15     |
| Running Reverse KL  | 8.39     |
| Running Update Time | 103      |
----------------------------------
2025-02-01 12:01:21.486861 Eastern Standard Time
| Itration            | 104      |
| Real Det Return     | 45.9     |
| Real Sto Return     | 23.7     |
| Reward Loss         | -150     |
| Running Env Steps   | 52000    |
| Running Forward KL  | 6.67     |
| Running Reverse KL  | 7.29     |
| Running Update Time | 104      |
----------------------------------
2025-02-01 12:01:35.431868 Eastern Standard Time
| Itration            | 105      |
| Real Det Return     | 20.3     |
| Real Sto Return     | 26.8     |
| Reward Loss         | -142     |
| Running Env Steps   | 52500    |
| Running Forward KL  | 6.26     |
| Running Reverse KL  | 6.92     |
| Running Update Time | 105      |
----------------------------------
2025-02-01 12:01:49.374517 Eastern Standard Time
| Itration            | 106      |
| Real Det Return     | 49.6     |
| Real Sto Return     | 33.5     |
| Reward Loss         | -146     |
| Running Env Steps   | 53000    |
| Running Forward KL  | 6.47     |
| Running Reverse KL  | 7.44     |
| Running Update Time | 106      |
----------------------------------
2025-02-01 12:02:03.568564 Eastern Standard Time
| Itration            | 107      |
| Real Det Return     | 60.9     |
| Real Sto Return     | 22.5     |
| Reward Loss         | -151     |
| Running Env Steps   | 53500    |
| Running Forward KL  | 6.9      |
| Running Reverse KL  | 7.77     |
| Running Update Time | 107      |
----------------------------------
2025-02-01 12:02:17.607005 Eastern Standard Time
| Itration            | 108      |
| Real Det Return     | 67       |
| Real Sto Return     | 34.6     |
| Reward Loss         | -153     |
| Running Env Steps   | 54000    |
| Running Forward KL  | 6.69     |
| Running Reverse KL  | 8.46     |
| Running Update Time | 108      |
----------------------------------
2025-02-01 12:02:31.561969 Eastern Standard Time
| Itration            | 109      |
| Real Det Return     | 54.8     |
| Real Sto Return     | 19.9     |
| Reward Loss         | -124     |
| Running Env Steps   | 54500    |
| Running Forward KL  | 6.08     |
| Running Reverse KL  | 5.79     |
| Running Update Time | 109      |
----------------------------------
2025-02-01 12:02:45.562937 Eastern Standard Time
| Itration            | 110      |
| Real Det Return     | 57.6     |
| Real Sto Return     | 21.8     |
| Reward Loss         | -145     |
| Running Env Steps   | 55000    |
| Running Forward KL  | 6.37     |
| Running Reverse KL  | 7.53     |
| Running Update Time | 110      |
----------------------------------
2025-02-01 12:02:59.533486 Eastern Standard Time
| Itration            | 111      |
| Real Det Return     | 63.5     |
| Real Sto Return     | 29.8     |
| Reward Loss         | -153     |
| Running Env Steps   | 55500    |
| Running Forward KL  | 6.69     |
| Running Reverse KL  | 8.11     |
| Running Update Time | 111      |
----------------------------------
2025-02-01 12:03:13.463847 Eastern Standard Time
| Itration            | 112      |
| Real Det Return     | 20.3     |
| Real Sto Return     | 15.6     |
| Reward Loss         | -148     |
| Running Env Steps   | 56000    |
| Running Forward KL  | 6.42     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 112      |
----------------------------------
2025-02-01 12:03:27.421132 Eastern Standard Time
| Itration            | 113      |
| Real Det Return     | 54.1     |
| Real Sto Return     | 38       |
| Reward Loss         | -128     |
| Running Env Steps   | 56500    |
| Running Forward KL  | 6.3      |
| Running Reverse KL  | 6.72     |
| Running Update Time | 113      |
----------------------------------
2025-02-01 12:03:41.461487 Eastern Standard Time
| Itration            | 114      |
| Real Det Return     | -33      |
| Real Sto Return     | 18.4     |
| Reward Loss         | -130     |
| Running Env Steps   | 57000    |
| Running Forward KL  | 5.88     |
| Running Reverse KL  | 6.66     |
| Running Update Time | 114      |
----------------------------------
2025-02-01 12:03:55.473718 Eastern Standard Time
| Itration            | 115      |
| Real Det Return     | -19.4    |
| Real Sto Return     | 26.1     |
| Reward Loss         | -149     |
| Running Env Steps   | 57500    |
| Running Forward KL  | 6.08     |
| Running Reverse KL  | 6.95     |
| Running Update Time | 115      |
----------------------------------
2025-02-01 12:04:09.457441 Eastern Standard Time
| Itration            | 116      |
| Real Det Return     | 4.53     |
| Real Sto Return     | 13.6     |
| Reward Loss         | -155     |
| Running Env Steps   | 58000    |
| Running Forward KL  | 6.04     |
| Running Reverse KL  | 7.05     |
| Running Update Time | 116      |
----------------------------------
2025-02-01 12:04:23.445127 Eastern Standard Time
| Itration            | 117      |
| Real Det Return     | 5.49     |
| Real Sto Return     | 29.7     |
| Reward Loss         | -123     |
| Running Env Steps   | 58500    |
| Running Forward KL  | 5.74     |
| Running Reverse KL  | 6.39     |
| Running Update Time | 117      |
----------------------------------
2025-02-01 12:04:37.384278 Eastern Standard Time
| Itration            | 118      |
| Real Det Return     | 17.3     |
| Real Sto Return     | 30.9     |
| Reward Loss         | -135     |
| Running Env Steps   | 59000    |
| Running Forward KL  | 6.38     |
| Running Reverse KL  | 5.86     |
| Running Update Time | 118      |
----------------------------------
2025-02-01 12:04:51.373981 Eastern Standard Time
| Itration            | 119      |
| Real Det Return     | 10.3     |
| Real Sto Return     | 11.3     |
| Reward Loss         | -112     |
| Running Env Steps   | 59500    |
| Running Forward KL  | 6.28     |
| Running Reverse KL  | 6.75     |
| Running Update Time | 119      |
----------------------------------
2025-02-01 12:05:05.343601 Eastern Standard Time
| Itration            | 120      |
| Real Det Return     | 17.2     |
| Real Sto Return     | 20.9     |
| Reward Loss         | -94.7    |
| Running Env Steps   | 60000    |
| Running Forward KL  | 6.92     |
| Running Reverse KL  | 8.07     |
| Running Update Time | 120      |
----------------------------------
2025-02-01 12:05:19.283990 Eastern Standard Time
| Itration            | 121      |
| Real Det Return     | 14       |
| Real Sto Return     | 20.3     |
| Reward Loss         | -117     |
| Running Env Steps   | 60500    |
| Running Forward KL  | 6.55     |
| Running Reverse KL  | 6.42     |
| Running Update Time | 121      |
----------------------------------
2025-02-01 12:05:33.193357 Eastern Standard Time
| Itration            | 122      |
| Real Det Return     | 6.18     |
| Real Sto Return     | 10.7     |
| Reward Loss         | -114     |
| Running Env Steps   | 61000    |
| Running Forward KL  | 6.29     |
| Running Reverse KL  | 6.2      |
| Running Update Time | 122      |
----------------------------------
2025-02-01 12:05:47.157278 Eastern Standard Time
| Itration            | 123      |
| Real Det Return     | 22.1     |
| Real Sto Return     | 12.5     |
| Reward Loss         | -119     |
| Running Env Steps   | 61500    |
| Running Forward KL  | 6.27     |
| Running Reverse KL  | 6.5      |
| Running Update Time | 123      |
----------------------------------
2025-02-01 12:06:01.063079 Eastern Standard Time
| Itration            | 124      |
| Real Det Return     | 13.4     |
| Real Sto Return     | 3.79     |
| Reward Loss         | -125     |
| Running Env Steps   | 62000    |
| Running Forward KL  | 6.39     |
| Running Reverse KL  | 6.22     |
| Running Update Time | 124      |
----------------------------------
2025-02-01 12:06:15.005254 Eastern Standard Time
| Itration            | 125      |
| Real Det Return     | 20.1     |
| Real Sto Return     | 11.5     |
| Reward Loss         | -111     |
| Running Env Steps   | 62500    |
| Running Forward KL  | 6.21     |
| Running Reverse KL  | 6.09     |
| Running Update Time | 125      |
----------------------------------
2025-02-01 12:06:28.990880 Eastern Standard Time
| Itration            | 126      |
| Real Det Return     | 10.6     |
| Real Sto Return     | 11.4     |
| Reward Loss         | -103     |
| Running Env Steps   | 63000    |
| Running Forward KL  | 6.78     |
| Running Reverse KL  | 7.22     |
| Running Update Time | 126      |
----------------------------------
2025-02-01 12:06:43.002979 Eastern Standard Time
| Itration            | 127      |
| Real Det Return     | 10.2     |
| Real Sto Return     | 0.77     |
| Reward Loss         | -110     |
| Running Env Steps   | 63500    |
| Running Forward KL  | 6.94     |
| Running Reverse KL  | 7.71     |
| Running Update Time | 127      |
----------------------------------
2025-02-01 12:06:56.922709 Eastern Standard Time
| Itration            | 128      |
| Real Det Return     | 19.5     |
| Real Sto Return     | 9.86     |
| Reward Loss         | -110     |
| Running Env Steps   | 64000    |
| Running Forward KL  | 7.61     |
| Running Reverse KL  | 8.97     |
| Running Update Time | 128      |
----------------------------------
2025-02-01 12:07:10.885020 Eastern Standard Time
| Itration            | 129      |
| Real Det Return     | 22.1     |
| Real Sto Return     | 4.83     |
| Reward Loss         | -120     |
| Running Env Steps   | 64500    |
| Running Forward KL  | 7.68     |
| Running Reverse KL  | 7.45     |
| Running Update Time | 129      |
----------------------------------
2025-02-01 12:07:24.764426 Eastern Standard Time
| Itration            | 130      |
| Real Det Return     | 18.4     |
| Real Sto Return     | 2.41     |
| Reward Loss         | -117     |
| Running Env Steps   | 65000    |
| Running Forward KL  | 7.31     |
| Running Reverse KL  | 8.03     |
| Running Update Time | 130      |
----------------------------------
2025-02-01 12:07:38.727828 Eastern Standard Time
| Itration            | 131      |
| Real Det Return     | 22.6     |
| Real Sto Return     | 14.6     |
| Reward Loss         | -119     |
| Running Env Steps   | 65500    |
| Running Forward KL  | 7.21     |
| Running Reverse KL  | 7.37     |
| Running Update Time | 131      |
----------------------------------
2025-02-01 12:07:52.696918 Eastern Standard Time
| Itration            | 132      |
| Real Det Return     | 24       |
| Real Sto Return     | 8.44     |
| Reward Loss         | -122     |
| Running Env Steps   | 66000    |
| Running Forward KL  | 7.1      |
| Running Reverse KL  | 7.16     |
| Running Update Time | 132      |
----------------------------------
2025-02-01 12:08:06.694527 Eastern Standard Time
| Itration            | 133      |
| Real Det Return     | 25       |
| Real Sto Return     | 8.61     |
| Reward Loss         | -130     |
| Running Env Steps   | 66500    |
| Running Forward KL  | 6.7      |
| Running Reverse KL  | 6.86     |
| Running Update Time | 133      |
----------------------------------
2025-02-01 12:08:20.772412 Eastern Standard Time
| Itration            | 134      |
| Real Det Return     | 25.3     |
| Real Sto Return     | 7.34     |
| Reward Loss         | -131     |
| Running Env Steps   | 67000    |
| Running Forward KL  | 6.64     |
| Running Reverse KL  | 6.98     |
| Running Update Time | 134      |
----------------------------------
2025-02-01 12:08:34.822753 Eastern Standard Time
| Itration            | 135      |
| Real Det Return     | 21.9     |
| Real Sto Return     | 5.92     |
| Reward Loss         | -141     |
| Running Env Steps   | 67500    |
| Running Forward KL  | 7.16     |
| Running Reverse KL  | 6.96     |
| Running Update Time | 135      |
----------------------------------
2025-02-01 12:08:49.474938 Eastern Standard Time
| Itration            | 136      |
| Real Det Return     | 21       |
| Real Sto Return     | 7.89     |
| Reward Loss         | -137     |
| Running Env Steps   | 68000    |
| Running Forward KL  | 7.2      |
| Running Reverse KL  | 7.11     |
| Running Update Time | 136      |
----------------------------------
2025-02-01 12:09:04.121635 Eastern Standard Time
| Itration            | 137      |
| Real Det Return     | 13.5     |
| Real Sto Return     | 5.31     |
| Reward Loss         | -142     |
| Running Env Steps   | 68500    |
| Running Forward KL  | 7.03     |
| Running Reverse KL  | 6.91     |
| Running Update Time | 137      |
----------------------------------
2025-02-01 12:09:18.335262 Eastern Standard Time
| Itration            | 138      |
| Real Det Return     | 25.5     |
| Real Sto Return     | 9.75     |
| Reward Loss         | -150     |
| Running Env Steps   | 69000    |
| Running Forward KL  | 6.84     |
| Running Reverse KL  | 6.79     |
| Running Update Time | 138      |
----------------------------------
2025-02-01 12:09:32.500195 Eastern Standard Time
| Itration            | 139      |
| Real Det Return     | 1.45     |
| Real Sto Return     | 3.21     |
| Reward Loss         | -145     |
| Running Env Steps   | 69500    |
| Running Forward KL  | 6.31     |
| Running Reverse KL  | 6.77     |
| Running Update Time | 139      |
----------------------------------
2025-02-01 12:09:46.624868 Eastern Standard Time
| Itration            | 140      |
| Real Det Return     | 10.2     |
| Real Sto Return     | 4.4      |
| Reward Loss         | -152     |
| Running Env Steps   | 70000    |
| Running Forward KL  | 7        |
| Running Reverse KL  | 7.18     |
| Running Update Time | 140      |
----------------------------------
2025-02-01 12:10:00.921495 Eastern Standard Time
| Itration            | 141      |
| Real Det Return     | 1.44     |
| Real Sto Return     | 6.18     |
| Reward Loss         | -154     |
| Running Env Steps   | 70500    |
| Running Forward KL  | 6.66     |
| Running Reverse KL  | 6.86     |
| Running Update Time | 141      |
----------------------------------
2025-02-01 12:10:15.219537 Eastern Standard Time
| Itration            | 142      |
| Real Det Return     | 8.22     |
| Real Sto Return     | 3.56     |
| Reward Loss         | -154     |
| Running Env Steps   | 71000    |
| Running Forward KL  | 6.39     |
| Running Reverse KL  | 6.57     |
| Running Update Time | 142      |
----------------------------------
2025-02-01 12:10:29.556648 Eastern Standard Time
| Itration            | 143      |
| Real Det Return     | 3.66     |
| Real Sto Return     | -2.02    |
| Reward Loss         | -153     |
| Running Env Steps   | 71500    |
| Running Forward KL  | 6.47     |
| Running Reverse KL  | 6.93     |
| Running Update Time | 143      |
----------------------------------
2025-02-01 12:10:43.888026 Eastern Standard Time
| Itration            | 144      |
| Real Det Return     | 9.03     |
| Real Sto Return     | 4.46     |
| Reward Loss         | -158     |
| Running Env Steps   | 72000    |
| Running Forward KL  | 6.58     |
| Running Reverse KL  | 6.44     |
| Running Update Time | 144      |
----------------------------------
2025-02-01 12:10:58.336130 Eastern Standard Time
| Itration            | 145      |
| Real Det Return     | 8.58     |
| Real Sto Return     | 0.5      |
| Reward Loss         | -156     |
| Running Env Steps   | 72500    |
| Running Forward KL  | 6.24     |
| Running Reverse KL  | 6.76     |
| Running Update Time | 145      |
----------------------------------
2025-02-01 12:11:12.849856 Eastern Standard Time
| Itration            | 146      |
| Real Det Return     | 0.2      |
| Real Sto Return     | 3.22     |
| Reward Loss         | -155     |
| Running Env Steps   | 73000    |
| Running Forward KL  | 6.22     |
| Running Reverse KL  | 6.69     |
| Running Update Time | 146      |
----------------------------------
2025-02-01 12:11:27.474782 Eastern Standard Time
| Itration            | 147      |
| Real Det Return     | 4.75     |
| Real Sto Return     | -2.54    |
| Reward Loss         | -161     |
| Running Env Steps   | 73500    |
| Running Forward KL  | 6.16     |
| Running Reverse KL  | 6.61     |
| Running Update Time | 147      |
----------------------------------
2025-02-01 12:11:42.218190 Eastern Standard Time
| Itration            | 148      |
| Real Det Return     | -4.98    |
| Real Sto Return     | -1.22    |
| Reward Loss         | -152     |
| Running Env Steps   | 74000    |
| Running Forward KL  | 6.01     |
| Running Reverse KL  | 6.69     |
| Running Update Time | 148      |
----------------------------------
2025-02-01 12:11:56.994434 Eastern Standard Time
| Itration            | 149      |
| Real Det Return     | 2.86     |
| Real Sto Return     | 1.29     |
| Reward Loss         | -164     |
| Running Env Steps   | 74500    |
| Running Forward KL  | 6.22     |
| Running Reverse KL  | 6.05     |
| Running Update Time | 149      |
----------------------------------
2025-02-01 12:12:11.870506 Eastern Standard Time
| Itration            | 150      |
| Real Det Return     | -3.4     |
| Real Sto Return     | -3.6     |
| Reward Loss         | -166     |
| Running Env Steps   | 75000    |
| Running Forward KL  | 6.18     |
| Running Reverse KL  | 6.67     |
| Running Update Time | 150      |
----------------------------------
2025-02-01 12:12:26.770303 Eastern Standard Time
| Itration            | 151      |
| Real Det Return     | -3.63    |
| Real Sto Return     | -2.77    |
| Reward Loss         | -170     |
| Running Env Steps   | 75500    |
| Running Forward KL  | 6.44     |
| Running Reverse KL  | 6.73     |
| Running Update Time | 151      |
----------------------------------
2025-02-01 12:12:41.742020 Eastern Standard Time
| Itration            | 152      |
| Real Det Return     | -2.7     |
| Real Sto Return     | -5.98    |
| Reward Loss         | -168     |
| Running Env Steps   | 76000    |
| Running Forward KL  | 5.89     |
| Running Reverse KL  | 6.73     |
| Running Update Time | 152      |
----------------------------------
2025-02-01 12:12:56.815354 Eastern Standard Time
| Itration            | 153      |
| Real Det Return     | 5.22     |
| Real Sto Return     | 3.71     |
| Reward Loss         | -172     |
| Running Env Steps   | 76500    |
| Running Forward KL  | 6.25     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 153      |
----------------------------------
2025-02-01 12:13:11.784937 Eastern Standard Time
| Itration            | 154      |
| Real Det Return     | -9.35    |
| Real Sto Return     | -5.83    |
| Reward Loss         | -173     |
| Running Env Steps   | 77000    |
| Running Forward KL  | 5.72     |
| Running Reverse KL  | 6.28     |
| Running Update Time | 154      |
----------------------------------
2025-02-01 12:13:26.854931 Eastern Standard Time
| Itration            | 155      |
| Real Det Return     | -1.44    |
| Real Sto Return     | -0.58    |
| Reward Loss         | -176     |
| Running Env Steps   | 77500    |
| Running Forward KL  | 6.13     |
| Running Reverse KL  | 6.04     |
| Running Update Time | 155      |
----------------------------------
2025-02-01 12:13:41.895293 Eastern Standard Time
| Itration            | 156      |
| Real Det Return     | -4.17    |
| Real Sto Return     | 1.96     |
| Reward Loss         | -174     |
| Running Env Steps   | 78000    |
| Running Forward KL  | 5.72     |
| Running Reverse KL  | 6.79     |
| Running Update Time | 156      |
----------------------------------
2025-02-01 12:13:56.988055 Eastern Standard Time
| Itration            | 157      |
| Real Det Return     | -0.71    |
| Real Sto Return     | -2.2     |
| Reward Loss         | -174     |
| Running Env Steps   | 78500    |
| Running Forward KL  | 5.75     |
| Running Reverse KL  | 6.38     |
| Running Update Time | 157      |
----------------------------------
2025-02-01 12:14:12.171485 Eastern Standard Time
| Itration            | 158      |
| Real Det Return     | 15.8     |
| Real Sto Return     | -0.07    |
| Reward Loss         | -177     |
| Running Env Steps   | 79000    |
| Running Forward KL  | 5.99     |
| Running Reverse KL  | 6.46     |
| Running Update Time | 158      |
----------------------------------
2025-02-01 12:14:27.305934 Eastern Standard Time
| Itration            | 159      |
| Real Det Return     | -9.39    |
| Real Sto Return     | 7.83     |
| Reward Loss         | -181     |
| Running Env Steps   | 79500    |
| Running Forward KL  | 6        |
| Running Reverse KL  | 6.23     |
| Running Update Time | 159      |
----------------------------------
2025-02-01 12:14:42.488210 Eastern Standard Time
| Itration            | 160      |
| Real Det Return     | 6.03     |
| Real Sto Return     | 12.6     |
| Reward Loss         | -189     |
| Running Env Steps   | 80000    |
| Running Forward KL  | 5.72     |
| Running Reverse KL  | 5.91     |
| Running Update Time | 160      |
----------------------------------
2025-02-01 12:14:57.698927 Eastern Standard Time
| Itration            | 161      |
| Real Det Return     | 11.4     |
| Real Sto Return     | 24       |
| Reward Loss         | -183     |
| Running Env Steps   | 80500    |
| Running Forward KL  | 5.54     |
| Running Reverse KL  | 6.37     |
| Running Update Time | 161      |
----------------------------------
2025-02-01 12:15:12.896622 Eastern Standard Time
| Itration            | 162      |
| Real Det Return     | 36.3     |
| Real Sto Return     | 30.5     |
| Reward Loss         | -181     |
| Running Env Steps   | 81000    |
| Running Forward KL  | 5.34     |
| Running Reverse KL  | 6.18     |
| Running Update Time | 162      |
----------------------------------
2025-02-01 12:15:28.088955 Eastern Standard Time
| Itration            | 163      |
| Real Det Return     | 14.2     |
| Real Sto Return     | 29.7     |
| Reward Loss         | -195     |
| Running Env Steps   | 81500    |
| Running Forward KL  | 5.7      |
| Running Reverse KL  | 6.45     |
| Running Update Time | 163      |
----------------------------------
2025-02-01 12:15:43.288853 Eastern Standard Time
| Itration            | 164      |
| Real Det Return     | 20.7     |
| Real Sto Return     | 20.9     |
| Reward Loss         | -192     |
| Running Env Steps   | 82000    |
| Running Forward KL  | 5.44     |
| Running Reverse KL  | 6.15     |
| Running Update Time | 164      |
----------------------------------
2025-02-01 12:15:58.522471 Eastern Standard Time
| Itration            | 165      |
| Real Det Return     | 81       |
| Real Sto Return     | 55.1     |
| Reward Loss         | -179     |
| Running Env Steps   | 82500    |
| Running Forward KL  | 5.31     |
| Running Reverse KL  | 6.54     |
| Running Update Time | 165      |
----------------------------------
2025-02-01 12:16:13.756134 Eastern Standard Time
| Itration            | 166      |
| Real Det Return     | 64.9     |
| Real Sto Return     | 36.3     |
| Reward Loss         | -196     |
| Running Env Steps   | 83000    |
| Running Forward KL  | 5.66     |
| Running Reverse KL  | 5.98     |
| Running Update Time | 166      |
----------------------------------
2025-02-01 12:16:28.977126 Eastern Standard Time
| Itration            | 167      |
| Real Det Return     | 115      |
| Real Sto Return     | 71.1     |
| Reward Loss         | -186     |
| Running Env Steps   | 83500    |
| Running Forward KL  | 4.98     |
| Running Reverse KL  | 6.76     |
| Running Update Time | 167      |
----------------------------------
2025-02-01 12:16:44.221201 Eastern Standard Time
| Itration            | 168      |
| Real Det Return     | 77.5     |
| Real Sto Return     | 49       |
| Reward Loss         | -193     |
| Running Env Steps   | 84000    |
| Running Forward KL  | 5.36     |
| Running Reverse KL  | 6.99     |
| Running Update Time | 168      |
----------------------------------
2025-02-01 12:16:59.490571 Eastern Standard Time
| Itration            | 169      |
| Real Det Return     | 96.1     |
| Real Sto Return     | 73.1     |
| Reward Loss         | -194     |
| Running Env Steps   | 84500    |
| Running Forward KL  | 5.41     |
| Running Reverse KL  | 6.78     |
| Running Update Time | 169      |
----------------------------------
2025-02-01 12:17:15.017471 Eastern Standard Time
| Itration            | 170      |
| Real Det Return     | 85       |
| Real Sto Return     | 75.3     |
| Reward Loss         | -185     |
| Running Env Steps   | 85000    |
| Running Forward KL  | 5.14     |
| Running Reverse KL  | 7.41     |
| Running Update Time | 170      |
----------------------------------
2025-02-01 12:17:30.311359 Eastern Standard Time
| Itration            | 171      |
| Real Det Return     | 118      |
| Real Sto Return     | 51       |
| Reward Loss         | -193     |
| Running Env Steps   | 85500    |
| Running Forward KL  | 5.27     |
| Running Reverse KL  | 6.77     |
| Running Update Time | 171      |
----------------------------------
2025-02-01 12:17:45.630205 Eastern Standard Time
| Itration            | 172      |
| Real Det Return     | 161      |
| Real Sto Return     | 93.1     |
| Reward Loss         | -191     |
| Running Env Steps   | 86000    |
| Running Forward KL  | 5.41     |
| Running Reverse KL  | 6.83     |
| Running Update Time | 172      |
----------------------------------
2025-02-01 12:18:00.968797 Eastern Standard Time
| Itration            | 173      |
| Real Det Return     | 153      |
| Real Sto Return     | 104      |
| Reward Loss         | -196     |
| Running Env Steps   | 86500    |
| Running Forward KL  | 5.38     |
| Running Reverse KL  | 6.9      |
| Running Update Time | 173      |
----------------------------------
2025-02-01 12:18:16.266268 Eastern Standard Time
| Itration            | 174      |
| Real Det Return     | 170      |
| Real Sto Return     | 106      |
| Reward Loss         | -191     |
| Running Env Steps   | 87000    |
| Running Forward KL  | 5.41     |
| Running Reverse KL  | 7.88     |
| Running Update Time | 174      |
----------------------------------
2025-02-01 12:18:31.560686 Eastern Standard Time
| Itration            | 175      |
| Real Det Return     | 178      |
| Real Sto Return     | 98.6     |
| Reward Loss         | -191     |
| Running Env Steps   | 87500    |
| Running Forward KL  | 5.79     |
| Running Reverse KL  | 8.58     |
| Running Update Time | 175      |
----------------------------------
2025-02-01 12:18:46.944410 Eastern Standard Time
| Itration            | 176      |
| Real Det Return     | 163      |
| Real Sto Return     | 107      |
| Reward Loss         | -195     |
| Running Env Steps   | 88000    |
| Running Forward KL  | 5.37     |
| Running Reverse KL  | 7.51     |
| Running Update Time | 176      |
----------------------------------
2025-02-01 12:19:02.265814 Eastern Standard Time
| Itration            | 177      |
| Real Det Return     | 167      |
| Real Sto Return     | 110      |
| Reward Loss         | -196     |
| Running Env Steps   | 88500    |
| Running Forward KL  | 5.24     |
| Running Reverse KL  | 7.28     |
| Running Update Time | 177      |
----------------------------------
2025-02-01 12:19:17.540950 Eastern Standard Time
| Itration            | 178      |
| Real Det Return     | 161      |
| Real Sto Return     | 102      |
| Reward Loss         | -211     |
| Running Env Steps   | 89000    |
| Running Forward KL  | 5.6      |
| Running Reverse KL  | 7.48     |
| Running Update Time | 178      |
----------------------------------
2025-02-01 12:19:32.791164 Eastern Standard Time
| Itration            | 179      |
| Real Det Return     | 150      |
| Real Sto Return     | 122      |
| Reward Loss         | -193     |
| Running Env Steps   | 89500    |
| Running Forward KL  | 5.09     |
| Running Reverse KL  | 7.55     |
| Running Update Time | 179      |
----------------------------------
2025-02-01 12:19:48.161350 Eastern Standard Time
| Itration            | 180      |
| Real Det Return     | 178      |
| Real Sto Return     | 113      |
| Reward Loss         | -202     |
| Running Env Steps   | 90000    |
| Running Forward KL  | 4.92     |
| Running Reverse KL  | 6.67     |
| Running Update Time | 180      |
----------------------------------
2025-02-01 12:20:03.464767 Eastern Standard Time
| Itration            | 181      |
| Real Det Return     | 165      |
| Real Sto Return     | 104      |
| Reward Loss         | -207     |
| Running Env Steps   | 90500    |
| Running Forward KL  | 5.62     |
| Running Reverse KL  | 8.17     |
| Running Update Time | 181      |
----------------------------------
2025-02-01 12:20:18.899020 Eastern Standard Time
| Itration            | 182      |
| Real Det Return     | 167      |
| Real Sto Return     | 110      |
| Reward Loss         | -208     |
| Running Env Steps   | 91000    |
| Running Forward KL  | 5.5      |
| Running Reverse KL  | 8        |
| Running Update Time | 182      |
----------------------------------
2025-02-01 12:20:34.310543 Eastern Standard Time
| Itration            | 183      |
| Real Det Return     | 124      |
| Real Sto Return     | 104      |
| Reward Loss         | -191     |
| Running Env Steps   | 91500    |
| Running Forward KL  | 4.38     |
| Running Reverse KL  | 6.98     |
| Running Update Time | 183      |
----------------------------------
2025-02-01 12:20:49.667960 Eastern Standard Time
| Itration            | 184      |
| Real Det Return     | 146      |
| Real Sto Return     | 112      |
| Reward Loss         | -197     |
| Running Env Steps   | 92000    |
| Running Forward KL  | 5.06     |
| Running Reverse KL  | 6.77     |
| Running Update Time | 184      |
----------------------------------
2025-02-01 12:21:05.066145 Eastern Standard Time
| Itration            | 185      |
| Real Det Return     | 164      |
| Real Sto Return     | 116      |
| Reward Loss         | -207     |
| Running Env Steps   | 92500    |
| Running Forward KL  | 4.97     |
| Running Reverse KL  | 7.52     |
| Running Update Time | 185      |
----------------------------------
2025-02-01 12:21:20.494827 Eastern Standard Time
| Itration            | 186      |
| Real Det Return     | 159      |
| Real Sto Return     | 120      |
| Reward Loss         | -214     |
| Running Env Steps   | 93000    |
| Running Forward KL  | 5.22     |
| Running Reverse KL  | 7.46     |
| Running Update Time | 186      |
----------------------------------
2025-02-01 12:21:35.965906 Eastern Standard Time
| Itration            | 187      |
| Real Det Return     | 177      |
| Real Sto Return     | 121      |
| Reward Loss         | -207     |
| Running Env Steps   | 93500    |
| Running Forward KL  | 5.1      |
| Running Reverse KL  | 7.52     |
| Running Update Time | 187      |
----------------------------------
2025-02-01 12:21:51.368025 Eastern Standard Time
| Itration            | 188      |
| Real Det Return     | 160      |
| Real Sto Return     | 117      |
| Reward Loss         | -200     |
| Running Env Steps   | 94000    |
| Running Forward KL  | 4.74     |
| Running Reverse KL  | 6.73     |
| Running Update Time | 188      |
----------------------------------
2025-02-01 12:22:06.766321 Eastern Standard Time
| Itration            | 189      |
| Real Det Return     | 176      |
| Real Sto Return     | 126      |
| Reward Loss         | -204     |
| Running Env Steps   | 94500    |
| Running Forward KL  | 4.77     |
| Running Reverse KL  | 7.08     |
| Running Update Time | 189      |
----------------------------------
2025-02-01 12:22:22.186521 Eastern Standard Time
| Itration            | 190      |
| Real Det Return     | 173      |
| Real Sto Return     | 131      |
| Reward Loss         | -200     |
| Running Env Steps   | 95000    |
| Running Forward KL  | 4.2      |
| Running Reverse KL  | 6.6      |
| Running Update Time | 190      |
----------------------------------
2025-02-01 12:22:37.629891 Eastern Standard Time
| Itration            | 191      |
| Real Det Return     | 142      |
| Real Sto Return     | 125      |
| Reward Loss         | -220     |
| Running Env Steps   | 95500    |
| Running Forward KL  | 5.15     |
| Running Reverse KL  | 6.9      |
| Running Update Time | 191      |
----------------------------------
2025-02-01 12:22:53.184941 Eastern Standard Time
| Itration            | 192      |
| Real Det Return     | 159      |
| Real Sto Return     | 122      |
| Reward Loss         | -238     |
| Running Env Steps   | 96000    |
| Running Forward KL  | 4.79     |
| Running Reverse KL  | 5.5      |
| Running Update Time | 192      |
----------------------------------
2025-02-01 12:23:08.670213 Eastern Standard Time
| Itration            | 193      |
| Real Det Return     | 183      |
| Real Sto Return     | 132      |
| Reward Loss         | -211     |
| Running Env Steps   | 96500    |
| Running Forward KL  | 4.68     |
| Running Reverse KL  | 6.44     |
| Running Update Time | 193      |
----------------------------------
2025-02-01 12:23:24.829589 Eastern Standard Time
| Itration            | 194      |
| Real Det Return     | 137      |
| Real Sto Return     | 142      |
| Reward Loss         | -208     |
| Running Env Steps   | 97000    |
| Running Forward KL  | 4.47     |
| Running Reverse KL  | 6.82     |
| Running Update Time | 194      |
----------------------------------
2025-02-01 12:23:40.738750 Eastern Standard Time
| Itration            | 195      |
| Real Det Return     | 175      |
| Real Sto Return     | 133      |
| Reward Loss         | -222     |
| Running Env Steps   | 97500    |
| Running Forward KL  | 4.37     |
| Running Reverse KL  | 7.4      |
| Running Update Time | 195      |
----------------------------------
2025-02-01 12:23:56.844034 Eastern Standard Time
| Itration            | 196      |
| Real Det Return     | 200      |
| Real Sto Return     | 146      |
| Reward Loss         | -210     |
| Running Env Steps   | 98000    |
| Running Forward KL  | 4.86     |
| Running Reverse KL  | 7.21     |
| Running Update Time | 196      |
----------------------------------
2025-02-01 12:24:13.216725 Eastern Standard Time
| Itration            | 197      |
| Real Det Return     | 159      |
| Real Sto Return     | 119      |
| Reward Loss         | -209     |
| Running Env Steps   | 98500    |
| Running Forward KL  | 3.92     |
| Running Reverse KL  | 6.28     |
| Running Update Time | 197      |
----------------------------------
2025-02-01 12:24:29.771963 Eastern Standard Time
| Itration            | 198      |
| Real Det Return     | 181      |
| Real Sto Return     | 128      |
| Reward Loss         | -218     |
| Running Env Steps   | 99000    |
| Running Forward KL  | 4.47     |
| Running Reverse KL  | 6        |
| Running Update Time | 198      |
----------------------------------
2025-02-01 12:24:45.422665 Eastern Standard Time
| Itration            | 199      |
| Real Det Return     | 149      |
| Real Sto Return     | 125      |
| Reward Loss         | -234     |
| Running Env Steps   | 99500    |
| Running Forward KL  | 4.63     |
| Running Reverse KL  | 5.39     |
| Running Update Time | 199      |
----------------------------------
2025-02-01 12:25:01.360868 Eastern Standard Time
| Itration            | 200      |
| Real Det Return     | 183      |
| Real Sto Return     | 127      |
| Reward Loss         | -214     |
| Running Env Steps   | 100000   |
| Running Forward KL  | 4.6      |
| Running Reverse KL  | 7.13     |
| Running Update Time | 200      |
----------------------------------
2025-02-01 12:25:17.140954 Eastern Standard Time
| Itration            | 201      |
| Real Det Return     | 188      |
| Real Sto Return     | 150      |
| Reward Loss         | -207     |
| Running Env Steps   | 100500   |
| Running Forward KL  | 4.08     |
| Running Reverse KL  | 6.97     |
| Running Update Time | 201      |
----------------------------------
2025-02-01 12:25:33.083241 Eastern Standard Time
| Itration            | 202      |
| Real Det Return     | 201      |
| Real Sto Return     | 146      |
| Reward Loss         | -216     |
| Running Env Steps   | 101000   |
| Running Forward KL  | 4.24     |
| Running Reverse KL  | 6.05     |
| Running Update Time | 202      |
----------------------------------
2025-02-01 12:25:49.095528 Eastern Standard Time
| Itration            | 203      |
| Real Det Return     | 198      |
| Real Sto Return     | 149      |
| Reward Loss         | -223     |
| Running Env Steps   | 101500   |
| Running Forward KL  | 4.34     |
| Running Reverse KL  | 6.83     |
| Running Update Time | 203      |
----------------------------------
2025-02-01 12:26:06.251608 Eastern Standard Time
| Itration            | 204      |
| Real Det Return     | 204      |
| Real Sto Return     | 155      |
| Reward Loss         | -218     |
| Running Env Steps   | 102000   |
| Running Forward KL  | 4.37     |
| Running Reverse KL  | 6.99     |
| Running Update Time | 204      |
----------------------------------
2025-02-01 12:26:22.074845 Eastern Standard Time
| Itration            | 205      |
| Real Det Return     | 194      |
| Real Sto Return     | 159      |
| Reward Loss         | -210     |
| Running Env Steps   | 102500   |
| Running Forward KL  | 3.84     |
| Running Reverse KL  | 6.59     |
| Running Update Time | 205      |
----------------------------------
2025-02-01 12:26:37.810186 Eastern Standard Time
| Itration            | 206      |
| Real Det Return     | 199      |
| Real Sto Return     | 148      |
| Reward Loss         | -224     |
| Running Env Steps   | 103000   |
| Running Forward KL  | 4.09     |
| Running Reverse KL  | 6.63     |
| Running Update Time | 206      |
----------------------------------
2025-02-01 12:26:53.916528 Eastern Standard Time
| Itration            | 207      |
| Real Det Return     | 207      |
| Real Sto Return     | 165      |
| Reward Loss         | -228     |
| Running Env Steps   | 103500   |
| Running Forward KL  | 4.38     |
| Running Reverse KL  | 6.61     |
| Running Update Time | 207      |
----------------------------------
2025-02-01 12:27:10.569404 Eastern Standard Time
| Itration            | 208      |
| Real Det Return     | 201      |
| Real Sto Return     | 160      |
| Reward Loss         | -226     |
| Running Env Steps   | 104000   |
| Running Forward KL  | 4.1      |
| Running Reverse KL  | 6.02     |
| Running Update Time | 208      |
----------------------------------
2025-02-01 12:27:26.263816 Eastern Standard Time
| Itration            | 209      |
| Real Det Return     | 193      |
| Real Sto Return     | 138      |
| Reward Loss         | -226     |
| Running Env Steps   | 104500   |
| Running Forward KL  | 3.89     |
| Running Reverse KL  | 6.49     |
| Running Update Time | 209      |
----------------------------------
2025-02-01 12:27:41.864332 Eastern Standard Time
| Itration            | 210      |
| Real Det Return     | 185      |
| Real Sto Return     | 152      |
| Reward Loss         | -217     |
| Running Env Steps   | 105000   |
| Running Forward KL  | 3.47     |
| Running Reverse KL  | 5.79     |
| Running Update Time | 210      |
----------------------------------
2025-02-01 12:27:57.413872 Eastern Standard Time
| Itration            | 211      |
| Real Det Return     | 171      |
| Real Sto Return     | 154      |
| Reward Loss         | -241     |
| Running Env Steps   | 105500   |
| Running Forward KL  | 4.09     |
| Running Reverse KL  | 5.77     |
| Running Update Time | 211      |
----------------------------------
2025-02-01 12:28:12.894841 Eastern Standard Time
| Itration            | 212      |
| Real Det Return     | 216      |
| Real Sto Return     | 185      |
| Reward Loss         | -211     |
| Running Env Steps   | 106000   |
| Running Forward KL  | 3.59     |
| Running Reverse KL  | 6.34     |
| Running Update Time | 212      |
----------------------------------
2025-02-01 12:28:28.410686 Eastern Standard Time
| Itration            | 213      |
| Real Det Return     | 206      |
| Real Sto Return     | 180      |
| Reward Loss         | -220     |
| Running Env Steps   | 106500   |
| Running Forward KL  | 3.69     |
| Running Reverse KL  | 5.32     |
| Running Update Time | 213      |
----------------------------------
2025-02-01 12:28:43.899490 Eastern Standard Time
| Itration            | 214      |
| Real Det Return     | 207      |
| Real Sto Return     | 164      |
| Reward Loss         | -237     |
| Running Env Steps   | 107000   |
| Running Forward KL  | 4.63     |
| Running Reverse KL  | 6.21     |
| Running Update Time | 214      |
----------------------------------
2025-02-01 12:29:00.247932 Eastern Standard Time
| Itration            | 215      |
| Real Det Return     | 174      |
| Real Sto Return     | 161      |
| Reward Loss         | -208     |
| Running Env Steps   | 107500   |
| Running Forward KL  | 4.05     |
| Running Reverse KL  | 6.1      |
| Running Update Time | 215      |
----------------------------------
2025-02-01 12:29:23.552749 Eastern Standard Time
| Itration            | 216      |
| Real Det Return     | 225      |
| Real Sto Return     | 179      |
| Reward Loss         | -219     |
| Running Env Steps   | 108000   |
| Running Forward KL  | 3.7      |
| Running Reverse KL  | 6.08     |
| Running Update Time | 216      |
----------------------------------
2025-02-01 12:29:44.260826 Eastern Standard Time
| Itration            | 217      |
| Real Det Return     | 222      |
| Real Sto Return     | 169      |
| Reward Loss         | -210     |
| Running Env Steps   | 108500   |
| Running Forward KL  | 3.55     |
| Running Reverse KL  | 5.99     |
| Running Update Time | 217      |
----------------------------------
2025-02-01 12:30:01.116362 Eastern Standard Time
| Itration            | 218      |
| Real Det Return     | 209      |
| Real Sto Return     | 181      |
| Reward Loss         | -209     |
| Running Env Steps   | 109000   |
| Running Forward KL  | 3.34     |
| Running Reverse KL  | 6.24     |
| Running Update Time | 218      |
----------------------------------
2025-02-01 12:30:17.317946 Eastern Standard Time
| Itration            | 219      |
| Real Det Return     | 181      |
| Real Sto Return     | 169      |
| Reward Loss         | -206     |
| Running Env Steps   | 109500   |
| Running Forward KL  | 3.88     |
| Running Reverse KL  | 6.06     |
| Running Update Time | 219      |
----------------------------------
2025-02-01 12:30:33.583908 Eastern Standard Time
| Itration            | 220      |
| Real Det Return     | 207      |
| Real Sto Return     | 177      |
| Reward Loss         | -225     |
| Running Env Steps   | 110000   |
| Running Forward KL  | 3.79     |
| Running Reverse KL  | 6.65     |
| Running Update Time | 220      |
----------------------------------
2025-02-01 12:30:49.561688 Eastern Standard Time
| Itration            | 221      |
| Real Det Return     | 240      |
| Real Sto Return     | 181      |
| Reward Loss         | -221     |
| Running Env Steps   | 110500   |
| Running Forward KL  | 3.86     |
| Running Reverse KL  | 6.57     |
| Running Update Time | 221      |
----------------------------------
2025-02-01 12:31:05.416895 Eastern Standard Time
| Itration            | 222      |
| Real Det Return     | 220      |
| Real Sto Return     | 182      |
| Reward Loss         | -222     |
| Running Env Steps   | 111000   |
| Running Forward KL  | 3.81     |
| Running Reverse KL  | 6.44     |
| Running Update Time | 222      |
----------------------------------
2025-02-01 12:31:21.959839 Eastern Standard Time
| Itration            | 223      |
| Real Det Return     | 213      |
| Real Sto Return     | 189      |
| Reward Loss         | -214     |
| Running Env Steps   | 111500   |
| Running Forward KL  | 3.76     |
| Running Reverse KL  | 6.38     |
| Running Update Time | 223      |
----------------------------------
2025-02-01 12:31:37.674458 Eastern Standard Time
| Itration            | 224      |
| Real Det Return     | 239      |
| Real Sto Return     | 177      |
| Reward Loss         | -198     |
| Running Env Steps   | 112000   |
| Running Forward KL  | 3.25     |
| Running Reverse KL  | 6.25     |
| Running Update Time | 224      |
----------------------------------
2025-02-01 12:31:53.787548 Eastern Standard Time
| Itration            | 225      |
| Real Det Return     | 229      |
| Real Sto Return     | 195      |
| Reward Loss         | -230     |
| Running Env Steps   | 112500   |
| Running Forward KL  | 3.53     |
| Running Reverse KL  | 5.54     |
| Running Update Time | 225      |
----------------------------------
2025-02-01 12:32:09.886381 Eastern Standard Time
| Itration            | 226      |
| Real Det Return     | 211      |
| Real Sto Return     | 181      |
| Reward Loss         | -218     |
| Running Env Steps   | 113000   |
| Running Forward KL  | 3.3      |
| Running Reverse KL  | 6.06     |
| Running Update Time | 226      |
----------------------------------
2025-02-01 12:32:25.760208 Eastern Standard Time
| Itration            | 227      |
| Real Det Return     | 226      |
| Real Sto Return     | 196      |
| Reward Loss         | -221     |
| Running Env Steps   | 113500   |
| Running Forward KL  | 3.47     |
| Running Reverse KL  | 5.52     |
| Running Update Time | 227      |
----------------------------------
2025-02-01 12:32:41.620333 Eastern Standard Time
| Itration            | 228      |
| Real Det Return     | 229      |
| Real Sto Return     | 187      |
| Reward Loss         | -212     |
| Running Env Steps   | 114000   |
| Running Forward KL  | 3.49     |
| Running Reverse KL  | 5.94     |
| Running Update Time | 228      |
----------------------------------
2025-02-01 12:32:57.105213 Eastern Standard Time
| Itration            | 229      |
| Real Det Return     | 238      |
| Real Sto Return     | 211      |
| Reward Loss         | -242     |
| Running Env Steps   | 114500   |
| Running Forward KL  | 3.89     |
| Running Reverse KL  | 5.73     |
| Running Update Time | 229      |
----------------------------------
2025-02-01 12:33:13.027829 Eastern Standard Time
| Itration            | 230      |
| Real Det Return     | 254      |
| Real Sto Return     | 207      |
| Reward Loss         | -206     |
| Running Env Steps   | 115000   |
| Running Forward KL  | 2.81     |
| Running Reverse KL  | 5.82     |
| Running Update Time | 230      |
----------------------------------
2025-02-01 12:33:28.970777 Eastern Standard Time
| Itration            | 231      |
| Real Det Return     | 230      |
| Real Sto Return     | 190      |
| Reward Loss         | -219     |
| Running Env Steps   | 115500   |
| Running Forward KL  | 2.96     |
| Running Reverse KL  | 5.17     |
| Running Update Time | 231      |
----------------------------------
2025-02-01 12:33:45.211293 Eastern Standard Time
| Itration            | 232      |
| Real Det Return     | 260      |
| Real Sto Return     | 202      |
| Reward Loss         | -224     |
| Running Env Steps   | 116000   |
| Running Forward KL  | 3.57     |
| Running Reverse KL  | 6.06     |
| Running Update Time | 232      |
----------------------------------
2025-02-01 12:34:01.455596 Eastern Standard Time
| Itration            | 233      |
| Real Det Return     | 266      |
| Real Sto Return     | 202      |
| Reward Loss         | -234     |
| Running Env Steps   | 116500   |
| Running Forward KL  | 3.91     |
| Running Reverse KL  | 5.9      |
| Running Update Time | 233      |
----------------------------------
2025-02-01 12:34:17.299071 Eastern Standard Time
| Itration            | 234      |
| Real Det Return     | 256      |
| Real Sto Return     | 215      |
| Reward Loss         | -230     |
| Running Env Steps   | 117000   |
| Running Forward KL  | 3.7      |
| Running Reverse KL  | 6.14     |
| Running Update Time | 234      |
----------------------------------
2025-02-01 12:34:33.096267 Eastern Standard Time
| Itration            | 235      |
| Real Det Return     | 258      |
| Real Sto Return     | 200      |
| Reward Loss         | -234     |
| Running Env Steps   | 117500   |
| Running Forward KL  | 3.69     |
| Running Reverse KL  | 5.94     |
| Running Update Time | 235      |
----------------------------------
2025-02-01 12:34:49.007608 Eastern Standard Time
| Itration            | 236      |
| Real Det Return     | 263      |
| Real Sto Return     | 200      |
| Reward Loss         | -202     |
| Running Env Steps   | 118000   |
| Running Forward KL  | 3.05     |
| Running Reverse KL  | 6.15     |
| Running Update Time | 236      |
----------------------------------
2025-02-01 12:35:04.821071 Eastern Standard Time
| Itration            | 237      |
| Real Det Return     | 244      |
| Real Sto Return     | 198      |
| Reward Loss         | -233     |
| Running Env Steps   | 118500   |
| Running Forward KL  | 3.58     |
| Running Reverse KL  | 6.09     |
| Running Update Time | 237      |
----------------------------------
2025-02-01 12:35:20.910896 Eastern Standard Time
| Itration            | 238      |
| Real Det Return     | 270      |
| Real Sto Return     | 225      |
| Reward Loss         | -224     |
| Running Env Steps   | 119000   |
| Running Forward KL  | 3.15     |
| Running Reverse KL  | 5.86     |
| Running Update Time | 238      |
----------------------------------
2025-02-01 12:35:36.705574 Eastern Standard Time
| Itration            | 239      |
| Real Det Return     | 287      |
| Real Sto Return     | 225      |
| Reward Loss         | -210     |
| Running Env Steps   | 119500   |
| Running Forward KL  | 3.19     |
| Running Reverse KL  | 6.12     |
| Running Update Time | 239      |
----------------------------------
2025-02-01 12:35:52.994299 Eastern Standard Time
| Itration            | 240      |
| Real Det Return     | 268      |
| Real Sto Return     | 204      |
| Reward Loss         | -216     |
| Running Env Steps   | 120000   |
| Running Forward KL  | 2.61     |
| Running Reverse KL  | 5.67     |
| Running Update Time | 240      |
----------------------------------
2025-02-01 12:36:10.000141 Eastern Standard Time
| Itration            | 241      |
| Real Det Return     | 245      |
| Real Sto Return     | 198      |
| Reward Loss         | -227     |
| Running Env Steps   | 120500   |
| Running Forward KL  | 4.12     |
| Running Reverse KL  | 6.08     |
| Running Update Time | 241      |
----------------------------------
2025-02-01 12:36:26.630797 Eastern Standard Time
| Itration            | 242      |
| Real Det Return     | 272      |
| Real Sto Return     | 220      |
| Reward Loss         | -215     |
| Running Env Steps   | 121000   |
| Running Forward KL  | 3.45     |
| Running Reverse KL  | 5.77     |
| Running Update Time | 242      |
----------------------------------
2025-02-01 12:36:42.799879 Eastern Standard Time
| Itration            | 243      |
| Real Det Return     | 270      |
| Real Sto Return     | 220      |
| Reward Loss         | -228     |
| Running Env Steps   | 121500   |
| Running Forward KL  | 3.62     |
| Running Reverse KL  | 6.2      |
| Running Update Time | 243      |
----------------------------------
2025-02-01 12:36:59.137380 Eastern Standard Time
| Itration            | 244      |
| Real Det Return     | 259      |
| Real Sto Return     | 233      |
| Reward Loss         | -222     |
| Running Env Steps   | 122000   |
| Running Forward KL  | 3.58     |
| Running Reverse KL  | 5.95     |
| Running Update Time | 244      |
----------------------------------
2025-02-01 12:37:16.336101 Eastern Standard Time
| Itration            | 245      |
| Real Det Return     | 271      |
| Real Sto Return     | 234      |
| Reward Loss         | -226     |
| Running Env Steps   | 122500   |
| Running Forward KL  | 3.42     |
| Running Reverse KL  | 5.76     |
| Running Update Time | 245      |
----------------------------------
2025-02-01 12:37:32.613428 Eastern Standard Time
| Itration            | 246      |
| Real Det Return     | 287      |
| Real Sto Return     | 216      |
| Reward Loss         | -218     |
| Running Env Steps   | 123000   |
| Running Forward KL  | 3.29     |
| Running Reverse KL  | 6.04     |
| Running Update Time | 246      |
----------------------------------
2025-02-01 12:37:49.255707 Eastern Standard Time
| Itration            | 247      |
| Real Det Return     | 292      |
| Real Sto Return     | 224      |
| Reward Loss         | -219     |
| Running Env Steps   | 123500   |
| Running Forward KL  | 3.21     |
| Running Reverse KL  | 5.74     |
| Running Update Time | 247      |
----------------------------------
2025-02-01 12:38:09.033525 Eastern Standard Time
| Itration            | 248      |
| Real Det Return     | 281      |
| Real Sto Return     | 231      |
| Reward Loss         | -212     |
| Running Env Steps   | 124000   |
| Running Forward KL  | 3.34     |
| Running Reverse KL  | 6.52     |
| Running Update Time | 248      |
----------------------------------
2025-02-01 12:38:24.632121 Eastern Standard Time
| Itration            | 249      |
| Real Det Return     | 273      |
| Real Sto Return     | 245      |
| Reward Loss         | -243     |
| Running Env Steps   | 124500   |
| Running Forward KL  | 3.47     |
| Running Reverse KL  | 5.92     |
| Running Update Time | 249      |
----------------------------------
2025-02-01 12:38:40.413771 Eastern Standard Time
| Itration            | 250      |
| Real Det Return     | 290      |
| Real Sto Return     | 245      |
| Reward Loss         | -226     |
| Running Env Steps   | 125000   |
| Running Forward KL  | 3.25     |
| Running Reverse KL  | 6.23     |
| Running Update Time | 250      |
----------------------------------
2025-02-01 12:38:57.058912 Eastern Standard Time
| Itration            | 251      |
| Real Det Return     | 279      |
| Real Sto Return     | 227      |
| Reward Loss         | -224     |
| Running Env Steps   | 125500   |
| Running Forward KL  | 3.4      |
| Running Reverse KL  | 6.23     |
| Running Update Time | 251      |
----------------------------------
2025-02-01 12:39:13.076661 Eastern Standard Time
| Itration            | 252      |
| Real Det Return     | 267      |
| Real Sto Return     | 234      |
| Reward Loss         | -231     |
| Running Env Steps   | 126000   |
| Running Forward KL  | 2.93     |
| Running Reverse KL  | 5.76     |
| Running Update Time | 252      |
----------------------------------
2025-02-01 12:39:28.823327 Eastern Standard Time
| Itration            | 253      |
| Real Det Return     | 291      |
| Real Sto Return     | 266      |
| Reward Loss         | -226     |
| Running Env Steps   | 126500   |
| Running Forward KL  | 2.83     |
| Running Reverse KL  | 5.63     |
| Running Update Time | 253      |
----------------------------------
2025-02-01 12:39:44.596182 Eastern Standard Time
| Itration            | 254      |
| Real Det Return     | 290      |
| Real Sto Return     | 243      |
| Reward Loss         | -224     |
| Running Env Steps   | 127000   |
| Running Forward KL  | 3.31     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 254      |
----------------------------------
2025-02-01 12:40:00.553713 Eastern Standard Time
| Itration            | 255      |
| Real Det Return     | 265      |
| Real Sto Return     | 207      |
| Reward Loss         | -237     |
| Running Env Steps   | 127500   |
| Running Forward KL  | 2.77     |
| Running Reverse KL  | 5.9      |
| Running Update Time | 255      |
----------------------------------
2025-02-01 12:40:16.208283 Eastern Standard Time
| Itration            | 256      |
| Real Det Return     | 275      |
| Real Sto Return     | 250      |
| Reward Loss         | -239     |
| Running Env Steps   | 128000   |
| Running Forward KL  | 3.25     |
| Running Reverse KL  | 5.63     |
| Running Update Time | 256      |
----------------------------------
2025-02-01 12:40:31.878558 Eastern Standard Time
| Itration            | 257      |
| Real Det Return     | 286      |
| Real Sto Return     | 242      |
| Reward Loss         | -218     |
| Running Env Steps   | 128500   |
| Running Forward KL  | 2.6      |
| Running Reverse KL  | 5.55     |
| Running Update Time | 257      |
----------------------------------
2025-02-01 12:40:47.933857 Eastern Standard Time
| Itration            | 258      |
| Real Det Return     | 312      |
| Real Sto Return     | 254      |
| Reward Loss         | -235     |
| Running Env Steps   | 129000   |
| Running Forward KL  | 3.22     |
| Running Reverse KL  | 5.98     |
| Running Update Time | 258      |
----------------------------------
2025-02-01 12:41:04.670600 Eastern Standard Time
| Itration            | 259      |
| Real Det Return     | 301      |
| Real Sto Return     | 249      |
| Reward Loss         | -230     |
| Running Env Steps   | 129500   |
| Running Forward KL  | 3.44     |
| Running Reverse KL  | 6.22     |
| Running Update Time | 259      |
----------------------------------
2025-02-01 12:41:21.303882 Eastern Standard Time
| Itration            | 260      |
| Real Det Return     | 305      |
| Real Sto Return     | 236      |
| Reward Loss         | -224     |
| Running Env Steps   | 130000   |
| Running Forward KL  | 3.25     |
| Running Reverse KL  | 6.01     |
| Running Update Time | 260      |
----------------------------------
2025-02-01 12:41:37.511871 Eastern Standard Time
| Itration            | 261      |
| Real Det Return     | 294      |
| Real Sto Return     | 237      |
| Reward Loss         | -243     |
| Running Env Steps   | 130500   |
| Running Forward KL  | 3.01     |
| Running Reverse KL  | 4.97     |
| Running Update Time | 261      |
----------------------------------
2025-02-01 12:41:53.220396 Eastern Standard Time
| Itration            | 262      |
| Real Det Return     | 300      |
| Real Sto Return     | 266      |
| Reward Loss         | -233     |
| Running Env Steps   | 131000   |
| Running Forward KL  | 2.95     |
| Running Reverse KL  | 5.6      |
| Running Update Time | 262      |
----------------------------------
2025-02-01 12:42:08.781520 Eastern Standard Time
| Itration            | 263      |
| Real Det Return     | 306      |
| Real Sto Return     | 263      |
| Reward Loss         | -224     |
| Running Env Steps   | 131500   |
| Running Forward KL  | 3.09     |
| Running Reverse KL  | 6.12     |
| Running Update Time | 263      |
----------------------------------
2025-02-01 12:42:24.624694 Eastern Standard Time
| Itration            | 264      |
| Real Det Return     | 314      |
| Real Sto Return     | 263      |
| Reward Loss         | -227     |
| Running Env Steps   | 132000   |
| Running Forward KL  | 2.76     |
| Running Reverse KL  | 5.94     |
| Running Update Time | 264      |
----------------------------------
2025-02-01 12:42:40.706959 Eastern Standard Time
| Itration            | 265      |
| Real Det Return     | 318      |
| Real Sto Return     | 273      |
| Reward Loss         | -221     |
| Running Env Steps   | 132500   |
| Running Forward KL  | 3.58     |
| Running Reverse KL  | 6.14     |
| Running Update Time | 265      |
----------------------------------
2025-02-01 12:42:57.364299 Eastern Standard Time
| Itration            | 266      |
| Real Det Return     | 293      |
| Real Sto Return     | 267      |
| Reward Loss         | -212     |
| Running Env Steps   | 133000   |
| Running Forward KL  | 3.26     |
| Running Reverse KL  | 6.6      |
| Running Update Time | 266      |
----------------------------------
2025-02-01 12:43:13.152947 Eastern Standard Time
| Itration            | 267      |
| Real Det Return     | 307      |
| Real Sto Return     | 282      |
| Reward Loss         | -216     |
| Running Env Steps   | 133500   |
| Running Forward KL  | 2.98     |
| Running Reverse KL  | 6.16     |
| Running Update Time | 267      |
----------------------------------
2025-02-01 12:43:28.653276 Eastern Standard Time
| Itration            | 268      |
| Real Det Return     | 308      |
| Real Sto Return     | 269      |
| Reward Loss         | -233     |
| Running Env Steps   | 134000   |
| Running Forward KL  | 3.28     |
| Running Reverse KL  | 5.92     |
| Running Update Time | 268      |
----------------------------------
2025-02-01 12:43:44.267307 Eastern Standard Time
| Itration            | 269      |
| Real Det Return     | 328      |
| Real Sto Return     | 284      |
| Reward Loss         | -212     |
| Running Env Steps   | 134500   |
| Running Forward KL  | 2.46     |
| Running Reverse KL  | 6.15     |
| Running Update Time | 269      |
----------------------------------
2025-02-01 12:43:59.791969 Eastern Standard Time
| Itration            | 270      |
| Real Det Return     | 351      |
| Real Sto Return     | 300      |
| Reward Loss         | -218     |
| Running Env Steps   | 135000   |
| Running Forward KL  | 3.27     |
| Running Reverse KL  | 5.92     |
| Running Update Time | 270      |
----------------------------------
2025-02-01 12:44:15.424304 Eastern Standard Time
| Itration            | 271      |
| Real Det Return     | 314      |
| Real Sto Return     | 255      |
| Reward Loss         | -241     |
| Running Env Steps   | 135500   |
| Running Forward KL  | 2.85     |
| Running Reverse KL  | 5.9      |
| Running Update Time | 271      |
----------------------------------
2025-02-01 12:44:30.991396 Eastern Standard Time
| Itration            | 272      |
| Real Det Return     | 304      |
| Real Sto Return     | 258      |
| Reward Loss         | -242     |
| Running Env Steps   | 136000   |
| Running Forward KL  | 3.35     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 272      |
----------------------------------
2025-02-01 12:44:46.608168 Eastern Standard Time
| Itration            | 273      |
| Real Det Return     | 308      |
| Real Sto Return     | 276      |
| Reward Loss         | -232     |
| Running Env Steps   | 136500   |
| Running Forward KL  | 2.92     |
| Running Reverse KL  | 6.14     |
| Running Update Time | 273      |
----------------------------------
2025-02-01 12:45:02.137391 Eastern Standard Time
| Itration            | 274      |
| Real Det Return     | 336      |
| Real Sto Return     | 283      |
| Reward Loss         | -230     |
| Running Env Steps   | 137000   |
| Running Forward KL  | 2.74     |
| Running Reverse KL  | 6.13     |
| Running Update Time | 274      |
----------------------------------
2025-02-01 12:45:17.818408 Eastern Standard Time
| Itration            | 275      |
| Real Det Return     | 333      |
| Real Sto Return     | 289      |
| Reward Loss         | -214     |
| Running Env Steps   | 137500   |
| Running Forward KL  | 3.01     |
| Running Reverse KL  | 6.52     |
| Running Update Time | 275      |
----------------------------------
2025-02-01 12:45:33.378035 Eastern Standard Time
| Itration            | 276      |
| Real Det Return     | 322      |
| Real Sto Return     | 275      |
| Reward Loss         | -231     |
| Running Env Steps   | 138000   |
| Running Forward KL  | 2.91     |
| Running Reverse KL  | 5.53     |
| Running Update Time | 276      |
----------------------------------
2025-02-01 12:45:48.940964 Eastern Standard Time
| Itration            | 277      |
| Real Det Return     | 333      |
| Real Sto Return     | 286      |
| Reward Loss         | -232     |
| Running Env Steps   | 138500   |
| Running Forward KL  | 2.39     |
| Running Reverse KL  | 6.08     |
| Running Update Time | 277      |
----------------------------------
2025-02-01 12:46:04.483793 Eastern Standard Time
| Itration            | 278      |
| Real Det Return     | 359      |
| Real Sto Return     | 292      |
| Reward Loss         | -240     |
| Running Env Steps   | 139000   |
| Running Forward KL  | 2.82     |
| Running Reverse KL  | 6.73     |
| Running Update Time | 278      |
----------------------------------
2025-02-01 12:46:20.109549 Eastern Standard Time
| Itration            | 279      |
| Real Det Return     | 337      |
| Real Sto Return     | 285      |
| Reward Loss         | -245     |
| Running Env Steps   | 139500   |
| Running Forward KL  | 2.85     |
| Running Reverse KL  | 6.35     |
| Running Update Time | 279      |
----------------------------------
2025-02-01 12:46:35.749268 Eastern Standard Time
| Itration            | 280      |
| Real Det Return     | 334      |
| Real Sto Return     | 295      |
| Reward Loss         | -243     |
| Running Env Steps   | 140000   |
| Running Forward KL  | 3.05     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 280      |
----------------------------------
2025-02-01 12:46:51.324350 Eastern Standard Time
| Itration            | 281      |
| Real Det Return     | 364      |
| Real Sto Return     | 296      |
| Reward Loss         | -221     |
| Running Env Steps   | 140500   |
| Running Forward KL  | 2.52     |
| Running Reverse KL  | 6.12     |
| Running Update Time | 281      |
----------------------------------
2025-02-01 12:47:08.183871 Eastern Standard Time
| Itration            | 282      |
| Real Det Return     | 362      |
| Real Sto Return     | 311      |
| Reward Loss         | -253     |
| Running Env Steps   | 141000   |
| Running Forward KL  | 2.63     |
| Running Reverse KL  | 6.49     |
| Running Update Time | 282      |
----------------------------------
2025-02-01 12:47:24.942430 Eastern Standard Time
| Itration            | 283      |
| Real Det Return     | 358      |
| Real Sto Return     | 302      |
| Reward Loss         | -225     |
| Running Env Steps   | 141500   |
| Running Forward KL  | 2.56     |
| Running Reverse KL  | 5.94     |
| Running Update Time | 283      |
----------------------------------
2025-02-01 12:47:40.502715 Eastern Standard Time
| Itration            | 284      |
| Real Det Return     | 327      |
| Real Sto Return     | 300      |
| Reward Loss         | -254     |
| Running Env Steps   | 142000   |
| Running Forward KL  | 2.66     |
| Running Reverse KL  | 6.37     |
| Running Update Time | 284      |
----------------------------------
2025-02-01 12:47:57.522498 Eastern Standard Time
| Itration            | 285      |
| Real Det Return     | 376      |
| Real Sto Return     | 295      |
| Reward Loss         | -232     |
| Running Env Steps   | 142500   |
| Running Forward KL  | 2.55     |
| Running Reverse KL  | 6.23     |
| Running Update Time | 285      |
----------------------------------
2025-02-01 12:48:13.554766 Eastern Standard Time
| Itration            | 286      |
| Real Det Return     | 365      |
| Real Sto Return     | 298      |
| Reward Loss         | -231     |
| Running Env Steps   | 143000   |
| Running Forward KL  | 2.84     |
| Running Reverse KL  | 6.56     |
| Running Update Time | 286      |
----------------------------------
2025-02-01 12:48:29.416895 Eastern Standard Time
| Itration            | 287      |
| Real Det Return     | 359      |
| Real Sto Return     | 309      |
| Reward Loss         | -213     |
| Running Env Steps   | 143500   |
| Running Forward KL  | 2.37     |
| Running Reverse KL  | 5.99     |
| Running Update Time | 287      |
----------------------------------
2025-02-01 12:48:45.129767 Eastern Standard Time
| Itration            | 288      |
| Real Det Return     | 361      |
| Real Sto Return     | 311      |
| Reward Loss         | -248     |
| Running Env Steps   | 144000   |
| Running Forward KL  | 2.74     |
| Running Reverse KL  | 6.01     |
| Running Update Time | 288      |
----------------------------------
2025-02-01 12:49:01.275832 Eastern Standard Time
| Itration            | 289      |
| Real Det Return     | 350      |
| Real Sto Return     | 303      |
| Reward Loss         | -223     |
| Running Env Steps   | 144500   |
| Running Forward KL  | 3.03     |
| Running Reverse KL  | 6.67     |
| Running Update Time | 289      |
----------------------------------
2025-02-01 12:49:16.996334 Eastern Standard Time
| Itration            | 290      |
| Real Det Return     | 352      |
| Real Sto Return     | 328      |
| Reward Loss         | -225     |
| Running Env Steps   | 145000   |
| Running Forward KL  | 3.12     |
| Running Reverse KL  | 5.94     |
| Running Update Time | 290      |
----------------------------------
2025-02-01 12:49:33.094414 Eastern Standard Time
| Itration            | 291      |
| Real Det Return     | 361      |
| Real Sto Return     | 303      |
| Reward Loss         | -227     |
| Running Env Steps   | 145500   |
| Running Forward KL  | 2.38     |
| Running Reverse KL  | 6.27     |
| Running Update Time | 291      |
----------------------------------
2025-02-01 12:49:48.994370 Eastern Standard Time
| Itration            | 292      |
| Real Det Return     | 346      |
| Real Sto Return     | 310      |
| Reward Loss         | -238     |
| Running Env Steps   | 146000   |
| Running Forward KL  | 3.16     |
| Running Reverse KL  | 6.27     |
| Running Update Time | 292      |
----------------------------------
2025-02-01 12:50:04.531057 Eastern Standard Time
| Itration            | 293      |
| Real Det Return     | 351      |
| Real Sto Return     | 312      |
| Reward Loss         | -233     |
| Running Env Steps   | 146500   |
| Running Forward KL  | 2.1      |
| Running Reverse KL  | 6.01     |
| Running Update Time | 293      |
----------------------------------
2025-02-01 12:50:21.361577 Eastern Standard Time
| Itration            | 294      |
| Real Det Return     | 376      |
| Real Sto Return     | 317      |
| Reward Loss         | -211     |
| Running Env Steps   | 147000   |
| Running Forward KL  | 2.78     |
| Running Reverse KL  | 5.96     |
| Running Update Time | 294      |
----------------------------------
2025-02-01 12:50:37.229763 Eastern Standard Time
| Itration            | 295      |
| Real Det Return     | 346      |
| Real Sto Return     | 325      |
| Reward Loss         | -227     |
| Running Env Steps   | 147500   |
| Running Forward KL  | 2.61     |
| Running Reverse KL  | 6.23     |
| Running Update Time | 295      |
----------------------------------
2025-02-01 12:50:53.208909 Eastern Standard Time
| Itration            | 296      |
| Real Det Return     | 376      |
| Real Sto Return     | 335      |
| Reward Loss         | -246     |
| Running Env Steps   | 148000   |
| Running Forward KL  | 2.79     |
| Running Reverse KL  | 6.39     |
| Running Update Time | 296      |
----------------------------------
2025-02-01 12:51:11.264901 Eastern Standard Time
| Itration            | 297      |
| Real Det Return     | 358      |
| Real Sto Return     | 338      |
| Reward Loss         | -230     |
| Running Env Steps   | 148500   |
| Running Forward KL  | 2.72     |
| Running Reverse KL  | 6.57     |
| Running Update Time | 297      |
----------------------------------
2025-02-01 12:51:27.139789 Eastern Standard Time
| Itration            | 298      |
| Real Det Return     | 374      |
| Real Sto Return     | 347      |
| Reward Loss         | -219     |
| Running Env Steps   | 149000   |
| Running Forward KL  | 2.5      |
| Running Reverse KL  | 6.05     |
| Running Update Time | 298      |
----------------------------------
2025-02-01 12:51:45.951866 Eastern Standard Time
| Itration            | 299      |
| Real Det Return     | 380      |
| Real Sto Return     | 340      |
| Reward Loss         | -237     |
| Running Env Steps   | 149500   |
| Running Forward KL  | 2.03     |
| Running Reverse KL  | 6.04     |
| Running Update Time | 299      |
----------------------------------
2025-02-01 12:52:02.512195 Eastern Standard Time
| Itration            | 300      |
| Real Det Return     | 395      |
| Real Sto Return     | 339      |
| Reward Loss         | -237     |
| Running Env Steps   | 150000   |
| Running Forward KL  | 2.76     |
| Running Reverse KL  | 6.07     |
| Running Update Time | 300      |
----------------------------------
2025-02-01 12:52:18.586149 Eastern Standard Time
| Itration            | 301      |
| Real Det Return     | 385      |
| Real Sto Return     | 347      |
| Reward Loss         | -212     |
| Running Env Steps   | 150500   |
| Running Forward KL  | 2.13     |
| Running Reverse KL  | 6.49     |
| Running Update Time | 301      |
----------------------------------
2025-02-01 12:52:36.850113 Eastern Standard Time
| Itration            | 302      |
| Real Det Return     | 382      |
| Real Sto Return     | 345      |
| Reward Loss         | -244     |
| Running Env Steps   | 151000   |
| Running Forward KL  | 2.33     |
| Running Reverse KL  | 6.18     |
| Running Update Time | 302      |
----------------------------------
2025-02-01 12:52:53.263056 Eastern Standard Time
| Itration            | 303      |
| Real Det Return     | 386      |
| Real Sto Return     | 342      |
| Reward Loss         | -228     |
| Running Env Steps   | 151500   |
| Running Forward KL  | 2.05     |
| Running Reverse KL  | 5.6      |
| Running Update Time | 303      |
----------------------------------
2025-02-01 12:53:09.732788 Eastern Standard Time
| Itration            | 304      |
| Real Det Return     | 374      |
| Real Sto Return     | 333      |
| Reward Loss         | -244     |
| Running Env Steps   | 152000   |
| Running Forward KL  | 2.08     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 304      |
----------------------------------
2025-02-01 12:53:26.080763 Eastern Standard Time
| Itration            | 305      |
| Real Det Return     | 375      |
| Real Sto Return     | 348      |
| Reward Loss         | -238     |
| Running Env Steps   | 152500   |
| Running Forward KL  | 2.6      |
| Running Reverse KL  | 6.24     |
| Running Update Time | 305      |
----------------------------------
2025-02-01 12:53:42.398176 Eastern Standard Time
| Itration            | 306      |
| Real Det Return     | 384      |
| Real Sto Return     | 343      |
| Reward Loss         | -212     |
| Running Env Steps   | 153000   |
| Running Forward KL  | 2.27     |
| Running Reverse KL  | 6.51     |
| Running Update Time | 306      |
----------------------------------
2025-02-01 12:53:57.976317 Eastern Standard Time
| Itration            | 307      |
| Real Det Return     | 395      |
| Real Sto Return     | 367      |
| Reward Loss         | -201     |
| Running Env Steps   | 153500   |
| Running Forward KL  | 2.09     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 307      |
----------------------------------
2025-02-01 12:54:13.744138 Eastern Standard Time
| Itration            | 308      |
| Real Det Return     | 394      |
| Real Sto Return     | 348      |
| Reward Loss         | -221     |
| Running Env Steps   | 154000   |
| Running Forward KL  | 2.9      |
| Running Reverse KL  | 6.62     |
| Running Update Time | 308      |
----------------------------------
2025-02-01 12:54:29.484519 Eastern Standard Time
| Itration            | 309      |
| Real Det Return     | 381      |
| Real Sto Return     | 349      |
| Reward Loss         | -243     |
| Running Env Steps   | 154500   |
| Running Forward KL  | 2.52     |
| Running Reverse KL  | 6.01     |
| Running Update Time | 309      |
----------------------------------
2025-02-01 12:54:46.520354 Eastern Standard Time
| Itration            | 310      |
| Real Det Return     | 406      |
| Real Sto Return     | 328      |
| Reward Loss         | -240     |
| Running Env Steps   | 155000   |
| Running Forward KL  | 2.26     |
| Running Reverse KL  | 5.88     |
| Running Update Time | 310      |
----------------------------------
2025-02-01 12:55:02.668091 Eastern Standard Time
| Itration            | 311      |
| Real Det Return     | 406      |
| Real Sto Return     | 350      |
| Reward Loss         | -228     |
| Running Env Steps   | 155500   |
| Running Forward KL  | 2.72     |
| Running Reverse KL  | 6.18     |
| Running Update Time | 311      |
----------------------------------
2025-02-01 12:55:18.276177 Eastern Standard Time
| Itration            | 312      |
| Real Det Return     | 407      |
| Real Sto Return     | 365      |
| Reward Loss         | -254     |
| Running Env Steps   | 156000   |
| Running Forward KL  | 2.71     |
| Running Reverse KL  | 6.15     |
| Running Update Time | 312      |
----------------------------------
2025-02-01 12:55:33.854195 Eastern Standard Time
| Itration            | 313      |
| Real Det Return     | 407      |
| Real Sto Return     | 353      |
| Reward Loss         | -240     |
| Running Env Steps   | 156500   |
| Running Forward KL  | 2.3      |
| Running Reverse KL  | 6        |
| Running Update Time | 313      |
----------------------------------
2025-02-01 12:55:49.826230 Eastern Standard Time
| Itration            | 314      |
| Real Det Return     | 395      |
| Real Sto Return     | 341      |
| Reward Loss         | -233     |
| Running Env Steps   | 157000   |
| Running Forward KL  | 2.56     |
| Running Reverse KL  | 6.25     |
| Running Update Time | 314      |
----------------------------------
2025-02-01 12:56:05.730174 Eastern Standard Time
| Itration            | 315      |
| Real Det Return     | 382      |
| Real Sto Return     | 337      |
| Reward Loss         | -243     |
| Running Env Steps   | 157500   |
| Running Forward KL  | 2.25     |
| Running Reverse KL  | 5.89     |
| Running Update Time | 315      |
----------------------------------
2025-02-01 12:56:21.614710 Eastern Standard Time
| Itration            | 316      |
| Real Det Return     | 426      |
| Real Sto Return     | 360      |
| Reward Loss         | -238     |
| Running Env Steps   | 158000   |
| Running Forward KL  | 2.62     |
| Running Reverse KL  | 6.14     |
| Running Update Time | 316      |
----------------------------------
2025-02-01 12:56:37.405442 Eastern Standard Time
| Itration            | 317      |
| Real Det Return     | 418      |
| Real Sto Return     | 369      |
| Reward Loss         | -247     |
| Running Env Steps   | 158500   |
| Running Forward KL  | 2.51     |
| Running Reverse KL  | 6.07     |
| Running Update Time | 317      |
----------------------------------
2025-02-01 12:56:53.052155 Eastern Standard Time
| Itration            | 318      |
| Real Det Return     | 401      |
| Real Sto Return     | 365      |
| Reward Loss         | -228     |
| Running Env Steps   | 159000   |
| Running Forward KL  | 1.94     |
| Running Reverse KL  | 5.89     |
| Running Update Time | 318      |
----------------------------------
2025-02-01 12:57:08.724717 Eastern Standard Time
| Itration            | 319      |
| Real Det Return     | 409      |
| Real Sto Return     | 364      |
| Reward Loss         | -237     |
| Running Env Steps   | 159500   |
| Running Forward KL  | 2.25     |
| Running Reverse KL  | 5.82     |
| Running Update Time | 319      |
----------------------------------
2025-02-01 12:57:24.385777 Eastern Standard Time
| Itration            | 320      |
| Real Det Return     | 401      |
| Real Sto Return     | 376      |
| Reward Loss         | -230     |
| Running Env Steps   | 160000   |
| Running Forward KL  | 1.86     |
| Running Reverse KL  | 6.28     |
| Running Update Time | 320      |
----------------------------------
2025-02-01 12:57:39.990344 Eastern Standard Time
| Itration            | 321      |
| Real Det Return     | 406      |
| Real Sto Return     | 377      |
| Reward Loss         | -225     |
| Running Env Steps   | 160500   |
| Running Forward KL  | 2.46     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 321      |
----------------------------------
2025-02-01 12:57:55.705427 Eastern Standard Time
| Itration            | 322      |
| Real Det Return     | 420      |
| Real Sto Return     | 389      |
| Reward Loss         | -234     |
| Running Env Steps   | 161000   |
| Running Forward KL  | 1.98     |
| Running Reverse KL  | 6.21     |
| Running Update Time | 322      |
----------------------------------
2025-02-01 12:58:11.862473 Eastern Standard Time
| Itration            | 323      |
| Real Det Return     | 404      |
| Real Sto Return     | 349      |
| Reward Loss         | -252     |
| Running Env Steps   | 161500   |
| Running Forward KL  | 2.44     |
| Running Reverse KL  | 6.12     |
| Running Update Time | 323      |
----------------------------------
2025-02-01 12:58:27.617796 Eastern Standard Time
| Itration            | 324      |
| Real Det Return     | 420      |
| Real Sto Return     | 373      |
| Reward Loss         | -224     |
| Running Env Steps   | 162000   |
| Running Forward KL  | 1.55     |
| Running Reverse KL  | 5.89     |
| Running Update Time | 324      |
----------------------------------
2025-02-01 12:58:43.373647 Eastern Standard Time
| Itration            | 325      |
| Real Det Return     | 424      |
| Real Sto Return     | 374      |
| Reward Loss         | -230     |
| Running Env Steps   | 162500   |
| Running Forward KL  | 2.27     |
| Running Reverse KL  | 6.35     |
| Running Update Time | 325      |
----------------------------------
2025-02-01 12:58:59.941699 Eastern Standard Time
| Itration            | 326      |
| Real Det Return     | 414      |
| Real Sto Return     | 380      |
| Reward Loss         | -214     |
| Running Env Steps   | 163000   |
| Running Forward KL  | 1.91     |
| Running Reverse KL  | 5.97     |
| Running Update Time | 326      |
----------------------------------
2025-02-01 12:59:16.256928 Eastern Standard Time
| Itration            | 327      |
| Real Det Return     | 398      |
| Real Sto Return     | 376      |
| Reward Loss         | -222     |
| Running Env Steps   | 163500   |
| Running Forward KL  | 1.59     |
| Running Reverse KL  | 5.31     |
| Running Update Time | 327      |
----------------------------------
2025-02-01 12:59:32.732456 Eastern Standard Time
| Itration            | 328      |
| Real Det Return     | 418      |
| Real Sto Return     | 372      |
| Reward Loss         | -233     |
| Running Env Steps   | 164000   |
| Running Forward KL  | 1.97     |
| Running Reverse KL  | 6.04     |
| Running Update Time | 328      |
----------------------------------
2025-02-01 12:59:48.970228 Eastern Standard Time
| Itration            | 329      |
| Real Det Return     | 423      |
| Real Sto Return     | 373      |
| Reward Loss         | -235     |
| Running Env Steps   | 164500   |
| Running Forward KL  | 2.56     |
| Running Reverse KL  | 6.43     |
| Running Update Time | 329      |
----------------------------------
2025-02-01 13:00:04.780585 Eastern Standard Time
| Itration            | 330      |
| Real Det Return     | 419      |
| Real Sto Return     | 384      |
| Reward Loss         | -203     |
| Running Env Steps   | 165000   |
| Running Forward KL  | 2.46     |
| Running Reverse KL  | 5.92     |
| Running Update Time | 330      |
----------------------------------
2025-02-01 13:00:21.186859 Eastern Standard Time
| Itration            | 331      |
| Real Det Return     | 416      |
| Real Sto Return     | 374      |
| Reward Loss         | -245     |
| Running Env Steps   | 165500   |
| Running Forward KL  | 1.89     |
| Running Reverse KL  | 5.68     |
| Running Update Time | 331      |
----------------------------------
2025-02-01 13:00:37.281675 Eastern Standard Time
| Itration            | 332      |
| Real Det Return     | 441      |
| Real Sto Return     | 393      |
| Reward Loss         | -224     |
| Running Env Steps   | 166000   |
| Running Forward KL  | 1.87     |
| Running Reverse KL  | 5.62     |
| Running Update Time | 332      |
----------------------------------
2025-02-01 13:00:53.141987 Eastern Standard Time
| Itration            | 333      |
| Real Det Return     | 403      |
| Real Sto Return     | 360      |
| Reward Loss         | -255     |
| Running Env Steps   | 166500   |
| Running Forward KL  | 2.47     |
| Running Reverse KL  | 5.48     |
| Running Update Time | 333      |
----------------------------------
2025-02-01 13:01:09.590741 Eastern Standard Time
| Itration            | 334      |
| Real Det Return     | 437      |
| Real Sto Return     | 391      |
| Reward Loss         | -254     |
| Running Env Steps   | 167000   |
| Running Forward KL  | 2.1      |
| Running Reverse KL  | 5.26     |
| Running Update Time | 334      |
----------------------------------
2025-02-01 13:01:26.306153 Eastern Standard Time
| Itration            | 335      |
| Real Det Return     | 390      |
| Real Sto Return     | 389      |
| Reward Loss         | -230     |
| Running Env Steps   | 167500   |
| Running Forward KL  | 2.28     |
| Running Reverse KL  | 6.2      |
| Running Update Time | 335      |
----------------------------------
2025-02-01 13:01:42.917892 Eastern Standard Time
| Itration            | 336      |
| Real Det Return     | 411      |
| Real Sto Return     | 375      |
| Reward Loss         | -245     |
| Running Env Steps   | 168000   |
| Running Forward KL  | 1.82     |
| Running Reverse KL  | 5.85     |
| Running Update Time | 336      |
----------------------------------
2025-02-01 13:01:59.609933 Eastern Standard Time
| Itration            | 337      |
| Real Det Return     | 428      |
| Real Sto Return     | 393      |
| Reward Loss         | -248     |
| Running Env Steps   | 168500   |
| Running Forward KL  | 2.25     |
| Running Reverse KL  | 5.89     |
| Running Update Time | 337      |
----------------------------------
2025-02-01 13:02:15.870252 Eastern Standard Time
| Itration            | 338      |
| Real Det Return     | 407      |
| Real Sto Return     | 386      |
| Reward Loss         | -258     |
| Running Env Steps   | 169000   |
| Running Forward KL  | 2.02     |
| Running Reverse KL  | 6.69     |
| Running Update Time | 338      |
----------------------------------
2025-02-01 13:02:31.509433 Eastern Standard Time
| Itration            | 339      |
| Real Det Return     | 433      |
| Real Sto Return     | 394      |
| Reward Loss         | -237     |
| Running Env Steps   | 169500   |
| Running Forward KL  | 2.52     |
| Running Reverse KL  | 6.21     |
| Running Update Time | 339      |
----------------------------------
2025-02-01 13:02:47.373548 Eastern Standard Time
| Itration            | 340      |
| Real Det Return     | 428      |
| Real Sto Return     | 388      |
| Reward Loss         | -254     |
| Running Env Steps   | 170000   |
| Running Forward KL  | 2.56     |
| Running Reverse KL  | 6.38     |
| Running Update Time | 340      |
----------------------------------
2025-02-01 13:03:03.006221 Eastern Standard Time
| Itration            | 341      |
| Real Det Return     | 417      |
| Real Sto Return     | 362      |
| Reward Loss         | -278     |
| Running Env Steps   | 170500   |
| Running Forward KL  | 2.52     |
| Running Reverse KL  | 6.16     |
| Running Update Time | 341      |
----------------------------------
2025-02-01 13:03:18.620353 Eastern Standard Time
| Itration            | 342      |
| Real Det Return     | 434      |
| Real Sto Return     | 418      |
| Reward Loss         | -214     |
| Running Env Steps   | 171000   |
| Running Forward KL  | 1.93     |
| Running Reverse KL  | 5.98     |
| Running Update Time | 342      |
----------------------------------
2025-02-01 13:03:34.343079 Eastern Standard Time
| Itration            | 343      |
| Real Det Return     | 425      |
| Real Sto Return     | 414      |
| Reward Loss         | -244     |
| Running Env Steps   | 171500   |
| Running Forward KL  | 1.88     |
| Running Reverse KL  | 6.58     |
| Running Update Time | 343      |
----------------------------------
2025-02-01 13:03:49.945853 Eastern Standard Time
| Itration            | 344      |
| Real Det Return     | 419      |
| Real Sto Return     | 386      |
| Reward Loss         | -241     |
| Running Env Steps   | 172000   |
| Running Forward KL  | 2.11     |
| Running Reverse KL  | 6.79     |
| Running Update Time | 344      |
----------------------------------
2025-02-01 13:04:05.523568 Eastern Standard Time
| Itration            | 345      |
| Real Det Return     | 445      |
| Real Sto Return     | 382      |
| Reward Loss         | -231     |
| Running Env Steps   | 172500   |
| Running Forward KL  | 1.76     |
| Running Reverse KL  | 6.02     |
| Running Update Time | 345      |
----------------------------------
2025-02-01 13:04:21.088311 Eastern Standard Time
| Itration            | 346      |
| Real Det Return     | 436      |
| Real Sto Return     | 395      |
| Reward Loss         | -231     |
| Running Env Steps   | 173000   |
| Running Forward KL  | 1.89     |
| Running Reverse KL  | 5.65     |
| Running Update Time | 346      |
----------------------------------
2025-02-01 13:04:36.688134 Eastern Standard Time
| Itration            | 347      |
| Real Det Return     | 453      |
| Real Sto Return     | 407      |
| Reward Loss         | -222     |
| Running Env Steps   | 173500   |
| Running Forward KL  | 1.94     |
| Running Reverse KL  | 6.07     |
| Running Update Time | 347      |
----------------------------------
2025-02-01 13:04:52.309657 Eastern Standard Time
| Itration            | 348      |
| Real Det Return     | 451      |
| Real Sto Return     | 403      |
| Reward Loss         | -212     |
| Running Env Steps   | 174000   |
| Running Forward KL  | 2.12     |
| Running Reverse KL  | 5.86     |
| Running Update Time | 348      |
----------------------------------
2025-02-01 13:05:08.016859 Eastern Standard Time
| Itration            | 349      |
| Real Det Return     | 457      |
| Real Sto Return     | 412      |
| Reward Loss         | -208     |
| Running Env Steps   | 174500   |
| Running Forward KL  | 1.4      |
| Running Reverse KL  | 6.05     |
| Running Update Time | 349      |
----------------------------------
2025-02-01 13:05:24.665384 Eastern Standard Time
| Itration            | 350      |
| Real Det Return     | 453      |
| Real Sto Return     | 412      |
| Reward Loss         | -246     |
| Running Env Steps   | 175000   |
| Running Forward KL  | 1.52     |
| Running Reverse KL  | 6.21     |
| Running Update Time | 350      |
----------------------------------
2025-02-01 13:05:40.694377 Eastern Standard Time
| Itration            | 351      |
| Real Det Return     | 436      |
| Real Sto Return     | 381      |
| Reward Loss         | -240     |
| Running Env Steps   | 175500   |
| Running Forward KL  | 2        |
| Running Reverse KL  | 5.91     |
| Running Update Time | 351      |
----------------------------------
2025-02-01 13:05:56.662748 Eastern Standard Time
| Itration            | 352      |
| Real Det Return     | 464      |
| Real Sto Return     | 420      |
| Reward Loss         | -237     |
| Running Env Steps   | 176000   |
| Running Forward KL  | 1.56     |
| Running Reverse KL  | 5.98     |
| Running Update Time | 352      |
----------------------------------
2025-02-01 13:06:13.164659 Eastern Standard Time
| Itration            | 353      |
| Real Det Return     | 427      |
| Real Sto Return     | 400      |
| Reward Loss         | -228     |
| Running Env Steps   | 176500   |
| Running Forward KL  | 1.82     |
| Running Reverse KL  | 6.04     |
| Running Update Time | 353      |
----------------------------------
2025-02-01 13:06:29.331175 Eastern Standard Time
| Itration            | 354      |
| Real Det Return     | 411      |
| Real Sto Return     | 387      |
| Reward Loss         | -247     |
| Running Env Steps   | 177000   |
| Running Forward KL  | 1.47     |
| Running Reverse KL  | 5.99     |
| Running Update Time | 354      |
----------------------------------
2025-02-01 13:06:44.966368 Eastern Standard Time
| Itration            | 355      |
| Real Det Return     | 422      |
| Real Sto Return     | 397      |
| Reward Loss         | -254     |
| Running Env Steps   | 177500   |
| Running Forward KL  | 2.12     |
| Running Reverse KL  | 6.3      |
| Running Update Time | 355      |
----------------------------------
2025-02-01 13:07:00.506157 Eastern Standard Time
| Itration            | 356      |
| Real Det Return     | 445      |
| Real Sto Return     | 415      |
| Reward Loss         | -228     |
| Running Env Steps   | 178000   |
| Running Forward KL  | 1.25     |
| Running Reverse KL  | 6.31     |
| Running Update Time | 356      |
----------------------------------
2025-02-01 13:07:16.210731 Eastern Standard Time
| Itration            | 357      |
| Real Det Return     | 433      |
| Real Sto Return     | 391      |
| Reward Loss         | -245     |
| Running Env Steps   | 178500   |
| Running Forward KL  | 2.3      |
| Running Reverse KL  | 6.29     |
| Running Update Time | 357      |
----------------------------------
2025-02-01 13:07:31.770831 Eastern Standard Time
| Itration            | 358      |
| Real Det Return     | 464      |
| Real Sto Return     | 409      |
| Reward Loss         | -266     |
| Running Env Steps   | 179000   |
| Running Forward KL  | 1.29     |
| Running Reverse KL  | 6.08     |
| Running Update Time | 358      |
----------------------------------
2025-02-01 13:07:48.072109 Eastern Standard Time
| Itration            | 359      |
| Real Det Return     | 443      |
| Real Sto Return     | 412      |
| Reward Loss         | -244     |
| Running Env Steps   | 179500   |
| Running Forward KL  | 1.93     |
| Running Reverse KL  | 6.38     |
| Running Update Time | 359      |
----------------------------------
2025-02-01 13:08:03.688036 Eastern Standard Time
| Itration            | 360      |
| Real Det Return     | 472      |
| Real Sto Return     | 438      |
| Reward Loss         | -240     |
| Running Env Steps   | 180000   |
| Running Forward KL  | 0.824    |
| Running Reverse KL  | 5.76     |
| Running Update Time | 360      |
----------------------------------
2025-02-01 13:08:19.356639 Eastern Standard Time
| Itration            | 361      |
| Real Det Return     | 455      |
| Real Sto Return     | 409      |
| Reward Loss         | -234     |
| Running Env Steps   | 180500   |
| Running Forward KL  | 1.94     |
| Running Reverse KL  | 6.02     |
| Running Update Time | 361      |
----------------------------------
2025-02-01 13:08:34.913991 Eastern Standard Time
| Itration            | 362      |
| Real Det Return     | 440      |
| Real Sto Return     | 397      |
| Reward Loss         | -261     |
| Running Env Steps   | 181000   |
| Running Forward KL  | 2.05     |
| Running Reverse KL  | 5.8      |
| Running Update Time | 362      |
----------------------------------
2025-02-01 13:08:50.651089 Eastern Standard Time
| Itration            | 363      |
| Real Det Return     | 431      |
| Real Sto Return     | 416      |
| Reward Loss         | -237     |
| Running Env Steps   | 181500   |
| Running Forward KL  | 1.93     |
| Running Reverse KL  | 5.64     |
| Running Update Time | 363      |
----------------------------------
2025-02-01 13:09:06.266748 Eastern Standard Time
| Itration            | 364      |
| Real Det Return     | 475      |
| Real Sto Return     | 450      |
| Reward Loss         | -250     |
| Running Env Steps   | 182000   |
| Running Forward KL  | 1.51     |
| Running Reverse KL  | 5.71     |
| Running Update Time | 364      |
----------------------------------
2025-02-01 13:09:21.814818 Eastern Standard Time
| Itration            | 365      |
| Real Det Return     | 452      |
| Real Sto Return     | 415      |
| Reward Loss         | -234     |
| Running Env Steps   | 182500   |
| Running Forward KL  | 1.88     |
| Running Reverse KL  | 6.13     |
| Running Update Time | 365      |
----------------------------------
2025-02-01 13:09:37.446732 Eastern Standard Time
| Itration            | 366      |
| Real Det Return     | 452      |
| Real Sto Return     | 415      |
| Reward Loss         | -240     |
| Running Env Steps   | 183000   |
| Running Forward KL  | 1.63     |
| Running Reverse KL  | 5.9      |
| Running Update Time | 366      |
----------------------------------
2025-02-01 13:09:53.122240 Eastern Standard Time
| Itration            | 367      |
| Real Det Return     | 443      |
| Real Sto Return     | 393      |
| Reward Loss         | -271     |
| Running Env Steps   | 183500   |
| Running Forward KL  | 1.73     |
| Running Reverse KL  | 6.77     |
| Running Update Time | 367      |
----------------------------------
2025-02-01 13:10:08.724153 Eastern Standard Time
| Itration            | 368      |
| Real Det Return     | 424      |
| Real Sto Return     | 397      |
| Reward Loss         | -259     |
| Running Env Steps   | 184000   |
| Running Forward KL  | 2.23     |
| Running Reverse KL  | 6.8      |
| Running Update Time | 368      |
----------------------------------
2025-02-01 13:10:24.466340 Eastern Standard Time
| Itration            | 369      |
| Real Det Return     | 448      |
| Real Sto Return     | 412      |
| Reward Loss         | -259     |
| Running Env Steps   | 184500   |
| Running Forward KL  | 1.82     |
| Running Reverse KL  | 5.89     |
| Running Update Time | 369      |
----------------------------------
2025-02-01 13:10:40.102378 Eastern Standard Time
| Itration            | 370      |
| Real Det Return     | 434      |
| Real Sto Return     | 401      |
| Reward Loss         | -255     |
| Running Env Steps   | 185000   |
| Running Forward KL  | 1.3      |
| Running Reverse KL  | 5.79     |
| Running Update Time | 370      |
----------------------------------
2025-02-01 13:10:55.857874 Eastern Standard Time
| Itration            | 371      |
| Real Det Return     | 459      |
| Real Sto Return     | 416      |
| Reward Loss         | -242     |
| Running Env Steps   | 185500   |
| Running Forward KL  | 2.18     |
| Running Reverse KL  | 5.73     |
| Running Update Time | 371      |
----------------------------------
2025-02-01 13:11:11.735192 Eastern Standard Time
| Itration            | 372      |
| Real Det Return     | 437      |
| Real Sto Return     | 398      |
| Reward Loss         | -218     |
| Running Env Steps   | 186000   |
| Running Forward KL  | 2.11     |
| Running Reverse KL  | 6.18     |
| Running Update Time | 372      |
----------------------------------
2025-02-01 13:11:27.335815 Eastern Standard Time
| Itration            | 373      |
| Real Det Return     | 452      |
| Real Sto Return     | 420      |
| Reward Loss         | -237     |
| Running Env Steps   | 186500   |
| Running Forward KL  | 2.01     |
| Running Reverse KL  | 6.08     |
| Running Update Time | 373      |
----------------------------------
2025-02-01 13:11:42.990420 Eastern Standard Time
| Itration            | 374      |
| Real Det Return     | 448      |
| Real Sto Return     | 415      |
| Reward Loss         | -219     |
| Running Env Steps   | 187000   |
| Running Forward KL  | 1.67     |
| Running Reverse KL  | 5.7      |
| Running Update Time | 374      |
----------------------------------
2025-02-01 13:11:58.591586 Eastern Standard Time
| Itration            | 375      |
| Real Det Return     | 472      |
| Real Sto Return     | 435      |
| Reward Loss         | -206     |
| Running Env Steps   | 187500   |
| Running Forward KL  | 1.27     |
| Running Reverse KL  | 5.6      |
| Running Update Time | 375      |
----------------------------------
2025-02-01 13:12:14.270053 Eastern Standard Time
| Itration            | 376      |
| Real Det Return     | 467      |
| Real Sto Return     | 435      |
| Reward Loss         | -228     |
| Running Env Steps   | 188000   |
| Running Forward KL  | 0.975    |
| Running Reverse KL  | 5.51     |
| Running Update Time | 376      |
----------------------------------
2025-02-01 13:12:30.205771 Eastern Standard Time
| Itration            | 377      |
| Real Det Return     | 453      |
| Real Sto Return     | 416      |
| Reward Loss         | -266     |
| Running Env Steps   | 188500   |
| Running Forward KL  | 1.8      |
| Running Reverse KL  | 6.49     |
| Running Update Time | 377      |
----------------------------------
2025-02-01 13:12:45.905466 Eastern Standard Time
| Itration            | 378      |
| Real Det Return     | 460      |
| Real Sto Return     | 423      |
| Reward Loss         | -251     |
| Running Env Steps   | 189000   |
| Running Forward KL  | 1.31     |
| Running Reverse KL  | 5.66     |
| Running Update Time | 378      |
----------------------------------
2025-02-01 13:13:01.596482 Eastern Standard Time
| Itration            | 379      |
| Real Det Return     | 468      |
| Real Sto Return     | 438      |
| Reward Loss         | -252     |
| Running Env Steps   | 189500   |
| Running Forward KL  | 1.86     |
| Running Reverse KL  | 6.42     |
| Running Update Time | 379      |
----------------------------------
2025-02-01 13:13:17.223266 Eastern Standard Time
| Itration            | 380      |
| Real Det Return     | 483      |
| Real Sto Return     | 427      |
| Reward Loss         | -225     |
| Running Env Steps   | 190000   |
| Running Forward KL  | 1.12     |
| Running Reverse KL  | 5.92     |
| Running Update Time | 380      |
----------------------------------
2025-02-01 13:13:32.819327 Eastern Standard Time
| Itration            | 381      |
| Real Det Return     | 450      |
| Real Sto Return     | 420      |
| Reward Loss         | -232     |
| Running Env Steps   | 190500   |
| Running Forward KL  | 1.2      |
| Running Reverse KL  | 5.68     |
| Running Update Time | 381      |
----------------------------------
2025-02-01 13:13:48.382727 Eastern Standard Time
| Itration            | 382      |
| Real Det Return     | 496      |
| Real Sto Return     | 421      |
| Reward Loss         | -205     |
| Running Env Steps   | 191000   |
| Running Forward KL  | 1.69     |
| Running Reverse KL  | 5.82     |
| Running Update Time | 382      |
----------------------------------
2025-02-01 13:14:04.036497 Eastern Standard Time
| Itration            | 383      |
| Real Det Return     | 462      |
| Real Sto Return     | 421      |
| Reward Loss         | -236     |
| Running Env Steps   | 191500   |
| Running Forward KL  | 1.29     |
| Running Reverse KL  | 5.26     |
| Running Update Time | 383      |
----------------------------------
2025-02-01 13:14:19.674244 Eastern Standard Time
| Itration            | 384      |
| Real Det Return     | 457      |
| Real Sto Return     | 434      |
| Reward Loss         | -211     |
| Running Env Steps   | 192000   |
| Running Forward KL  | 1.17     |
| Running Reverse KL  | 5.68     |
| Running Update Time | 384      |
----------------------------------
2025-02-01 13:14:35.224491 Eastern Standard Time
| Itration            | 385      |
| Real Det Return     | 473      |
| Real Sto Return     | 418      |
| Reward Loss         | -223     |
| Running Env Steps   | 192500   |
| Running Forward KL  | 1.66     |
| Running Reverse KL  | 5.77     |
| Running Update Time | 385      |
----------------------------------
2025-02-01 13:14:51.590321 Eastern Standard Time
| Itration            | 386      |
| Real Det Return     | 476      |
| Real Sto Return     | 443      |
| Reward Loss         | -245     |
| Running Env Steps   | 193000   |
| Running Forward KL  | 1.47     |
| Running Reverse KL  | 5.54     |
| Running Update Time | 386      |
----------------------------------
2025-02-01 13:15:07.253817 Eastern Standard Time
| Itration            | 387      |
| Real Det Return     | 496      |
| Real Sto Return     | 452      |
| Reward Loss         | -232     |
| Running Env Steps   | 193500   |
| Running Forward KL  | 0.987    |
| Running Reverse KL  | 5.64     |
| Running Update Time | 387      |
----------------------------------
2025-02-01 13:15:23.199166 Eastern Standard Time
| Itration            | 388      |
| Real Det Return     | 470      |
| Real Sto Return     | 440      |
| Reward Loss         | -233     |
| Running Env Steps   | 194000   |
| Running Forward KL  | 1.96     |
| Running Reverse KL  | 5.91     |
| Running Update Time | 388      |
----------------------------------
2025-02-01 13:15:38.758816 Eastern Standard Time
| Itration            | 389      |
| Real Det Return     | 484      |
| Real Sto Return     | 438      |
| Reward Loss         | -242     |
| Running Env Steps   | 194500   |
| Running Forward KL  | 1.32     |
| Running Reverse KL  | 5.72     |
| Running Update Time | 389      |
----------------------------------
2025-02-01 13:15:54.633129 Eastern Standard Time
| Itration            | 390      |
| Real Det Return     | 465      |
| Real Sto Return     | 428      |
| Reward Loss         | -240     |
| Running Env Steps   | 195000   |
| Running Forward KL  | 1.01     |
| Running Reverse KL  | 5.33     |
| Running Update Time | 390      |
----------------------------------
2025-02-01 13:16:11.536340 Eastern Standard Time
| Itration            | 391      |
| Real Det Return     | 478      |
| Real Sto Return     | 444      |
| Reward Loss         | -224     |
| Running Env Steps   | 195500   |
| Running Forward KL  | 1.45     |
| Running Reverse KL  | 5.48     |
| Running Update Time | 391      |
----------------------------------
2025-02-01 13:16:28.887562 Eastern Standard Time
| Itration            | 392      |
| Real Det Return     | 502      |
| Real Sto Return     | 446      |
| Reward Loss         | -239     |
| Running Env Steps   | 196000   |
| Running Forward KL  | 1.6      |
| Running Reverse KL  | 5.49     |
| Running Update Time | 392      |
----------------------------------
2025-02-01 13:16:46.351303 Eastern Standard Time
| Itration            | 393      |
| Real Det Return     | 458      |
| Real Sto Return     | 434      |
| Reward Loss         | -234     |
| Running Env Steps   | 196500   |
| Running Forward KL  | 0.817    |
| Running Reverse KL  | 5.77     |
| Running Update Time | 393      |
----------------------------------
2025-02-01 13:17:02.070228 Eastern Standard Time
| Itration            | 394      |
| Real Det Return     | 471      |
| Real Sto Return     | 454      |
| Reward Loss         | -246     |
| Running Env Steps   | 197000   |
| Running Forward KL  | 1.71     |
| Running Reverse KL  | 6.1      |
| Running Update Time | 394      |
----------------------------------
2025-02-01 13:17:18.271897 Eastern Standard Time
| Itration            | 395      |
| Real Det Return     | 511      |
| Real Sto Return     | 458      |
| Reward Loss         | -246     |
| Running Env Steps   | 197500   |
| Running Forward KL  | 1.1      |
| Running Reverse KL  | 6.36     |
| Running Update Time | 395      |
----------------------------------
2025-02-01 13:17:33.981998 Eastern Standard Time
| Itration            | 396      |
| Real Det Return     | 502      |
| Real Sto Return     | 443      |
| Reward Loss         | -224     |
| Running Env Steps   | 198000   |
| Running Forward KL  | 1.54     |
| Running Reverse KL  | 6.32     |
| Running Update Time | 396      |
----------------------------------
2025-02-01 13:17:50.490626 Eastern Standard Time
| Itration            | 397      |
| Real Det Return     | 504      |
| Real Sto Return     | 455      |
| Reward Loss         | -237     |
| Running Env Steps   | 198500   |
| Running Forward KL  | 1.15     |
| Running Reverse KL  | 5.16     |
| Running Update Time | 397      |
----------------------------------
2025-02-01 13:18:07.172278 Eastern Standard Time
| Itration            | 398      |
| Real Det Return     | 494      |
| Real Sto Return     | 451      |
| Reward Loss         | -207     |
| Running Env Steps   | 199000   |
| Running Forward KL  | 1.61     |
| Running Reverse KL  | 5.75     |
| Running Update Time | 398      |
----------------------------------
2025-02-01 13:18:23.602097 Eastern Standard Time
| Itration            | 399      |
| Real Det Return     | 479      |
| Real Sto Return     | 452      |
| Reward Loss         | -240     |
| Running Env Steps   | 199500   |
| Running Forward KL  | 1.66     |
| Running Reverse KL  | 6.43     |
| Running Update Time | 399      |
----------------------------------
2025-02-01 13:18:40.607348 Eastern Standard Time
| Itration            | 400      |
| Real Det Return     | 491      |
| Real Sto Return     | 443      |
| Reward Loss         | -260     |
| Running Env Steps   | 200000   |
| Running Forward KL  | 1.64     |
| Running Reverse KL  | 6.05     |
| Running Update Time | 400      |
----------------------------------
2025-02-01 13:18:57.459547 Eastern Standard Time
| Itration            | 401      |
| Real Det Return     | 515      |
| Real Sto Return     | 457      |
| Reward Loss         | -222     |
| Running Env Steps   | 200500   |
| Running Forward KL  | 0.72     |
| Running Reverse KL  | 5.89     |
| Running Update Time | 401      |
----------------------------------
2025-02-01 13:19:13.266073 Eastern Standard Time
| Itration            | 402      |
| Real Det Return     | 482      |
| Real Sto Return     | 453      |
| Reward Loss         | -254     |
| Running Env Steps   | 201000   |
| Running Forward KL  | 1.56     |
| Running Reverse KL  | 5.82     |
| Running Update Time | 402      |
----------------------------------
2025-02-01 13:19:28.864702 Eastern Standard Time
| Itration            | 403      |
| Real Det Return     | 510      |
| Real Sto Return     | 461      |
| Reward Loss         | -230     |
| Running Env Steps   | 201500   |
| Running Forward KL  | 1.64     |
| Running Reverse KL  | 5.48     |
| Running Update Time | 403      |
----------------------------------
2025-02-01 13:19:44.432481 Eastern Standard Time
| Itration            | 404      |
| Real Det Return     | 497      |
| Real Sto Return     | 466      |
| Reward Loss         | -210     |
| Running Env Steps   | 202000   |
| Running Forward KL  | 0.863    |
| Running Reverse KL  | 5.43     |
| Running Update Time | 404      |
----------------------------------
2025-02-01 13:20:00.121212 Eastern Standard Time
| Itration            | 405      |
| Real Det Return     | 504      |
| Real Sto Return     | 462      |
| Reward Loss         | -249     |
| Running Env Steps   | 202500   |
| Running Forward KL  | 2.02     |
| Running Reverse KL  | 5.92     |
| Running Update Time | 405      |
----------------------------------
2025-02-01 13:20:15.650560 Eastern Standard Time
| Itration            | 406      |
| Real Det Return     | 498      |
| Real Sto Return     | 461      |
| Reward Loss         | -236     |
| Running Env Steps   | 203000   |
| Running Forward KL  | 1.93     |
| Running Reverse KL  | 6.37     |
| Running Update Time | 406      |
----------------------------------
2025-02-01 13:20:31.312575 Eastern Standard Time
| Itration            | 407      |
| Real Det Return     | 504      |
| Real Sto Return     | 471      |
| Reward Loss         | -219     |
| Running Env Steps   | 203500   |
| Running Forward KL  | 1.11     |
| Running Reverse KL  | 5.59     |
| Running Update Time | 407      |
----------------------------------
2025-02-01 13:20:46.848754 Eastern Standard Time
| Itration            | 408      |
| Real Det Return     | 506      |
| Real Sto Return     | 468      |
| Reward Loss         | -236     |
| Running Env Steps   | 204000   |
| Running Forward KL  | 0.983    |
| Running Reverse KL  | 5.48     |
| Running Update Time | 408      |
----------------------------------
2025-02-01 13:21:02.441837 Eastern Standard Time
| Itration            | 409      |
| Real Det Return     | 501      |
| Real Sto Return     | 469      |
| Reward Loss         | -203     |
| Running Env Steps   | 204500   |
| Running Forward KL  | 1.41     |
| Running Reverse KL  | 5.96     |
| Running Update Time | 409      |
----------------------------------
2025-02-01 13:21:18.816428 Eastern Standard Time
| Itration            | 410      |
| Real Det Return     | 500      |
| Real Sto Return     | 456      |
| Reward Loss         | -226     |
| Running Env Steps   | 205000   |
| Running Forward KL  | 1.34     |
| Running Reverse KL  | 5.99     |
| Running Update Time | 410      |
----------------------------------
2025-02-01 13:21:34.459878 Eastern Standard Time
| Itration            | 411      |
| Real Det Return     | 483      |
| Real Sto Return     | 420      |
| Reward Loss         | -276     |
| Running Env Steps   | 205500   |
| Running Forward KL  | 1.15     |
| Running Reverse KL  | 5.99     |
| Running Update Time | 411      |
----------------------------------
2025-02-01 13:21:50.166308 Eastern Standard Time
| Itration            | 412      |
| Real Det Return     | 515      |
| Real Sto Return     | 458      |
| Reward Loss         | -222     |
| Running Env Steps   | 206000   |
| Running Forward KL  | 1.32     |
| Running Reverse KL  | 6.31     |
| Running Update Time | 412      |
----------------------------------
2025-02-01 13:22:06.327526 Eastern Standard Time
| Itration            | 413      |
| Real Det Return     | 516      |
| Real Sto Return     | 465      |
| Reward Loss         | -224     |
| Running Env Steps   | 206500   |
| Running Forward KL  | 1.31     |
| Running Reverse KL  | 5.6      |
| Running Update Time | 413      |
----------------------------------
2025-02-01 13:22:22.768454 Eastern Standard Time
| Itration            | 414      |
| Real Det Return     | 495      |
| Real Sto Return     | 445      |
| Reward Loss         | -253     |
| Running Env Steps   | 207000   |
| Running Forward KL  | 1.09     |
| Running Reverse KL  | 5.71     |
| Running Update Time | 414      |
----------------------------------
2025-02-01 13:22:39.053784 Eastern Standard Time
| Itration            | 415      |
| Real Det Return     | 517      |
| Real Sto Return     | 477      |
| Reward Loss         | -221     |
| Running Env Steps   | 207500   |
| Running Forward KL  | 1.6      |
| Running Reverse KL  | 5.86     |
| Running Update Time | 415      |
----------------------------------
2025-02-01 13:22:54.774201 Eastern Standard Time
| Itration            | 416      |
| Real Det Return     | 513      |
| Real Sto Return     | 442      |
| Reward Loss         | -210     |
| Running Env Steps   | 208000   |
| Running Forward KL  | 1.55     |
| Running Reverse KL  | 5.63     |
| Running Update Time | 416      |
----------------------------------
2025-02-01 13:23:10.531121 Eastern Standard Time
| Itration            | 417      |
| Real Det Return     | 477      |
| Real Sto Return     | 448      |
| Reward Loss         | -244     |
| Running Env Steps   | 208500   |
| Running Forward KL  | 1.4      |
| Running Reverse KL  | 5.64     |
| Running Update Time | 417      |
----------------------------------
2025-02-01 13:23:26.477834 Eastern Standard Time
| Itration            | 418      |
| Real Det Return     | 501      |
| Real Sto Return     | 478      |
| Reward Loss         | -231     |
| Running Env Steps   | 209000   |
| Running Forward KL  | 1.6      |
| Running Reverse KL  | 5.95     |
| Running Update Time | 418      |
----------------------------------
2025-02-01 13:23:47.633775 Eastern Standard Time
| Itration            | 419      |
| Real Det Return     | 503      |
| Real Sto Return     | 461      |
| Reward Loss         | -242     |
| Running Env Steps   | 209500   |
| Running Forward KL  | 1.65     |
| Running Reverse KL  | 5.52     |
| Running Update Time | 419      |
----------------------------------
2025-02-01 13:24:07.403546 Eastern Standard Time
| Itration            | 420      |
| Real Det Return     | 524      |
| Real Sto Return     | 478      |
| Reward Loss         | -221     |
| Running Env Steps   | 210000   |
| Running Forward KL  | 0.659    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 420      |
----------------------------------
2025-02-01 13:24:23.507740 Eastern Standard Time
| Itration            | 421      |
| Real Det Return     | 504      |
| Real Sto Return     | 467      |
| Reward Loss         | -225     |
| Running Env Steps   | 210500   |
| Running Forward KL  | 1.29     |
| Running Reverse KL  | 5.73     |
| Running Update Time | 421      |
----------------------------------
2025-02-01 13:24:39.187387 Eastern Standard Time
| Itration            | 422      |
| Real Det Return     | 489      |
| Real Sto Return     | 441      |
| Reward Loss         | -223     |
| Running Env Steps   | 211000   |
| Running Forward KL  | 0.756    |
| Running Reverse KL  | 5.37     |
| Running Update Time | 422      |
----------------------------------
2025-02-01 13:24:54.780615 Eastern Standard Time
| Itration            | 423      |
| Real Det Return     | 514      |
| Real Sto Return     | 465      |
| Reward Loss         | -272     |
| Running Env Steps   | 211500   |
| Running Forward KL  | 1.25     |
| Running Reverse KL  | 5.27     |
| Running Update Time | 423      |
----------------------------------
2025-02-01 13:25:10.345846 Eastern Standard Time
| Itration            | 424      |
| Real Det Return     | 497      |
| Real Sto Return     | 471      |
| Reward Loss         | -226     |
| Running Env Steps   | 212000   |
| Running Forward KL  | 0.656    |
| Running Reverse KL  | 5.27     |
| Running Update Time | 424      |
----------------------------------
2025-02-01 13:25:25.937022 Eastern Standard Time
| Itration            | 425      |
| Real Det Return     | 497      |
| Real Sto Return     | 464      |
| Reward Loss         | -224     |
| Running Env Steps   | 212500   |
| Running Forward KL  | 1.37     |
| Running Reverse KL  | 5.84     |
| Running Update Time | 425      |
----------------------------------
2025-02-01 13:25:41.481024 Eastern Standard Time
| Itration            | 426      |
| Real Det Return     | 502      |
| Real Sto Return     | 462      |
| Reward Loss         | -236     |
| Running Env Steps   | 213000   |
| Running Forward KL  | 1.56     |
| Running Reverse KL  | 5.82     |
| Running Update Time | 426      |
----------------------------------
2025-02-01 13:25:57.021991 Eastern Standard Time
| Itration            | 427      |
| Real Det Return     | 508      |
| Real Sto Return     | 465      |
| Reward Loss         | -290     |
| Running Env Steps   | 213500   |
| Running Forward KL  | 1.62     |
| Running Reverse KL  | 5.25     |
| Running Update Time | 427      |
----------------------------------
2025-02-01 13:26:12.631463 Eastern Standard Time
| Itration            | 428      |
| Real Det Return     | 506      |
| Real Sto Return     | 482      |
| Reward Loss         | -264     |
| Running Env Steps   | 214000   |
| Running Forward KL  | 1.18     |
| Running Reverse KL  | 5.72     |
| Running Update Time | 428      |
----------------------------------
2025-02-01 13:26:28.168989 Eastern Standard Time
| Itration            | 429      |
| Real Det Return     | 517      |
| Real Sto Return     | 468      |
| Reward Loss         | -235     |
| Running Env Steps   | 214500   |
| Running Forward KL  | 1.31     |
| Running Reverse KL  | 5.43     |
| Running Update Time | 429      |
----------------------------------
2025-02-01 13:26:43.739126 Eastern Standard Time
| Itration            | 430      |
| Real Det Return     | 545      |
| Real Sto Return     | 476      |
| Reward Loss         | -196     |
| Running Env Steps   | 215000   |
| Running Forward KL  | 1.3      |
| Running Reverse KL  | 5.96     |
| Running Update Time | 430      |
----------------------------------
2025-02-01 13:26:59.485304 Eastern Standard Time
| Itration            | 431      |
| Real Det Return     | 507      |
| Real Sto Return     | 469      |
| Reward Loss         | -226     |
| Running Env Steps   | 215500   |
| Running Forward KL  | 1.47     |
| Running Reverse KL  | 5.37     |
| Running Update Time | 431      |
----------------------------------
2025-02-01 13:27:15.086752 Eastern Standard Time
| Itration            | 432      |
| Real Det Return     | 485      |
| Real Sto Return     | 446      |
| Reward Loss         | -271     |
| Running Env Steps   | 216000   |
| Running Forward KL  | 1.53     |
| Running Reverse KL  | 6        |
| Running Update Time | 432      |
----------------------------------
2025-02-01 13:27:30.681757 Eastern Standard Time
| Itration            | 433      |
| Real Det Return     | 491      |
| Real Sto Return     | 468      |
| Reward Loss         | -209     |
| Running Env Steps   | 216500   |
| Running Forward KL  | 1.25     |
| Running Reverse KL  | 5.45     |
| Running Update Time | 433      |
----------------------------------
2025-02-01 13:27:46.199494 Eastern Standard Time
| Itration            | 434      |
| Real Det Return     | 543      |
| Real Sto Return     | 481      |
| Reward Loss         | -212     |
| Running Env Steps   | 217000   |
| Running Forward KL  | 0.81     |
| Running Reverse KL  | 5.92     |
| Running Update Time | 434      |
----------------------------------
2025-02-01 13:28:02.031261 Eastern Standard Time
| Itration            | 435      |
| Real Det Return     | 521      |
| Real Sto Return     | 488      |
| Reward Loss         | -213     |
| Running Env Steps   | 217500   |
| Running Forward KL  | 1.3      |
| Running Reverse KL  | 6.27     |
| Running Update Time | 435      |
----------------------------------
2025-02-01 13:28:17.541308 Eastern Standard Time
| Itration            | 436      |
| Real Det Return     | 503      |
| Real Sto Return     | 467      |
| Reward Loss         | -238     |
| Running Env Steps   | 218000   |
| Running Forward KL  | 1.14     |
| Running Reverse KL  | 5.93     |
| Running Update Time | 436      |
----------------------------------
2025-02-01 13:28:33.230014 Eastern Standard Time
| Itration            | 437      |
| Real Det Return     | 534      |
| Real Sto Return     | 496      |
| Reward Loss         | -233     |
| Running Env Steps   | 218500   |
| Running Forward KL  | 1.93     |
| Running Reverse KL  | 6.49     |
| Running Update Time | 437      |
----------------------------------
2025-02-01 13:28:49.046285 Eastern Standard Time
| Itration            | 438      |
| Real Det Return     | 518      |
| Real Sto Return     | 471      |
| Reward Loss         | -232     |
| Running Env Steps   | 219000   |
| Running Forward KL  | 0.861    |
| Running Reverse KL  | 5.47     |
| Running Update Time | 438      |
----------------------------------
2025-02-01 13:29:04.682271 Eastern Standard Time
| Itration            | 439      |
| Real Det Return     | 506      |
| Real Sto Return     | 481      |
| Reward Loss         | -200     |
| Running Env Steps   | 219500   |
| Running Forward KL  | 1.14     |
| Running Reverse KL  | 5.91     |
| Running Update Time | 439      |
----------------------------------
2025-02-01 13:29:20.166727 Eastern Standard Time
| Itration            | 440      |
| Real Det Return     | 537      |
| Real Sto Return     | 492      |
| Reward Loss         | -194     |
| Running Env Steps   | 220000   |
| Running Forward KL  | 0.893    |
| Running Reverse KL  | 6.09     |
| Running Update Time | 440      |
----------------------------------
2025-02-01 13:29:35.720102 Eastern Standard Time
| Itration            | 441      |
| Real Det Return     | 524      |
| Real Sto Return     | 502      |
| Reward Loss         | -215     |
| Running Env Steps   | 220500   |
| Running Forward KL  | 1.84     |
| Running Reverse KL  | 6.29     |
| Running Update Time | 441      |
----------------------------------
2025-02-01 13:29:51.269390 Eastern Standard Time
| Itration            | 442      |
| Real Det Return     | 521      |
| Real Sto Return     | 496      |
| Reward Loss         | -214     |
| Running Env Steps   | 221000   |
| Running Forward KL  | 0.698    |
| Running Reverse KL  | 5.61     |
| Running Update Time | 442      |
----------------------------------
2025-02-01 13:30:06.861459 Eastern Standard Time
| Itration            | 443      |
| Real Det Return     | 487      |
| Real Sto Return     | 475      |
| Reward Loss         | -258     |
| Running Env Steps   | 221500   |
| Running Forward KL  | 1.36     |
| Running Reverse KL  | 6.06     |
| Running Update Time | 443      |
----------------------------------
2025-02-01 13:30:22.508374 Eastern Standard Time
| Itration            | 444      |
| Real Det Return     | 525      |
| Real Sto Return     | 462      |
| Reward Loss         | -213     |
| Running Env Steps   | 222000   |
| Running Forward KL  | 1.05     |
| Running Reverse KL  | 5.28     |
| Running Update Time | 444      |
----------------------------------
2025-02-01 13:30:38.185376 Eastern Standard Time
| Itration            | 445      |
| Real Det Return     | 538      |
| Real Sto Return     | 487      |
| Reward Loss         | -225     |
| Running Env Steps   | 222500   |
| Running Forward KL  | 0.984    |
| Running Reverse KL  | 5.38     |
| Running Update Time | 445      |
----------------------------------
2025-02-01 13:30:53.804044 Eastern Standard Time
| Itration            | 446      |
| Real Det Return     | 530      |
| Real Sto Return     | 490      |
| Reward Loss         | -223     |
| Running Env Steps   | 223000   |
| Running Forward KL  | 0.0927   |
| Running Reverse KL  | 5.28     |
| Running Update Time | 446      |
----------------------------------
2025-02-01 13:31:09.368671 Eastern Standard Time
| Itration            | 447      |
| Real Det Return     | 539      |
| Real Sto Return     | 497      |
| Reward Loss         | -215     |
| Running Env Steps   | 223500   |
| Running Forward KL  | 0.528    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 447      |
----------------------------------
2025-02-01 13:31:24.921036 Eastern Standard Time
| Itration            | 448      |
| Real Det Return     | 534      |
| Real Sto Return     | 480      |
| Reward Loss         | -191     |
| Running Env Steps   | 224000   |
| Running Forward KL  | 1.05     |
| Running Reverse KL  | 5.53     |
| Running Update Time | 448      |
----------------------------------
2025-02-01 13:31:40.487294 Eastern Standard Time
| Itration            | 449      |
| Real Det Return     | 526      |
| Real Sto Return     | 478      |
| Reward Loss         | -238     |
| Running Env Steps   | 224500   |
| Running Forward KL  | 1.98     |
| Running Reverse KL  | 5.93     |
| Running Update Time | 449      |
----------------------------------
2025-02-01 13:31:56.080514 Eastern Standard Time
| Itration            | 450      |
| Real Det Return     | 517      |
| Real Sto Return     | 475      |
| Reward Loss         | -228     |
| Running Env Steps   | 225000   |
| Running Forward KL  | 0.874    |
| Running Reverse KL  | 5.37     |
| Running Update Time | 450      |
----------------------------------
2025-02-01 13:32:11.702935 Eastern Standard Time
| Itration            | 451      |
| Real Det Return     | 552      |
| Real Sto Return     | 505      |
| Reward Loss         | -231     |
| Running Env Steps   | 225500   |
| Running Forward KL  | 1.36     |
| Running Reverse KL  | 5.49     |
| Running Update Time | 451      |
----------------------------------
2025-02-01 13:32:27.239300 Eastern Standard Time
| Itration            | 452      |
| Real Det Return     | 533      |
| Real Sto Return     | 497      |
| Reward Loss         | -206     |
| Running Env Steps   | 226000   |
| Running Forward KL  | 1.24     |
| Running Reverse KL  | 5.82     |
| Running Update Time | 452      |
----------------------------------
2025-02-01 13:32:42.763048 Eastern Standard Time
| Itration            | 453      |
| Real Det Return     | 527      |
| Real Sto Return     | 477      |
| Reward Loss         | -216     |
| Running Env Steps   | 226500   |
| Running Forward KL  | 0.815    |
| Running Reverse KL  | 5.93     |
| Running Update Time | 453      |
----------------------------------
2025-02-01 13:32:58.300326 Eastern Standard Time
| Itration            | 454      |
| Real Det Return     | 531      |
| Real Sto Return     | 467      |
| Reward Loss         | -214     |
| Running Env Steps   | 227000   |
| Running Forward KL  | 1.59     |
| Running Reverse KL  | 6.6      |
| Running Update Time | 454      |
----------------------------------
2025-02-01 13:33:13.914231 Eastern Standard Time
| Itration            | 455      |
| Real Det Return     | 536      |
| Real Sto Return     | 507      |
| Reward Loss         | -206     |
| Running Env Steps   | 227500   |
| Running Forward KL  | 0.751    |
| Running Reverse KL  | 5.77     |
| Running Update Time | 455      |
----------------------------------
2025-02-01 13:33:29.458420 Eastern Standard Time
| Itration            | 456      |
| Real Det Return     | 531      |
| Real Sto Return     | 505      |
| Reward Loss         | -216     |
| Running Env Steps   | 228000   |
| Running Forward KL  | 0.682    |
| Running Reverse KL  | 5.85     |
| Running Update Time | 456      |
----------------------------------
2025-02-01 13:33:44.953558 Eastern Standard Time
| Itration            | 457      |
| Real Det Return     | 531      |
| Real Sto Return     | 495      |
| Reward Loss         | -203     |
| Running Env Steps   | 228500   |
| Running Forward KL  | 0.903    |
| Running Reverse KL  | 5.18     |
| Running Update Time | 457      |
----------------------------------
2025-02-01 13:34:00.488631 Eastern Standard Time
| Itration            | 458      |
| Real Det Return     | 532      |
| Real Sto Return     | 490      |
| Reward Loss         | -212     |
| Running Env Steps   | 229000   |
| Running Forward KL  | 0.611    |
| Running Reverse KL  | 5.63     |
| Running Update Time | 458      |
----------------------------------
2025-02-01 13:34:16.043155 Eastern Standard Time
| Itration            | 459      |
| Real Det Return     | 529      |
| Real Sto Return     | 490      |
| Reward Loss         | -194     |
| Running Env Steps   | 229500   |
| Running Forward KL  | 0.11     |
| Running Reverse KL  | 5.58     |
| Running Update Time | 459      |
----------------------------------
2025-02-01 13:34:31.728390 Eastern Standard Time
| Itration            | 460      |
| Real Det Return     | 557      |
| Real Sto Return     | 493      |
| Reward Loss         | -205     |
| Running Env Steps   | 230000   |
| Running Forward KL  | 0.877    |
| Running Reverse KL  | 5.18     |
| Running Update Time | 460      |
----------------------------------
2025-02-01 13:34:47.315186 Eastern Standard Time
| Itration            | 461      |
| Real Det Return     | 533      |
| Real Sto Return     | 472      |
| Reward Loss         | -228     |
| Running Env Steps   | 230500   |
| Running Forward KL  | 0.879    |
| Running Reverse KL  | 5.81     |
| Running Update Time | 461      |
----------------------------------
2025-02-01 13:35:02.962148 Eastern Standard Time
| Itration            | 462      |
| Real Det Return     | 527      |
| Real Sto Return     | 500      |
| Reward Loss         | -231     |
| Running Env Steps   | 231000   |
| Running Forward KL  | 0.403    |
| Running Reverse KL  | 5.63     |
| Running Update Time | 462      |
----------------------------------
2025-02-01 13:35:18.571077 Eastern Standard Time
| Itration            | 463      |
| Real Det Return     | 506      |
| Real Sto Return     | 462      |
| Reward Loss         | -282     |
| Running Env Steps   | 231500   |
| Running Forward KL  | 1.32     |
| Running Reverse KL  | 5.29     |
| Running Update Time | 463      |
----------------------------------
2025-02-01 13:35:34.051493 Eastern Standard Time
| Itration            | 464      |
| Real Det Return     | 539      |
| Real Sto Return     | 497      |
| Reward Loss         | -200     |
| Running Env Steps   | 232000   |
| Running Forward KL  | -0.112   |
| Running Reverse KL  | 5.47     |
| Running Update Time | 464      |
----------------------------------
2025-02-01 13:35:49.614064 Eastern Standard Time
| Itration            | 465      |
| Real Det Return     | 548      |
| Real Sto Return     | 501      |
| Reward Loss         | -207     |
| Running Env Steps   | 232500   |
| Running Forward KL  | -0.215   |
| Running Reverse KL  | 4.93     |
| Running Update Time | 465      |
----------------------------------
2025-02-01 13:36:05.220822 Eastern Standard Time
| Itration            | 466      |
| Real Det Return     | 519      |
| Real Sto Return     | 489      |
| Reward Loss         | -216     |
| Running Env Steps   | 233000   |
| Running Forward KL  | 0.598    |
| Running Reverse KL  | 5.96     |
| Running Update Time | 466      |
----------------------------------
2025-02-01 13:36:21.102240 Eastern Standard Time
| Itration            | 467      |
| Real Det Return     | 531      |
| Real Sto Return     | 502      |
| Reward Loss         | -203     |
| Running Env Steps   | 233500   |
| Running Forward KL  | 0.972    |
| Running Reverse KL  | 5.64     |
| Running Update Time | 467      |
----------------------------------
2025-02-01 13:36:36.633635 Eastern Standard Time
| Itration            | 468      |
| Real Det Return     | 532      |
| Real Sto Return     | 510      |
| Reward Loss         | -215     |
| Running Env Steps   | 234000   |
| Running Forward KL  | 1.25     |
| Running Reverse KL  | 5.53     |
| Running Update Time | 468      |
----------------------------------
2025-02-01 13:36:52.272258 Eastern Standard Time
| Itration            | 469      |
| Real Det Return     | 524      |
| Real Sto Return     | 479      |
| Reward Loss         | -205     |
| Running Env Steps   | 234500   |
| Running Forward KL  | 0.362    |
| Running Reverse KL  | 5.54     |
| Running Update Time | 469      |
----------------------------------
2025-02-01 13:37:07.919417 Eastern Standard Time
| Itration            | 470      |
| Real Det Return     | 546      |
| Real Sto Return     | 504      |
| Reward Loss         | -200     |
| Running Env Steps   | 235000   |
| Running Forward KL  | 0.471    |
| Running Reverse KL  | 5.43     |
| Running Update Time | 470      |
----------------------------------
2025-02-01 13:37:23.457270 Eastern Standard Time
| Itration            | 471      |
| Real Det Return     | 560      |
| Real Sto Return     | 506      |
| Reward Loss         | -192     |
| Running Env Steps   | 235500   |
| Running Forward KL  | 0.409    |
| Running Reverse KL  | 5.15     |
| Running Update Time | 471      |
----------------------------------
2025-02-01 13:37:39.137779 Eastern Standard Time
| Itration            | 472      |
| Real Det Return     | 562      |
| Real Sto Return     | 504      |
| Reward Loss         | -188     |
| Running Env Steps   | 236000   |
| Running Forward KL  | 0.199    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 472      |
----------------------------------
2025-02-01 13:37:54.786823 Eastern Standard Time
| Itration            | 473      |
| Real Det Return     | 546      |
| Real Sto Return     | 502      |
| Reward Loss         | -196     |
| Running Env Steps   | 236500   |
| Running Forward KL  | 0.759    |
| Running Reverse KL  | 5.59     |
| Running Update Time | 473      |
----------------------------------
2025-02-01 13:38:10.355751 Eastern Standard Time
| Itration            | 474      |
| Real Det Return     | 537      |
| Real Sto Return     | 485      |
| Reward Loss         | -237     |
| Running Env Steps   | 237000   |
| Running Forward KL  | 0.767    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 474      |
----------------------------------
2025-02-01 13:38:25.996565 Eastern Standard Time
| Itration            | 475      |
| Real Det Return     | 550      |
| Real Sto Return     | 505      |
| Reward Loss         | -182     |
| Running Env Steps   | 237500   |
| Running Forward KL  | 0.343    |
| Running Reverse KL  | 5.57     |
| Running Update Time | 475      |
----------------------------------
2025-02-01 13:38:41.593108 Eastern Standard Time
| Itration            | 476      |
| Real Det Return     | 553      |
| Real Sto Return     | 504      |
| Reward Loss         | -192     |
| Running Env Steps   | 238000   |
| Running Forward KL  | 0.366    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 476      |
----------------------------------
2025-02-01 13:38:57.155258 Eastern Standard Time
| Itration            | 477      |
| Real Det Return     | 526      |
| Real Sto Return     | 484      |
| Reward Loss         | -213     |
| Running Env Steps   | 238500   |
| Running Forward KL  | 0.543    |
| Running Reverse KL  | 5.54     |
| Running Update Time | 477      |
----------------------------------
2025-02-01 13:39:12.744759 Eastern Standard Time
| Itration            | 478      |
| Real Det Return     | 541      |
| Real Sto Return     | 513      |
| Reward Loss         | -222     |
| Running Env Steps   | 239000   |
| Running Forward KL  | 0.779    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 478      |
----------------------------------
2025-02-01 13:39:28.340914 Eastern Standard Time
| Itration            | 479      |
| Real Det Return     | 555      |
| Real Sto Return     | 514      |
| Reward Loss         | -238     |
| Running Env Steps   | 239500   |
| Running Forward KL  | 1.56     |
| Running Reverse KL  | 5.81     |
| Running Update Time | 479      |
----------------------------------
2025-02-01 13:39:43.955923 Eastern Standard Time
| Itration            | 480      |
| Real Det Return     | 543      |
| Real Sto Return     | 508      |
| Reward Loss         | -217     |
| Running Env Steps   | 240000   |
| Running Forward KL  | 0.664    |
| Running Reverse KL  | 5.64     |
| Running Update Time | 480      |
----------------------------------
2025-02-01 13:39:59.550380 Eastern Standard Time
| Itration            | 481      |
| Real Det Return     | 562      |
| Real Sto Return     | 514      |
| Reward Loss         | -188     |
| Running Env Steps   | 240500   |
| Running Forward KL  | -0.192   |
| Running Reverse KL  | 5.28     |
| Running Update Time | 481      |
----------------------------------
2025-02-01 13:40:15.192324 Eastern Standard Time
| Itration            | 482      |
| Real Det Return     | 549      |
| Real Sto Return     | 506      |
| Reward Loss         | -227     |
| Running Env Steps   | 241000   |
| Running Forward KL  | 0.957    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 482      |
----------------------------------
2025-02-01 13:40:30.811340 Eastern Standard Time
| Itration            | 483      |
| Real Det Return     | 547      |
| Real Sto Return     | 511      |
| Reward Loss         | -190     |
| Running Env Steps   | 241500   |
| Running Forward KL  | -0.432   |
| Running Reverse KL  | 5.54     |
| Running Update Time | 483      |
----------------------------------
2025-02-01 13:40:46.429621 Eastern Standard Time
| Itration            | 484      |
| Real Det Return     | 557      |
| Real Sto Return     | 506      |
| Reward Loss         | -231     |
| Running Env Steps   | 242000   |
| Running Forward KL  | 0.773    |
| Running Reverse KL  | 5.88     |
| Running Update Time | 484      |
----------------------------------
2025-02-01 13:41:01.977621 Eastern Standard Time
| Itration            | 485      |
| Real Det Return     | 529      |
| Real Sto Return     | 496      |
| Reward Loss         | -198     |
| Running Env Steps   | 242500   |
| Running Forward KL  | -0.835   |
| Running Reverse KL  | 5.11     |
| Running Update Time | 485      |
----------------------------------
2025-02-01 13:41:17.552322 Eastern Standard Time
| Itration            | 486      |
| Real Det Return     | 566      |
| Real Sto Return     | 505      |
| Reward Loss         | -211     |
| Running Env Steps   | 243000   |
| Running Forward KL  | 0.643    |
| Running Reverse KL  | 5.36     |
| Running Update Time | 486      |
----------------------------------
2025-02-01 13:41:33.242518 Eastern Standard Time
| Itration            | 487      |
| Real Det Return     | 542      |
| Real Sto Return     | 506      |
| Reward Loss         | -205     |
| Running Env Steps   | 243500   |
| Running Forward KL  | -0.156   |
| Running Reverse KL  | 4.86     |
| Running Update Time | 487      |
----------------------------------
2025-02-01 13:41:48.832442 Eastern Standard Time
| Itration            | 488      |
| Real Det Return     | 530      |
| Real Sto Return     | 504      |
| Reward Loss         | -181     |
| Running Env Steps   | 244000   |
| Running Forward KL  | -0.43    |
| Running Reverse KL  | 5.24     |
| Running Update Time | 488      |
----------------------------------
2025-02-01 13:42:04.481493 Eastern Standard Time
| Itration            | 489      |
| Real Det Return     | 557      |
| Real Sto Return     | 504      |
| Reward Loss         | -230     |
| Running Env Steps   | 244500   |
| Running Forward KL  | 0.224    |
| Running Reverse KL  | 5.51     |
| Running Update Time | 489      |
----------------------------------
2025-02-01 13:42:20.088828 Eastern Standard Time
| Itration            | 490      |
| Real Det Return     | 538      |
| Real Sto Return     | 506      |
| Reward Loss         | -196     |
| Running Env Steps   | 245000   |
| Running Forward KL  | 0.666    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 490      |
----------------------------------
2025-02-01 13:42:35.696667 Eastern Standard Time
| Itration            | 491      |
| Real Det Return     | 538      |
| Real Sto Return     | 487      |
| Reward Loss         | -210     |
| Running Env Steps   | 245500   |
| Running Forward KL  | 0.0899   |
| Running Reverse KL  | 5.83     |
| Running Update Time | 491      |
----------------------------------
2025-02-01 13:42:51.344801 Eastern Standard Time
| Itration            | 492      |
| Real Det Return     | 558      |
| Real Sto Return     | 498      |
| Reward Loss         | -187     |
| Running Env Steps   | 246000   |
| Running Forward KL  | 0.106    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 492      |
----------------------------------
2025-02-01 13:43:07.110187 Eastern Standard Time
| Itration            | 493      |
| Real Det Return     | 534      |
| Real Sto Return     | 500      |
| Reward Loss         | -212     |
| Running Env Steps   | 246500   |
| Running Forward KL  | 0.287    |
| Running Reverse KL  | 5.84     |
| Running Update Time | 493      |
----------------------------------
2025-02-01 13:43:22.812213 Eastern Standard Time
| Itration            | 494      |
| Real Det Return     | 545      |
| Real Sto Return     | 511      |
| Reward Loss         | -190     |
| Running Env Steps   | 247000   |
| Running Forward KL  | -0.468   |
| Running Reverse KL  | 5.51     |
| Running Update Time | 494      |
----------------------------------
2025-02-01 13:43:38.698831 Eastern Standard Time
| Itration            | 495      |
| Real Det Return     | 552      |
| Real Sto Return     | 511      |
| Reward Loss         | -204     |
| Running Env Steps   | 247500   |
| Running Forward KL  | -0.571   |
| Running Reverse KL  | 5.09     |
| Running Update Time | 495      |
----------------------------------
2025-02-01 13:43:54.616763 Eastern Standard Time
| Itration            | 496      |
| Real Det Return     | 553      |
| Real Sto Return     | 510      |
| Reward Loss         | -228     |
| Running Env Steps   | 248000   |
| Running Forward KL  | 0.454    |
| Running Reverse KL  | 5.82     |
| Running Update Time | 496      |
----------------------------------
2025-02-01 13:44:10.270886 Eastern Standard Time
| Itration            | 497      |
| Real Det Return     | 568      |
| Real Sto Return     | 517      |
| Reward Loss         | -212     |
| Running Env Steps   | 248500   |
| Running Forward KL  | 0.813    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 497      |
----------------------------------
2025-02-01 13:44:25.882351 Eastern Standard Time
| Itration            | 498      |
| Real Det Return     | 553      |
| Real Sto Return     | 514      |
| Reward Loss         | -206     |
| Running Env Steps   | 249000   |
| Running Forward KL  | 0.983    |
| Running Reverse KL  | 5.47     |
| Running Update Time | 498      |
----------------------------------
2025-02-01 13:44:41.512804 Eastern Standard Time
| Itration            | 499      |
| Real Det Return     | 558      |
| Real Sto Return     | 492      |
| Reward Loss         | -204     |
| Running Env Steps   | 249500   |
| Running Forward KL  | 1.42     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 499      |
----------------------------------
2025-02-01 13:44:57.093412 Eastern Standard Time
| Itration            | 500      |
| Real Det Return     | 568      |
| Real Sto Return     | 527      |
| Reward Loss         | -191     |
| Running Env Steps   | 250000   |
| Running Forward KL  | -0.252   |
| Running Reverse KL  | 5.45     |
| Running Update Time | 500      |
----------------------------------
2025-02-01 13:45:12.682102 Eastern Standard Time
| Itration            | 501      |
| Real Det Return     | 550      |
| Real Sto Return     | 518      |
| Reward Loss         | -215     |
| Running Env Steps   | 250500   |
| Running Forward KL  | -0.429   |
| Running Reverse KL  | 5.33     |
| Running Update Time | 501      |
----------------------------------
2025-02-01 13:45:28.260678 Eastern Standard Time
| Itration            | 502      |
| Real Det Return     | 553      |
| Real Sto Return     | 506      |
| Reward Loss         | -210     |
| Running Env Steps   | 251000   |
| Running Forward KL  | -0.203   |
| Running Reverse KL  | 5.27     |
| Running Update Time | 502      |
----------------------------------
2025-02-01 13:45:43.899516 Eastern Standard Time
| Itration            | 503      |
| Real Det Return     | 563      |
| Real Sto Return     | 506      |
| Reward Loss         | -190     |
| Running Env Steps   | 251500   |
| Running Forward KL  | -0.765   |
| Running Reverse KL  | 5.04     |
| Running Update Time | 503      |
----------------------------------
2025-02-01 13:45:59.537568 Eastern Standard Time
| Itration            | 504      |
| Real Det Return     | 550      |
| Real Sto Return     | 510      |
| Reward Loss         | -172     |
| Running Env Steps   | 252000   |
| Running Forward KL  | -0.628   |
| Running Reverse KL  | 5.56     |
| Running Update Time | 504      |
----------------------------------
2025-02-01 13:46:15.152761 Eastern Standard Time
| Itration            | 505      |
| Real Det Return     | 546      |
| Real Sto Return     | 506      |
| Reward Loss         | -226     |
| Running Env Steps   | 252500   |
| Running Forward KL  | -0.767   |
| Running Reverse KL  | 5.15     |
| Running Update Time | 505      |
----------------------------------
2025-02-01 13:46:30.816700 Eastern Standard Time
| Itration            | 506      |
| Real Det Return     | 536      |
| Real Sto Return     | 509      |
| Reward Loss         | -220     |
| Running Env Steps   | 253000   |
| Running Forward KL  | -0.636   |
| Running Reverse KL  | 5.03     |
| Running Update Time | 506      |
----------------------------------
2025-02-01 13:46:46.585876 Eastern Standard Time
| Itration            | 507      |
| Real Det Return     | 545      |
| Real Sto Return     | 520      |
| Reward Loss         | -200     |
| Running Env Steps   | 253500   |
| Running Forward KL  | -0.468   |
| Running Reverse KL  | 5.05     |
| Running Update Time | 507      |
----------------------------------
2025-02-01 13:47:02.240076 Eastern Standard Time
| Itration            | 508      |
| Real Det Return     | 555      |
| Real Sto Return     | 515      |
| Reward Loss         | -207     |
| Running Env Steps   | 254000   |
| Running Forward KL  | -0.451   |
| Running Reverse KL  | 5.25     |
| Running Update Time | 508      |
----------------------------------
2025-02-01 13:47:17.876396 Eastern Standard Time
| Itration            | 509      |
| Real Det Return     | 566      |
| Real Sto Return     | 514      |
| Reward Loss         | -186     |
| Running Env Steps   | 254500   |
| Running Forward KL  | -1.14    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 509      |
----------------------------------
2025-02-01 13:47:33.466816 Eastern Standard Time
| Itration            | 510      |
| Real Det Return     | 555      |
| Real Sto Return     | 509      |
| Reward Loss         | -214     |
| Running Env Steps   | 255000   |
| Running Forward KL  | 0.207    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 510      |
----------------------------------
2025-02-01 13:47:49.110255 Eastern Standard Time
| Itration            | 511      |
| Real Det Return     | 559      |
| Real Sto Return     | 534      |
| Reward Loss         | -191     |
| Running Env Steps   | 255500   |
| Running Forward KL  | -0.508   |
| Running Reverse KL  | 5.34     |
| Running Update Time | 511      |
----------------------------------
2025-02-01 13:48:04.741664 Eastern Standard Time
| Itration            | 512      |
| Real Det Return     | 550      |
| Real Sto Return     | 514      |
| Reward Loss         | -206     |
| Running Env Steps   | 256000   |
| Running Forward KL  | -0.113   |
| Running Reverse KL  | 5.34     |
| Running Update Time | 512      |
----------------------------------
2025-02-01 13:48:20.378811 Eastern Standard Time
| Itration            | 513      |
| Real Det Return     | 550      |
| Real Sto Return     | 523      |
| Reward Loss         | -190     |
| Running Env Steps   | 256500   |
| Running Forward KL  | -1.1     |
| Running Reverse KL  | 4.59     |
| Running Update Time | 513      |
----------------------------------
2025-02-01 13:48:36.016479 Eastern Standard Time
| Itration            | 514      |
| Real Det Return     | 544      |
| Real Sto Return     | 515      |
| Reward Loss         | -193     |
| Running Env Steps   | 257000   |
| Running Forward KL  | -0.537   |
| Running Reverse KL  | 4.42     |
| Running Update Time | 514      |
----------------------------------
2025-02-01 13:48:51.669361 Eastern Standard Time
| Itration            | 515      |
| Real Det Return     | 553      |
| Real Sto Return     | 531      |
| Reward Loss         | -200     |
| Running Env Steps   | 257500   |
| Running Forward KL  | -0.675   |
| Running Reverse KL  | 5.18     |
| Running Update Time | 515      |
----------------------------------
2025-02-01 13:49:07.317666 Eastern Standard Time
| Itration            | 516      |
| Real Det Return     | 541      |
| Real Sto Return     | 508      |
| Reward Loss         | -210     |
| Running Env Steps   | 258000   |
| Running Forward KL  | -0.667   |
| Running Reverse KL  | 5.13     |
| Running Update Time | 516      |
----------------------------------
2025-02-01 13:49:22.971436 Eastern Standard Time
| Itration            | 517      |
| Real Det Return     | 564      |
| Real Sto Return     | 523      |
| Reward Loss         | -218     |
| Running Env Steps   | 258500   |
| Running Forward KL  | -0.284   |
| Running Reverse KL  | 5.04     |
| Running Update Time | 517      |
----------------------------------
2025-02-01 13:49:38.488499 Eastern Standard Time
| Itration            | 518      |
| Real Det Return     | 554      |
| Real Sto Return     | 525      |
| Reward Loss         | -212     |
| Running Env Steps   | 259000   |
| Running Forward KL  | -0.904   |
| Running Reverse KL  | 4.54     |
| Running Update Time | 518      |
----------------------------------
2025-02-01 13:49:54.131971 Eastern Standard Time
| Itration            | 519      |
| Real Det Return     | 542      |
| Real Sto Return     | 518      |
| Reward Loss         | -203     |
| Running Env Steps   | 259500   |
| Running Forward KL  | 0.346    |
| Running Reverse KL  | 5.65     |
| Running Update Time | 519      |
----------------------------------
2025-02-01 13:50:09.787931 Eastern Standard Time
| Itration            | 520      |
| Real Det Return     | 570      |
| Real Sto Return     | 523      |
| Reward Loss         | -184     |
| Running Env Steps   | 260000   |
| Running Forward KL  | -0.172   |
| Running Reverse KL  | 5.41     |
| Running Update Time | 520      |
----------------------------------
2025-02-01 13:50:25.408656 Eastern Standard Time
| Itration            | 521      |
| Real Det Return     | 553      |
| Real Sto Return     | 501      |
| Reward Loss         | -219     |
| Running Env Steps   | 260500   |
| Running Forward KL  | -0.197   |
| Running Reverse KL  | 5.03     |
| Running Update Time | 521      |
----------------------------------
2025-02-01 13:50:41.041531 Eastern Standard Time
| Itration            | 522      |
| Real Det Return     | 550      |
| Real Sto Return     | 498      |
| Reward Loss         | -178     |
| Running Env Steps   | 261000   |
| Running Forward KL  | -0.51    |
| Running Reverse KL  | 5.86     |
| Running Update Time | 522      |
----------------------------------
2025-02-01 13:50:56.691805 Eastern Standard Time
| Itration            | 523      |
| Real Det Return     | 592      |
| Real Sto Return     | 552      |
| Reward Loss         | -163     |
| Running Env Steps   | 261500   |
| Running Forward KL  | -0.657   |
| Running Reverse KL  | 5.17     |
| Running Update Time | 523      |
----------------------------------
2025-02-01 13:51:12.373406 Eastern Standard Time
| Itration            | 524      |
| Real Det Return     | 555      |
| Real Sto Return     | 512      |
| Reward Loss         | -218     |
| Running Env Steps   | 262000   |
| Running Forward KL  | -0.315   |
| Running Reverse KL  | 5.18     |
| Running Update Time | 524      |
----------------------------------
2025-02-01 13:51:28.002250 Eastern Standard Time
| Itration            | 525      |
| Real Det Return     | 581      |
| Real Sto Return     | 533      |
| Reward Loss         | -183     |
| Running Env Steps   | 262500   |
| Running Forward KL  | 0.0225   |
| Running Reverse KL  | 5.28     |
| Running Update Time | 525      |
----------------------------------
2025-02-01 13:51:43.576283 Eastern Standard Time
| Itration            | 526      |
| Real Det Return     | 546      |
| Real Sto Return     | 502      |
| Reward Loss         | -216     |
| Running Env Steps   | 263000   |
| Running Forward KL  | -0.192   |
| Running Reverse KL  | 5.51     |
| Running Update Time | 526      |
----------------------------------
2025-02-01 13:51:59.137294 Eastern Standard Time
| Itration            | 527      |
| Real Det Return     | 577      |
| Real Sto Return     | 536      |
| Reward Loss         | -181     |
| Running Env Steps   | 263500   |
| Running Forward KL  | -1.33    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 527      |
----------------------------------
2025-02-01 13:52:14.740988 Eastern Standard Time
| Itration            | 528      |
| Real Det Return     | 542      |
| Real Sto Return     | 510      |
| Reward Loss         | -196     |
| Running Env Steps   | 264000   |
| Running Forward KL  | -0.733   |
| Running Reverse KL  | 5.33     |
| Running Update Time | 528      |
----------------------------------
2025-02-01 13:52:30.416184 Eastern Standard Time
| Itration            | 529      |
| Real Det Return     | 557      |
| Real Sto Return     | 529      |
| Reward Loss         | -181     |
| Running Env Steps   | 264500   |
| Running Forward KL  | -1.17    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 529      |
----------------------------------
2025-02-01 13:52:46.124197 Eastern Standard Time
| Itration            | 530      |
| Real Det Return     | 579      |
| Real Sto Return     | 539      |
| Reward Loss         | -178     |
| Running Env Steps   | 265000   |
| Running Forward KL  | -1.65    |
| Running Reverse KL  | 5.03     |
| Running Update Time | 530      |
----------------------------------
2025-02-01 13:53:01.808413 Eastern Standard Time
| Itration            | 531      |
| Real Det Return     | 570      |
| Real Sto Return     | 518      |
| Reward Loss         | -163     |
| Running Env Steps   | 265500   |
| Running Forward KL  | -1.18    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 531      |
----------------------------------
2025-02-01 13:53:17.385584 Eastern Standard Time
| Itration            | 532      |
| Real Det Return     | 578      |
| Real Sto Return     | 533      |
| Reward Loss         | -178     |
| Running Env Steps   | 266000   |
| Running Forward KL  | -0.625   |
| Running Reverse KL  | 5.07     |
| Running Update Time | 532      |
----------------------------------
2025-02-01 13:53:32.992278 Eastern Standard Time
| Itration            | 533      |
| Real Det Return     | 565      |
| Real Sto Return     | 536      |
| Reward Loss         | -175     |
| Running Env Steps   | 266500   |
| Running Forward KL  | -0.697   |
| Running Reverse KL  | 5.64     |
| Running Update Time | 533      |
----------------------------------
2025-02-01 13:53:48.622052 Eastern Standard Time
| Itration            | 534      |
| Real Det Return     | 550      |
| Real Sto Return     | 522      |
| Reward Loss         | -207     |
| Running Env Steps   | 267000   |
| Running Forward KL  | -0.899   |
| Running Reverse KL  | 4.96     |
| Running Update Time | 534      |
----------------------------------
2025-02-01 13:54:04.285243 Eastern Standard Time
| Itration            | 535      |
| Real Det Return     | 551      |
| Real Sto Return     | 519      |
| Reward Loss         | -186     |
| Running Env Steps   | 267500   |
| Running Forward KL  | -0.775   |
| Running Reverse KL  | 4.46     |
| Running Update Time | 535      |
----------------------------------
2025-02-01 13:54:19.938503 Eastern Standard Time
| Itration            | 536      |
| Real Det Return     | 568      |
| Real Sto Return     | 520      |
| Reward Loss         | -162     |
| Running Env Steps   | 268000   |
| Running Forward KL  | -1.17    |
| Running Reverse KL  | 5.21     |
| Running Update Time | 536      |
----------------------------------
2025-02-01 13:54:35.517559 Eastern Standard Time
| Itration            | 537      |
| Real Det Return     | 588      |
| Real Sto Return     | 533      |
| Reward Loss         | -184     |
| Running Env Steps   | 268500   |
| Running Forward KL  | -0.72    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 537      |
----------------------------------
2025-02-01 13:54:51.202722 Eastern Standard Time
| Itration            | 538      |
| Real Det Return     | 568      |
| Real Sto Return     | 529      |
| Reward Loss         | -206     |
| Running Env Steps   | 269000   |
| Running Forward KL  | 0.423    |
| Running Reverse KL  | 5.72     |
| Running Update Time | 538      |
----------------------------------
2025-02-01 13:55:06.954484 Eastern Standard Time
| Itration            | 539      |
| Real Det Return     | 563      |
| Real Sto Return     | 530      |
| Reward Loss         | -263     |
| Running Env Steps   | 269500   |
| Running Forward KL  | 0.712    |
| Running Reverse KL  | 5.68     |
| Running Update Time | 539      |
----------------------------------
2025-02-01 13:55:22.626901 Eastern Standard Time
| Itration            | 540      |
| Real Det Return     | 605      |
| Real Sto Return     | 539      |
| Reward Loss         | -196     |
| Running Env Steps   | 270000   |
| Running Forward KL  | -1.3     |
| Running Reverse KL  | 4.86     |
| Running Update Time | 540      |
----------------------------------
2025-02-01 13:55:38.331389 Eastern Standard Time
| Itration            | 541      |
| Real Det Return     | 584      |
| Real Sto Return     | 541      |
| Reward Loss         | -156     |
| Running Env Steps   | 270500   |
| Running Forward KL  | -1.84    |
| Running Reverse KL  | 5.32     |
| Running Update Time | 541      |
----------------------------------
2025-02-01 13:55:53.977658 Eastern Standard Time
| Itration            | 542      |
| Real Det Return     | 558      |
| Real Sto Return     | 514      |
| Reward Loss         | -188     |
| Running Env Steps   | 271000   |
| Running Forward KL  | -1.31    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 542      |
----------------------------------
2025-02-01 13:56:09.706629 Eastern Standard Time
| Itration            | 543      |
| Real Det Return     | 575      |
| Real Sto Return     | 530      |
| Reward Loss         | -185     |
| Running Env Steps   | 271500   |
| Running Forward KL  | -0.783   |
| Running Reverse KL  | 5.69     |
| Running Update Time | 543      |
----------------------------------
2025-02-01 13:56:25.267612 Eastern Standard Time
| Itration            | 544      |
| Real Det Return     | 566      |
| Real Sto Return     | 519      |
| Reward Loss         | -193     |
| Running Env Steps   | 272000   |
| Running Forward KL  | -1.01    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 544      |
----------------------------------
2025-02-01 13:56:40.917721 Eastern Standard Time
| Itration            | 545      |
| Real Det Return     | 582      |
| Real Sto Return     | 534      |
| Reward Loss         | -184     |
| Running Env Steps   | 272500   |
| Running Forward KL  | -1.12    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 545      |
----------------------------------
2025-02-01 13:56:56.547034 Eastern Standard Time
| Itration            | 546      |
| Real Det Return     | 570      |
| Real Sto Return     | 523      |
| Reward Loss         | -195     |
| Running Env Steps   | 273000   |
| Running Forward KL  | -1.22    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 546      |
----------------------------------
2025-02-01 13:57:12.248584 Eastern Standard Time
| Itration            | 547      |
| Real Det Return     | 537      |
| Real Sto Return     | 502      |
| Reward Loss         | -190     |
| Running Env Steps   | 273500   |
| Running Forward KL  | -1.04    |
| Running Reverse KL  | 5.32     |
| Running Update Time | 547      |
----------------------------------
2025-02-01 13:57:27.871428 Eastern Standard Time
| Itration            | 548      |
| Real Det Return     | 581      |
| Real Sto Return     | 535      |
| Reward Loss         | -193     |
| Running Env Steps   | 274000   |
| Running Forward KL  | -0.976   |
| Running Reverse KL  | 4.48     |
| Running Update Time | 548      |
----------------------------------
2025-02-01 13:57:43.555239 Eastern Standard Time
| Itration            | 549      |
| Real Det Return     | 581      |
| Real Sto Return     | 536      |
| Reward Loss         | -198     |
| Running Env Steps   | 274500   |
| Running Forward KL  | -0.576   |
| Running Reverse KL  | 4.74     |
| Running Update Time | 549      |
----------------------------------
2025-02-01 13:57:59.232442 Eastern Standard Time
| Itration            | 550      |
| Real Det Return     | 543      |
| Real Sto Return     | 515      |
| Reward Loss         | -200     |
| Running Env Steps   | 275000   |
| Running Forward KL  | -1.54    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 550      |
----------------------------------
2025-02-01 13:58:14.850356 Eastern Standard Time
| Itration            | 551      |
| Real Det Return     | 570      |
| Real Sto Return     | 543      |
| Reward Loss         | -148     |
| Running Env Steps   | 275500   |
| Running Forward KL  | -2.65    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 551      |
----------------------------------
2025-02-01 13:58:30.567700 Eastern Standard Time
| Itration            | 552      |
| Real Det Return     | 621      |
| Real Sto Return     | 547      |
| Reward Loss         | -145     |
| Running Env Steps   | 276000   |
| Running Forward KL  | -1.59    |
| Running Reverse KL  | 5.8      |
| Running Update Time | 552      |
----------------------------------
2025-02-01 13:58:46.368339 Eastern Standard Time
| Itration            | 553      |
| Real Det Return     | 558      |
| Real Sto Return     | 535      |
| Reward Loss         | -175     |
| Running Env Steps   | 276500   |
| Running Forward KL  | -1.58    |
| Running Reverse KL  | 4.41     |
| Running Update Time | 553      |
----------------------------------
2025-02-01 13:59:02.188061 Eastern Standard Time
| Itration            | 554      |
| Real Det Return     | 541      |
| Real Sto Return     | 508      |
| Reward Loss         | -202     |
| Running Env Steps   | 277000   |
| Running Forward KL  | -0.767   |
| Running Reverse KL  | 5.13     |
| Running Update Time | 554      |
----------------------------------
2025-02-01 13:59:17.938824 Eastern Standard Time
| Itration            | 555      |
| Real Det Return     | 580      |
| Real Sto Return     | 533      |
| Reward Loss         | -198     |
| Running Env Steps   | 277500   |
| Running Forward KL  | -1.44    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 555      |
----------------------------------
2025-02-01 13:59:33.652281 Eastern Standard Time
| Itration            | 556      |
| Real Det Return     | 575      |
| Real Sto Return     | 521      |
| Reward Loss         | -210     |
| Running Env Steps   | 278000   |
| Running Forward KL  | -1.84    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 556      |
----------------------------------
2025-02-01 13:59:49.259589 Eastern Standard Time
| Itration            | 557      |
| Real Det Return     | 606      |
| Real Sto Return     | 556      |
| Reward Loss         | -185     |
| Running Env Steps   | 278500   |
| Running Forward KL  | -0.745   |
| Running Reverse KL  | 5.38     |
| Running Update Time | 557      |
----------------------------------
2025-02-01 14:00:04.905223 Eastern Standard Time
| Itration            | 558      |
| Real Det Return     | 602      |
| Real Sto Return     | 540      |
| Reward Loss         | -191     |
| Running Env Steps   | 279000   |
| Running Forward KL  | -0.615   |
| Running Reverse KL  | 5.39     |
| Running Update Time | 558      |
----------------------------------
2025-02-01 14:00:20.604971 Eastern Standard Time
| Itration            | 559      |
| Real Det Return     | 582      |
| Real Sto Return     | 558      |
| Reward Loss         | -134     |
| Running Env Steps   | 279500   |
| Running Forward KL  | -1.96    |
| Running Reverse KL  | 5.08     |
| Running Update Time | 559      |
----------------------------------
2025-02-01 14:00:36.238376 Eastern Standard Time
| Itration            | 560      |
| Real Det Return     | 599      |
| Real Sto Return     | 560      |
| Reward Loss         | -187     |
| Running Env Steps   | 280000   |
| Running Forward KL  | -0.793   |
| Running Reverse KL  | 5.15     |
| Running Update Time | 560      |
----------------------------------
2025-02-01 14:00:51.926324 Eastern Standard Time
| Itration            | 561      |
| Real Det Return     | 528      |
| Real Sto Return     | 500      |
| Reward Loss         | -178     |
| Running Env Steps   | 280500   |
| Running Forward KL  | -1.57    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 561      |
----------------------------------
2025-02-01 14:01:07.544256 Eastern Standard Time
| Itration            | 562      |
| Real Det Return     | 578      |
| Real Sto Return     | 546      |
| Reward Loss         | -179     |
| Running Env Steps   | 281000   |
| Running Forward KL  | -0.953   |
| Running Reverse KL  | 5.27     |
| Running Update Time | 562      |
----------------------------------
2025-02-01 14:01:23.124566 Eastern Standard Time
| Itration            | 563      |
| Real Det Return     | 591      |
| Real Sto Return     | 544      |
| Reward Loss         | -145     |
| Running Env Steps   | 281500   |
| Running Forward KL  | -2.12    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 563      |
----------------------------------
2025-02-01 14:01:38.733178 Eastern Standard Time
| Itration            | 564      |
| Real Det Return     | 565      |
| Real Sto Return     | 536      |
| Reward Loss         | -171     |
| Running Env Steps   | 282000   |
| Running Forward KL  | -1.99    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 564      |
----------------------------------
2025-02-01 14:01:54.368863 Eastern Standard Time
| Itration            | 565      |
| Real Det Return     | 557      |
| Real Sto Return     | 517      |
| Reward Loss         | -194     |
| Running Env Steps   | 282500   |
| Running Forward KL  | -1.88    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 565      |
----------------------------------
2025-02-01 14:02:10.021797 Eastern Standard Time
| Itration            | 566      |
| Real Det Return     | 596      |
| Real Sto Return     | 547      |
| Reward Loss         | -186     |
| Running Env Steps   | 283000   |
| Running Forward KL  | -1.97    |
| Running Reverse KL  | 4.38     |
| Running Update Time | 566      |
----------------------------------
2025-02-01 14:02:25.714450 Eastern Standard Time
| Itration            | 567      |
| Real Det Return     | 591      |
| Real Sto Return     | 546      |
| Reward Loss         | -139     |
| Running Env Steps   | 283500   |
| Running Forward KL  | -2.4     |
| Running Reverse KL  | 4.97     |
| Running Update Time | 567      |
----------------------------------
2025-02-01 14:02:41.390854 Eastern Standard Time
| Itration            | 568      |
| Real Det Return     | 587      |
| Real Sto Return     | 526      |
| Reward Loss         | -230     |
| Running Env Steps   | 284000   |
| Running Forward KL  | -0.193   |
| Running Reverse KL  | 5.03     |
| Running Update Time | 568      |
----------------------------------
2025-02-01 14:02:57.009980 Eastern Standard Time
| Itration            | 569      |
| Real Det Return     | 566      |
| Real Sto Return     | 520      |
| Reward Loss         | -176     |
| Running Env Steps   | 284500   |
| Running Forward KL  | -2       |
| Running Reverse KL  | 4.99     |
| Running Update Time | 569      |
----------------------------------
2025-02-01 14:03:12.677119 Eastern Standard Time
| Itration            | 570      |
| Real Det Return     | 594      |
| Real Sto Return     | 527      |
| Reward Loss         | -192     |
| Running Env Steps   | 285000   |
| Running Forward KL  | -1.33    |
| Running Reverse KL  | 5.63     |
| Running Update Time | 570      |
----------------------------------
2025-02-01 14:03:28.259967 Eastern Standard Time
| Itration            | 571      |
| Real Det Return     | 570      |
| Real Sto Return     | 542      |
| Reward Loss         | -193     |
| Running Env Steps   | 285500   |
| Running Forward KL  | -1.95    |
| Running Reverse KL  | 5.17     |
| Running Update Time | 571      |
----------------------------------
2025-02-01 14:03:43.884050 Eastern Standard Time
| Itration            | 572      |
| Real Det Return     | 580      |
| Real Sto Return     | 556      |
| Reward Loss         | -162     |
| Running Env Steps   | 286000   |
| Running Forward KL  | -1.83    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 572      |
----------------------------------
2025-02-01 14:03:59.528327 Eastern Standard Time
| Itration            | 573      |
| Real Det Return     | 598      |
| Real Sto Return     | 562      |
| Reward Loss         | -185     |
| Running Env Steps   | 286500   |
| Running Forward KL  | -1.15    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 573      |
----------------------------------
2025-02-01 14:04:15.183153 Eastern Standard Time
| Itration            | 574      |
| Real Det Return     | 581      |
| Real Sto Return     | 532      |
| Reward Loss         | -165     |
| Running Env Steps   | 287000   |
| Running Forward KL  | -1.94    |
| Running Reverse KL  | 4.79     |
| Running Update Time | 574      |
----------------------------------
2025-02-01 14:04:30.861587 Eastern Standard Time
| Itration            | 575      |
| Real Det Return     | 578      |
| Real Sto Return     | 521      |
| Reward Loss         | -212     |
| Running Env Steps   | 287500   |
| Running Forward KL  | -0.906   |
| Running Reverse KL  | 5.13     |
| Running Update Time | 575      |
----------------------------------
2025-02-01 14:04:46.445207 Eastern Standard Time
| Itration            | 576      |
| Real Det Return     | 613      |
| Real Sto Return     | 569      |
| Reward Loss         | -182     |
| Running Env Steps   | 288000   |
| Running Forward KL  | -1.76    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 576      |
----------------------------------
2025-02-01 14:05:02.403545 Eastern Standard Time
| Itration            | 577      |
| Real Det Return     | 555      |
| Real Sto Return     | 532      |
| Reward Loss         | -180     |
| Running Env Steps   | 288500   |
| Running Forward KL  | -2.37    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 577      |
----------------------------------
2025-02-01 14:05:18.078309 Eastern Standard Time
| Itration            | 578      |
| Real Det Return     | 605      |
| Real Sto Return     | 563      |
| Reward Loss         | -186     |
| Running Env Steps   | 289000   |
| Running Forward KL  | -2.07    |
| Running Reverse KL  | 4.68     |
| Running Update Time | 578      |
----------------------------------
2025-02-01 14:05:33.764279 Eastern Standard Time
| Itration            | 579      |
| Real Det Return     | 605      |
| Real Sto Return     | 544      |
| Reward Loss         | -163     |
| Running Env Steps   | 289500   |
| Running Forward KL  | -2.22    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 579      |
----------------------------------
2025-02-01 14:05:49.436172 Eastern Standard Time
| Itration            | 580      |
| Real Det Return     | 604      |
| Real Sto Return     | 550      |
| Reward Loss         | -174     |
| Running Env Steps   | 290000   |
| Running Forward KL  | -1.9     |
| Running Reverse KL  | 5.09     |
| Running Update Time | 580      |
----------------------------------
2025-02-01 14:06:05.067316 Eastern Standard Time
| Itration            | 581      |
| Real Det Return     | 623      |
| Real Sto Return     | 566      |
| Reward Loss         | -143     |
| Running Env Steps   | 290500   |
| Running Forward KL  | -2.3     |
| Running Reverse KL  | 3.9      |
| Running Update Time | 581      |
----------------------------------
2025-02-01 14:06:20.751574 Eastern Standard Time
| Itration            | 582      |
| Real Det Return     | 570      |
| Real Sto Return     | 556      |
| Reward Loss         | -157     |
| Running Env Steps   | 291000   |
| Running Forward KL  | -1.98    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 582      |
----------------------------------
2025-02-01 14:06:36.468710 Eastern Standard Time
| Itration            | 583      |
| Real Det Return     | 600      |
| Real Sto Return     | 520      |
| Reward Loss         | -196     |
| Running Env Steps   | 291500   |
| Running Forward KL  | -1.28    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 583      |
----------------------------------
2025-02-01 14:06:52.136290 Eastern Standard Time
| Itration            | 584      |
| Real Det Return     | 612      |
| Real Sto Return     | 591      |
| Reward Loss         | -151     |
| Running Env Steps   | 292000   |
| Running Forward KL  | -2.14    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 584      |
----------------------------------
2025-02-01 14:07:07.893297 Eastern Standard Time
| Itration            | 585      |
| Real Det Return     | 588      |
| Real Sto Return     | 543      |
| Reward Loss         | -158     |
| Running Env Steps   | 292500   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 5.3      |
| Running Update Time | 585      |
----------------------------------
2025-02-01 14:07:23.526937 Eastern Standard Time
| Itration            | 586      |
| Real Det Return     | 610      |
| Real Sto Return     | 561      |
| Reward Loss         | -175     |
| Running Env Steps   | 293000   |
| Running Forward KL  | -1.53    |
| Running Reverse KL  | 5.26     |
| Running Update Time | 586      |
----------------------------------
2025-02-01 14:07:39.151623 Eastern Standard Time
| Itration            | 587      |
| Real Det Return     | 598      |
| Real Sto Return     | 555      |
| Reward Loss         | -189     |
| Running Env Steps   | 293500   |
| Running Forward KL  | -1.7     |
| Running Reverse KL  | 4.1      |
| Running Update Time | 587      |
----------------------------------
2025-02-01 14:07:54.773521 Eastern Standard Time
| Itration            | 588      |
| Real Det Return     | 602      |
| Real Sto Return     | 547      |
| Reward Loss         | -159     |
| Running Env Steps   | 294000   |
| Running Forward KL  | -2.15    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 588      |
----------------------------------
2025-02-01 14:08:10.469870 Eastern Standard Time
| Itration            | 589      |
| Real Det Return     | 566      |
| Real Sto Return     | 538      |
| Reward Loss         | -211     |
| Running Env Steps   | 294500   |
| Running Forward KL  | -1.96    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 589      |
----------------------------------
2025-02-01 14:08:26.076681 Eastern Standard Time
| Itration            | 590      |
| Real Det Return     | 587      |
| Real Sto Return     | 539      |
| Reward Loss         | -194     |
| Running Env Steps   | 295000   |
| Running Forward KL  | -1.48    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 590      |
----------------------------------
2025-02-01 14:08:41.742324 Eastern Standard Time
| Itration            | 591      |
| Real Det Return     | 612      |
| Real Sto Return     | 567      |
| Reward Loss         | -148     |
| Running Env Steps   | 295500   |
| Running Forward KL  | -2.83    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 591      |
----------------------------------
2025-02-01 14:08:57.313745 Eastern Standard Time
| Itration            | 592      |
| Real Det Return     | 588      |
| Real Sto Return     | 553      |
| Reward Loss         | -204     |
| Running Env Steps   | 296000   |
| Running Forward KL  | -1.92    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 592      |
----------------------------------
2025-02-01 14:09:12.968038 Eastern Standard Time
| Itration            | 593      |
| Real Det Return     | 560      |
| Real Sto Return     | 536      |
| Reward Loss         | -152     |
| Running Env Steps   | 296500   |
| Running Forward KL  | -2       |
| Running Reverse KL  | 5.2      |
| Running Update Time | 593      |
----------------------------------
2025-02-01 14:09:28.644779 Eastern Standard Time
| Itration            | 594      |
| Real Det Return     | 589      |
| Real Sto Return     | 542      |
| Reward Loss         | -142     |
| Running Env Steps   | 297000   |
| Running Forward KL  | -1.85    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 594      |
----------------------------------
2025-02-01 14:09:44.365890 Eastern Standard Time
| Itration            | 595      |
| Real Det Return     | 599      |
| Real Sto Return     | 515      |
| Reward Loss         | -179     |
| Running Env Steps   | 297500   |
| Running Forward KL  | -2.12    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 595      |
----------------------------------
2025-02-01 14:10:00.058455 Eastern Standard Time
| Itration            | 596      |
| Real Det Return     | 608      |
| Real Sto Return     | 574      |
| Reward Loss         | -143     |
| Running Env Steps   | 298000   |
| Running Forward KL  | -2.48    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 596      |
----------------------------------
2025-02-01 14:10:15.726418 Eastern Standard Time
| Itration            | 597      |
| Real Det Return     | 612      |
| Real Sto Return     | 560      |
| Reward Loss         | -172     |
| Running Env Steps   | 298500   |
| Running Forward KL  | -2.08    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 597      |
----------------------------------
2025-02-01 14:10:31.363986 Eastern Standard Time
| Itration            | 598      |
| Real Det Return     | 589      |
| Real Sto Return     | 550      |
| Reward Loss         | -161     |
| Running Env Steps   | 299000   |
| Running Forward KL  | -1.84    |
| Running Reverse KL  | 5.42     |
| Running Update Time | 598      |
----------------------------------
2025-02-01 14:10:47.061887 Eastern Standard Time
| Itration            | 599      |
| Real Det Return     | 598      |
| Real Sto Return     | 558      |
| Reward Loss         | -160     |
| Running Env Steps   | 299500   |
| Running Forward KL  | -2.15    |
| Running Reverse KL  | 5.17     |
| Running Update Time | 599      |
----------------------------------
2025-02-01 14:11:02.764630 Eastern Standard Time
| Itration            | 600      |
| Real Det Return     | 564      |
| Real Sto Return     | 532      |
| Reward Loss         | -206     |
| Running Env Steps   | 300000   |
| Running Forward KL  | -0.194   |
| Running Reverse KL  | 5.58     |
| Running Update Time | 600      |
----------------------------------
2025-02-01 14:11:18.485602 Eastern Standard Time
| Itration            | 601      |
| Real Det Return     | 616      |
| Real Sto Return     | 556      |
| Reward Loss         | -172     |
| Running Env Steps   | 300500   |
| Running Forward KL  | -1.39    |
| Running Reverse KL  | 5.49     |
| Running Update Time | 601      |
----------------------------------
2025-02-01 14:11:34.166438 Eastern Standard Time
| Itration            | 602      |
| Real Det Return     | 561      |
| Real Sto Return     | 521      |
| Reward Loss         | -168     |
| Running Env Steps   | 301000   |
| Running Forward KL  | -1.92    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 602      |
----------------------------------
2025-02-01 14:11:49.872929 Eastern Standard Time
| Itration            | 603      |
| Real Det Return     | 594      |
| Real Sto Return     | 563      |
| Reward Loss         | -170     |
| Running Env Steps   | 301500   |
| Running Forward KL  | -2.32    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 603      |
----------------------------------
2025-02-01 14:12:05.580218 Eastern Standard Time
| Itration            | 604      |
| Real Det Return     | 601      |
| Real Sto Return     | 554      |
| Reward Loss         | -173     |
| Running Env Steps   | 302000   |
| Running Forward KL  | -2.06    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 604      |
----------------------------------
2025-02-01 14:12:21.173602 Eastern Standard Time
| Itration            | 605      |
| Real Det Return     | 617      |
| Real Sto Return     | 566      |
| Reward Loss         | -157     |
| Running Env Steps   | 302500   |
| Running Forward KL  | -1.72    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 605      |
----------------------------------
2025-02-01 14:12:36.805244 Eastern Standard Time
| Itration            | 606      |
| Real Det Return     | 586      |
| Real Sto Return     | 532      |
| Reward Loss         | -140     |
| Running Env Steps   | 303000   |
| Running Forward KL  | -2.91    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 606      |
----------------------------------
2025-02-01 14:12:52.527368 Eastern Standard Time
| Itration            | 607      |
| Real Det Return     | 581      |
| Real Sto Return     | 538      |
| Reward Loss         | -156     |
| Running Env Steps   | 303500   |
| Running Forward KL  | -2       |
| Running Reverse KL  | 5.47     |
| Running Update Time | 607      |
----------------------------------
2025-02-01 14:13:08.169726 Eastern Standard Time
| Itration            | 608      |
| Real Det Return     | 610      |
| Real Sto Return     | 532      |
| Reward Loss         | -199     |
| Running Env Steps   | 304000   |
| Running Forward KL  | -1.78    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 608      |
----------------------------------
2025-02-01 14:13:23.831968 Eastern Standard Time
| Itration            | 609      |
| Real Det Return     | 596      |
| Real Sto Return     | 561      |
| Reward Loss         | -137     |
| Running Env Steps   | 304500   |
| Running Forward KL  | -1.96    |
| Running Reverse KL  | 5.54     |
| Running Update Time | 609      |
----------------------------------
2025-02-01 14:13:39.475556 Eastern Standard Time
| Itration            | 610      |
| Real Det Return     | 584      |
| Real Sto Return     | 542      |
| Reward Loss         | -176     |
| Running Env Steps   | 305000   |
| Running Forward KL  | -1.25    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 610      |
----------------------------------
2025-02-01 14:13:55.290170 Eastern Standard Time
| Itration            | 611      |
| Real Det Return     | 617      |
| Real Sto Return     | 563      |
| Reward Loss         | -131     |
| Running Env Steps   | 305500   |
| Running Forward KL  | -2.17    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 611      |
----------------------------------
2025-02-01 14:14:10.896761 Eastern Standard Time
| Itration            | 612      |
| Real Det Return     | 625      |
| Real Sto Return     | 557      |
| Reward Loss         | -124     |
| Running Env Steps   | 306000   |
| Running Forward KL  | -2.03    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 612      |
----------------------------------
2025-02-01 14:14:26.623314 Eastern Standard Time
| Itration            | 613      |
| Real Det Return     | 552      |
| Real Sto Return     | 533      |
| Reward Loss         | -172     |
| Running Env Steps   | 306500   |
| Running Forward KL  | -2.27    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 613      |
----------------------------------
2025-02-01 14:14:42.244766 Eastern Standard Time
| Itration            | 614      |
| Real Det Return     | 588      |
| Real Sto Return     | 539      |
| Reward Loss         | -245     |
| Running Env Steps   | 307000   |
| Running Forward KL  | -0.847   |
| Running Reverse KL  | 5.01     |
| Running Update Time | 614      |
----------------------------------
2025-02-01 14:14:57.921119 Eastern Standard Time
| Itration            | 615      |
| Real Det Return     | 580      |
| Real Sto Return     | 551      |
| Reward Loss         | -173     |
| Running Env Steps   | 307500   |
| Running Forward KL  | -1.08    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 615      |
----------------------------------
2025-02-01 14:15:13.601463 Eastern Standard Time
| Itration            | 616      |
| Real Det Return     | 635      |
| Real Sto Return     | 592      |
| Reward Loss         | -136     |
| Running Env Steps   | 308000   |
| Running Forward KL  | -1.6     |
| Running Reverse KL  | 5.3      |
| Running Update Time | 616      |
----------------------------------
2025-02-01 14:15:29.242555 Eastern Standard Time
| Itration            | 617      |
| Real Det Return     | 621      |
| Real Sto Return     | 552      |
| Reward Loss         | -187     |
| Running Env Steps   | 308500   |
| Running Forward KL  | -2.23    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 617      |
----------------------------------
2025-02-01 14:15:44.897228 Eastern Standard Time
| Itration            | 618      |
| Real Det Return     | 626      |
| Real Sto Return     | 562      |
| Reward Loss         | -217     |
| Running Env Steps   | 309000   |
| Running Forward KL  | -1.64    |
| Running Reverse KL  | 4.41     |
| Running Update Time | 618      |
----------------------------------
2025-02-01 14:16:00.440283 Eastern Standard Time
| Itration            | 619      |
| Real Det Return     | 616      |
| Real Sto Return     | 559      |
| Reward Loss         | -134     |
| Running Env Steps   | 309500   |
| Running Forward KL  | -2.44    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 619      |
----------------------------------
2025-02-01 14:16:16.035998 Eastern Standard Time
| Itration            | 620      |
| Real Det Return     | 584      |
| Real Sto Return     | 553      |
| Reward Loss         | -161     |
| Running Env Steps   | 310000   |
| Running Forward KL  | -1.8     |
| Running Reverse KL  | 4.84     |
| Running Update Time | 620      |
----------------------------------
2025-02-01 14:16:31.691567 Eastern Standard Time
| Itration            | 621      |
| Real Det Return     | 604      |
| Real Sto Return     | 544      |
| Reward Loss         | -145     |
| Running Env Steps   | 310500   |
| Running Forward KL  | -2.81    |
| Running Reverse KL  | 5.17     |
| Running Update Time | 621      |
----------------------------------
2025-02-01 14:16:47.326369 Eastern Standard Time
| Itration            | 622      |
| Real Det Return     | 587      |
| Real Sto Return     | 532      |
| Reward Loss         | -173     |
| Running Env Steps   | 311000   |
| Running Forward KL  | -2.48    |
| Running Reverse KL  | 5.19     |
| Running Update Time | 622      |
----------------------------------
2025-02-01 14:17:02.992787 Eastern Standard Time
| Itration            | 623      |
| Real Det Return     | 601      |
| Real Sto Return     | 566      |
| Reward Loss         | -167     |
| Running Env Steps   | 311500   |
| Running Forward KL  | -2.75    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 623      |
----------------------------------
2025-02-01 14:17:18.840075 Eastern Standard Time
| Itration            | 624      |
| Real Det Return     | 601      |
| Real Sto Return     | 559      |
| Reward Loss         | -167     |
| Running Env Steps   | 312000   |
| Running Forward KL  | -3.09    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 624      |
----------------------------------
2025-02-01 14:17:34.565218 Eastern Standard Time
| Itration            | 625      |
| Real Det Return     | 598      |
| Real Sto Return     | 541      |
| Reward Loss         | -153     |
| Running Env Steps   | 312500   |
| Running Forward KL  | -2.41    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 625      |
----------------------------------
2025-02-01 14:17:50.205718 Eastern Standard Time
| Itration            | 626      |
| Real Det Return     | 618      |
| Real Sto Return     | 566      |
| Reward Loss         | -160     |
| Running Env Steps   | 313000   |
| Running Forward KL  | -3.05    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 626      |
----------------------------------
2025-02-01 14:18:05.810025 Eastern Standard Time
| Itration            | 627      |
| Real Det Return     | 607      |
| Real Sto Return     | 572      |
| Reward Loss         | -142     |
| Running Env Steps   | 313500   |
| Running Forward KL  | -1.96    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 627      |
----------------------------------
2025-02-01 14:18:21.381506 Eastern Standard Time
| Itration            | 628      |
| Real Det Return     | 587      |
| Real Sto Return     | 546      |
| Reward Loss         | -174     |
| Running Env Steps   | 314000   |
| Running Forward KL  | -2.39    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 628      |
----------------------------------
2025-02-01 14:18:37.019914 Eastern Standard Time
| Itration            | 629      |
| Real Det Return     | 621      |
| Real Sto Return     | 571      |
| Reward Loss         | -133     |
| Running Env Steps   | 314500   |
| Running Forward KL  | -2.25    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 629      |
----------------------------------
2025-02-01 14:18:52.569880 Eastern Standard Time
| Itration            | 630      |
| Real Det Return     | 606      |
| Real Sto Return     | 560      |
| Reward Loss         | -127     |
| Running Env Steps   | 315000   |
| Running Forward KL  | -3.08    |
| Running Reverse KL  | 4.68     |
| Running Update Time | 630      |
----------------------------------
2025-02-01 14:19:08.293201 Eastern Standard Time
| Itration            | 631      |
| Real Det Return     | 592      |
| Real Sto Return     | 564      |
| Reward Loss         | -141     |
| Running Env Steps   | 315500   |
| Running Forward KL  | -3.23    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 631      |
----------------------------------
2025-02-01 14:19:23.988803 Eastern Standard Time
| Itration            | 632      |
| Real Det Return     | 597      |
| Real Sto Return     | 563      |
| Reward Loss         | -178     |
| Running Env Steps   | 316000   |
| Running Forward KL  | -1.95    |
| Running Reverse KL  | 5.32     |
| Running Update Time | 632      |
----------------------------------
2025-02-01 14:19:39.606159 Eastern Standard Time
| Itration            | 633      |
| Real Det Return     | 579      |
| Real Sto Return     | 547      |
| Reward Loss         | -164     |
| Running Env Steps   | 316500   |
| Running Forward KL  | -2.8     |
| Running Reverse KL  | 4.88     |
| Running Update Time | 633      |
----------------------------------
2025-02-01 14:19:55.262922 Eastern Standard Time
| Itration            | 634      |
| Real Det Return     | 617      |
| Real Sto Return     | 550      |
| Reward Loss         | -130     |
| Running Env Steps   | 317000   |
| Running Forward KL  | -2.03    |
| Running Reverse KL  | 5.69     |
| Running Update Time | 634      |
----------------------------------
2025-02-01 14:20:10.924016 Eastern Standard Time
| Itration            | 635      |
| Real Det Return     | 565      |
| Real Sto Return     | 544      |
| Reward Loss         | -195     |
| Running Env Steps   | 317500   |
| Running Forward KL  | -1.46    |
| Running Reverse KL  | 5.24     |
| Running Update Time | 635      |
----------------------------------
2025-02-01 14:20:26.602554 Eastern Standard Time
| Itration            | 636      |
| Real Det Return     | 616      |
| Real Sto Return     | 578      |
| Reward Loss         | -144     |
| Running Env Steps   | 318000   |
| Running Forward KL  | -2.65    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 636      |
----------------------------------
2025-02-01 14:20:42.254801 Eastern Standard Time
| Itration            | 637      |
| Real Det Return     | 595      |
| Real Sto Return     | 579      |
| Reward Loss         | -172     |
| Running Env Steps   | 318500   |
| Running Forward KL  | -2.54    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 637      |
----------------------------------
2025-02-01 14:20:57.901477 Eastern Standard Time
| Itration            | 638      |
| Real Det Return     | 619      |
| Real Sto Return     | 579      |
| Reward Loss         | -127     |
| Running Env Steps   | 319000   |
| Running Forward KL  | -2.68    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 638      |
----------------------------------
2025-02-01 14:21:13.504469 Eastern Standard Time
| Itration            | 639      |
| Real Det Return     | 621      |
| Real Sto Return     | 573      |
| Reward Loss         | -159     |
| Running Env Steps   | 319500   |
| Running Forward KL  | -2.27    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 639      |
----------------------------------
2025-02-01 14:21:29.099270 Eastern Standard Time
| Itration            | 640      |
| Real Det Return     | 611      |
| Real Sto Return     | 583      |
| Reward Loss         | -110     |
| Running Env Steps   | 320000   |
| Running Forward KL  | -2.92    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 640      |
----------------------------------
2025-02-01 14:21:44.787995 Eastern Standard Time
| Itration            | 641      |
| Real Det Return     | 613      |
| Real Sto Return     | 567      |
| Reward Loss         | -128     |
| Running Env Steps   | 320500   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 641      |
----------------------------------
2025-02-01 14:22:00.483086 Eastern Standard Time
| Itration            | 642      |
| Real Det Return     | 640      |
| Real Sto Return     | 570      |
| Reward Loss         | -154     |
| Running Env Steps   | 321000   |
| Running Forward KL  | -2.33    |
| Running Reverse KL  | 5.21     |
| Running Update Time | 642      |
----------------------------------
2025-02-01 14:22:16.156204 Eastern Standard Time
| Itration            | 643      |
| Real Det Return     | 583      |
| Real Sto Return     | 557      |
| Reward Loss         | -207     |
| Running Env Steps   | 321500   |
| Running Forward KL  | -1.37    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 643      |
----------------------------------
2025-02-01 14:22:31.773247 Eastern Standard Time
| Itration            | 644      |
| Real Det Return     | 582      |
| Real Sto Return     | 523      |
| Reward Loss         | -145     |
| Running Env Steps   | 322000   |
| Running Forward KL  | -2.17    |
| Running Reverse KL  | 5.46     |
| Running Update Time | 644      |
----------------------------------
2025-02-01 14:22:47.384058 Eastern Standard Time
| Itration            | 645      |
| Real Det Return     | 592      |
| Real Sto Return     | 559      |
| Reward Loss         | -159     |
| Running Env Steps   | 322500   |
| Running Forward KL  | -2.25    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 645      |
----------------------------------
2025-02-01 14:23:02.948873 Eastern Standard Time
| Itration            | 646      |
| Real Det Return     | 606      |
| Real Sto Return     | 585      |
| Reward Loss         | -141     |
| Running Env Steps   | 323000   |
| Running Forward KL  | -2.49    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 646      |
----------------------------------
2025-02-01 14:23:18.541294 Eastern Standard Time
| Itration            | 647      |
| Real Det Return     | 600      |
| Real Sto Return     | 577      |
| Reward Loss         | -171     |
| Running Env Steps   | 323500   |
| Running Forward KL  | -1.41    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 647      |
----------------------------------
2025-02-01 14:23:34.157630 Eastern Standard Time
| Itration            | 648      |
| Real Det Return     | 621      |
| Real Sto Return     | 575      |
| Reward Loss         | -118     |
| Running Env Steps   | 324000   |
| Running Forward KL  | -2.49    |
| Running Reverse KL  | 5.56     |
| Running Update Time | 648      |
----------------------------------
2025-02-01 14:23:49.770111 Eastern Standard Time
| Itration            | 649      |
| Real Det Return     | 588      |
| Real Sto Return     | 571      |
| Reward Loss         | -143     |
| Running Env Steps   | 324500   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 5.27     |
| Running Update Time | 649      |
----------------------------------
2025-02-01 14:24:05.391335 Eastern Standard Time
| Itration            | 650      |
| Real Det Return     | 638      |
| Real Sto Return     | 600      |
| Reward Loss         | -109     |
| Running Env Steps   | 325000   |
| Running Forward KL  | -3       |
| Running Reverse KL  | 4.66     |
| Running Update Time | 650      |
----------------------------------
2025-02-01 14:24:21.323388 Eastern Standard Time
| Itration            | 651      |
| Real Det Return     | 627      |
| Real Sto Return     | 588      |
| Reward Loss         | -113     |
| Running Env Steps   | 325500   |
| Running Forward KL  | -2.87    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 651      |
----------------------------------
2025-02-01 14:24:37.006320 Eastern Standard Time
| Itration            | 652      |
| Real Det Return     | 619      |
| Real Sto Return     | 575      |
| Reward Loss         | -158     |
| Running Env Steps   | 326000   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 652      |
----------------------------------
2025-02-01 14:24:52.705249 Eastern Standard Time
| Itration            | 653      |
| Real Det Return     | 591      |
| Real Sto Return     | 553      |
| Reward Loss         | -171     |
| Running Env Steps   | 326500   |
| Running Forward KL  | -2.7     |
| Running Reverse KL  | 5.22     |
| Running Update Time | 653      |
----------------------------------
2025-02-01 14:25:08.266212 Eastern Standard Time
| Itration            | 654      |
| Real Det Return     | 595      |
| Real Sto Return     | 582      |
| Reward Loss         | -109     |
| Running Env Steps   | 327000   |
| Running Forward KL  | -2.94    |
| Running Reverse KL  | 5.63     |
| Running Update Time | 654      |
----------------------------------
2025-02-01 14:25:26.112291 Eastern Standard Time
| Itration            | 655      |
| Real Det Return     | 616      |
| Real Sto Return     | 549      |
| Reward Loss         | -186     |
| Running Env Steps   | 327500   |
| Running Forward KL  | -1.83    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 655      |
----------------------------------
2025-02-01 14:25:43.813419 Eastern Standard Time
| Itration            | 656      |
| Real Det Return     | 592      |
| Real Sto Return     | 565      |
| Reward Loss         | -161     |
| Running Env Steps   | 328000   |
| Running Forward KL  | -2.66    |
| Running Reverse KL  | 5.46     |
| Running Update Time | 656      |
----------------------------------
2025-02-01 14:26:01.901903 Eastern Standard Time
| Itration            | 657      |
| Real Det Return     | 627      |
| Real Sto Return     | 580      |
| Reward Loss         | -88.7    |
| Running Env Steps   | 328500   |
| Running Forward KL  | -3.54    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 657      |
----------------------------------
2025-02-01 14:26:22.305571 Eastern Standard Time
| Itration            | 658      |
| Real Det Return     | 633      |
| Real Sto Return     | 591      |
| Reward Loss         | -151     |
| Running Env Steps   | 329000   |
| Running Forward KL  | -3       |
| Running Reverse KL  | 5.08     |
| Running Update Time | 658      |
----------------------------------
2025-02-01 14:26:39.422497 Eastern Standard Time
| Itration            | 659      |
| Real Det Return     | 617      |
| Real Sto Return     | 582      |
| Reward Loss         | -172     |
| Running Env Steps   | 329500   |
| Running Forward KL  | -2.64    |
| Running Reverse KL  | 5.52     |
| Running Update Time | 659      |
----------------------------------
2025-02-01 14:26:56.419385 Eastern Standard Time
| Itration            | 660      |
| Real Det Return     | 623      |
| Real Sto Return     | 583      |
| Reward Loss         | -134     |
| Running Env Steps   | 330000   |
| Running Forward KL  | -2.91    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 660      |
----------------------------------
2025-02-01 14:27:13.509618 Eastern Standard Time
| Itration            | 661      |
| Real Det Return     | 593      |
| Real Sto Return     | 568      |
| Reward Loss         | -140     |
| Running Env Steps   | 330500   |
| Running Forward KL  | -3.57    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 661      |
----------------------------------
2025-02-01 14:27:30.420168 Eastern Standard Time
| Itration            | 662      |
| Real Det Return     | 576      |
| Real Sto Return     | 558      |
| Reward Loss         | -145     |
| Running Env Steps   | 331000   |
| Running Forward KL  | -2.48    |
| Running Reverse KL  | 5.32     |
| Running Update Time | 662      |
----------------------------------
2025-02-01 14:27:47.423941 Eastern Standard Time
| Itration            | 663      |
| Real Det Return     | 619      |
| Real Sto Return     | 575      |
| Reward Loss         | -133     |
| Running Env Steps   | 331500   |
| Running Forward KL  | -2.73    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 663      |
----------------------------------
2025-02-01 14:28:04.459488 Eastern Standard Time
| Itration            | 664      |
| Real Det Return     | 628      |
| Real Sto Return     | 579      |
| Reward Loss         | -120     |
| Running Env Steps   | 332000   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 664      |
----------------------------------
2025-02-01 14:28:21.645553 Eastern Standard Time
| Itration            | 665      |
| Real Det Return     | 555      |
| Real Sto Return     | 532      |
| Reward Loss         | -191     |
| Running Env Steps   | 332500   |
| Running Forward KL  | -1.98    |
| Running Reverse KL  | 5.77     |
| Running Update Time | 665      |
----------------------------------
2025-02-01 14:28:38.678489 Eastern Standard Time
| Itration            | 666      |
| Real Det Return     | 614      |
| Real Sto Return     | 589      |
| Reward Loss         | -158     |
| Running Env Steps   | 333000   |
| Running Forward KL  | -1.87    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 666      |
----------------------------------
2025-02-01 14:28:55.670888 Eastern Standard Time
| Itration            | 667      |
| Real Det Return     | 617      |
| Real Sto Return     | 583      |
| Reward Loss         | -127     |
| Running Env Steps   | 333500   |
| Running Forward KL  | -3.38    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 667      |
----------------------------------
2025-02-01 14:29:12.683602 Eastern Standard Time
| Itration            | 668      |
| Real Det Return     | 596      |
| Real Sto Return     | 557      |
| Reward Loss         | -154     |
| Running Env Steps   | 334000   |
| Running Forward KL  | -1.66    |
| Running Reverse KL  | 5.33     |
| Running Update Time | 668      |
----------------------------------
2025-02-01 14:29:29.683985 Eastern Standard Time
| Itration            | 669      |
| Real Det Return     | 614      |
| Real Sto Return     | 574      |
| Reward Loss         | -148     |
| Running Env Steps   | 334500   |
| Running Forward KL  | -3.23    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 669      |
----------------------------------
2025-02-01 14:29:46.657538 Eastern Standard Time
| Itration            | 670      |
| Real Det Return     | 618      |
| Real Sto Return     | 569      |
| Reward Loss         | -102     |
| Running Env Steps   | 335000   |
| Running Forward KL  | -3.28    |
| Running Reverse KL  | 5.32     |
| Running Update Time | 670      |
----------------------------------
2025-02-01 14:30:03.598464 Eastern Standard Time
| Itration            | 671      |
| Real Det Return     | 607      |
| Real Sto Return     | 547      |
| Reward Loss         | -154     |
| Running Env Steps   | 335500   |
| Running Forward KL  | -3.26    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 671      |
----------------------------------
2025-02-01 14:30:20.571545 Eastern Standard Time
| Itration            | 672      |
| Real Det Return     | 629      |
| Real Sto Return     | 589      |
| Reward Loss         | -106     |
| Running Env Steps   | 336000   |
| Running Forward KL  | -3.44    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 672      |
----------------------------------
2025-02-01 14:30:37.580437 Eastern Standard Time
| Itration            | 673      |
| Real Det Return     | 636      |
| Real Sto Return     | 570      |
| Reward Loss         | -134     |
| Running Env Steps   | 336500   |
| Running Forward KL  | -3.55    |
| Running Reverse KL  | 4.57     |
| Running Update Time | 673      |
----------------------------------
2025-02-01 14:30:54.641992 Eastern Standard Time
| Itration            | 674      |
| Real Det Return     | 612      |
| Real Sto Return     | 558      |
| Reward Loss         | -127     |
| Running Env Steps   | 337000   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 4.88     |
| Running Update Time | 674      |
----------------------------------
2025-02-01 14:31:11.762922 Eastern Standard Time
| Itration            | 675      |
| Real Det Return     | 626      |
| Real Sto Return     | 575      |
| Reward Loss         | -142     |
| Running Env Steps   | 337500   |
| Running Forward KL  | -2.76    |
| Running Reverse KL  | 5.37     |
| Running Update Time | 675      |
----------------------------------
2025-02-01 14:31:28.756187 Eastern Standard Time
| Itration            | 676      |
| Real Det Return     | 610      |
| Real Sto Return     | 581      |
| Reward Loss         | -166     |
| Running Env Steps   | 338000   |
| Running Forward KL  | -2.8     |
| Running Reverse KL  | 4.69     |
| Running Update Time | 676      |
----------------------------------
2025-02-01 14:31:45.700206 Eastern Standard Time
| Itration            | 677      |
| Real Det Return     | 610      |
| Real Sto Return     | 574      |
| Reward Loss         | -201     |
| Running Env Steps   | 338500   |
| Running Forward KL  | -0.711   |
| Running Reverse KL  | 5.2      |
| Running Update Time | 677      |
----------------------------------
2025-02-01 14:32:02.651480 Eastern Standard Time
| Itration            | 678      |
| Real Det Return     | 606      |
| Real Sto Return     | 588      |
| Reward Loss         | -133     |
| Running Env Steps   | 339000   |
| Running Forward KL  | -3.21    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 678      |
----------------------------------
2025-02-01 14:32:19.831978 Eastern Standard Time
| Itration            | 679      |
| Real Det Return     | 599      |
| Real Sto Return     | 578      |
| Reward Loss         | -140     |
| Running Env Steps   | 339500   |
| Running Forward KL  | -3.4     |
| Running Reverse KL  | 5.14     |
| Running Update Time | 679      |
----------------------------------
2025-02-01 14:32:36.770498 Eastern Standard Time
| Itration            | 680      |
| Real Det Return     | 624      |
| Real Sto Return     | 564      |
| Reward Loss         | -190     |
| Running Env Steps   | 340000   |
| Running Forward KL  | -2.19    |
| Running Reverse KL  | 5.49     |
| Running Update Time | 680      |
----------------------------------
2025-02-01 14:32:53.991474 Eastern Standard Time
| Itration            | 681      |
| Real Det Return     | 618      |
| Real Sto Return     | 584      |
| Reward Loss         | -197     |
| Running Env Steps   | 340500   |
| Running Forward KL  | -1.67    |
| Running Reverse KL  | 5.57     |
| Running Update Time | 681      |
----------------------------------
2025-02-01 14:33:11.136431 Eastern Standard Time
| Itration            | 682      |
| Real Det Return     | 631      |
| Real Sto Return     | 591      |
| Reward Loss         | -115     |
| Running Env Steps   | 341000   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 682      |
----------------------------------
2025-02-01 14:33:28.083700 Eastern Standard Time
| Itration            | 683      |
| Real Det Return     | 638      |
| Real Sto Return     | 603      |
| Reward Loss         | -106     |
| Running Env Steps   | 341500   |
| Running Forward KL  | -3.1     |
| Running Reverse KL  | 5.88     |
| Running Update Time | 683      |
----------------------------------
2025-02-01 14:33:45.074952 Eastern Standard Time
| Itration            | 684      |
| Real Det Return     | 589      |
| Real Sto Return     | 560      |
| Reward Loss         | -155     |
| Running Env Steps   | 342000   |
| Running Forward KL  | -3.3     |
| Running Reverse KL  | 4.48     |
| Running Update Time | 684      |
----------------------------------
2025-02-01 14:34:02.043917 Eastern Standard Time
| Itration            | 685      |
| Real Det Return     | 589      |
| Real Sto Return     | 559      |
| Reward Loss         | -133     |
| Running Env Steps   | 342500   |
| Running Forward KL  | -2.67    |
| Running Reverse KL  | 5.34     |
| Running Update Time | 685      |
----------------------------------
2025-02-01 14:34:19.082009 Eastern Standard Time
| Itration            | 686      |
| Real Det Return     | 624      |
| Real Sto Return     | 579      |
| Reward Loss         | -109     |
| Running Env Steps   | 343000   |
| Running Forward KL  | -3.53    |
| Running Reverse KL  | 5.3      |
| Running Update Time | 686      |
----------------------------------
2025-02-01 14:34:36.054534 Eastern Standard Time
| Itration            | 687      |
| Real Det Return     | 615      |
| Real Sto Return     | 583      |
| Reward Loss         | -102     |
| Running Env Steps   | 343500   |
| Running Forward KL  | -3.77    |
| Running Reverse KL  | 5.51     |
| Running Update Time | 687      |
----------------------------------
2025-02-01 14:34:52.959206 Eastern Standard Time
| Itration            | 688      |
| Real Det Return     | 621      |
| Real Sto Return     | 583      |
| Reward Loss         | -120     |
| Running Env Steps   | 344000   |
| Running Forward KL  | -2.49    |
| Running Reverse KL  | 5.6      |
| Running Update Time | 688      |
----------------------------------
2025-02-01 14:35:10.029363 Eastern Standard Time
| Itration            | 689      |
| Real Det Return     | 609      |
| Real Sto Return     | 570      |
| Reward Loss         | -136     |
| Running Env Steps   | 344500   |
| Running Forward KL  | -2.3     |
| Running Reverse KL  | 5.21     |
| Running Update Time | 689      |
----------------------------------
2025-02-01 14:35:26.967533 Eastern Standard Time
| Itration            | 690      |
| Real Det Return     | 599      |
| Real Sto Return     | 585      |
| Reward Loss         | -119     |
| Running Env Steps   | 345000   |
| Running Forward KL  | -3.61    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 690      |
----------------------------------
2025-02-01 14:35:44.024380 Eastern Standard Time
| Itration            | 691      |
| Real Det Return     | 614      |
| Real Sto Return     | 595      |
| Reward Loss         | -117     |
| Running Env Steps   | 345500   |
| Running Forward KL  | -1.93    |
| Running Reverse KL  | 6.14     |
| Running Update Time | 691      |
----------------------------------
2025-02-01 14:36:01.000341 Eastern Standard Time
| Itration            | 692      |
| Real Det Return     | 614      |
| Real Sto Return     | 576      |
| Reward Loss         | -141     |
| Running Env Steps   | 346000   |
| Running Forward KL  | -2.17    |
| Running Reverse KL  | 5.77     |
| Running Update Time | 692      |
----------------------------------
2025-02-01 14:36:17.933007 Eastern Standard Time
| Itration            | 693      |
| Real Det Return     | 585      |
| Real Sto Return     | 576      |
| Reward Loss         | -134     |
| Running Env Steps   | 346500   |
| Running Forward KL  | -3.06    |
| Running Reverse KL  | 5.71     |
| Running Update Time | 693      |
----------------------------------
2025-02-01 14:36:34.860362 Eastern Standard Time
| Itration            | 694      |
| Real Det Return     | 608      |
| Real Sto Return     | 572      |
| Reward Loss         | -124     |
| Running Env Steps   | 347000   |
| Running Forward KL  | -2.88    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 694      |
----------------------------------
2025-02-01 14:36:51.931174 Eastern Standard Time
| Itration            | 695      |
| Real Det Return     | 639      |
| Real Sto Return     | 601      |
| Reward Loss         | -80.7    |
| Running Env Steps   | 347500   |
| Running Forward KL  | -3.34    |
| Running Reverse KL  | 6.36     |
| Running Update Time | 695      |
----------------------------------
2025-02-01 14:37:09.095503 Eastern Standard Time
| Itration            | 696      |
| Real Det Return     | 608      |
| Real Sto Return     | 562      |
| Reward Loss         | -113     |
| Running Env Steps   | 348000   |
| Running Forward KL  | -2.94    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 696      |
----------------------------------
2025-02-01 14:37:26.140699 Eastern Standard Time
| Itration            | 697      |
| Real Det Return     | 619      |
| Real Sto Return     | 578      |
| Reward Loss         | -171     |
| Running Env Steps   | 348500   |
| Running Forward KL  | -3.39    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 697      |
----------------------------------
2025-02-01 14:37:43.053344 Eastern Standard Time
| Itration            | 698      |
| Real Det Return     | 624      |
| Real Sto Return     | 579      |
| Reward Loss         | -165     |
| Running Env Steps   | 349000   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 698      |
----------------------------------
2025-02-01 14:38:00.325821 Eastern Standard Time
| Itration            | 699      |
| Real Det Return     | 631      |
| Real Sto Return     | 593      |
| Reward Loss         | -124     |
| Running Env Steps   | 349500   |
| Running Forward KL  | -2.97    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 699      |
----------------------------------
2025-02-01 14:38:17.280411 Eastern Standard Time
| Itration            | 700      |
| Real Det Return     | 596      |
| Real Sto Return     | 550      |
| Reward Loss         | -130     |
| Running Env Steps   | 350000   |
| Running Forward KL  | -3.33    |
| Running Reverse KL  | 5.72     |
| Running Update Time | 700      |
----------------------------------
2025-02-01 14:38:34.248365 Eastern Standard Time
| Itration            | 701      |
| Real Det Return     | 579      |
| Real Sto Return     | 564      |
| Reward Loss         | -142     |
| Running Env Steps   | 350500   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 6.12     |
| Running Update Time | 701      |
----------------------------------
2025-02-01 14:38:51.186799 Eastern Standard Time
| Itration            | 702      |
| Real Det Return     | 623      |
| Real Sto Return     | 599      |
| Reward Loss         | -139     |
| Running Env Steps   | 351000   |
| Running Forward KL  | -2.88    |
| Running Reverse KL  | 5.44     |
| Running Update Time | 702      |
----------------------------------
2025-02-01 14:39:08.270696 Eastern Standard Time
| Itration            | 703      |
| Real Det Return     | 603      |
| Real Sto Return     | 566      |
| Reward Loss         | -142     |
| Running Env Steps   | 351500   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 5.07     |
| Running Update Time | 703      |
----------------------------------
2025-02-01 14:39:24.673160 Eastern Standard Time
| Itration            | 704      |
| Real Det Return     | 610      |
| Real Sto Return     | 598      |
| Reward Loss         | -155     |
| Running Env Steps   | 352000   |
| Running Forward KL  | -3.71    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 704      |
----------------------------------
2025-02-01 14:39:41.102914 Eastern Standard Time
| Itration            | 705      |
| Real Det Return     | 635      |
| Real Sto Return     | 589      |
| Reward Loss         | -134     |
| Running Env Steps   | 352500   |
| Running Forward KL  | -3.08    |
| Running Reverse KL  | 6.3      |
| Running Update Time | 705      |
----------------------------------
2025-02-01 14:39:57.461385 Eastern Standard Time
| Itration            | 706      |
| Real Det Return     | 634      |
| Real Sto Return     | 598      |
| Reward Loss         | -110     |
| Running Env Steps   | 353000   |
| Running Forward KL  | -2.48    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 706      |
----------------------------------
2025-02-01 14:40:13.893426 Eastern Standard Time
| Itration            | 707      |
| Real Det Return     | 600      |
| Real Sto Return     | 575      |
| Reward Loss         | -169     |
| Running Env Steps   | 353500   |
| Running Forward KL  | -2.89    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 707      |
----------------------------------
2025-02-01 14:40:30.335433 Eastern Standard Time
| Itration            | 708      |
| Real Det Return     | 611      |
| Real Sto Return     | 568      |
| Reward Loss         | -140     |
| Running Env Steps   | 354000   |
| Running Forward KL  | -3.25    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 708      |
----------------------------------
2025-02-01 14:40:49.855938 Eastern Standard Time
| Itration            | 709      |
| Real Det Return     | 588      |
| Real Sto Return     | 565      |
| Reward Loss         | -150     |
| Running Env Steps   | 354500   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 709      |
----------------------------------
2025-02-01 14:41:08.436821 Eastern Standard Time
| Itration            | 710      |
| Real Det Return     | 638      |
| Real Sto Return     | 588      |
| Reward Loss         | -102     |
| Running Env Steps   | 355000   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 710      |
----------------------------------
2025-02-01 14:41:25.502122 Eastern Standard Time
| Itration            | 711      |
| Real Det Return     | 640      |
| Real Sto Return     | 586      |
| Reward Loss         | -95.7    |
| Running Env Steps   | 355500   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 5.78     |
| Running Update Time | 711      |
----------------------------------
2025-02-01 14:41:42.890195 Eastern Standard Time
| Itration            | 712      |
| Real Det Return     | 620      |
| Real Sto Return     | 590      |
| Reward Loss         | -109     |
| Running Env Steps   | 356000   |
| Running Forward KL  | -3.04    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 712      |
----------------------------------
2025-02-01 14:41:59.901632 Eastern Standard Time
| Itration            | 713      |
| Real Det Return     | 620      |
| Real Sto Return     | 586      |
| Reward Loss         | -154     |
| Running Env Steps   | 356500   |
| Running Forward KL  | -2.89    |
| Running Reverse KL  | 5.81     |
| Running Update Time | 713      |
----------------------------------
2025-02-01 14:42:16.876680 Eastern Standard Time
| Itration            | 714      |
| Real Det Return     | 629      |
| Real Sto Return     | 597      |
| Reward Loss         | -138     |
| Running Env Steps   | 357000   |
| Running Forward KL  | -3.11    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 714      |
----------------------------------
2025-02-01 14:42:33.896996 Eastern Standard Time
| Itration            | 715      |
| Real Det Return     | 630      |
| Real Sto Return     | 583      |
| Reward Loss         | -112     |
| Running Env Steps   | 357500   |
| Running Forward KL  | -3.19    |
| Running Reverse KL  | 5.79     |
| Running Update Time | 715      |
----------------------------------
2025-02-01 14:42:50.891316 Eastern Standard Time
| Itration            | 716      |
| Real Det Return     | 640      |
| Real Sto Return     | 566      |
| Reward Loss         | -141     |
| Running Env Steps   | 358000   |
| Running Forward KL  | -3.42    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 716      |
----------------------------------
2025-02-01 14:43:09.471382 Eastern Standard Time
| Itration            | 717      |
| Real Det Return     | 639      |
| Real Sto Return     | 589      |
| Reward Loss         | -99.9    |
| Running Env Steps   | 358500   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 5.64     |
| Running Update Time | 717      |
----------------------------------
2025-02-01 14:43:27.049887 Eastern Standard Time
| Itration            | 718      |
| Real Det Return     | 649      |
| Real Sto Return     | 578      |
| Reward Loss         | -123     |
| Running Env Steps   | 359000   |
| Running Forward KL  | -3.01    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 718      |
----------------------------------
2025-02-01 14:43:44.039053 Eastern Standard Time
| Itration            | 719      |
| Real Det Return     | 630      |
| Real Sto Return     | 554      |
| Reward Loss         | -131     |
| Running Env Steps   | 359500   |
| Running Forward KL  | -2.98    |
| Running Reverse KL  | 5.85     |
| Running Update Time | 719      |
----------------------------------
2025-02-01 14:44:00.989316 Eastern Standard Time
| Itration            | 720      |
| Real Det Return     | 629      |
| Real Sto Return     | 605      |
| Reward Loss         | -139     |
| Running Env Steps   | 360000   |
| Running Forward KL  | -3.67    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 720      |
----------------------------------
2025-02-01 14:44:18.007550 Eastern Standard Time
| Itration            | 721      |
| Real Det Return     | 630      |
| Real Sto Return     | 596      |
| Reward Loss         | -157     |
| Running Env Steps   | 360500   |
| Running Forward KL  | -2.68    |
| Running Reverse KL  | 5.51     |
| Running Update Time | 721      |
----------------------------------
2025-02-01 14:44:34.963576 Eastern Standard Time
| Itration            | 722      |
| Real Det Return     | 595      |
| Real Sto Return     | 568      |
| Reward Loss         | -150     |
| Running Env Steps   | 361000   |
| Running Forward KL  | -3.27    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 722      |
----------------------------------
2025-02-01 14:44:52.036570 Eastern Standard Time
| Itration            | 723      |
| Real Det Return     | 577      |
| Real Sto Return     | 563      |
| Reward Loss         | -171     |
| Running Env Steps   | 361500   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 723      |
----------------------------------
2025-02-01 14:45:09.043900 Eastern Standard Time
| Itration            | 724      |
| Real Det Return     | 592      |
| Real Sto Return     | 577      |
| Reward Loss         | -88.3    |
| Running Env Steps   | 362000   |
| Running Forward KL  | -2.93    |
| Running Reverse KL  | 6.42     |
| Running Update Time | 724      |
----------------------------------
2025-02-01 14:45:25.966778 Eastern Standard Time
| Itration            | 725      |
| Real Det Return     | 621      |
| Real Sto Return     | 584      |
| Reward Loss         | -118     |
| Running Env Steps   | 362500   |
| Running Forward KL  | -3.12    |
| Running Reverse KL  | 5.57     |
| Running Update Time | 725      |
----------------------------------
2025-02-01 14:45:42.974300 Eastern Standard Time
| Itration            | 726      |
| Real Det Return     | 650      |
| Real Sto Return     | 619      |
| Reward Loss         | -95.1    |
| Running Env Steps   | 363000   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 5.49     |
| Running Update Time | 726      |
----------------------------------
2025-02-01 14:46:00.353814 Eastern Standard Time
| Itration            | 727      |
| Real Det Return     | 651      |
| Real Sto Return     | 607      |
| Reward Loss         | -121     |
| Running Env Steps   | 363500   |
| Running Forward KL  | -2.54    |
| Running Reverse KL  | 6.15     |
| Running Update Time | 727      |
----------------------------------
2025-02-01 14:46:17.992274 Eastern Standard Time
| Itration            | 728      |
| Real Det Return     | 626      |
| Real Sto Return     | 570      |
| Reward Loss         | -121     |
| Running Env Steps   | 364000   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 5.92     |
| Running Update Time | 728      |
----------------------------------
2025-02-01 14:46:35.042205 Eastern Standard Time
| Itration            | 729      |
| Real Det Return     | 640      |
| Real Sto Return     | 599      |
| Reward Loss         | -105     |
| Running Env Steps   | 364500   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 5.54     |
| Running Update Time | 729      |
----------------------------------
2025-02-01 14:46:52.222996 Eastern Standard Time
| Itration            | 730      |
| Real Det Return     | 648      |
| Real Sto Return     | 607      |
| Reward Loss         | -86.1    |
| Running Env Steps   | 365000   |
| Running Forward KL  | -3.79    |
| Running Reverse KL  | 5.64     |
| Running Update Time | 730      |
----------------------------------
2025-02-01 14:47:09.863117 Eastern Standard Time
| Itration            | 731      |
| Real Det Return     | 626      |
| Real Sto Return     | 582      |
| Reward Loss         | -117     |
| Running Env Steps   | 365500   |
| Running Forward KL  | -2.79    |
| Running Reverse KL  | 5.48     |
| Running Update Time | 731      |
----------------------------------
2025-02-01 14:47:27.198543 Eastern Standard Time
| Itration            | 732      |
| Real Det Return     | 637      |
| Real Sto Return     | 596      |
| Reward Loss         | -127     |
| Running Env Steps   | 366000   |
| Running Forward KL  | -2.94    |
| Running Reverse KL  | 5.38     |
| Running Update Time | 732      |
----------------------------------
2025-02-01 14:47:45.078260 Eastern Standard Time
| Itration            | 733      |
| Real Det Return     | 600      |
| Real Sto Return     | 578      |
| Reward Loss         | -126     |
| Running Env Steps   | 366500   |
| Running Forward KL  | -3.52    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 733      |
----------------------------------
2025-02-01 14:48:02.168526 Eastern Standard Time
| Itration            | 734      |
| Real Det Return     | 629      |
| Real Sto Return     | 585      |
| Reward Loss         | -126     |
| Running Env Steps   | 367000   |
| Running Forward KL  | -3.19    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 734      |
----------------------------------
2025-02-01 14:48:18.973908 Eastern Standard Time
| Itration            | 735      |
| Real Det Return     | 560      |
| Real Sto Return     | 515      |
| Reward Loss         | -212     |
| Running Env Steps   | 367500   |
| Running Forward KL  | -0.829   |
| Running Reverse KL  | 5.43     |
| Running Update Time | 735      |
----------------------------------
2025-02-01 14:48:35.434373 Eastern Standard Time
| Itration            | 736      |
| Real Det Return     | 599      |
| Real Sto Return     | 569      |
| Reward Loss         | -153     |
| Running Env Steps   | 368000   |
| Running Forward KL  | -3.11    |
| Running Reverse KL  | 5.33     |
| Running Update Time | 736      |
----------------------------------
2025-02-01 14:48:53.095916 Eastern Standard Time
| Itration            | 737      |
| Real Det Return     | 648      |
| Real Sto Return     | 581      |
| Reward Loss         | -106     |
| Running Env Steps   | 368500   |
| Running Forward KL  | -3.53    |
| Running Reverse KL  | 5.21     |
| Running Update Time | 737      |
----------------------------------
2025-02-01 14:49:11.492582 Eastern Standard Time
| Itration            | 738      |
| Real Det Return     | 636      |
| Real Sto Return     | 608      |
| Reward Loss         | -154     |
| Running Env Steps   | 369000   |
| Running Forward KL  | -2.56    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 738      |
----------------------------------
2025-02-01 14:49:28.005637 Eastern Standard Time
| Itration            | 739      |
| Real Det Return     | 602      |
| Real Sto Return     | 584      |
| Reward Loss         | -119     |
| Running Env Steps   | 369500   |
| Running Forward KL  | -3.18    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 739      |
----------------------------------
2025-02-01 14:49:44.365955 Eastern Standard Time
| Itration            | 740      |
| Real Det Return     | 652      |
| Real Sto Return     | 596      |
| Reward Loss         | -106     |
| Running Env Steps   | 370000   |
| Running Forward KL  | -3.24    |
| Running Reverse KL  | 5.77     |
| Running Update Time | 740      |
----------------------------------
2025-02-01 14:50:00.650390 Eastern Standard Time
| Itration            | 741      |
| Real Det Return     | 603      |
| Real Sto Return     | 591      |
| Reward Loss         | -177     |
| Running Env Steps   | 370500   |
| Running Forward KL  | -2.36    |
| Running Reverse KL  | 5.6      |
| Running Update Time | 741      |
----------------------------------
2025-02-01 14:50:16.349885 Eastern Standard Time
| Itration            | 742      |
| Real Det Return     | 630      |
| Real Sto Return     | 611      |
| Reward Loss         | -118     |
| Running Env Steps   | 371000   |
| Running Forward KL  | -2.84    |
| Running Reverse KL  | 6.17     |
| Running Update Time | 742      |
----------------------------------
2025-02-01 14:50:32.023849 Eastern Standard Time
| Itration            | 743      |
| Real Det Return     | 650      |
| Real Sto Return     | 600      |
| Reward Loss         | -119     |
| Running Env Steps   | 371500   |
| Running Forward KL  | -3.39    |
| Running Reverse KL  | 5.03     |
| Running Update Time | 743      |
----------------------------------
2025-02-01 14:50:47.572452 Eastern Standard Time
| Itration            | 744      |
| Real Det Return     | 628      |
| Real Sto Return     | 610      |
| Reward Loss         | -147     |
| Running Env Steps   | 372000   |
| Running Forward KL  | -2.44    |
| Running Reverse KL  | 5.6      |
| Running Update Time | 744      |
----------------------------------
2025-02-01 14:51:03.110051 Eastern Standard Time
| Itration            | 745      |
| Real Det Return     | 613      |
| Real Sto Return     | 559      |
| Reward Loss         | -104     |
| Running Env Steps   | 372500   |
| Running Forward KL  | -3.18    |
| Running Reverse KL  | 5.09     |
| Running Update Time | 745      |
----------------------------------
2025-02-01 14:51:18.628550 Eastern Standard Time
| Itration            | 746      |
| Real Det Return     | 645      |
| Real Sto Return     | 591      |
| Reward Loss         | -108     |
| Running Env Steps   | 373000   |
| Running Forward KL  | -2.68    |
| Running Reverse KL  | 6.35     |
| Running Update Time | 746      |
----------------------------------
2025-02-01 14:51:34.246998 Eastern Standard Time
| Itration            | 747      |
| Real Det Return     | 650      |
| Real Sto Return     | 586      |
| Reward Loss         | -111     |
| Running Env Steps   | 373500   |
| Running Forward KL  | -2.89    |
| Running Reverse KL  | 5.83     |
| Running Update Time | 747      |
----------------------------------
2025-02-01 14:51:49.850034 Eastern Standard Time
| Itration            | 748      |
| Real Det Return     | 601      |
| Real Sto Return     | 598      |
| Reward Loss         | -117     |
| Running Env Steps   | 374000   |
| Running Forward KL  | -2.31    |
| Running Reverse KL  | 5.79     |
| Running Update Time | 748      |
----------------------------------
2025-02-01 14:52:05.470444 Eastern Standard Time
| Itration            | 749      |
| Real Det Return     | 601      |
| Real Sto Return     | 517      |
| Reward Loss         | -229     |
| Running Env Steps   | 374500   |
| Running Forward KL  | -2.25    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 749      |
----------------------------------
2025-02-01 14:52:21.054560 Eastern Standard Time
| Itration            | 750      |
| Real Det Return     | 611      |
| Real Sto Return     | 581      |
| Reward Loss         | -117     |
| Running Env Steps   | 375000   |
| Running Forward KL  | -2.54    |
| Running Reverse KL  | 6.23     |
| Running Update Time | 750      |
----------------------------------
2025-02-01 14:52:36.659173 Eastern Standard Time
| Itration            | 751      |
| Real Det Return     | 620      |
| Real Sto Return     | 571      |
| Reward Loss         | -185     |
| Running Env Steps   | 375500   |
| Running Forward KL  | -2.62    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 751      |
----------------------------------
2025-02-01 14:52:52.323365 Eastern Standard Time
| Itration            | 752      |
| Real Det Return     | 611      |
| Real Sto Return     | 587      |
| Reward Loss         | -114     |
| Running Env Steps   | 376000   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 5.38     |
| Running Update Time | 752      |
----------------------------------
2025-02-01 14:53:07.816977 Eastern Standard Time
| Itration            | 753      |
| Real Det Return     | 629      |
| Real Sto Return     | 592      |
| Reward Loss         | -153     |
| Running Env Steps   | 376500   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 6.02     |
| Running Update Time | 753      |
----------------------------------
2025-02-01 14:53:23.375921 Eastern Standard Time
| Itration            | 754      |
| Real Det Return     | 644      |
| Real Sto Return     | 596      |
| Reward Loss         | -135     |
| Running Env Steps   | 377000   |
| Running Forward KL  | -3.7     |
| Running Reverse KL  | 4.92     |
| Running Update Time | 754      |
----------------------------------
2025-02-01 14:53:38.931818 Eastern Standard Time
| Itration            | 755      |
| Real Det Return     | 615      |
| Real Sto Return     | 597      |
| Reward Loss         | -111     |
| Running Env Steps   | 377500   |
| Running Forward KL  | -3.36    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 755      |
----------------------------------
2025-02-01 14:53:54.528931 Eastern Standard Time
| Itration            | 756      |
| Real Det Return     | 594      |
| Real Sto Return     | 564      |
| Reward Loss         | -167     |
| Running Env Steps   | 378000   |
| Running Forward KL  | -3.13    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 756      |
----------------------------------
2025-02-01 14:54:10.343554 Eastern Standard Time
| Itration            | 757      |
| Real Det Return     | 655      |
| Real Sto Return     | 619      |
| Reward Loss         | -116     |
| Running Env Steps   | 378500   |
| Running Forward KL  | -3.41    |
| Running Reverse KL  | 5.3      |
| Running Update Time | 757      |
----------------------------------
2025-02-01 14:54:25.977549 Eastern Standard Time
| Itration            | 758      |
| Real Det Return     | 605      |
| Real Sto Return     | 580      |
| Reward Loss         | -129     |
| Running Env Steps   | 379000   |
| Running Forward KL  | -3.18    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 758      |
----------------------------------
2025-02-01 14:54:41.620491 Eastern Standard Time
| Itration            | 759      |
| Real Det Return     | 640      |
| Real Sto Return     | 607      |
| Reward Loss         | -111     |
| Running Env Steps   | 379500   |
| Running Forward KL  | -3.53    |
| Running Reverse KL  | 5.47     |
| Running Update Time | 759      |
----------------------------------
2025-02-01 14:54:57.213880 Eastern Standard Time
| Itration            | 760      |
| Real Det Return     | 602      |
| Real Sto Return     | 580      |
| Reward Loss         | -133     |
| Running Env Steps   | 380000   |
| Running Forward KL  | -3.38    |
| Running Reverse KL  | 5.35     |
| Running Update Time | 760      |
----------------------------------
2025-02-01 14:55:13.749727 Eastern Standard Time
| Itration            | 761      |
| Real Det Return     | 645      |
| Real Sto Return     | 585      |
| Reward Loss         | -136     |
| Running Env Steps   | 380500   |
| Running Forward KL  | -3.18    |
| Running Reverse KL  | 5.74     |
| Running Update Time | 761      |
----------------------------------
2025-02-01 14:55:30.930564 Eastern Standard Time
| Itration            | 762      |
| Real Det Return     | 598      |
| Real Sto Return     | 577      |
| Reward Loss         | -126     |
| Running Env Steps   | 381000   |
| Running Forward KL  | -2.78    |
| Running Reverse KL  | 5.9      |
| Running Update Time | 762      |
----------------------------------
2025-02-01 14:55:47.661626 Eastern Standard Time
| Itration            | 763      |
| Real Det Return     | 653      |
| Real Sto Return     | 603      |
| Reward Loss         | -92.1    |
| Running Env Steps   | 381500   |
| Running Forward KL  | -3.38    |
| Running Reverse KL  | 6.12     |
| Running Update Time | 763      |
----------------------------------
2025-02-01 14:56:03.552641 Eastern Standard Time
| Itration            | 764      |
| Real Det Return     | 629      |
| Real Sto Return     | 601      |
| Reward Loss         | -128     |
| Running Env Steps   | 382000   |
| Running Forward KL  | -3.69    |
| Running Reverse KL  | 5.5      |
| Running Update Time | 764      |
----------------------------------
2025-02-01 14:56:19.657707 Eastern Standard Time
| Itration            | 765      |
| Real Det Return     | 629      |
| Real Sto Return     | 590      |
| Reward Loss         | -149     |
| Running Env Steps   | 382500   |
| Running Forward KL  | -3.1     |
| Running Reverse KL  | 6.02     |
| Running Update Time | 765      |
----------------------------------
2025-02-01 14:56:36.852314 Eastern Standard Time
| Itration            | 766      |
| Real Det Return     | 616      |
| Real Sto Return     | 567      |
| Reward Loss         | -157     |
| Running Env Steps   | 383000   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 5.92     |
| Running Update Time | 766      |
----------------------------------
2025-02-01 14:56:52.632670 Eastern Standard Time
| Itration            | 767      |
| Real Det Return     | 642      |
| Real Sto Return     | 608      |
| Reward Loss         | -106     |
| Running Env Steps   | 383500   |
| Running Forward KL  | -3.07    |
| Running Reverse KL  | 5.15     |
| Running Update Time | 767      |
----------------------------------
2025-02-01 14:57:08.564005 Eastern Standard Time
| Itration            | 768      |
| Real Det Return     | 632      |
| Real Sto Return     | 589      |
| Reward Loss         | -112     |
| Running Env Steps   | 384000   |
| Running Forward KL  | -3.93    |
| Running Reverse KL  | 5.68     |
| Running Update Time | 768      |
----------------------------------
2025-02-01 14:57:24.425034 Eastern Standard Time
| Itration            | 769      |
| Real Det Return     | 595      |
| Real Sto Return     | 568      |
| Reward Loss         | -134     |
| Running Env Steps   | 384500   |
| Running Forward KL  | -1.62    |
| Running Reverse KL  | 5.82     |
| Running Update Time | 769      |
----------------------------------
2025-02-01 14:57:40.238952 Eastern Standard Time
| Itration            | 770      |
| Real Det Return     | 649      |
| Real Sto Return     | 594      |
| Reward Loss         | -116     |
| Running Env Steps   | 385000   |
| Running Forward KL  | -3.49    |
| Running Reverse KL  | 6.61     |
| Running Update Time | 770      |
----------------------------------
2025-02-01 14:57:56.741791 Eastern Standard Time
| Itration            | 771      |
| Real Det Return     | 628      |
| Real Sto Return     | 601      |
| Reward Loss         | -131     |
| Running Env Steps   | 385500   |
| Running Forward KL  | -2.24    |
| Running Reverse KL  | 5.53     |
| Running Update Time | 771      |
----------------------------------
2025-02-01 14:58:14.055166 Eastern Standard Time
| Itration            | 772      |
| Real Det Return     | 645      |
| Real Sto Return     | 591      |
| Reward Loss         | -84.1    |
| Running Env Steps   | 386000   |
| Running Forward KL  | -2.44    |
| Running Reverse KL  | 7.1      |
| Running Update Time | 772      |
----------------------------------
2025-02-01 14:58:29.916083 Eastern Standard Time
| Itration            | 773      |
| Real Det Return     | 636      |
| Real Sto Return     | 605      |
| Reward Loss         | -136     |
| Running Env Steps   | 386500   |
| Running Forward KL  | -1.52    |
| Running Reverse KL  | 6.44     |
| Running Update Time | 773      |
----------------------------------
2025-02-01 14:58:45.662984 Eastern Standard Time
| Itration            | 774      |
| Real Det Return     | 613      |
| Real Sto Return     | 598      |
| Reward Loss         | -141     |
| Running Env Steps   | 387000   |
| Running Forward KL  | -3.59    |
| Running Reverse KL  | 4.55     |
| Running Update Time | 774      |
----------------------------------
2025-02-01 14:59:01.439015 Eastern Standard Time
| Itration            | 775      |
| Real Det Return     | 607      |
| Real Sto Return     | 583      |
| Reward Loss         | -144     |
| Running Env Steps   | 387500   |
| Running Forward KL  | -3.56    |
| Running Reverse KL  | 5.66     |
| Running Update Time | 775      |
----------------------------------
2025-02-01 14:59:17.202635 Eastern Standard Time
| Itration            | 776      |
| Real Det Return     | 623      |
| Real Sto Return     | 603      |
| Reward Loss         | -160     |
| Running Env Steps   | 388000   |
| Running Forward KL  | -2.52    |
| Running Reverse KL  | 5.33     |
| Running Update Time | 776      |
----------------------------------
2025-02-01 14:59:32.970668 Eastern Standard Time
| Itration            | 777      |
| Real Det Return     | 600      |
| Real Sto Return     | 590      |
| Reward Loss         | -118     |
| Running Env Steps   | 388500   |
| Running Forward KL  | -3.01    |
| Running Reverse KL  | 5.6      |
| Running Update Time | 777      |
----------------------------------
2025-02-01 14:59:48.727819 Eastern Standard Time
| Itration            | 778      |
| Real Det Return     | 620      |
| Real Sto Return     | 573      |
| Reward Loss         | -141     |
| Running Env Steps   | 389000   |
| Running Forward KL  | -3.24    |
| Running Reverse KL  | 5.69     |
| Running Update Time | 778      |
----------------------------------
2025-02-01 15:00:04.669384 Eastern Standard Time
| Itration            | 779      |
| Real Det Return     | 637      |
| Real Sto Return     | 604      |
| Reward Loss         | -144     |
| Running Env Steps   | 389500   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 5.72     |
| Running Update Time | 779      |
----------------------------------
2025-02-01 15:00:20.432707 Eastern Standard Time
| Itration            | 780      |
| Real Det Return     | 629      |
| Real Sto Return     | 598      |
| Reward Loss         | -131     |
| Running Env Steps   | 390000   |
| Running Forward KL  | -3.28    |
| Running Reverse KL  | 5.92     |
| Running Update Time | 780      |
----------------------------------
2025-02-01 15:00:36.274163 Eastern Standard Time
| Itration            | 781      |
| Real Det Return     | 605      |
| Real Sto Return     | 572      |
| Reward Loss         | -141     |
| Running Env Steps   | 390500   |
| Running Forward KL  | -2.72    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 781      |
----------------------------------
2025-02-01 15:00:51.981977 Eastern Standard Time
| Itration            | 782      |
| Real Det Return     | 634      |
| Real Sto Return     | 605      |
| Reward Loss         | -102     |
| Running Env Steps   | 391000   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 782      |
----------------------------------
2025-02-01 15:01:07.793388 Eastern Standard Time
| Itration            | 783      |
| Real Det Return     | 637      |
| Real Sto Return     | 601      |
| Reward Loss         | -100     |
| Running Env Steps   | 391500   |
| Running Forward KL  | -2.96    |
| Running Reverse KL  | 5.69     |
| Running Update Time | 783      |
----------------------------------
2025-02-01 15:01:23.499739 Eastern Standard Time
| Itration            | 784      |
| Real Det Return     | 656      |
| Real Sto Return     | 605      |
| Reward Loss         | -148     |
| Running Env Steps   | 392000   |
| Running Forward KL  | -2.8     |
| Running Reverse KL  | 6.19     |
| Running Update Time | 784      |
----------------------------------
2025-02-01 15:01:39.219191 Eastern Standard Time
| Itration            | 785      |
| Real Det Return     | 578      |
| Real Sto Return     | 568      |
| Reward Loss         | -201     |
| Running Env Steps   | 392500   |
| Running Forward KL  | -3.12    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 785      |
----------------------------------
2025-02-01 15:01:54.952375 Eastern Standard Time
| Itration            | 786      |
| Real Det Return     | 633      |
| Real Sto Return     | 612      |
| Reward Loss         | -121     |
| Running Env Steps   | 393000   |
| Running Forward KL  | -3.38    |
| Running Reverse KL  | 5.82     |
| Running Update Time | 786      |
----------------------------------
2025-02-01 15:02:10.717796 Eastern Standard Time
| Itration            | 787      |
| Real Det Return     | 603      |
| Real Sto Return     | 566      |
| Reward Loss         | -182     |
| Running Env Steps   | 393500   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 5.69     |
| Running Update Time | 787      |
----------------------------------
2025-02-01 15:02:26.595214 Eastern Standard Time
| Itration            | 788      |
| Real Det Return     | 614      |
| Real Sto Return     | 583      |
| Reward Loss         | -144     |
| Running Env Steps   | 394000   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 788      |
----------------------------------
2025-02-01 15:02:42.373798 Eastern Standard Time
| Itration            | 789      |
| Real Det Return     | 623      |
| Real Sto Return     | 597      |
| Reward Loss         | -113     |
| Running Env Steps   | 394500   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 789      |
----------------------------------
2025-02-01 15:02:58.003508 Eastern Standard Time
| Itration            | 790      |
| Real Det Return     | 612      |
| Real Sto Return     | 586      |
| Reward Loss         | -137     |
| Running Env Steps   | 395000   |
| Running Forward KL  | -2.96    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 790      |
----------------------------------
2025-02-01 15:03:13.919588 Eastern Standard Time
| Itration            | 791      |
| Real Det Return     | 632      |
| Real Sto Return     | 597      |
| Reward Loss         | -141     |
| Running Env Steps   | 395500   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 5.96     |
| Running Update Time | 791      |
----------------------------------
2025-02-01 15:03:29.634447 Eastern Standard Time
| Itration            | 792      |
| Real Det Return     | 619      |
| Real Sto Return     | 601      |
| Reward Loss         | -112     |
| Running Env Steps   | 396000   |
| Running Forward KL  | -3.48    |
| Running Reverse KL  | 5.86     |
| Running Update Time | 792      |
----------------------------------
2025-02-01 15:03:45.412851 Eastern Standard Time
| Itration            | 793      |
| Real Det Return     | 639      |
| Real Sto Return     | 607      |
| Reward Loss         | -138     |
| Running Env Steps   | 396500   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 5.27     |
| Running Update Time | 793      |
----------------------------------
2025-02-01 15:04:01.153590 Eastern Standard Time
| Itration            | 794      |
| Real Det Return     | 627      |
| Real Sto Return     | 594      |
| Reward Loss         | -129     |
| Running Env Steps   | 397000   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 6.06     |
| Running Update Time | 794      |
----------------------------------
2025-02-01 15:04:16.937751 Eastern Standard Time
| Itration            | 795      |
| Real Det Return     | 631      |
| Real Sto Return     | 592      |
| Reward Loss         | -122     |
| Running Env Steps   | 397500   |
| Running Forward KL  | -3.04    |
| Running Reverse KL  | 6.15     |
| Running Update Time | 795      |
----------------------------------
2025-02-01 15:04:32.709850 Eastern Standard Time
| Itration            | 796      |
| Real Det Return     | 622      |
| Real Sto Return     | 579      |
| Reward Loss         | -188     |
| Running Env Steps   | 398000   |
| Running Forward KL  | -1.22    |
| Running Reverse KL  | 6.01     |
| Running Update Time | 796      |
----------------------------------
2025-02-01 15:04:48.413197 Eastern Standard Time
| Itration            | 797      |
| Real Det Return     | 591      |
| Real Sto Return     | 567      |
| Reward Loss         | -155     |
| Running Env Steps   | 398500   |
| Running Forward KL  | -3.39    |
| Running Reverse KL  | 5.19     |
| Running Update Time | 797      |
----------------------------------
2025-02-01 15:05:04.430655 Eastern Standard Time
| Itration            | 798      |
| Real Det Return     | 623      |
| Real Sto Return     | 593      |
| Reward Loss         | -147     |
| Running Env Steps   | 399000   |
| Running Forward KL  | -1.26    |
| Running Reverse KL  | 6.96     |
| Running Update Time | 798      |
----------------------------------
2025-02-01 15:05:20.247452 Eastern Standard Time
| Itration            | 799      |
| Real Det Return     | 630      |
| Real Sto Return     | 602      |
| Reward Loss         | -118     |
| Running Env Steps   | 399500   |
| Running Forward KL  | -3.72    |
| Running Reverse KL  | 5.65     |
| Running Update Time | 799      |
----------------------------------
2025-02-01 15:05:35.990888 Eastern Standard Time
| Itration            | 800      |
| Real Det Return     | 630      |
| Real Sto Return     | 603      |
| Reward Loss         | -124     |
| Running Env Steps   | 400000   |
| Running Forward KL  | -2.46    |
| Running Reverse KL  | 5.63     |
| Running Update Time | 800      |
----------------------------------
2025-02-01 15:05:51.737988 Eastern Standard Time
| Itration            | 801      |
| Real Det Return     | 625      |
| Real Sto Return     | 615      |
| Reward Loss         | -72.2    |
| Running Env Steps   | 400500   |
| Running Forward KL  | -3.37    |
| Running Reverse KL  | 6        |
| Running Update Time | 801      |
----------------------------------
2025-02-01 15:06:07.485462 Eastern Standard Time
| Itration            | 802      |
| Real Det Return     | 632      |
| Real Sto Return     | 612      |
| Reward Loss         | -84.9    |
| Running Env Steps   | 401000   |
| Running Forward KL  | -3.76    |
| Running Reverse KL  | 6.43     |
| Running Update Time | 802      |
----------------------------------
2025-02-01 15:06:23.227385 Eastern Standard Time
| Itration            | 803      |
| Real Det Return     | 592      |
| Real Sto Return     | 552      |
| Reward Loss         | -157     |
| Running Env Steps   | 401500   |
| Running Forward KL  | -2.65    |
| Running Reverse KL  | 6.72     |
| Running Update Time | 803      |
----------------------------------
2025-02-01 15:06:38.997828 Eastern Standard Time
| Itration            | 804      |
| Real Det Return     | 631      |
| Real Sto Return     | 597      |
| Reward Loss         | -103     |
| Running Env Steps   | 402000   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 6.17     |
| Running Update Time | 804      |
----------------------------------
2025-02-01 15:06:54.713776 Eastern Standard Time
| Itration            | 805      |
| Real Det Return     | 633      |
| Real Sto Return     | 604      |
| Reward Loss         | -103     |
| Running Env Steps   | 402500   |
| Running Forward KL  | -3.21    |
| Running Reverse KL  | 5.93     |
| Running Update Time | 805      |
----------------------------------
2025-02-01 15:07:10.438229 Eastern Standard Time
| Itration            | 806      |
| Real Det Return     | 648      |
| Real Sto Return     | 618      |
| Reward Loss         | -103     |
| Running Env Steps   | 403000   |
| Running Forward KL  | -3.56    |
| Running Reverse KL  | 6.1      |
| Running Update Time | 806      |
----------------------------------
2025-02-01 15:07:26.211410 Eastern Standard Time
| Itration            | 807      |
| Real Det Return     | 633      |
| Real Sto Return     | 607      |
| Reward Loss         | -135     |
| Running Env Steps   | 403500   |
| Running Forward KL  | -2.71    |
| Running Reverse KL  | 5.9      |
| Running Update Time | 807      |
----------------------------------
2025-02-01 15:07:41.965484 Eastern Standard Time
| Itration            | 808      |
| Real Det Return     | 595      |
| Real Sto Return     | 590      |
| Reward Loss         | -125     |
| Running Env Steps   | 404000   |
| Running Forward KL  | -2.35    |
| Running Reverse KL  | 5.73     |
| Running Update Time | 808      |
----------------------------------
2025-02-01 15:07:57.731478 Eastern Standard Time
| Itration            | 809      |
| Real Det Return     | 641      |
| Real Sto Return     | 585      |
| Reward Loss         | -84.3    |
| Running Env Steps   | 404500   |
| Running Forward KL  | -3.88    |
| Running Reverse KL  | 5.54     |
| Running Update Time | 809      |
----------------------------------
2025-02-01 15:08:13.427895 Eastern Standard Time
| Itration            | 810      |
| Real Det Return     | 641      |
| Real Sto Return     | 591      |
| Reward Loss         | -117     |
| Running Env Steps   | 405000   |
| Running Forward KL  | -3.65    |
| Running Reverse KL  | 6.12     |
| Running Update Time | 810      |
----------------------------------
2025-02-01 15:08:29.204330 Eastern Standard Time
| Itration            | 811      |
| Real Det Return     | 615      |
| Real Sto Return     | 568      |
| Reward Loss         | -191     |
| Running Env Steps   | 405500   |
| Running Forward KL  | -1.65    |
| Running Reverse KL  | 5.31     |
| Running Update Time | 811      |
----------------------------------
2025-02-01 15:08:44.945864 Eastern Standard Time
| Itration            | 812      |
| Real Det Return     | 624      |
| Real Sto Return     | 598      |
| Reward Loss         | -117     |
| Running Env Steps   | 406000   |
| Running Forward KL  | -3.76    |
| Running Reverse KL  | 5.58     |
| Running Update Time | 812      |
----------------------------------
2025-02-01 15:09:00.781346 Eastern Standard Time
| Itration            | 813      |
| Real Det Return     | 632      |
| Real Sto Return     | 590      |
| Reward Loss         | -101     |
| Running Env Steps   | 406500   |
| Running Forward KL  | -2.93    |
| Running Reverse KL  | 5.44     |
| Running Update Time | 813      |
----------------------------------
2025-02-01 15:09:16.795347 Eastern Standard Time
| Itration            | 814      |
| Real Det Return     | 640      |
| Real Sto Return     | 605      |
| Reward Loss         | -119     |
| Running Env Steps   | 407000   |
| Running Forward KL  | -3.53    |
| Running Reverse KL  | 6.18     |
| Running Update Time | 814      |
----------------------------------
2025-02-01 15:09:32.521678 Eastern Standard Time
| Itration            | 815      |
| Real Det Return     | 622      |
| Real Sto Return     | 571      |
| Reward Loss         | -162     |
| Running Env Steps   | 407500   |
| Running Forward KL  | -2.79    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 815      |
----------------------------------
2025-02-01 15:09:48.225393 Eastern Standard Time
| Itration            | 816      |
| Real Det Return     | 653      |
| Real Sto Return     | 605      |
| Reward Loss         | -113     |
| Running Env Steps   | 408000   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 816      |
----------------------------------
2025-02-01 15:10:04.054913 Eastern Standard Time
| Itration            | 817      |
| Real Det Return     | 648      |
| Real Sto Return     | 588      |
| Reward Loss         | -215     |
| Running Env Steps   | 408500   |
| Running Forward KL  | -2.38    |
| Running Reverse KL  | 5.89     |
| Running Update Time | 817      |
----------------------------------
2025-02-01 15:10:19.761965 Eastern Standard Time
| Itration            | 818      |
| Real Det Return     | 627      |
| Real Sto Return     | 589      |
| Reward Loss         | -130     |
| Running Env Steps   | 409000   |
| Running Forward KL  | -3.33    |
| Running Reverse KL  | 5.37     |
| Running Update Time | 818      |
----------------------------------
2025-02-01 15:10:35.517407 Eastern Standard Time
| Itration            | 819      |
| Real Det Return     | 669      |
| Real Sto Return     | 626      |
| Reward Loss         | -96.2    |
| Running Env Steps   | 409500   |
| Running Forward KL  | -2.92    |
| Running Reverse KL  | 6.08     |
| Running Update Time | 819      |
----------------------------------
2025-02-01 15:10:51.291321 Eastern Standard Time
| Itration            | 820      |
| Real Det Return     | 622      |
| Real Sto Return     | 592      |
| Reward Loss         | -138     |
| Running Env Steps   | 410000   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 5.38     |
| Running Update Time | 820      |
----------------------------------
2025-02-01 15:11:07.056113 Eastern Standard Time
| Itration            | 821      |
| Real Det Return     | 609      |
| Real Sto Return     | 578      |
| Reward Loss         | -173     |
| Running Env Steps   | 410500   |
| Running Forward KL  | -2.72    |
| Running Reverse KL  | 5.38     |
| Running Update Time | 821      |
----------------------------------
2025-02-01 15:11:22.734818 Eastern Standard Time
| Itration            | 822      |
| Real Det Return     | 642      |
| Real Sto Return     | 605      |
| Reward Loss         | -90.3    |
| Running Env Steps   | 411000   |
| Running Forward KL  | -4       |
| Running Reverse KL  | 6.08     |
| Running Update Time | 822      |
----------------------------------
2025-02-01 15:11:38.500936 Eastern Standard Time
| Itration            | 823      |
| Real Det Return     | 609      |
| Real Sto Return     | 577      |
| Reward Loss         | -165     |
| Running Env Steps   | 411500   |
| Running Forward KL  | -2.53    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 823      |
----------------------------------
2025-02-01 15:11:54.196238 Eastern Standard Time
| Itration            | 824      |
| Real Det Return     | 648      |
| Real Sto Return     | 591      |
| Reward Loss         | -138     |
| Running Env Steps   | 412000   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 824      |
----------------------------------
2025-02-01 15:12:10.020466 Eastern Standard Time
| Itration            | 825      |
| Real Det Return     | 586      |
| Real Sto Return     | 553      |
| Reward Loss         | -198     |
| Running Env Steps   | 412500   |
| Running Forward KL  | -1.5     |
| Running Reverse KL  | 5.66     |
| Running Update Time | 825      |
----------------------------------
2025-02-01 15:12:25.786899 Eastern Standard Time
| Itration            | 826      |
| Real Det Return     | 644      |
| Real Sto Return     | 617      |
| Reward Loss         | -102     |
| Running Env Steps   | 413000   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 6.22     |
| Running Update Time | 826      |
----------------------------------
2025-02-01 15:12:41.521706 Eastern Standard Time
| Itration            | 827      |
| Real Det Return     | 647      |
| Real Sto Return     | 592      |
| Reward Loss         | -104     |
| Running Env Steps   | 413500   |
| Running Forward KL  | -3.64    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 827      |
----------------------------------
2025-02-01 15:12:57.291597 Eastern Standard Time
| Itration            | 828      |
| Real Det Return     | 659      |
| Real Sto Return     | 630      |
| Reward Loss         | -78.9    |
| Running Env Steps   | 414000   |
| Running Forward KL  | -3.88    |
| Running Reverse KL  | 6.83     |
| Running Update Time | 828      |
----------------------------------
2025-02-01 15:13:12.996861 Eastern Standard Time
| Itration            | 829      |
| Real Det Return     | 614      |
| Real Sto Return     | 604      |
| Reward Loss         | -104     |
| Running Env Steps   | 414500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 6.08     |
| Running Update Time | 829      |
----------------------------------
2025-02-01 15:13:28.733828 Eastern Standard Time
| Itration            | 830      |
| Real Det Return     | 617      |
| Real Sto Return     | 595      |
| Reward Loss         | -78.9    |
| Running Env Steps   | 415000   |
| Running Forward KL  | -3.01    |
| Running Reverse KL  | 6.35     |
| Running Update Time | 830      |
----------------------------------
2025-02-01 15:13:44.533678 Eastern Standard Time
| Itration            | 831      |
| Real Det Return     | 639      |
| Real Sto Return     | 620      |
| Reward Loss         | -135     |
| Running Env Steps   | 415500   |
| Running Forward KL  | -2.67    |
| Running Reverse KL  | 6.3      |
| Running Update Time | 831      |
----------------------------------
2025-02-01 15:14:00.282902 Eastern Standard Time
| Itration            | 832      |
| Real Det Return     | 634      |
| Real Sto Return     | 627      |
| Reward Loss         | -78.1    |
| Running Env Steps   | 416000   |
| Running Forward KL  | -4.24    |
| Running Reverse KL  | 5.61     |
| Running Update Time | 832      |
----------------------------------
2025-02-01 15:14:16.062783 Eastern Standard Time
| Itration            | 833      |
| Real Det Return     | 652      |
| Real Sto Return     | 621      |
| Reward Loss         | -90.1    |
| Running Env Steps   | 416500   |
| Running Forward KL  | -3.6     |
| Running Reverse KL  | 6.11     |
| Running Update Time | 833      |
----------------------------------
2025-02-01 15:14:31.767913 Eastern Standard Time
| Itration            | 834      |
| Real Det Return     | 647      |
| Real Sto Return     | 610      |
| Reward Loss         | -108     |
| Running Env Steps   | 417000   |
| Running Forward KL  | -2.97    |
| Running Reverse KL  | 6.29     |
| Running Update Time | 834      |
----------------------------------
2025-02-01 15:14:47.494409 Eastern Standard Time
| Itration            | 835      |
| Real Det Return     | 646      |
| Real Sto Return     | 614      |
| Reward Loss         | -127     |
| Running Env Steps   | 417500   |
| Running Forward KL  | -3.12    |
| Running Reverse KL  | 5.53     |
| Running Update Time | 835      |
----------------------------------
2025-02-01 15:15:03.169123 Eastern Standard Time
| Itration            | 836      |
| Real Det Return     | 661      |
| Real Sto Return     | 605      |
| Reward Loss         | -121     |
| Running Env Steps   | 418000   |
| Running Forward KL  | -3.09    |
| Running Reverse KL  | 5.79     |
| Running Update Time | 836      |
----------------------------------
2025-02-01 15:15:18.995565 Eastern Standard Time
| Itration            | 837      |
| Real Det Return     | 613      |
| Real Sto Return     | 594      |
| Reward Loss         | -138     |
| Running Env Steps   | 418500   |
| Running Forward KL  | -3.14    |
| Running Reverse KL  | 5.38     |
| Running Update Time | 837      |
----------------------------------
2025-02-01 15:15:34.708246 Eastern Standard Time
| Itration            | 838      |
| Real Det Return     | 645      |
| Real Sto Return     | 615      |
| Reward Loss         | -67.6    |
| Running Env Steps   | 419000   |
| Running Forward KL  | -3.24    |
| Running Reverse KL  | 6.93     |
| Running Update Time | 838      |
----------------------------------
2025-02-01 15:15:50.556377 Eastern Standard Time
| Itration            | 839      |
| Real Det Return     | 650      |
| Real Sto Return     | 606      |
| Reward Loss         | -152     |
| Running Env Steps   | 419500   |
| Running Forward KL  | -2.78    |
| Running Reverse KL  | 5.57     |
| Running Update Time | 839      |
----------------------------------
2025-02-01 15:16:06.286753 Eastern Standard Time
| Itration            | 840      |
| Real Det Return     | 657      |
| Real Sto Return     | 599      |
| Reward Loss         | -89.5    |
| Running Env Steps   | 420000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 5.89     |
| Running Update Time | 840      |
----------------------------------
2025-02-01 15:16:21.966834 Eastern Standard Time
| Itration            | 841      |
| Real Det Return     | 657      |
| Real Sto Return     | 618      |
| Reward Loss         | -93.2    |
| Running Env Steps   | 420500   |
| Running Forward KL  | -3.12    |
| Running Reverse KL  | 6.16     |
| Running Update Time | 841      |
----------------------------------
2025-02-01 15:16:37.810127 Eastern Standard Time
| Itration            | 842      |
| Real Det Return     | 646      |
| Real Sto Return     | 593      |
| Reward Loss         | -113     |
| Running Env Steps   | 421000   |
| Running Forward KL  | -3.45    |
| Running Reverse KL  | 5.74     |
| Running Update Time | 842      |
----------------------------------
2025-02-01 15:16:53.616242 Eastern Standard Time
| Itration            | 843      |
| Real Det Return     | 642      |
| Real Sto Return     | 598      |
| Reward Loss         | -146     |
| Running Env Steps   | 421500   |
| Running Forward KL  | -4.04    |
| Running Reverse KL  | 5.58     |
| Running Update Time | 843      |
----------------------------------
2025-02-01 15:17:09.451880 Eastern Standard Time
| Itration            | 844      |
| Real Det Return     | 624      |
| Real Sto Return     | 577      |
| Reward Loss         | -191     |
| Running Env Steps   | 422000   |
| Running Forward KL  | -2.67    |
| Running Reverse KL  | 5.86     |
| Running Update Time | 844      |
----------------------------------
2025-02-01 15:17:25.209829 Eastern Standard Time
| Itration            | 845      |
| Real Det Return     | 644      |
| Real Sto Return     | 606      |
| Reward Loss         | -139     |
| Running Env Steps   | 422500   |
| Running Forward KL  | -3.25    |
| Running Reverse KL  | 6.57     |
| Running Update Time | 845      |
----------------------------------
2025-02-01 15:17:40.956754 Eastern Standard Time
| Itration            | 846      |
| Real Det Return     | 631      |
| Real Sto Return     | 607      |
| Reward Loss         | -111     |
| Running Env Steps   | 423000   |
| Running Forward KL  | -3.44    |
| Running Reverse KL  | 6.47     |
| Running Update Time | 846      |
----------------------------------
2025-02-01 15:17:56.671431 Eastern Standard Time
| Itration            | 847      |
| Real Det Return     | 653      |
| Real Sto Return     | 631      |
| Reward Loss         | -96.4    |
| Running Env Steps   | 423500   |
| Running Forward KL  | -3.05    |
| Running Reverse KL  | 5.88     |
| Running Update Time | 847      |
----------------------------------
2025-02-01 15:18:12.446009 Eastern Standard Time
| Itration            | 848      |
| Real Det Return     | 642      |
| Real Sto Return     | 617      |
| Reward Loss         | -109     |
| Running Env Steps   | 424000   |
| Running Forward KL  | -3.48    |
| Running Reverse KL  | 6.25     |
| Running Update Time | 848      |
----------------------------------
2025-02-01 15:18:28.175972 Eastern Standard Time
| Itration            | 849      |
| Real Det Return     | 603      |
| Real Sto Return     | 588      |
| Reward Loss         | -145     |
| Running Env Steps   | 424500   |
| Running Forward KL  | -3.15    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 849      |
----------------------------------
2025-02-01 15:18:43.990330 Eastern Standard Time
| Itration            | 850      |
| Real Det Return     | 617      |
| Real Sto Return     | 596      |
| Reward Loss         | -107     |
| Running Env Steps   | 425000   |
| Running Forward KL  | -3.83    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 850      |
----------------------------------
2025-02-01 15:18:59.729797 Eastern Standard Time
| Itration            | 851      |
| Real Det Return     | 640      |
| Real Sto Return     | 611      |
| Reward Loss         | -103     |
| Running Env Steps   | 425500   |
| Running Forward KL  | -3.03    |
| Running Reverse KL  | 6.16     |
| Running Update Time | 851      |
----------------------------------
2025-02-01 15:19:15.538020 Eastern Standard Time
| Itration            | 852      |
| Real Det Return     | 638      |
| Real Sto Return     | 622      |
| Reward Loss         | -95      |
| Running Env Steps   | 426000   |
| Running Forward KL  | -3.25    |
| Running Reverse KL  | 6.4      |
| Running Update Time | 852      |
----------------------------------
2025-02-01 15:19:31.303276 Eastern Standard Time
| Itration            | 853      |
| Real Det Return     | 644      |
| Real Sto Return     | 605      |
| Reward Loss         | -121     |
| Running Env Steps   | 426500   |
| Running Forward KL  | -3.05    |
| Running Reverse KL  | 5.89     |
| Running Update Time | 853      |
----------------------------------
2025-02-01 15:19:47.042612 Eastern Standard Time
| Itration            | 854      |
| Real Det Return     | 650      |
| Real Sto Return     | 611      |
| Reward Loss         | -247     |
| Running Env Steps   | 427000   |
| Running Forward KL  | -2.62    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 854      |
----------------------------------
2025-02-01 15:20:02.794217 Eastern Standard Time
| Itration            | 855      |
| Real Det Return     | 606      |
| Real Sto Return     | 577      |
| Reward Loss         | -135     |
| Running Env Steps   | 427500   |
| Running Forward KL  | -3.2     |
| Running Reverse KL  | 5.78     |
| Running Update Time | 855      |
----------------------------------
2025-02-01 15:20:18.627336 Eastern Standard Time
| Itration            | 856      |
| Real Det Return     | 670      |
| Real Sto Return     | 631      |
| Reward Loss         | -102     |
| Running Env Steps   | 428000   |
| Running Forward KL  | -3.9     |
| Running Reverse KL  | 6.3      |
| Running Update Time | 856      |
----------------------------------
2025-02-01 15:20:34.397702 Eastern Standard Time
| Itration            | 857      |
| Real Det Return     | 656      |
| Real Sto Return     | 618      |
| Reward Loss         | -119     |
| Running Env Steps   | 428500   |
| Running Forward KL  | -2.95    |
| Running Reverse KL  | 6.61     |
| Running Update Time | 857      |
----------------------------------
2025-02-01 15:20:50.213055 Eastern Standard Time
| Itration            | 858      |
| Real Det Return     | 642      |
| Real Sto Return     | 618      |
| Reward Loss         | -98.8    |
| Running Env Steps   | 429000   |
| Running Forward KL  | -3.81    |
| Running Reverse KL  | 6.02     |
| Running Update Time | 858      |
----------------------------------
2025-02-01 15:21:06.001514 Eastern Standard Time
| Itration            | 859      |
| Real Det Return     | 666      |
| Real Sto Return     | 634      |
| Reward Loss         | -73.3    |
| Running Env Steps   | 429500   |
| Running Forward KL  | -2.61    |
| Running Reverse KL  | 6.81     |
| Running Update Time | 859      |
----------------------------------
2025-02-01 15:21:21.728081 Eastern Standard Time
| Itration            | 860      |
| Real Det Return     | 629      |
| Real Sto Return     | 591      |
| Reward Loss         | -94.8    |
| Running Env Steps   | 430000   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 5.89     |
| Running Update Time | 860      |
----------------------------------
2025-02-01 15:21:37.548147 Eastern Standard Time
| Itration            | 861      |
| Real Det Return     | 655      |
| Real Sto Return     | 610      |
| Reward Loss         | -129     |
| Running Env Steps   | 430500   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 5.6      |
| Running Update Time | 861      |
----------------------------------
2025-02-01 15:21:53.305430 Eastern Standard Time
| Itration            | 862      |
| Real Det Return     | 631      |
| Real Sto Return     | 621      |
| Reward Loss         | -122     |
| Running Env Steps   | 431000   |
| Running Forward KL  | -2.23    |
| Running Reverse KL  | 6.45     |
| Running Update Time | 862      |
----------------------------------
2025-02-01 15:22:09.087850 Eastern Standard Time
| Itration            | 863      |
| Real Det Return     | 649      |
| Real Sto Return     | 613      |
| Reward Loss         | -103     |
| Running Env Steps   | 431500   |
| Running Forward KL  | -2.83    |
| Running Reverse KL  | 5.93     |
| Running Update Time | 863      |
----------------------------------
2025-02-01 15:22:24.834203 Eastern Standard Time
| Itration            | 864      |
| Real Det Return     | 648      |
| Real Sto Return     | 596      |
| Reward Loss         | -114     |
| Running Env Steps   | 432000   |
| Running Forward KL  | -3.24    |
| Running Reverse KL  | 6.25     |
| Running Update Time | 864      |
----------------------------------
2025-02-01 15:22:40.632696 Eastern Standard Time
| Itration            | 865      |
| Real Det Return     | 608      |
| Real Sto Return     | 581      |
| Reward Loss         | -110     |
| Running Env Steps   | 432500   |
| Running Forward KL  | -3.16    |
| Running Reverse KL  | 5.67     |
| Running Update Time | 865      |
----------------------------------
2025-02-01 15:22:56.437713 Eastern Standard Time
| Itration            | 866      |
| Real Det Return     | 615      |
| Real Sto Return     | 580      |
| Reward Loss         | -136     |
| Running Env Steps   | 433000   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 6.16     |
| Running Update Time | 866      |
----------------------------------
2025-02-01 15:23:12.172298 Eastern Standard Time
| Itration            | 867      |
| Real Det Return     | 632      |
| Real Sto Return     | 603      |
| Reward Loss         | -125     |
| Running Env Steps   | 433500   |
| Running Forward KL  | -4.01    |
| Running Reverse KL  | 5.9      |
| Running Update Time | 867      |
----------------------------------
2025-02-01 15:23:27.899406 Eastern Standard Time
| Itration            | 868      |
| Real Det Return     | 667      |
| Real Sto Return     | 607      |
| Reward Loss         | -64.9    |
| Running Env Steps   | 434000   |
| Running Forward KL  | -3.53    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 868      |
----------------------------------
2025-02-01 15:23:43.738405 Eastern Standard Time
| Itration            | 869      |
| Real Det Return     | 665      |
| Real Sto Return     | 619      |
| Reward Loss         | -54.9    |
| Running Env Steps   | 434500   |
| Running Forward KL  | -3.29    |
| Running Reverse KL  | 6.8      |
| Running Update Time | 869      |
----------------------------------
2025-02-01 15:24:00.605038 Eastern Standard Time
| Itration            | 870      |
| Real Det Return     | 653      |
| Real Sto Return     | 621      |
| Reward Loss         | -120     |
| Running Env Steps   | 435000   |
| Running Forward KL  | -2.84    |
| Running Reverse KL  | 6.92     |
| Running Update Time | 870      |
----------------------------------
2025-02-01 15:24:16.912172 Eastern Standard Time
| Itration            | 871      |
| Real Det Return     | 653      |
| Real Sto Return     | 605      |
| Reward Loss         | -105     |
| Running Env Steps   | 435500   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 5.79     |
| Running Update Time | 871      |
----------------------------------
2025-02-01 15:24:32.494252 Eastern Standard Time
| Itration            | 872      |
| Real Det Return     | 643      |
| Real Sto Return     | 621      |
| Reward Loss         | -116     |
| Running Env Steps   | 436000   |
| Running Forward KL  | -3.36    |
| Running Reverse KL  | 6.73     |
| Running Update Time | 872      |
----------------------------------
2025-02-01 15:24:48.121811 Eastern Standard Time
| Itration            | 873      |
| Real Det Return     | 644      |
| Real Sto Return     | 616      |
| Reward Loss         | -110     |
| Running Env Steps   | 436500   |
| Running Forward KL  | -3.13    |
| Running Reverse KL  | 5.72     |
| Running Update Time | 873      |
----------------------------------
2025-02-01 15:25:03.760903 Eastern Standard Time
| Itration            | 874      |
| Real Det Return     | 659      |
| Real Sto Return     | 616      |
| Reward Loss         | -125     |
| Running Env Steps   | 437000   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 5.42     |
| Running Update Time | 874      |
----------------------------------
2025-02-01 15:25:20.505115 Eastern Standard Time
| Itration            | 875      |
| Real Det Return     | 659      |
| Real Sto Return     | 639      |
| Reward Loss         | -65.3    |
| Running Env Steps   | 437500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 6.62     |
| Running Update Time | 875      |
----------------------------------
2025-02-01 15:25:36.476560 Eastern Standard Time
| Itration            | 876      |
| Real Det Return     | 598      |
| Real Sto Return     | 568      |
| Reward Loss         | -126     |
| Running Env Steps   | 438000   |
| Running Forward KL  | -3.1     |
| Running Reverse KL  | 5.86     |
| Running Update Time | 876      |
----------------------------------
2025-02-01 15:25:52.346303 Eastern Standard Time
| Itration            | 877      |
| Real Det Return     | 633      |
| Real Sto Return     | 613      |
| Reward Loss         | -103     |
| Running Env Steps   | 438500   |
| Running Forward KL  | -3.6     |
| Running Reverse KL  | 6.56     |
| Running Update Time | 877      |
----------------------------------
2025-02-01 15:26:08.243976 Eastern Standard Time
| Itration            | 878      |
| Real Det Return     | 654      |
| Real Sto Return     | 624      |
| Reward Loss         | -130     |
| Running Env Steps   | 439000   |
| Running Forward KL  | -3.16    |
| Running Reverse KL  | 5.21     |
| Running Update Time | 878      |
----------------------------------
2025-02-01 15:26:24.091178 Eastern Standard Time
| Itration            | 879      |
| Real Det Return     | 640      |
| Real Sto Return     | 620      |
| Reward Loss         | -98.2    |
| Running Env Steps   | 439500   |
| Running Forward KL  | -3.62    |
| Running Reverse KL  | 6.38     |
| Running Update Time | 879      |
----------------------------------
2025-02-01 15:26:40.205045 Eastern Standard Time
| Itration            | 880      |
| Real Det Return     | 652      |
| Real Sto Return     | 624      |
| Reward Loss         | -94.6    |
| Running Env Steps   | 440000   |
| Running Forward KL  | -3.5     |
| Running Reverse KL  | 6.47     |
| Running Update Time | 880      |
----------------------------------
2025-02-01 15:26:56.822554 Eastern Standard Time
| Itration            | 881      |
| Real Det Return     | 639      |
| Real Sto Return     | 600      |
| Reward Loss         | -84.4    |
| Running Env Steps   | 440500   |
| Running Forward KL  | -3.41    |
| Running Reverse KL  | 6.07     |
| Running Update Time | 881      |
----------------------------------
2025-02-01 15:27:14.739028 Eastern Standard Time
| Itration            | 882      |
| Real Det Return     | 656      |
| Real Sto Return     | 615      |
| Reward Loss         | -117     |
| Running Env Steps   | 441000   |
| Running Forward KL  | -3.33    |
| Running Reverse KL  | 5.48     |
| Running Update Time | 882      |
----------------------------------
2025-02-01 15:27:32.363114 Eastern Standard Time
| Itration            | 883      |
| Real Det Return     | 674      |
| Real Sto Return     | 634      |
| Reward Loss         | -60.1    |
| Running Env Steps   | 441500   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 5.42     |
| Running Update Time | 883      |
----------------------------------
2025-02-01 15:27:49.376593 Eastern Standard Time
| Itration            | 884      |
| Real Det Return     | 626      |
| Real Sto Return     | 598      |
| Reward Loss         | -153     |
| Running Env Steps   | 442000   |
| Running Forward KL  | -3.07    |
| Running Reverse KL  | 5.72     |
| Running Update Time | 884      |
----------------------------------
2025-02-01 15:28:08.221990 Eastern Standard Time
| Itration            | 885      |
| Real Det Return     | 645      |
| Real Sto Return     | 602      |
| Reward Loss         | -110     |
| Running Env Steps   | 442500   |
| Running Forward KL  | -3.52    |
| Running Reverse KL  | 6.09     |
| Running Update Time | 885      |
----------------------------------
2025-02-01 15:28:26.033713 Eastern Standard Time
| Itration            | 886      |
| Real Det Return     | 639      |
| Real Sto Return     | 614      |
| Reward Loss         | -112     |
| Running Env Steps   | 443000   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 6.65     |
| Running Update Time | 886      |
----------------------------------
2025-02-01 15:28:42.847390 Eastern Standard Time
| Itration            | 887      |
| Real Det Return     | 622      |
| Real Sto Return     | 568      |
| Reward Loss         | -150     |
| Running Env Steps   | 443500   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 5.6      |
| Running Update Time | 887      |
----------------------------------
2025-02-01 15:28:58.918922 Eastern Standard Time
| Itration            | 888      |
| Real Det Return     | 657      |
| Real Sto Return     | 615      |
| Reward Loss         | -89.6    |
| Running Env Steps   | 444000   |
| Running Forward KL  | -3.72    |
| Running Reverse KL  | 6.38     |
| Running Update Time | 888      |
----------------------------------
2025-02-01 15:29:15.022517 Eastern Standard Time
| Itration            | 889      |
| Real Det Return     | 655      |
| Real Sto Return     | 610      |
| Reward Loss         | -63.1    |
| Running Env Steps   | 444500   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 6.16     |
| Running Update Time | 889      |
----------------------------------
2025-02-01 15:29:30.773940 Eastern Standard Time
| Itration            | 890      |
| Real Det Return     | 650      |
| Real Sto Return     | 610      |
| Reward Loss         | -117     |
| Running Env Steps   | 445000   |
| Running Forward KL  | -2.76    |
| Running Reverse KL  | 6.24     |
| Running Update Time | 890      |
----------------------------------
2025-02-01 15:29:46.546035 Eastern Standard Time
| Itration            | 891      |
| Real Det Return     | 664      |
| Real Sto Return     | 617      |
| Reward Loss         | -99.7    |
| Running Env Steps   | 445500   |
| Running Forward KL  | -3.26    |
| Running Reverse KL  | 6.23     |
| Running Update Time | 891      |
----------------------------------
2025-02-01 15:30:02.532126 Eastern Standard Time
| Itration            | 892      |
| Real Det Return     | 617      |
| Real Sto Return     | 594      |
| Reward Loss         | -153     |
| Running Env Steps   | 446000   |
| Running Forward KL  | -2.5     |
| Running Reverse KL  | 6.65     |
| Running Update Time | 892      |
----------------------------------
2025-02-01 15:30:18.247374 Eastern Standard Time
| Itration            | 893      |
| Real Det Return     | 653      |
| Real Sto Return     | 621      |
| Reward Loss         | -101     |
| Running Env Steps   | 446500   |
| Running Forward KL  | -3.94    |
| Running Reverse KL  | 6.65     |
| Running Update Time | 893      |
----------------------------------
2025-02-01 15:30:33.902351 Eastern Standard Time
| Itration            | 894      |
| Real Det Return     | 679      |
| Real Sto Return     | 639      |
| Reward Loss         | -35.1    |
| Running Env Steps   | 447000   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 894      |
----------------------------------
2025-02-01 15:30:49.569546 Eastern Standard Time
| Itration            | 895      |
| Real Det Return     | 632      |
| Real Sto Return     | 610      |
| Reward Loss         | -87.4    |
| Running Env Steps   | 447500   |
| Running Forward KL  | -4       |
| Running Reverse KL  | 6.1      |
| Running Update Time | 895      |
----------------------------------
2025-02-01 15:31:05.198267 Eastern Standard Time
| Itration            | 896      |
| Real Det Return     | 671      |
| Real Sto Return     | 630      |
| Reward Loss         | -110     |
| Running Env Steps   | 448000   |
| Running Forward KL  | -3.28    |
| Running Reverse KL  | 5.88     |
| Running Update Time | 896      |
----------------------------------
2025-02-01 15:31:20.935839 Eastern Standard Time
| Itration            | 897      |
| Real Det Return     | 641      |
| Real Sto Return     | 613      |
| Reward Loss         | -99.6    |
| Running Env Steps   | 448500   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 6.05     |
| Running Update Time | 897      |
----------------------------------
2025-02-01 15:31:36.567281 Eastern Standard Time
| Itration            | 898      |
| Real Det Return     | 629      |
| Real Sto Return     | 610      |
| Reward Loss         | -103     |
| Running Env Steps   | 449000   |
| Running Forward KL  | -3.76    |
| Running Reverse KL  | 5.89     |
| Running Update Time | 898      |
----------------------------------
2025-02-01 15:31:52.156555 Eastern Standard Time
| Itration            | 899      |
| Real Det Return     | 628      |
| Real Sto Return     | 588      |
| Reward Loss         | -125     |
| Running Env Steps   | 449500   |
| Running Forward KL  | -3.25    |
| Running Reverse KL  | 6.82     |
| Running Update Time | 899      |
----------------------------------
2025-02-01 15:32:07.837454 Eastern Standard Time
| Itration            | 900      |
| Real Det Return     | 646      |
| Real Sto Return     | 625      |
| Reward Loss         | -121     |
| Running Env Steps   | 450000   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 900      |
----------------------------------
2025-02-01 15:32:23.480043 Eastern Standard Time
| Itration            | 901      |
| Real Det Return     | 649      |
| Real Sto Return     | 625      |
| Reward Loss         | -72.4    |
| Running Env Steps   | 450500   |
| Running Forward KL  | -3.38    |
| Running Reverse KL  | 6.32     |
| Running Update Time | 901      |
----------------------------------
2025-02-01 15:32:39.182353 Eastern Standard Time
| Itration            | 902      |
| Real Det Return     | 648      |
| Real Sto Return     | 613      |
| Reward Loss         | -83.9    |
| Running Env Steps   | 451000   |
| Running Forward KL  | -3.3     |
| Running Reverse KL  | 7.4      |
| Running Update Time | 902      |
----------------------------------
2025-02-01 15:32:55.285487 Eastern Standard Time
| Itration            | 903      |
| Real Det Return     | 625      |
| Real Sto Return     | 614      |
| Reward Loss         | -101     |
| Running Env Steps   | 451500   |
| Running Forward KL  | -4       |
| Running Reverse KL  | 6.64     |
| Running Update Time | 903      |
----------------------------------
2025-02-01 15:33:11.083793 Eastern Standard Time
| Itration            | 904      |
| Real Det Return     | 662      |
| Real Sto Return     | 619      |
| Reward Loss         | -68.8    |
| Running Env Steps   | 452000   |
| Running Forward KL  | -3.78    |
| Running Reverse KL  | 5.94     |
| Running Update Time | 904      |
----------------------------------
2025-02-01 15:33:26.796876 Eastern Standard Time
| Itration            | 905      |
| Real Det Return     | 667      |
| Real Sto Return     | 612      |
| Reward Loss         | -114     |
| Running Env Steps   | 452500   |
| Running Forward KL  | -3.2     |
| Running Reverse KL  | 5.83     |
| Running Update Time | 905      |
----------------------------------
2025-02-01 15:33:42.536882 Eastern Standard Time
| Itration            | 906      |
| Real Det Return     | 640      |
| Real Sto Return     | 606      |
| Reward Loss         | -99.4    |
| Running Env Steps   | 453000   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 7.18     |
| Running Update Time | 906      |
----------------------------------
2025-02-01 15:33:58.271783 Eastern Standard Time
| Itration            | 907      |
| Real Det Return     | 634      |
| Real Sto Return     | 622      |
| Reward Loss         | -92.7    |
| Running Env Steps   | 453500   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 5.79     |
| Running Update Time | 907      |
----------------------------------
2025-02-01 15:34:14.038761 Eastern Standard Time
| Itration            | 908      |
| Real Det Return     | 646      |
| Real Sto Return     | 612      |
| Reward Loss         | -94.9    |
| Running Env Steps   | 454000   |
| Running Forward KL  | -2.78    |
| Running Reverse KL  | 6.74     |
| Running Update Time | 908      |
----------------------------------
2025-02-01 15:34:29.725153 Eastern Standard Time
| Itration            | 909      |
| Real Det Return     | 664      |
| Real Sto Return     | 630      |
| Reward Loss         | -57.8    |
| Running Env Steps   | 454500   |
| Running Forward KL  | -4.01    |
| Running Reverse KL  | 7.28     |
| Running Update Time | 909      |
----------------------------------
2025-02-01 15:34:45.424679 Eastern Standard Time
| Itration            | 910      |
| Real Det Return     | 652      |
| Real Sto Return     | 628      |
| Reward Loss         | -81.1    |
| Running Env Steps   | 455000   |
| Running Forward KL  | -3.46    |
| Running Reverse KL  | 6.39     |
| Running Update Time | 910      |
----------------------------------
2025-02-01 15:35:01.171727 Eastern Standard Time
| Itration            | 911      |
| Real Det Return     | 652      |
| Real Sto Return     | 624      |
| Reward Loss         | -91.9    |
| Running Env Steps   | 455500   |
| Running Forward KL  | -3.45    |
| Running Reverse KL  | 6.49     |
| Running Update Time | 911      |
----------------------------------
2025-02-01 15:35:16.959417 Eastern Standard Time
| Itration            | 912      |
| Real Det Return     | 667      |
| Real Sto Return     | 628      |
| Reward Loss         | -52.8    |
| Running Env Steps   | 456000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 6.48     |
| Running Update Time | 912      |
----------------------------------
2025-02-01 15:35:32.705453 Eastern Standard Time
| Itration            | 913      |
| Real Det Return     | 656      |
| Real Sto Return     | 608      |
| Reward Loss         | -102     |
| Running Env Steps   | 456500   |
| Running Forward KL  | -3.83    |
| Running Reverse KL  | 6.57     |
| Running Update Time | 913      |
----------------------------------
2025-02-01 15:35:48.439544 Eastern Standard Time
| Itration            | 914      |
| Real Det Return     | 597      |
| Real Sto Return     | 578      |
| Reward Loss         | -152     |
| Running Env Steps   | 457000   |
| Running Forward KL  | -2.64    |
| Running Reverse KL  | 6.78     |
| Running Update Time | 914      |
----------------------------------
2025-02-01 15:36:04.156922 Eastern Standard Time
| Itration            | 915      |
| Real Det Return     | 616      |
| Real Sto Return     | 588      |
| Reward Loss         | -84.9    |
| Running Env Steps   | 457500   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 5.68     |
| Running Update Time | 915      |
----------------------------------
2025-02-01 15:36:19.916876 Eastern Standard Time
| Itration            | 916      |
| Real Det Return     | 667      |
| Real Sto Return     | 613      |
| Reward Loss         | -85.9    |
| Running Env Steps   | 458000   |
| Running Forward KL  | -3.32    |
| Running Reverse KL  | 6.67     |
| Running Update Time | 916      |
----------------------------------
2025-02-01 15:36:35.692365 Eastern Standard Time
| Itration            | 917      |
| Real Det Return     | 676      |
| Real Sto Return     | 629      |
| Reward Loss         | -105     |
| Running Env Steps   | 458500   |
| Running Forward KL  | -3.59    |
| Running Reverse KL  | 6.01     |
| Running Update Time | 917      |
----------------------------------
2025-02-01 15:36:51.505177 Eastern Standard Time
| Itration            | 918      |
| Real Det Return     | 634      |
| Real Sto Return     | 601      |
| Reward Loss         | -113     |
| Running Env Steps   | 459000   |
| Running Forward KL  | -2.79    |
| Running Reverse KL  | 6.21     |
| Running Update Time | 918      |
----------------------------------
2025-02-01 15:37:07.238381 Eastern Standard Time
| Itration            | 919      |
| Real Det Return     | 650      |
| Real Sto Return     | 609      |
| Reward Loss         | -81.5    |
| Running Env Steps   | 459500   |
| Running Forward KL  | -3.47    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 919      |
----------------------------------
2025-02-01 15:37:23.185224 Eastern Standard Time
| Itration            | 920      |
| Real Det Return     | 651      |
| Real Sto Return     | 625      |
| Reward Loss         | -101     |
| Running Env Steps   | 460000   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 5.87     |
| Running Update Time | 920      |
----------------------------------
2025-02-01 15:37:38.929067 Eastern Standard Time
| Itration            | 921      |
| Real Det Return     | 634      |
| Real Sto Return     | 617      |
| Reward Loss         | -119     |
| Running Env Steps   | 460500   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 6.38     |
| Running Update Time | 921      |
----------------------------------
2025-02-01 15:37:54.646678 Eastern Standard Time
| Itration            | 922      |
| Real Det Return     | 631      |
| Real Sto Return     | 601      |
| Reward Loss         | -74.3    |
| Running Env Steps   | 461000   |
| Running Forward KL  | -3.32    |
| Running Reverse KL  | 6.71     |
| Running Update Time | 922      |
----------------------------------
2025-02-01 15:38:10.473505 Eastern Standard Time
| Itration            | 923      |
| Real Det Return     | 664      |
| Real Sto Return     | 626      |
| Reward Loss         | -76.5    |
| Running Env Steps   | 461500   |
| Running Forward KL  | -3.91    |
| Running Reverse KL  | 6.51     |
| Running Update Time | 923      |
----------------------------------
2025-02-01 15:38:26.270236 Eastern Standard Time
| Itration            | 924      |
| Real Det Return     | 649      |
| Real Sto Return     | 636      |
| Reward Loss         | -67.3    |
| Running Env Steps   | 462000   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 5.87     |
| Running Update Time | 924      |
----------------------------------
2025-02-01 15:38:42.039081 Eastern Standard Time
| Itration            | 925      |
| Real Det Return     | 651      |
| Real Sto Return     | 620      |
| Reward Loss         | -76.1    |
| Running Env Steps   | 462500   |
| Running Forward KL  | -3.86    |
| Running Reverse KL  | 6        |
| Running Update Time | 925      |
----------------------------------
2025-02-01 15:38:57.761680 Eastern Standard Time
| Itration            | 926      |
| Real Det Return     | 644      |
| Real Sto Return     | 605      |
| Reward Loss         | -92.7    |
| Running Env Steps   | 463000   |
| Running Forward KL  | -4.06    |
| Running Reverse KL  | 5.71     |
| Running Update Time | 926      |
----------------------------------
2025-02-01 15:39:13.570844 Eastern Standard Time
| Itration            | 927      |
| Real Det Return     | 624      |
| Real Sto Return     | 616      |
| Reward Loss         | -70.6    |
| Running Env Steps   | 463500   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 6.63     |
| Running Update Time | 927      |
----------------------------------
2025-02-01 15:39:29.304018 Eastern Standard Time
| Itration            | 928      |
| Real Det Return     | 646      |
| Real Sto Return     | 622      |
| Reward Loss         | -89.2    |
| Running Env Steps   | 464000   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 5.64     |
| Running Update Time | 928      |
----------------------------------
2025-02-01 15:39:45.055754 Eastern Standard Time
| Itration            | 929      |
| Real Det Return     | 615      |
| Real Sto Return     | 600      |
| Reward Loss         | -173     |
| Running Env Steps   | 464500   |
| Running Forward KL  | -3.03    |
| Running Reverse KL  | 5.85     |
| Running Update Time | 929      |
----------------------------------
2025-02-01 15:40:00.745570 Eastern Standard Time
| Itration            | 930      |
| Real Det Return     | 582      |
| Real Sto Return     | 569      |
| Reward Loss         | -236     |
| Running Env Steps   | 465000   |
| Running Forward KL  | -1.62    |
| Running Reverse KL  | 5.42     |
| Running Update Time | 930      |
----------------------------------
2025-02-01 15:40:16.715640 Eastern Standard Time
| Itration            | 931      |
| Real Det Return     | 675      |
| Real Sto Return     | 655      |
| Reward Loss         | -72.6    |
| Running Env Steps   | 465500   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 6.25     |
| Running Update Time | 931      |
----------------------------------
2025-02-01 15:40:32.787789 Eastern Standard Time
| Itration            | 932      |
| Real Det Return     | 648      |
| Real Sto Return     | 623      |
| Reward Loss         | -71.4    |
| Running Env Steps   | 466000   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 6.04     |
| Running Update Time | 932      |
----------------------------------
2025-02-01 15:40:48.873153 Eastern Standard Time
| Itration            | 933      |
| Real Det Return     | 658      |
| Real Sto Return     | 621      |
| Reward Loss         | -109     |
| Running Env Steps   | 466500   |
| Running Forward KL  | -2.57    |
| Running Reverse KL  | 6.79     |
| Running Update Time | 933      |
----------------------------------
2025-02-01 15:41:04.909543 Eastern Standard Time
| Itration            | 934      |
| Real Det Return     | 669      |
| Real Sto Return     | 624      |
| Reward Loss         | -107     |
| Running Env Steps   | 467000   |
| Running Forward KL  | -3.4     |
| Running Reverse KL  | 6.97     |
| Running Update Time | 934      |
----------------------------------
2025-02-01 15:41:21.014768 Eastern Standard Time
| Itration            | 935      |
| Real Det Return     | 664      |
| Real Sto Return     | 620      |
| Reward Loss         | -70.5    |
| Running Env Steps   | 467500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 6.68     |
| Running Update Time | 935      |
----------------------------------
2025-02-01 15:41:37.141028 Eastern Standard Time
| Itration            | 936      |
| Real Det Return     | 682      |
| Real Sto Return     | 625      |
| Reward Loss         | -162     |
| Running Env Steps   | 468000   |
| Running Forward KL  | -3.64    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 936      |
----------------------------------
2025-02-01 15:41:53.257071 Eastern Standard Time
| Itration            | 937      |
| Real Det Return     | 646      |
| Real Sto Return     | 623      |
| Reward Loss         | -65.3    |
| Running Env Steps   | 468500   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 937      |
----------------------------------
2025-02-01 15:42:09.061698 Eastern Standard Time
| Itration            | 938      |
| Real Det Return     | 645      |
| Real Sto Return     | 613      |
| Reward Loss         | -87.9    |
| Running Env Steps   | 469000   |
| Running Forward KL  | -3.9     |
| Running Reverse KL  | 5.68     |
| Running Update Time | 938      |
----------------------------------
2025-02-01 15:42:24.808681 Eastern Standard Time
| Itration            | 939      |
| Real Det Return     | 591      |
| Real Sto Return     | 531      |
| Reward Loss         | -131     |
| Running Env Steps   | 469500   |
| Running Forward KL  | -3.14    |
| Running Reverse KL  | 6.3      |
| Running Update Time | 939      |
----------------------------------
2025-02-01 15:42:40.615675 Eastern Standard Time
| Itration            | 940      |
| Real Det Return     | 686      |
| Real Sto Return     | 634      |
| Reward Loss         | -46.9    |
| Running Env Steps   | 470000   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 6.56     |
| Running Update Time | 940      |
----------------------------------
2025-02-01 15:42:56.560172 Eastern Standard Time
| Itration            | 941      |
| Real Det Return     | 669      |
| Real Sto Return     | 617      |
| Reward Loss         | -79      |
| Running Env Steps   | 470500   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 6.39     |
| Running Update Time | 941      |
----------------------------------
2025-02-01 15:43:12.326567 Eastern Standard Time
| Itration            | 942      |
| Real Det Return     | 684      |
| Real Sto Return     | 629      |
| Reward Loss         | -132     |
| Running Env Steps   | 471000   |
| Running Forward KL  | -3.08    |
| Running Reverse KL  | 6.36     |
| Running Update Time | 942      |
----------------------------------
2025-02-01 15:43:28.104927 Eastern Standard Time
| Itration            | 943      |
| Real Det Return     | 662      |
| Real Sto Return     | 620      |
| Reward Loss         | -111     |
| Running Env Steps   | 471500   |
| Running Forward KL  | -4.1     |
| Running Reverse KL  | 6.46     |
| Running Update Time | 943      |
----------------------------------
2025-02-01 15:43:43.776636 Eastern Standard Time
| Itration            | 944      |
| Real Det Return     | 638      |
| Real Sto Return     | 630      |
| Reward Loss         | -75.5    |
| Running Env Steps   | 472000   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 7.03     |
| Running Update Time | 944      |
----------------------------------
2025-02-01 15:43:59.502490 Eastern Standard Time
| Itration            | 945      |
| Real Det Return     | 629      |
| Real Sto Return     | 606      |
| Reward Loss         | -77      |
| Running Env Steps   | 472500   |
| Running Forward KL  | -3.56    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 945      |
----------------------------------
2025-02-01 15:44:15.253373 Eastern Standard Time
| Itration            | 946      |
| Real Det Return     | 646      |
| Real Sto Return     | 620      |
| Reward Loss         | -83.4    |
| Running Env Steps   | 473000   |
| Running Forward KL  | -3.69    |
| Running Reverse KL  | 7        |
| Running Update Time | 946      |
----------------------------------
2025-02-01 15:44:31.009056 Eastern Standard Time
| Itration            | 947      |
| Real Det Return     | 644      |
| Real Sto Return     | 633      |
| Reward Loss         | -74.2    |
| Running Env Steps   | 473500   |
| Running Forward KL  | -3.31    |
| Running Reverse KL  | 6.71     |
| Running Update Time | 947      |
----------------------------------
2025-02-01 15:44:46.815321 Eastern Standard Time
| Itration            | 948      |
| Real Det Return     | 633      |
| Real Sto Return     | 579      |
| Reward Loss         | -124     |
| Running Env Steps   | 474000   |
| Running Forward KL  | -3.02    |
| Running Reverse KL  | 6.17     |
| Running Update Time | 948      |
----------------------------------
2025-02-01 15:45:02.615151 Eastern Standard Time
| Itration            | 949      |
| Real Det Return     | 673      |
| Real Sto Return     | 626      |
| Reward Loss         | -74.1    |
| Running Env Steps   | 474500   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 6.84     |
| Running Update Time | 949      |
----------------------------------
2025-02-01 15:45:18.448205 Eastern Standard Time
| Itration            | 950      |
| Real Det Return     | 664      |
| Real Sto Return     | 622      |
| Reward Loss         | -93.1    |
| Running Env Steps   | 475000   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 5.9      |
| Running Update Time | 950      |
----------------------------------
2025-02-01 15:45:34.307946 Eastern Standard Time
| Itration            | 951      |
| Real Det Return     | 623      |
| Real Sto Return     | 590      |
| Reward Loss         | -82.8    |
| Running Env Steps   | 475500   |
| Running Forward KL  | -3.11    |
| Running Reverse KL  | 7.81     |
| Running Update Time | 951      |
----------------------------------
2025-02-01 15:45:50.080961 Eastern Standard Time
| Itration            | 952      |
| Real Det Return     | 685      |
| Real Sto Return     | 649      |
| Reward Loss         | -42.9    |
| Running Env Steps   | 476000   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 952      |
----------------------------------
2025-02-01 15:46:05.762383 Eastern Standard Time
| Itration            | 953      |
| Real Det Return     | 631      |
| Real Sto Return     | 609      |
| Reward Loss         | -132     |
| Running Env Steps   | 476500   |
| Running Forward KL  | -2.84    |
| Running Reverse KL  | 6.2      |
| Running Update Time | 953      |
----------------------------------
2025-02-01 15:46:22.522773 Eastern Standard Time
| Itration            | 954      |
| Real Det Return     | 652      |
| Real Sto Return     | 644      |
| Reward Loss         | -49.4    |
| Running Env Steps   | 477000   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 6.49     |
| Running Update Time | 954      |
----------------------------------
2025-02-01 15:46:40.054748 Eastern Standard Time
| Itration            | 955      |
| Real Det Return     | 646      |
| Real Sto Return     | 618      |
| Reward Loss         | -144     |
| Running Env Steps   | 477500   |
| Running Forward KL  | -3.08    |
| Running Reverse KL  | 6.8      |
| Running Update Time | 955      |
----------------------------------
2025-02-01 15:46:55.954172 Eastern Standard Time
| Itration            | 956      |
| Real Det Return     | 665      |
| Real Sto Return     | 608      |
| Reward Loss         | -82.8    |
| Running Env Steps   | 478000   |
| Running Forward KL  | -3.4     |
| Running Reverse KL  | 7.08     |
| Running Update Time | 956      |
----------------------------------
2025-02-01 15:47:12.116890 Eastern Standard Time
| Itration            | 957      |
| Real Det Return     | 637      |
| Real Sto Return     | 623      |
| Reward Loss         | -90.3    |
| Running Env Steps   | 478500   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 6.86     |
| Running Update Time | 957      |
----------------------------------
2025-02-01 15:47:28.290901 Eastern Standard Time
| Itration            | 958      |
| Real Det Return     | 656      |
| Real Sto Return     | 621      |
| Reward Loss         | -99.5    |
| Running Env Steps   | 479000   |
| Running Forward KL  | -3.47    |
| Running Reverse KL  | 6.35     |
| Running Update Time | 958      |
----------------------------------
2025-02-01 15:47:43.942535 Eastern Standard Time
| Itration            | 959      |
| Real Det Return     | 649      |
| Real Sto Return     | 612      |
| Reward Loss         | -77.5    |
| Running Env Steps   | 479500   |
| Running Forward KL  | -3.34    |
| Running Reverse KL  | 7.03     |
| Running Update Time | 959      |
----------------------------------
2025-02-01 15:47:59.596732 Eastern Standard Time
| Itration            | 960      |
| Real Det Return     | 653      |
| Real Sto Return     | 600      |
| Reward Loss         | -98.8    |
| Running Env Steps   | 480000   |
| Running Forward KL  | -3.19    |
| Running Reverse KL  | 6.11     |
| Running Update Time | 960      |
----------------------------------
2025-02-01 15:48:15.468041 Eastern Standard Time
| Itration            | 961      |
| Real Det Return     | 679      |
| Real Sto Return     | 619      |
| Reward Loss         | -78.6    |
| Running Env Steps   | 480500   |
| Running Forward KL  | -3.19    |
| Running Reverse KL  | 7.34     |
| Running Update Time | 961      |
----------------------------------
2025-02-01 15:48:31.326389 Eastern Standard Time
| Itration            | 962      |
| Real Det Return     | 630      |
| Real Sto Return     | 617      |
| Reward Loss         | -135     |
| Running Env Steps   | 481000   |
| Running Forward KL  | -3.3     |
| Running Reverse KL  | 6.23     |
| Running Update Time | 962      |
----------------------------------
2025-02-01 15:48:48.146181 Eastern Standard Time
| Itration            | 963      |
| Real Det Return     | 651      |
| Real Sto Return     | 625      |
| Reward Loss         | -52.1    |
| Running Env Steps   | 481500   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 6.73     |
| Running Update Time | 963      |
----------------------------------
2025-02-01 15:49:05.761515 Eastern Standard Time
| Itration            | 964      |
| Real Det Return     | 630      |
| Real Sto Return     | 612      |
| Reward Loss         | -114     |
| Running Env Steps   | 482000   |
| Running Forward KL  | -3.4     |
| Running Reverse KL  | 5.9      |
| Running Update Time | 964      |
----------------------------------
2025-02-01 15:49:22.037524 Eastern Standard Time
| Itration            | 965      |
| Real Det Return     | 646      |
| Real Sto Return     | 616      |
| Reward Loss         | -73.8    |
| Running Env Steps   | 482500   |
| Running Forward KL  | -3.08    |
| Running Reverse KL  | 7.14     |
| Running Update Time | 965      |
----------------------------------
2025-02-01 15:49:38.052497 Eastern Standard Time
| Itration            | 966      |
| Real Det Return     | 674      |
| Real Sto Return     | 641      |
| Reward Loss         | -66      |
| Running Env Steps   | 483000   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 6.21     |
| Running Update Time | 966      |
----------------------------------
2025-02-01 15:49:53.833040 Eastern Standard Time
| Itration            | 967      |
| Real Det Return     | 649      |
| Real Sto Return     | 633      |
| Reward Loss         | -79.1    |
| Running Env Steps   | 483500   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 7.13     |
| Running Update Time | 967      |
----------------------------------
2025-02-01 15:50:09.653055 Eastern Standard Time
| Itration            | 968      |
| Real Det Return     | 659      |
| Real Sto Return     | 616      |
| Reward Loss         | -105     |
| Running Env Steps   | 484000   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 968      |
----------------------------------
2025-02-01 15:50:25.485669 Eastern Standard Time
| Itration            | 969      |
| Real Det Return     | 623      |
| Real Sto Return     | 595      |
| Reward Loss         | -116     |
| Running Env Steps   | 484500   |
| Running Forward KL  | -3.19    |
| Running Reverse KL  | 5.57     |
| Running Update Time | 969      |
----------------------------------
2025-02-01 15:50:41.311845 Eastern Standard Time
| Itration            | 970      |
| Real Det Return     | 665      |
| Real Sto Return     | 635      |
| Reward Loss         | -84.3    |
| Running Env Steps   | 485000   |
| Running Forward KL  | -3.4     |
| Running Reverse KL  | 6.88     |
| Running Update Time | 970      |
----------------------------------
2025-02-01 15:50:57.243485 Eastern Standard Time
| Itration            | 971      |
| Real Det Return     | 644      |
| Real Sto Return     | 616      |
| Reward Loss         | -71.8    |
| Running Env Steps   | 485500   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 7.53     |
| Running Update Time | 971      |
----------------------------------
2025-02-01 15:51:13.163488 Eastern Standard Time
| Itration            | 972      |
| Real Det Return     | 633      |
| Real Sto Return     | 610      |
| Reward Loss         | -83.3    |
| Running Env Steps   | 486000   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 6.31     |
| Running Update Time | 972      |
----------------------------------
2025-02-01 15:51:29.065111 Eastern Standard Time
| Itration            | 973      |
| Real Det Return     | 645      |
| Real Sto Return     | 606      |
| Reward Loss         | -77.1    |
| Running Env Steps   | 486500   |
| Running Forward KL  | -2.98    |
| Running Reverse KL  | 7.21     |
| Running Update Time | 973      |
----------------------------------
2025-02-01 15:51:44.898832 Eastern Standard Time
| Itration            | 974      |
| Real Det Return     | 672      |
| Real Sto Return     | 614      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 487000   |
| Running Forward KL  | -3.77    |
| Running Reverse KL  | 7.88     |
| Running Update Time | 974      |
----------------------------------
2025-02-01 15:52:00.759408 Eastern Standard Time
| Itration            | 975      |
| Real Det Return     | 612      |
| Real Sto Return     | 598      |
| Reward Loss         | -112     |
| Running Env Steps   | 487500   |
| Running Forward KL  | -3.48    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 975      |
----------------------------------
2025-02-01 15:52:16.618752 Eastern Standard Time
| Itration            | 976      |
| Real Det Return     | 652      |
| Real Sto Return     | 635      |
| Reward Loss         | -54.9    |
| Running Env Steps   | 488000   |
| Running Forward KL  | -3.88    |
| Running Reverse KL  | 6.35     |
| Running Update Time | 976      |
----------------------------------
2025-02-01 15:52:32.491025 Eastern Standard Time
| Itration            | 977      |
| Real Det Return     | 649      |
| Real Sto Return     | 610      |
| Reward Loss         | -75.8    |
| Running Env Steps   | 488500   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 977      |
----------------------------------
2025-02-01 15:52:48.383884 Eastern Standard Time
| Itration            | 978      |
| Real Det Return     | 657      |
| Real Sto Return     | 626      |
| Reward Loss         | -103     |
| Running Env Steps   | 489000   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 5.74     |
| Running Update Time | 978      |
----------------------------------
2025-02-01 15:53:04.260275 Eastern Standard Time
| Itration            | 979      |
| Real Det Return     | 611      |
| Real Sto Return     | 586      |
| Reward Loss         | -111     |
| Running Env Steps   | 489500   |
| Running Forward KL  | -3.33    |
| Running Reverse KL  | 7.11     |
| Running Update Time | 979      |
----------------------------------
2025-02-01 15:53:20.130937 Eastern Standard Time
| Itration            | 980      |
| Real Det Return     | 645      |
| Real Sto Return     | 641      |
| Reward Loss         | -66.8    |
| Running Env Steps   | 490000   |
| Running Forward KL  | -3.62    |
| Running Reverse KL  | 7.43     |
| Running Update Time | 980      |
----------------------------------
2025-02-01 15:53:36.066852 Eastern Standard Time
| Itration            | 981      |
| Real Det Return     | 650      |
| Real Sto Return     | 632      |
| Reward Loss         | -81.2    |
| Running Env Steps   | 490500   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 7.31     |
| Running Update Time | 981      |
----------------------------------
2025-02-01 15:53:51.939296 Eastern Standard Time
| Itration            | 982      |
| Real Det Return     | 653      |
| Real Sto Return     | 609      |
| Reward Loss         | -87.9    |
| Running Env Steps   | 491000   |
| Running Forward KL  | -3.81    |
| Running Reverse KL  | 6.08     |
| Running Update Time | 982      |
----------------------------------
2025-02-01 15:54:07.817353 Eastern Standard Time
| Itration            | 983      |
| Real Det Return     | 655      |
| Real Sto Return     | 620      |
| Reward Loss         | -101     |
| Running Env Steps   | 491500   |
| Running Forward KL  | -3.54    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 983      |
----------------------------------
2025-02-01 15:54:23.834362 Eastern Standard Time
| Itration            | 984      |
| Real Det Return     | 663      |
| Real Sto Return     | 628      |
| Reward Loss         | -117     |
| Running Env Steps   | 492000   |
| Running Forward KL  | -2.76    |
| Running Reverse KL  | 6.67     |
| Running Update Time | 984      |
----------------------------------
2025-02-01 15:54:39.722498 Eastern Standard Time
| Itration            | 985      |
| Real Det Return     | 661      |
| Real Sto Return     | 612      |
| Reward Loss         | -50.4    |
| Running Env Steps   | 492500   |
| Running Forward KL  | -3.88    |
| Running Reverse KL  | 6.75     |
| Running Update Time | 985      |
----------------------------------
2025-02-01 15:54:55.607638 Eastern Standard Time
| Itration            | 986      |
| Real Det Return     | 622      |
| Real Sto Return     | 602      |
| Reward Loss         | -124     |
| Running Env Steps   | 493000   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 5.75     |
| Running Update Time | 986      |
----------------------------------
2025-02-01 15:55:11.480037 Eastern Standard Time
| Itration            | 987      |
| Real Det Return     | 618      |
| Real Sto Return     | 605      |
| Reward Loss         | -130     |
| Running Env Steps   | 493500   |
| Running Forward KL  | -2.23    |
| Running Reverse KL  | 7.23     |
| Running Update Time | 987      |
----------------------------------
2025-02-01 15:55:27.352945 Eastern Standard Time
| Itration            | 988      |
| Real Det Return     | 669      |
| Real Sto Return     | 640      |
| Reward Loss         | -25.1    |
| Running Env Steps   | 494000   |
| Running Forward KL  | -3.9     |
| Running Reverse KL  | 6.65     |
| Running Update Time | 988      |
----------------------------------
2025-02-01 15:55:43.068426 Eastern Standard Time
| Itration            | 989      |
| Real Det Return     | 588      |
| Real Sto Return     | 589      |
| Reward Loss         | -207     |
| Running Env Steps   | 494500   |
| Running Forward KL  | -2.46    |
| Running Reverse KL  | 5.75     |
| Running Update Time | 989      |
----------------------------------
2025-02-01 15:55:58.765797 Eastern Standard Time
| Itration            | 990      |
| Real Det Return     | 669      |
| Real Sto Return     | 641      |
| Reward Loss         | -99.9    |
| Running Env Steps   | 495000   |
| Running Forward KL  | -3.34    |
| Running Reverse KL  | 6.41     |
| Running Update Time | 990      |
----------------------------------
2025-02-01 15:56:14.449244 Eastern Standard Time
| Itration            | 991      |
| Real Det Return     | 685      |
| Real Sto Return     | 646      |
| Reward Loss         | -35.3    |
| Running Env Steps   | 495500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 6.63     |
| Running Update Time | 991      |
----------------------------------
2025-02-01 15:56:29.947115 Eastern Standard Time
| Itration            | 992      |
| Real Det Return     | 639      |
| Real Sto Return     | 619      |
| Reward Loss         | -109     |
| Running Env Steps   | 496000   |
| Running Forward KL  | -3.19    |
| Running Reverse KL  | 6.8      |
| Running Update Time | 992      |
----------------------------------
2025-02-01 15:56:45.671407 Eastern Standard Time
| Itration            | 993      |
| Real Det Return     | 628      |
| Real Sto Return     | 595      |
| Reward Loss         | -87.6    |
| Running Env Steps   | 496500   |
| Running Forward KL  | -3.51    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 993      |
----------------------------------
2025-02-01 15:57:01.267465 Eastern Standard Time
| Itration            | 994      |
| Real Det Return     | 661      |
| Real Sto Return     | 616      |
| Reward Loss         | -65.5    |
| Running Env Steps   | 497000   |
| Running Forward KL  | -2.89    |
| Running Reverse KL  | 7.46     |
| Running Update Time | 994      |
----------------------------------
2025-02-01 15:57:16.940331 Eastern Standard Time
| Itration            | 995      |
| Real Det Return     | 654      |
| Real Sto Return     | 623      |
| Reward Loss         | -134     |
| Running Env Steps   | 497500   |
| Running Forward KL  | -2.78    |
| Running Reverse KL  | 6.37     |
| Running Update Time | 995      |
----------------------------------
2025-02-01 15:57:32.453013 Eastern Standard Time
| Itration            | 996      |
| Real Det Return     | 643      |
| Real Sto Return     | 610      |
| Reward Loss         | -149     |
| Running Env Steps   | 498000   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 5.17     |
| Running Update Time | 996      |
----------------------------------
2025-02-01 15:57:48.043336 Eastern Standard Time
| Itration            | 997      |
| Real Det Return     | 663      |
| Real Sto Return     | 646      |
| Reward Loss         | -86.8    |
| Running Env Steps   | 498500   |
| Running Forward KL  | -3.64    |
| Running Reverse KL  | 7.48     |
| Running Update Time | 997      |
----------------------------------
2025-02-01 15:58:03.657803 Eastern Standard Time
| Itration            | 998      |
| Real Det Return     | 647      |
| Real Sto Return     | 608      |
| Reward Loss         | -122     |
| Running Env Steps   | 499000   |
| Running Forward KL  | -2.4     |
| Running Reverse KL  | 7.15     |
| Running Update Time | 998      |
----------------------------------
2025-02-01 15:58:19.276289 Eastern Standard Time
| Itration            | 999      |
| Real Det Return     | 668      |
| Real Sto Return     | 639      |
| Reward Loss         | -110     |
| Running Env Steps   | 499500   |
| Running Forward KL  | -2.2     |
| Running Reverse KL  | 6.78     |
| Running Update Time | 999      |
----------------------------------
2025-02-01 15:58:34.947299 Eastern Standard Time
| Itration            | 1000     |
| Real Det Return     | 627      |
| Real Sto Return     | 595      |
| Reward Loss         | -62.6    |
| Running Env Steps   | 500000   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 1000     |
----------------------------------
2025-02-01 15:58:50.614712 Eastern Standard Time
| Itration            | 1001     |
| Real Det Return     | 648      |
| Real Sto Return     | 599      |
| Reward Loss         | -116     |
| Running Env Steps   | 500500   |
| Running Forward KL  | -3.16    |
| Running Reverse KL  | 7.34     |
| Running Update Time | 1001     |
----------------------------------
2025-02-01 15:59:06.261420 Eastern Standard Time
| Itration            | 1002     |
| Real Det Return     | 645      |
| Real Sto Return     | 589      |
| Reward Loss         | -125     |
| Running Env Steps   | 501000   |
| Running Forward KL  | -3.47    |
| Running Reverse KL  | 5.64     |
| Running Update Time | 1002     |
----------------------------------
2025-02-01 15:59:22.116921 Eastern Standard Time
| Itration            | 1003     |
| Real Det Return     | 647      |
| Real Sto Return     | 624      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 501500   |
| Running Forward KL  | -3.91    |
| Running Reverse KL  | 7.56     |
| Running Update Time | 1003     |
----------------------------------
2025-02-01 15:59:37.813648 Eastern Standard Time
| Itration            | 1004     |
| Real Det Return     | 667      |
| Real Sto Return     | 626      |
| Reward Loss         | -71.3    |
| Running Env Steps   | 502000   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 7.38     |
| Running Update Time | 1004     |
----------------------------------
2025-02-01 15:59:53.433747 Eastern Standard Time
| Itration            | 1005     |
| Real Det Return     | 642      |
| Real Sto Return     | 620      |
| Reward Loss         | -87      |
| Running Env Steps   | 502500   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 1005     |
----------------------------------
2025-02-01 16:00:08.993733 Eastern Standard Time
| Itration            | 1006     |
| Real Det Return     | 673      |
| Real Sto Return     | 634      |
| Reward Loss         | -60.7    |
| Running Env Steps   | 503000   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 6.81     |
| Running Update Time | 1006     |
----------------------------------
2025-02-01 16:00:24.659037 Eastern Standard Time
| Itration            | 1007     |
| Real Det Return     | 671      |
| Real Sto Return     | 636      |
| Reward Loss         | -45.4    |
| Running Env Steps   | 503500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 1007     |
----------------------------------
2025-02-01 16:00:40.341589 Eastern Standard Time
| Itration            | 1008     |
| Real Det Return     | 659      |
| Real Sto Return     | 613      |
| Reward Loss         | -107     |
| Running Env Steps   | 504000   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 6.38     |
| Running Update Time | 1008     |
----------------------------------
2025-02-01 16:00:56.057486 Eastern Standard Time
| Itration            | 1009     |
| Real Det Return     | 656      |
| Real Sto Return     | 635      |
| Reward Loss         | -82.4    |
| Running Env Steps   | 504500   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 6.28     |
| Running Update Time | 1009     |
----------------------------------
2025-02-01 16:01:11.628145 Eastern Standard Time
| Itration            | 1010     |
| Real Det Return     | 681      |
| Real Sto Return     | 655      |
| Reward Loss         | -60      |
| Running Env Steps   | 505000   |
| Running Forward KL  | -3.06    |
| Running Reverse KL  | 8.38     |
| Running Update Time | 1010     |
----------------------------------
2025-02-01 16:01:27.217055 Eastern Standard Time
| Itration            | 1011     |
| Real Det Return     | 625      |
| Real Sto Return     | 606      |
| Reward Loss         | -126     |
| Running Env Steps   | 505500   |
| Running Forward KL  | -2.35    |
| Running Reverse KL  | 7.36     |
| Running Update Time | 1011     |
----------------------------------
2025-02-01 16:01:42.813314 Eastern Standard Time
| Itration            | 1012     |
| Real Det Return     | 675      |
| Real Sto Return     | 619      |
| Reward Loss         | -172     |
| Running Env Steps   | 506000   |
| Running Forward KL  | -2.1     |
| Running Reverse KL  | 6.15     |
| Running Update Time | 1012     |
----------------------------------
2025-02-01 16:01:58.446085 Eastern Standard Time
| Itration            | 1013     |
| Real Det Return     | 661      |
| Real Sto Return     | 641      |
| Reward Loss         | -78.3    |
| Running Env Steps   | 506500   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 7.23     |
| Running Update Time | 1013     |
----------------------------------
2025-02-01 16:02:14.051327 Eastern Standard Time
| Itration            | 1014     |
| Real Det Return     | 637      |
| Real Sto Return     | 616      |
| Reward Loss         | -117     |
| Running Env Steps   | 507000   |
| Running Forward KL  | -2.97    |
| Running Reverse KL  | 6.83     |
| Running Update Time | 1014     |
----------------------------------
2025-02-01 16:02:29.763302 Eastern Standard Time
| Itration            | 1015     |
| Real Det Return     | 676      |
| Real Sto Return     | 638      |
| Reward Loss         | -97.6    |
| Running Env Steps   | 507500   |
| Running Forward KL  | -3.16    |
| Running Reverse KL  | 6.82     |
| Running Update Time | 1015     |
----------------------------------
2025-02-01 16:02:45.428496 Eastern Standard Time
| Itration            | 1016     |
| Real Det Return     | 662      |
| Real Sto Return     | 636      |
| Reward Loss         | -72.7    |
| Running Env Steps   | 508000   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 6.17     |
| Running Update Time | 1016     |
----------------------------------
2025-02-01 16:03:01.066445 Eastern Standard Time
| Itration            | 1017     |
| Real Det Return     | 638      |
| Real Sto Return     | 614      |
| Reward Loss         | -114     |
| Running Env Steps   | 508500   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 7.89     |
| Running Update Time | 1017     |
----------------------------------
2025-02-01 16:03:16.795296 Eastern Standard Time
| Itration            | 1018     |
| Real Det Return     | 671      |
| Real Sto Return     | 635      |
| Reward Loss         | -89.1    |
| Running Env Steps   | 509000   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 7.12     |
| Running Update Time | 1018     |
----------------------------------
2025-02-01 16:03:32.388402 Eastern Standard Time
| Itration            | 1019     |
| Real Det Return     | 675      |
| Real Sto Return     | 657      |
| Reward Loss         | -53.4    |
| Running Env Steps   | 509500   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 7.08     |
| Running Update Time | 1019     |
----------------------------------
2025-02-01 16:03:48.035616 Eastern Standard Time
| Itration            | 1020     |
| Real Det Return     | 652      |
| Real Sto Return     | 626      |
| Reward Loss         | -53.5    |
| Running Env Steps   | 510000   |
| Running Forward KL  | -3.84    |
| Running Reverse KL  | 6.71     |
| Running Update Time | 1020     |
----------------------------------
2025-02-01 16:04:03.601878 Eastern Standard Time
| Itration            | 1021     |
| Real Det Return     | 654      |
| Real Sto Return     | 641      |
| Reward Loss         | -52.5    |
| Running Env Steps   | 510500   |
| Running Forward KL  | -3.79    |
| Running Reverse KL  | 7.76     |
| Running Update Time | 1021     |
----------------------------------
2025-02-01 16:04:19.197449 Eastern Standard Time
| Itration            | 1022     |
| Real Det Return     | 616      |
| Real Sto Return     | 590      |
| Reward Loss         | -61.5    |
| Running Env Steps   | 511000   |
| Running Forward KL  | -3.01    |
| Running Reverse KL  | 7.88     |
| Running Update Time | 1022     |
----------------------------------
2025-02-01 16:04:34.940045 Eastern Standard Time
| Itration            | 1023     |
| Real Det Return     | 659      |
| Real Sto Return     | 601      |
| Reward Loss         | -139     |
| Running Env Steps   | 511500   |
| Running Forward KL  | -2.94    |
| Running Reverse KL  | 5.67     |
| Running Update Time | 1023     |
----------------------------------
2025-02-01 16:04:50.529008 Eastern Standard Time
| Itration            | 1024     |
| Real Det Return     | 649      |
| Real Sto Return     | 618      |
| Reward Loss         | -113     |
| Running Env Steps   | 512000   |
| Running Forward KL  | -2.69    |
| Running Reverse KL  | 7.47     |
| Running Update Time | 1024     |
----------------------------------
2025-02-01 16:05:06.179789 Eastern Standard Time
| Itration            | 1025     |
| Real Det Return     | 677      |
| Real Sto Return     | 655      |
| Reward Loss         | -67.7    |
| Running Env Steps   | 512500   |
| Running Forward KL  | -3.41    |
| Running Reverse KL  | 7.41     |
| Running Update Time | 1025     |
----------------------------------
2025-02-01 16:05:21.888453 Eastern Standard Time
| Itration            | 1026     |
| Real Det Return     | 661      |
| Real Sto Return     | 635      |
| Reward Loss         | -120     |
| Running Env Steps   | 513000   |
| Running Forward KL  | -3.54    |
| Running Reverse KL  | 6.46     |
| Running Update Time | 1026     |
----------------------------------
2025-02-01 16:05:37.509596 Eastern Standard Time
| Itration            | 1027     |
| Real Det Return     | 684      |
| Real Sto Return     | 638      |
| Reward Loss         | -51.6    |
| Running Env Steps   | 513500   |
| Running Forward KL  | -4.06    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1027     |
----------------------------------
2025-02-01 16:05:53.097540 Eastern Standard Time
| Itration            | 1028     |
| Real Det Return     | 673      |
| Real Sto Return     | 633      |
| Reward Loss         | -82.1    |
| Running Env Steps   | 514000   |
| Running Forward KL  | -3.45    |
| Running Reverse KL  | 6.82     |
| Running Update Time | 1028     |
----------------------------------
2025-02-01 16:06:08.731242 Eastern Standard Time
| Itration            | 1029     |
| Real Det Return     | 670      |
| Real Sto Return     | 651      |
| Reward Loss         | -59.4    |
| Running Env Steps   | 514500   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 7.75     |
| Running Update Time | 1029     |
----------------------------------
2025-02-01 16:06:24.352795 Eastern Standard Time
| Itration            | 1030     |
| Real Det Return     | 660      |
| Real Sto Return     | 626      |
| Reward Loss         | -66.7    |
| Running Env Steps   | 515000   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 7.36     |
| Running Update Time | 1030     |
----------------------------------
2025-02-01 16:06:39.974835 Eastern Standard Time
| Itration            | 1031     |
| Real Det Return     | 644      |
| Real Sto Return     | 612      |
| Reward Loss         | -92.3    |
| Running Env Steps   | 515500   |
| Running Forward KL  | -3.48    |
| Running Reverse KL  | 6.94     |
| Running Update Time | 1031     |
----------------------------------
2025-02-01 16:06:55.625359 Eastern Standard Time
| Itration            | 1032     |
| Real Det Return     | 629      |
| Real Sto Return     | 581      |
| Reward Loss         | -135     |
| Running Env Steps   | 516000   |
| Running Forward KL  | -3.06    |
| Running Reverse KL  | 7.39     |
| Running Update Time | 1032     |
----------------------------------
2025-02-01 16:07:11.292221 Eastern Standard Time
| Itration            | 1033     |
| Real Det Return     | 592      |
| Real Sto Return     | 569      |
| Reward Loss         | -157     |
| Running Env Steps   | 516500   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 1033     |
----------------------------------
2025-02-01 16:07:26.899156 Eastern Standard Time
| Itration            | 1034     |
| Real Det Return     | 646      |
| Real Sto Return     | 610      |
| Reward Loss         | -110     |
| Running Env Steps   | 517000   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1034     |
----------------------------------
2025-02-01 16:07:42.518879 Eastern Standard Time
| Itration            | 1035     |
| Real Det Return     | 625      |
| Real Sto Return     | 612      |
| Reward Loss         | -116     |
| Running Env Steps   | 517500   |
| Running Forward KL  | -3.54    |
| Running Reverse KL  | 6.31     |
| Running Update Time | 1035     |
----------------------------------
2025-02-01 16:07:58.167326 Eastern Standard Time
| Itration            | 1036     |
| Real Det Return     | 669      |
| Real Sto Return     | 637      |
| Reward Loss         | -57      |
| Running Env Steps   | 518000   |
| Running Forward KL  | -3.09    |
| Running Reverse KL  | 7.4      |
| Running Update Time | 1036     |
----------------------------------
2025-02-01 16:08:13.817279 Eastern Standard Time
| Itration            | 1037     |
| Real Det Return     | 665      |
| Real Sto Return     | 621      |
| Reward Loss         | -103     |
| Running Env Steps   | 518500   |
| Running Forward KL  | -3.05    |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1037     |
----------------------------------
2025-02-01 16:08:29.547719 Eastern Standard Time
| Itration            | 1038     |
| Real Det Return     | 670      |
| Real Sto Return     | 644      |
| Reward Loss         | -75      |
| Running Env Steps   | 519000   |
| Running Forward KL  | -3.81    |
| Running Reverse KL  | 7        |
| Running Update Time | 1038     |
----------------------------------
2025-02-01 16:08:45.280572 Eastern Standard Time
| Itration            | 1039     |
| Real Det Return     | 682      |
| Real Sto Return     | 652      |
| Reward Loss         | -71.2    |
| Running Env Steps   | 519500   |
| Running Forward KL  | -3.51    |
| Running Reverse KL  | 6.94     |
| Running Update Time | 1039     |
----------------------------------
2025-02-01 16:09:00.942011 Eastern Standard Time
| Itration            | 1040     |
| Real Det Return     | 660      |
| Real Sto Return     | 623      |
| Reward Loss         | -111     |
| Running Env Steps   | 520000   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 7.19     |
| Running Update Time | 1040     |
----------------------------------
2025-02-01 16:09:16.589374 Eastern Standard Time
| Itration            | 1041     |
| Real Det Return     | 670      |
| Real Sto Return     | 637      |
| Reward Loss         | -101     |
| Running Env Steps   | 520500   |
| Running Forward KL  | -3.14    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 1041     |
----------------------------------
2025-02-01 16:09:32.379881 Eastern Standard Time
| Itration            | 1042     |
| Real Det Return     | 657      |
| Real Sto Return     | 637      |
| Reward Loss         | -99.2    |
| Running Env Steps   | 521000   |
| Running Forward KL  | -3.8     |
| Running Reverse KL  | 6.39     |
| Running Update Time | 1042     |
----------------------------------
2025-02-01 16:09:48.008960 Eastern Standard Time
| Itration            | 1043     |
| Real Det Return     | 686      |
| Real Sto Return     | 658      |
| Reward Loss         | -48.4    |
| Running Env Steps   | 521500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 6.82     |
| Running Update Time | 1043     |
----------------------------------
2025-02-01 16:10:03.613066 Eastern Standard Time
| Itration            | 1044     |
| Real Det Return     | 659      |
| Real Sto Return     | 629      |
| Reward Loss         | -123     |
| Running Env Steps   | 522000   |
| Running Forward KL  | -2.94    |
| Running Reverse KL  | 6.51     |
| Running Update Time | 1044     |
----------------------------------
2025-02-01 16:10:19.273909 Eastern Standard Time
| Itration            | 1045     |
| Real Det Return     | 663      |
| Real Sto Return     | 628      |
| Reward Loss         | -82      |
| Running Env Steps   | 522500   |
| Running Forward KL  | -3.27    |
| Running Reverse KL  | 7.34     |
| Running Update Time | 1045     |
----------------------------------
2025-02-01 16:10:34.920376 Eastern Standard Time
| Itration            | 1046     |
| Real Det Return     | 668      |
| Real Sto Return     | 639      |
| Reward Loss         | -39.3    |
| Running Env Steps   | 523000   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 7.92     |
| Running Update Time | 1046     |
----------------------------------
2025-02-01 16:10:50.588708 Eastern Standard Time
| Itration            | 1047     |
| Real Det Return     | 670      |
| Real Sto Return     | 651      |
| Reward Loss         | -76.8    |
| Running Env Steps   | 523500   |
| Running Forward KL  | -3.42    |
| Running Reverse KL  | 6.79     |
| Running Update Time | 1047     |
----------------------------------
2025-02-01 16:11:06.216083 Eastern Standard Time
| Itration            | 1048     |
| Real Det Return     | 680      |
| Real Sto Return     | 659      |
| Reward Loss         | -49.3    |
| Running Env Steps   | 524000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 6.72     |
| Running Update Time | 1048     |
----------------------------------
2025-02-01 16:11:21.902330 Eastern Standard Time
| Itration            | 1049     |
| Real Det Return     | 673      |
| Real Sto Return     | 650      |
| Reward Loss         | -82.1    |
| Running Env Steps   | 524500   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1049     |
----------------------------------
2025-02-01 16:11:37.497805 Eastern Standard Time
| Itration            | 1050     |
| Real Det Return     | 678      |
| Real Sto Return     | 626      |
| Reward Loss         | -179     |
| Running Env Steps   | 525000   |
| Running Forward KL  | -1.97    |
| Running Reverse KL  | 6.05     |
| Running Update Time | 1050     |
----------------------------------
2025-02-01 16:11:53.203115 Eastern Standard Time
| Itration            | 1051     |
| Real Det Return     | 623      |
| Real Sto Return     | 614      |
| Reward Loss         | -109     |
| Running Env Steps   | 525500   |
| Running Forward KL  | -3.78    |
| Running Reverse KL  | 7.27     |
| Running Update Time | 1051     |
----------------------------------
2025-02-01 16:12:08.914165 Eastern Standard Time
| Itration            | 1052     |
| Real Det Return     | 672      |
| Real Sto Return     | 646      |
| Reward Loss         | -70.5    |
| Running Env Steps   | 526000   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 6.91     |
| Running Update Time | 1052     |
----------------------------------
2025-02-01 16:12:24.536655 Eastern Standard Time
| Itration            | 1053     |
| Real Det Return     | 621      |
| Real Sto Return     | 601      |
| Reward Loss         | -94.7    |
| Running Env Steps   | 526500   |
| Running Forward KL  | -3.4     |
| Running Reverse KL  | 7.11     |
| Running Update Time | 1053     |
----------------------------------
2025-02-01 16:12:40.185459 Eastern Standard Time
| Itration            | 1054     |
| Real Det Return     | 670      |
| Real Sto Return     | 643      |
| Reward Loss         | -70.8    |
| Running Env Steps   | 527000   |
| Running Forward KL  | -3.88    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 1054     |
----------------------------------
2025-02-01 16:12:55.898608 Eastern Standard Time
| Itration            | 1055     |
| Real Det Return     | 651      |
| Real Sto Return     | 632      |
| Reward Loss         | -91.9    |
| Running Env Steps   | 527500   |
| Running Forward KL  | -3.69    |
| Running Reverse KL  | 7.69     |
| Running Update Time | 1055     |
----------------------------------
2025-02-01 16:13:11.639771 Eastern Standard Time
| Itration            | 1056     |
| Real Det Return     | 676      |
| Real Sto Return     | 650      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 528000   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 1056     |
----------------------------------
2025-02-01 16:13:27.278474 Eastern Standard Time
| Itration            | 1057     |
| Real Det Return     | 633      |
| Real Sto Return     | 602      |
| Reward Loss         | -90.3    |
| Running Env Steps   | 528500   |
| Running Forward KL  | -3.05    |
| Running Reverse KL  | 7.64     |
| Running Update Time | 1057     |
----------------------------------
2025-02-01 16:13:42.914501 Eastern Standard Time
| Itration            | 1058     |
| Real Det Return     | 654      |
| Real Sto Return     | 623      |
| Reward Loss         | -119     |
| Running Env Steps   | 529000   |
| Running Forward KL  | -3.06    |
| Running Reverse KL  | 7.27     |
| Running Update Time | 1058     |
----------------------------------
2025-02-01 16:13:58.518657 Eastern Standard Time
| Itration            | 1059     |
| Real Det Return     | 634      |
| Real Sto Return     | 639      |
| Reward Loss         | -128     |
| Running Env Steps   | 529500   |
| Running Forward KL  | -3.14    |
| Running Reverse KL  | 7.21     |
| Running Update Time | 1059     |
----------------------------------
2025-02-01 16:14:14.245164 Eastern Standard Time
| Itration            | 1060     |
| Real Det Return     | 687      |
| Real Sto Return     | 646      |
| Reward Loss         | -110     |
| Running Env Steps   | 530000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 6.44     |
| Running Update Time | 1060     |
----------------------------------
2025-02-01 16:14:29.917579 Eastern Standard Time
| Itration            | 1061     |
| Real Det Return     | 683      |
| Real Sto Return     | 639      |
| Reward Loss         | -75.7    |
| Running Env Steps   | 530500   |
| Running Forward KL  | -3.48    |
| Running Reverse KL  | 6.35     |
| Running Update Time | 1061     |
----------------------------------
2025-02-01 16:14:45.604012 Eastern Standard Time
| Itration            | 1062     |
| Real Det Return     | 658      |
| Real Sto Return     | 620      |
| Reward Loss         | -54.5    |
| Running Env Steps   | 531000   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 1062     |
----------------------------------
2025-02-01 16:15:01.273099 Eastern Standard Time
| Itration            | 1063     |
| Real Det Return     | 643      |
| Real Sto Return     | 627      |
| Reward Loss         | -92.7    |
| Running Env Steps   | 531500   |
| Running Forward KL  | -3.33    |
| Running Reverse KL  | 7.39     |
| Running Update Time | 1063     |
----------------------------------
2025-02-01 16:15:16.933733 Eastern Standard Time
| Itration            | 1064     |
| Real Det Return     | 631      |
| Real Sto Return     | 609      |
| Reward Loss         | -103     |
| Running Env Steps   | 532000   |
| Running Forward KL  | -3.4     |
| Running Reverse KL  | 7.34     |
| Running Update Time | 1064     |
----------------------------------
2025-02-01 16:15:32.556204 Eastern Standard Time
| Itration            | 1065     |
| Real Det Return     | 663      |
| Real Sto Return     | 634      |
| Reward Loss         | -42      |
| Running Env Steps   | 532500   |
| Running Forward KL  | -3.8     |
| Running Reverse KL  | 8.07     |
| Running Update Time | 1065     |
----------------------------------
2025-02-01 16:15:48.212848 Eastern Standard Time
| Itration            | 1066     |
| Real Det Return     | 676      |
| Real Sto Return     | 646      |
| Reward Loss         | -94.6    |
| Running Env Steps   | 533000   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 1066     |
----------------------------------
2025-02-01 16:16:03.834188 Eastern Standard Time
| Itration            | 1067     |
| Real Det Return     | 682      |
| Real Sto Return     | 648      |
| Reward Loss         | -51.4    |
| Running Env Steps   | 533500   |
| Running Forward KL  | -4.01    |
| Running Reverse KL  | 6.54     |
| Running Update Time | 1067     |
----------------------------------
2025-02-01 16:16:19.445644 Eastern Standard Time
| Itration            | 1068     |
| Real Det Return     | 663      |
| Real Sto Return     | 632      |
| Reward Loss         | -83.2    |
| Running Env Steps   | 534000   |
| Running Forward KL  | -2.78    |
| Running Reverse KL  | 7.86     |
| Running Update Time | 1068     |
----------------------------------
2025-02-01 16:16:35.090701 Eastern Standard Time
| Itration            | 1069     |
| Real Det Return     | 653      |
| Real Sto Return     | 623      |
| Reward Loss         | -51.4    |
| Running Env Steps   | 534500   |
| Running Forward KL  | -3.67    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1069     |
----------------------------------
2025-02-01 16:16:50.856544 Eastern Standard Time
| Itration            | 1070     |
| Real Det Return     | 682      |
| Real Sto Return     | 654      |
| Reward Loss         | -15.5    |
| Running Env Steps   | 535000   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 8.1      |
| Running Update Time | 1070     |
----------------------------------
2025-02-01 16:17:06.522673 Eastern Standard Time
| Itration            | 1071     |
| Real Det Return     | 664      |
| Real Sto Return     | 638      |
| Reward Loss         | -62.5    |
| Running Env Steps   | 535500   |
| Running Forward KL  | -3.8     |
| Running Reverse KL  | 7.49     |
| Running Update Time | 1071     |
----------------------------------
2025-02-01 16:17:22.422982 Eastern Standard Time
| Itration            | 1072     |
| Real Det Return     | 636      |
| Real Sto Return     | 612      |
| Reward Loss         | -116     |
| Running Env Steps   | 536000   |
| Running Forward KL  | -3.77    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 1072     |
----------------------------------
2025-02-01 16:17:38.229259 Eastern Standard Time
| Itration            | 1073     |
| Real Det Return     | 673      |
| Real Sto Return     | 638      |
| Reward Loss         | -54.4    |
| Running Env Steps   | 536500   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 7.87     |
| Running Update Time | 1073     |
----------------------------------
2025-02-01 16:17:53.928512 Eastern Standard Time
| Itration            | 1074     |
| Real Det Return     | 671      |
| Real Sto Return     | 641      |
| Reward Loss         | -93.5    |
| Running Env Steps   | 537000   |
| Running Forward KL  | -2.56    |
| Running Reverse KL  | 7.47     |
| Running Update Time | 1074     |
----------------------------------
2025-02-01 16:18:09.892181 Eastern Standard Time
| Itration            | 1075     |
| Real Det Return     | 677      |
| Real Sto Return     | 631      |
| Reward Loss         | -81.9    |
| Running Env Steps   | 537500   |
| Running Forward KL  | -3.54    |
| Running Reverse KL  | 7.51     |
| Running Update Time | 1075     |
----------------------------------
2025-02-01 16:18:26.075140 Eastern Standard Time
| Itration            | 1076     |
| Real Det Return     | 659      |
| Real Sto Return     | 645      |
| Reward Loss         | -90.7    |
| Running Env Steps   | 538000   |
| Running Forward KL  | -3.87    |
| Running Reverse KL  | 7.43     |
| Running Update Time | 1076     |
----------------------------------
2025-02-01 16:18:42.136888 Eastern Standard Time
| Itration            | 1077     |
| Real Det Return     | 680      |
| Real Sto Return     | 650      |
| Reward Loss         | -92.5    |
| Running Env Steps   | 538500   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 7.52     |
| Running Update Time | 1077     |
----------------------------------
2025-02-01 16:18:58.722553 Eastern Standard Time
| Itration            | 1078     |
| Real Det Return     | 682      |
| Real Sto Return     | 654      |
| Reward Loss         | -48.7    |
| Running Env Steps   | 539000   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 6.62     |
| Running Update Time | 1078     |
----------------------------------
2025-02-01 16:19:14.431533 Eastern Standard Time
| Itration            | 1079     |
| Real Det Return     | 662      |
| Real Sto Return     | 642      |
| Reward Loss         | -81      |
| Running Env Steps   | 539500   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 6.72     |
| Running Update Time | 1079     |
----------------------------------
2025-02-01 16:19:30.243852 Eastern Standard Time
| Itration            | 1080     |
| Real Det Return     | 661      |
| Real Sto Return     | 627      |
| Reward Loss         | -93.9    |
| Running Env Steps   | 540000   |
| Running Forward KL  | -3.44    |
| Running Reverse KL  | 6.77     |
| Running Update Time | 1080     |
----------------------------------
2025-02-01 16:19:45.934382 Eastern Standard Time
| Itration            | 1081     |
| Real Det Return     | 677      |
| Real Sto Return     | 647      |
| Reward Loss         | -108     |
| Running Env Steps   | 540500   |
| Running Forward KL  | -3.13    |
| Running Reverse KL  | 6.59     |
| Running Update Time | 1081     |
----------------------------------
2025-02-01 16:20:01.966606 Eastern Standard Time
| Itration            | 1082     |
| Real Det Return     | 646      |
| Real Sto Return     | 618      |
| Reward Loss         | -91.9    |
| Running Env Steps   | 541000   |
| Running Forward KL  | -3.7     |
| Running Reverse KL  | 6.39     |
| Running Update Time | 1082     |
----------------------------------
2025-02-01 16:20:18.428276 Eastern Standard Time
| Itration            | 1083     |
| Real Det Return     | 637      |
| Real Sto Return     | 610      |
| Reward Loss         | -149     |
| Running Env Steps   | 541500   |
| Running Forward KL  | -2.59    |
| Running Reverse KL  | 5.95     |
| Running Update Time | 1083     |
----------------------------------
2025-02-01 16:20:34.254073 Eastern Standard Time
| Itration            | 1084     |
| Real Det Return     | 652      |
| Real Sto Return     | 636      |
| Reward Loss         | -95.9    |
| Running Env Steps   | 542000   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 6.08     |
| Running Update Time | 1084     |
----------------------------------
2025-02-01 16:20:50.627256 Eastern Standard Time
| Itration            | 1085     |
| Real Det Return     | 653      |
| Real Sto Return     | 634      |
| Reward Loss         | -91.1    |
| Running Env Steps   | 542500   |
| Running Forward KL  | -3.79    |
| Running Reverse KL  | 7.15     |
| Running Update Time | 1085     |
----------------------------------
2025-02-01 16:21:06.437185 Eastern Standard Time
| Itration            | 1086     |
| Real Det Return     | 662      |
| Real Sto Return     | 649      |
| Reward Loss         | -78.1    |
| Running Env Steps   | 543000   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 1086     |
----------------------------------
2025-02-01 16:21:22.403438 Eastern Standard Time
| Itration            | 1087     |
| Real Det Return     | 662      |
| Real Sto Return     | 637      |
| Reward Loss         | -58.2    |
| Running Env Steps   | 543500   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1087     |
----------------------------------
2025-02-01 16:21:38.667886 Eastern Standard Time
| Itration            | 1088     |
| Real Det Return     | 655      |
| Real Sto Return     | 645      |
| Reward Loss         | -63.9    |
| Running Env Steps   | 544000   |
| Running Forward KL  | -3.76    |
| Running Reverse KL  | 7.93     |
| Running Update Time | 1088     |
----------------------------------
2025-02-01 16:21:54.962897 Eastern Standard Time
| Itration            | 1089     |
| Real Det Return     | 658      |
| Real Sto Return     | 631      |
| Reward Loss         | -86.1    |
| Running Env Steps   | 544500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1089     |
----------------------------------
2025-02-01 16:22:11.060292 Eastern Standard Time
| Itration            | 1090     |
| Real Det Return     | 670      |
| Real Sto Return     | 653      |
| Reward Loss         | -78.8    |
| Running Env Steps   | 545000   |
| Running Forward KL  | -3.25    |
| Running Reverse KL  | 7.78     |
| Running Update Time | 1090     |
----------------------------------
2025-02-01 16:22:27.221950 Eastern Standard Time
| Itration            | 1091     |
| Real Det Return     | 674      |
| Real Sto Return     | 649      |
| Reward Loss         | -94.3    |
| Running Env Steps   | 545500   |
| Running Forward KL  | -3.78    |
| Running Reverse KL  | 6.92     |
| Running Update Time | 1091     |
----------------------------------
2025-02-01 16:22:42.985158 Eastern Standard Time
| Itration            | 1092     |
| Real Det Return     | 671      |
| Real Sto Return     | 632      |
| Reward Loss         | -144     |
| Running Env Steps   | 546000   |
| Running Forward KL  | -3.02    |
| Running Reverse KL  | 6.54     |
| Running Update Time | 1092     |
----------------------------------
2025-02-01 16:22:58.758607 Eastern Standard Time
| Itration            | 1093     |
| Real Det Return     | 661      |
| Real Sto Return     | 629      |
| Reward Loss         | -46.1    |
| Running Env Steps   | 546500   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 6.92     |
| Running Update Time | 1093     |
----------------------------------
2025-02-01 16:23:14.636592 Eastern Standard Time
| Itration            | 1094     |
| Real Det Return     | 629      |
| Real Sto Return     | 607      |
| Reward Loss         | -99.8    |
| Running Env Steps   | 547000   |
| Running Forward KL  | -3.2     |
| Running Reverse KL  | 7.42     |
| Running Update Time | 1094     |
----------------------------------
2025-02-01 16:23:31.171378 Eastern Standard Time
| Itration            | 1095     |
| Real Det Return     | 673      |
| Real Sto Return     | 653      |
| Reward Loss         | -82.9    |
| Running Env Steps   | 547500   |
| Running Forward KL  | -3.86    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1095     |
----------------------------------
2025-02-01 16:23:46.932925 Eastern Standard Time
| Itration            | 1096     |
| Real Det Return     | 689      |
| Real Sto Return     | 666      |
| Reward Loss         | -79.6    |
| Running Env Steps   | 548000   |
| Running Forward KL  | -3.72    |
| Running Reverse KL  | 7.62     |
| Running Update Time | 1096     |
----------------------------------
2025-02-01 16:24:02.713791 Eastern Standard Time
| Itration            | 1097     |
| Real Det Return     | 676      |
| Real Sto Return     | 640      |
| Reward Loss         | -111     |
| Running Env Steps   | 548500   |
| Running Forward KL  | -4.5     |
| Running Reverse KL  | 6.71     |
| Running Update Time | 1097     |
----------------------------------
2025-02-01 16:24:18.448412 Eastern Standard Time
| Itration            | 1098     |
| Real Det Return     | 671      |
| Real Sto Return     | 642      |
| Reward Loss         | -68.9    |
| Running Env Steps   | 549000   |
| Running Forward KL  | -3.8     |
| Running Reverse KL  | 6.88     |
| Running Update Time | 1098     |
----------------------------------
2025-02-01 16:24:34.208729 Eastern Standard Time
| Itration            | 1099     |
| Real Det Return     | 641      |
| Real Sto Return     | 624      |
| Reward Loss         | -59.9    |
| Running Env Steps   | 549500   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 6.67     |
| Running Update Time | 1099     |
----------------------------------
2025-02-01 16:24:49.962901 Eastern Standard Time
| Itration            | 1100     |
| Real Det Return     | 658      |
| Real Sto Return     | 653      |
| Reward Loss         | -78      |
| Running Env Steps   | 550000   |
| Running Forward KL  | -3.62    |
| Running Reverse KL  | 7.27     |
| Running Update Time | 1100     |
----------------------------------
2025-02-01 16:25:05.697138 Eastern Standard Time
| Itration            | 1101     |
| Real Det Return     | 668      |
| Real Sto Return     | 634      |
| Reward Loss         | -95.4    |
| Running Env Steps   | 550500   |
| Running Forward KL  | -2.92    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1101     |
----------------------------------
2025-02-01 16:25:21.407489 Eastern Standard Time
| Itration            | 1102     |
| Real Det Return     | 673      |
| Real Sto Return     | 652      |
| Reward Loss         | -95      |
| Running Env Steps   | 551000   |
| Running Forward KL  | -3.77    |
| Running Reverse KL  | 7.5      |
| Running Update Time | 1102     |
----------------------------------
2025-02-01 16:25:37.182159 Eastern Standard Time
| Itration            | 1103     |
| Real Det Return     | 668      |
| Real Sto Return     | 640      |
| Reward Loss         | -194     |
| Running Env Steps   | 551500   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 6.02     |
| Running Update Time | 1103     |
----------------------------------
2025-02-01 16:25:53.212081 Eastern Standard Time
| Itration            | 1104     |
| Real Det Return     | 639      |
| Real Sto Return     | 609      |
| Reward Loss         | -98.3    |
| Running Env Steps   | 552000   |
| Running Forward KL  | -3.64    |
| Running Reverse KL  | 7.41     |
| Running Update Time | 1104     |
----------------------------------
2025-02-01 16:26:09.177758 Eastern Standard Time
| Itration            | 1105     |
| Real Det Return     | 687      |
| Real Sto Return     | 653      |
| Reward Loss         | -61.7    |
| Running Env Steps   | 552500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 7.03     |
| Running Update Time | 1105     |
----------------------------------
2025-02-01 16:26:24.915045 Eastern Standard Time
| Itration            | 1106     |
| Real Det Return     | 669      |
| Real Sto Return     | 625      |
| Reward Loss         | -95.4    |
| Running Env Steps   | 553000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 6.54     |
| Running Update Time | 1106     |
----------------------------------
2025-02-01 16:26:40.651706 Eastern Standard Time
| Itration            | 1107     |
| Real Det Return     | 683      |
| Real Sto Return     | 637      |
| Reward Loss         | -72.1    |
| Running Env Steps   | 553500   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 7.96     |
| Running Update Time | 1107     |
----------------------------------
2025-02-01 16:26:56.436874 Eastern Standard Time
| Itration            | 1108     |
| Real Det Return     | 687      |
| Real Sto Return     | 651      |
| Reward Loss         | -82.5    |
| Running Env Steps   | 554000   |
| Running Forward KL  | -3.79    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 1108     |
----------------------------------
2025-02-01 16:27:12.594060 Eastern Standard Time
| Itration            | 1109     |
| Real Det Return     | 655      |
| Real Sto Return     | 635      |
| Reward Loss         | -81.2    |
| Running Env Steps   | 554500   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 7.39     |
| Running Update Time | 1109     |
----------------------------------
2025-02-01 16:27:28.729138 Eastern Standard Time
| Itration            | 1110     |
| Real Det Return     | 682      |
| Real Sto Return     | 644      |
| Reward Loss         | -56.3    |
| Running Env Steps   | 555000   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 7.35     |
| Running Update Time | 1110     |
----------------------------------
2025-02-01 16:27:44.880991 Eastern Standard Time
| Itration            | 1111     |
| Real Det Return     | 655      |
| Real Sto Return     | 628      |
| Reward Loss         | -108     |
| Running Env Steps   | 555500   |
| Running Forward KL  | -2.85    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1111     |
----------------------------------
2025-02-01 16:28:01.067751 Eastern Standard Time
| Itration            | 1112     |
| Real Det Return     | 659      |
| Real Sto Return     | 625      |
| Reward Loss         | -114     |
| Running Env Steps   | 556000   |
| Running Forward KL  | -3.7     |
| Running Reverse KL  | 6.96     |
| Running Update Time | 1112     |
----------------------------------
2025-02-01 16:28:17.354236 Eastern Standard Time
| Itration            | 1113     |
| Real Det Return     | 684      |
| Real Sto Return     | 639      |
| Reward Loss         | -73.5    |
| Running Env Steps   | 556500   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 6.51     |
| Running Update Time | 1113     |
----------------------------------
2025-02-01 16:28:33.272009 Eastern Standard Time
| Itration            | 1114     |
| Real Det Return     | 659      |
| Real Sto Return     | 644      |
| Reward Loss         | -73.4    |
| Running Env Steps   | 557000   |
| Running Forward KL  | -3.96    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1114     |
----------------------------------
2025-02-01 16:28:49.119202 Eastern Standard Time
| Itration            | 1115     |
| Real Det Return     | 619      |
| Real Sto Return     | 605      |
| Reward Loss         | -114     |
| Running Env Steps   | 557500   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1115     |
----------------------------------
2025-02-01 16:29:04.817220 Eastern Standard Time
| Itration            | 1116     |
| Real Det Return     | 658      |
| Real Sto Return     | 634      |
| Reward Loss         | -89.9    |
| Running Env Steps   | 558000   |
| Running Forward KL  | -3.51    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 1116     |
----------------------------------
2025-02-01 16:29:20.710327 Eastern Standard Time
| Itration            | 1117     |
| Real Det Return     | 616      |
| Real Sto Return     | 599      |
| Reward Loss         | -122     |
| Running Env Steps   | 558500   |
| Running Forward KL  | -3.78    |
| Running Reverse KL  | 6.62     |
| Running Update Time | 1117     |
----------------------------------
2025-02-01 16:29:36.294503 Eastern Standard Time
| Itration            | 1118     |
| Real Det Return     | 660      |
| Real Sto Return     | 638      |
| Reward Loss         | -87.7    |
| Running Env Steps   | 559000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 6.31     |
| Running Update Time | 1118     |
----------------------------------
2025-02-01 16:29:52.034085 Eastern Standard Time
| Itration            | 1119     |
| Real Det Return     | 669      |
| Real Sto Return     | 648      |
| Reward Loss         | -56.5    |
| Running Env Steps   | 559500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 7.5      |
| Running Update Time | 1119     |
----------------------------------
2025-02-01 16:30:07.701788 Eastern Standard Time
| Itration            | 1120     |
| Real Det Return     | 683      |
| Real Sto Return     | 661      |
| Reward Loss         | -67.9    |
| Running Env Steps   | 560000   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 7.16     |
| Running Update Time | 1120     |
----------------------------------
2025-02-01 16:30:23.470977 Eastern Standard Time
| Itration            | 1121     |
| Real Det Return     | 609      |
| Real Sto Return     | 591      |
| Reward Loss         | -160     |
| Running Env Steps   | 560500   |
| Running Forward KL  | -2.93    |
| Running Reverse KL  | 7.36     |
| Running Update Time | 1121     |
----------------------------------
2025-02-01 16:30:39.276341 Eastern Standard Time
| Itration            | 1122     |
| Real Det Return     | 624      |
| Real Sto Return     | 606      |
| Reward Loss         | -108     |
| Running Env Steps   | 561000   |
| Running Forward KL  | -3.09    |
| Running Reverse KL  | 8.23     |
| Running Update Time | 1122     |
----------------------------------
2025-02-01 16:30:55.197166 Eastern Standard Time
| Itration            | 1123     |
| Real Det Return     | 672      |
| Real Sto Return     | 641      |
| Reward Loss         | -58.9    |
| Running Env Steps   | 561500   |
| Running Forward KL  | -3.46    |
| Running Reverse KL  | 7.48     |
| Running Update Time | 1123     |
----------------------------------
2025-02-01 16:31:10.934281 Eastern Standard Time
| Itration            | 1124     |
| Real Det Return     | 671      |
| Real Sto Return     | 648      |
| Reward Loss         | -66.8    |
| Running Env Steps   | 562000   |
| Running Forward KL  | -3.79    |
| Running Reverse KL  | 7.29     |
| Running Update Time | 1124     |
----------------------------------
2025-02-01 16:31:29.601473 Eastern Standard Time
| Itration            | 1125     |
| Real Det Return     | 688      |
| Real Sto Return     | 638      |
| Reward Loss         | -84.3    |
| Running Env Steps   | 562500   |
| Running Forward KL  | -2.93    |
| Running Reverse KL  | 8.71     |
| Running Update Time | 1125     |
----------------------------------
2025-02-01 16:31:46.618853 Eastern Standard Time
| Itration            | 1126     |
| Real Det Return     | 689      |
| Real Sto Return     | 640      |
| Reward Loss         | -83.2    |
| Running Env Steps   | 563000   |
| Running Forward KL  | -3.71    |
| Running Reverse KL  | 7.44     |
| Running Update Time | 1126     |
----------------------------------
2025-02-01 16:32:02.959160 Eastern Standard Time
| Itration            | 1127     |
| Real Det Return     | 677      |
| Real Sto Return     | 651      |
| Reward Loss         | -82.4    |
| Running Env Steps   | 563500   |
| Running Forward KL  | -3.16    |
| Running Reverse KL  | 7.13     |
| Running Update Time | 1127     |
----------------------------------
2025-02-01 16:32:19.332204 Eastern Standard Time
| Itration            | 1128     |
| Real Det Return     | 677      |
| Real Sto Return     | 641      |
| Reward Loss         | -80      |
| Running Env Steps   | 564000   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 7.28     |
| Running Update Time | 1128     |
----------------------------------
2025-02-01 16:32:35.082067 Eastern Standard Time
| Itration            | 1129     |
| Real Det Return     | 660      |
| Real Sto Return     | 642      |
| Reward Loss         | -96.1    |
| Running Env Steps   | 564500   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 7.21     |
| Running Update Time | 1129     |
----------------------------------
2025-02-01 16:32:51.137285 Eastern Standard Time
| Itration            | 1130     |
| Real Det Return     | 658      |
| Real Sto Return     | 635      |
| Reward Loss         | -68.9    |
| Running Env Steps   | 565000   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 7.28     |
| Running Update Time | 1130     |
----------------------------------
2025-02-01 16:33:07.185596 Eastern Standard Time
| Itration            | 1131     |
| Real Det Return     | 682      |
| Real Sto Return     | 654      |
| Reward Loss         | -63.8    |
| Running Env Steps   | 565500   |
| Running Forward KL  | -3.9     |
| Running Reverse KL  | 7.06     |
| Running Update Time | 1131     |
----------------------------------
2025-02-01 16:33:23.058683 Eastern Standard Time
| Itration            | 1132     |
| Real Det Return     | 653      |
| Real Sto Return     | 623      |
| Reward Loss         | -38.3    |
| Running Env Steps   | 566000   |
| Running Forward KL  | -3.62    |
| Running Reverse KL  | 7.72     |
| Running Update Time | 1132     |
----------------------------------
2025-02-01 16:33:38.963703 Eastern Standard Time
| Itration            | 1133     |
| Real Det Return     | 644      |
| Real Sto Return     | 618      |
| Reward Loss         | -96.1    |
| Running Env Steps   | 566500   |
| Running Forward KL  | -3.27    |
| Running Reverse KL  | 7.45     |
| Running Update Time | 1133     |
----------------------------------
2025-02-01 16:33:55.198429 Eastern Standard Time
| Itration            | 1134     |
| Real Det Return     | 684      |
| Real Sto Return     | 665      |
| Reward Loss         | -56.9    |
| Running Env Steps   | 567000   |
| Running Forward KL  | -3.64    |
| Running Reverse KL  | 8.02     |
| Running Update Time | 1134     |
----------------------------------
2025-02-01 16:34:11.154468 Eastern Standard Time
| Itration            | 1135     |
| Real Det Return     | 661      |
| Real Sto Return     | 659      |
| Reward Loss         | -58.8    |
| Running Env Steps   | 567500   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 7.85     |
| Running Update Time | 1135     |
----------------------------------
2025-02-01 16:34:27.384590 Eastern Standard Time
| Itration            | 1136     |
| Real Det Return     | 679      |
| Real Sto Return     | 652      |
| Reward Loss         | -95.7    |
| Running Env Steps   | 568000   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 6.77     |
| Running Update Time | 1136     |
----------------------------------
2025-02-01 16:34:43.253779 Eastern Standard Time
| Itration            | 1137     |
| Real Det Return     | 660      |
| Real Sto Return     | 633      |
| Reward Loss         | -61.3    |
| Running Env Steps   | 568500   |
| Running Forward KL  | -3.48    |
| Running Reverse KL  | 7.39     |
| Running Update Time | 1137     |
----------------------------------
2025-02-01 16:34:59.615707 Eastern Standard Time
| Itration            | 1138     |
| Real Det Return     | 664      |
| Real Sto Return     | 638      |
| Reward Loss         | -61.2    |
| Running Env Steps   | 569000   |
| Running Forward KL  | -4.04    |
| Running Reverse KL  | 7.16     |
| Running Update Time | 1138     |
----------------------------------
2025-02-01 16:35:16.011819 Eastern Standard Time
| Itration            | 1139     |
| Real Det Return     | 691      |
| Real Sto Return     | 648      |
| Reward Loss         | -70      |
| Running Env Steps   | 569500   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 8.58     |
| Running Update Time | 1139     |
----------------------------------
2025-02-01 16:35:32.277207 Eastern Standard Time
| Itration            | 1140     |
| Real Det Return     | 646      |
| Real Sto Return     | 615      |
| Reward Loss         | -88.6    |
| Running Env Steps   | 570000   |
| Running Forward KL  | -3.15    |
| Running Reverse KL  | 7.77     |
| Running Update Time | 1140     |
----------------------------------
2025-02-01 16:35:48.569093 Eastern Standard Time
| Itration            | 1141     |
| Real Det Return     | 668      |
| Real Sto Return     | 638      |
| Reward Loss         | -89.7    |
| Running Env Steps   | 570500   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1141     |
----------------------------------
2025-02-01 16:36:04.405701 Eastern Standard Time
| Itration            | 1142     |
| Real Det Return     | 671      |
| Real Sto Return     | 628      |
| Reward Loss         | -152     |
| Running Env Steps   | 571000   |
| Running Forward KL  | -2.93    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1142     |
----------------------------------
2025-02-01 16:36:20.305888 Eastern Standard Time
| Itration            | 1143     |
| Real Det Return     | 637      |
| Real Sto Return     | 617      |
| Reward Loss         | -97.4    |
| Running Env Steps   | 571500   |
| Running Forward KL  | -3.8     |
| Running Reverse KL  | 6.3      |
| Running Update Time | 1143     |
----------------------------------
2025-02-01 16:36:36.185418 Eastern Standard Time
| Itration            | 1144     |
| Real Det Return     | 666      |
| Real Sto Return     | 664      |
| Reward Loss         | -85.9    |
| Running Env Steps   | 572000   |
| Running Forward KL  | -3.67    |
| Running Reverse KL  | 7.58     |
| Running Update Time | 1144     |
----------------------------------
2025-02-01 16:36:52.196047 Eastern Standard Time
| Itration            | 1145     |
| Real Det Return     | 665      |
| Real Sto Return     | 641      |
| Reward Loss         | -95.1    |
| Running Env Steps   | 572500   |
| Running Forward KL  | -4.04    |
| Running Reverse KL  | 6.27     |
| Running Update Time | 1145     |
----------------------------------
2025-02-01 16:37:08.040339 Eastern Standard Time
| Itration            | 1146     |
| Real Det Return     | 691      |
| Real Sto Return     | 654      |
| Reward Loss         | -67.3    |
| Running Env Steps   | 573000   |
| Running Forward KL  | -3.7     |
| Running Reverse KL  | 7.79     |
| Running Update Time | 1146     |
----------------------------------
2025-02-01 16:37:23.877912 Eastern Standard Time
| Itration            | 1147     |
| Real Det Return     | 636      |
| Real Sto Return     | 617      |
| Reward Loss         | -91.8    |
| Running Env Steps   | 573500   |
| Running Forward KL  | -3.45    |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1147     |
----------------------------------
2025-02-01 16:37:39.661654 Eastern Standard Time
| Itration            | 1148     |
| Real Det Return     | 666      |
| Real Sto Return     | 634      |
| Reward Loss         | -77.6    |
| Running Env Steps   | 574000   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 7.23     |
| Running Update Time | 1148     |
----------------------------------
2025-02-01 16:37:55.487792 Eastern Standard Time
| Itration            | 1149     |
| Real Det Return     | 668      |
| Real Sto Return     | 653      |
| Reward Loss         | -111     |
| Running Env Steps   | 574500   |
| Running Forward KL  | -2.97    |
| Running Reverse KL  | 7.66     |
| Running Update Time | 1149     |
----------------------------------
2025-02-01 16:38:11.387804 Eastern Standard Time
| Itration            | 1150     |
| Real Det Return     | 686      |
| Real Sto Return     | 651      |
| Reward Loss         | -43.8    |
| Running Env Steps   | 575000   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 7.68     |
| Running Update Time | 1150     |
----------------------------------
2025-02-01 16:38:27.222333 Eastern Standard Time
| Itration            | 1151     |
| Real Det Return     | 649      |
| Real Sto Return     | 628      |
| Reward Loss         | -86      |
| Running Env Steps   | 575500   |
| Running Forward KL  | -3.55    |
| Running Reverse KL  | 6.62     |
| Running Update Time | 1151     |
----------------------------------
2025-02-01 16:38:42.920171 Eastern Standard Time
| Itration            | 1152     |
| Real Det Return     | 668      |
| Real Sto Return     | 646      |
| Reward Loss         | -90.4    |
| Running Env Steps   | 576000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 5.96     |
| Running Update Time | 1152     |
----------------------------------
2025-02-01 16:38:58.738748 Eastern Standard Time
| Itration            | 1153     |
| Real Det Return     | 681      |
| Real Sto Return     | 646      |
| Reward Loss         | -76      |
| Running Env Steps   | 576500   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 6.51     |
| Running Update Time | 1153     |
----------------------------------
2025-02-01 16:39:14.578263 Eastern Standard Time
| Itration            | 1154     |
| Real Det Return     | 695      |
| Real Sto Return     | 661      |
| Reward Loss         | -52      |
| Running Env Steps   | 577000   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 7.12     |
| Running Update Time | 1154     |
----------------------------------
2025-02-01 16:39:30.369657 Eastern Standard Time
| Itration            | 1155     |
| Real Det Return     | 673      |
| Real Sto Return     | 638      |
| Reward Loss         | -76.8    |
| Running Env Steps   | 577500   |
| Running Forward KL  | -3.54    |
| Running Reverse KL  | 8.16     |
| Running Update Time | 1155     |
----------------------------------
2025-02-01 16:39:46.149454 Eastern Standard Time
| Itration            | 1156     |
| Real Det Return     | 673      |
| Real Sto Return     | 642      |
| Reward Loss         | -86.2    |
| Running Env Steps   | 578000   |
| Running Forward KL  | -3.58    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 1156     |
----------------------------------
2025-02-01 16:40:01.901772 Eastern Standard Time
| Itration            | 1157     |
| Real Det Return     | 667      |
| Real Sto Return     | 635      |
| Reward Loss         | -107     |
| Running Env Steps   | 578500   |
| Running Forward KL  | -3.33    |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1157     |
----------------------------------
2025-02-01 16:40:17.806260 Eastern Standard Time
| Itration            | 1158     |
| Real Det Return     | 687      |
| Real Sto Return     | 657      |
| Reward Loss         | -83.8    |
| Running Env Steps   | 579000   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1158     |
----------------------------------
2025-02-01 16:40:33.674942 Eastern Standard Time
| Itration            | 1159     |
| Real Det Return     | 684      |
| Real Sto Return     | 648      |
| Reward Loss         | -52.7    |
| Running Env Steps   | 579500   |
| Running Forward KL  | -4.06    |
| Running Reverse KL  | 7.47     |
| Running Update Time | 1159     |
----------------------------------
2025-02-01 16:40:49.868569 Eastern Standard Time
| Itration            | 1160     |
| Real Det Return     | 670      |
| Real Sto Return     | 638      |
| Reward Loss         | -107     |
| Running Env Steps   | 580000   |
| Running Forward KL  | -3.19    |
| Running Reverse KL  | 7.36     |
| Running Update Time | 1160     |
----------------------------------
2025-02-01 16:41:06.054645 Eastern Standard Time
| Itration            | 1161     |
| Real Det Return     | 678      |
| Real Sto Return     | 661      |
| Reward Loss         | -73.9    |
| Running Env Steps   | 580500   |
| Running Forward KL  | -3.46    |
| Running Reverse KL  | 7.53     |
| Running Update Time | 1161     |
----------------------------------
2025-02-01 16:41:21.946733 Eastern Standard Time
| Itration            | 1162     |
| Real Det Return     | 676      |
| Real Sto Return     | 648      |
| Reward Loss         | -88.8    |
| Running Env Steps   | 581000   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 6.48     |
| Running Update Time | 1162     |
----------------------------------
2025-02-01 16:41:38.645961 Eastern Standard Time
| Itration            | 1163     |
| Real Det Return     | 681      |
| Real Sto Return     | 658      |
| Reward Loss         | -65.7    |
| Running Env Steps   | 581500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 7.03     |
| Running Update Time | 1163     |
----------------------------------
2025-02-01 16:41:54.588026 Eastern Standard Time
| Itration            | 1164     |
| Real Det Return     | 698      |
| Real Sto Return     | 652      |
| Reward Loss         | -66.9    |
| Running Env Steps   | 582000   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1164     |
----------------------------------
2025-02-01 16:42:10.571054 Eastern Standard Time
| Itration            | 1165     |
| Real Det Return     | 656      |
| Real Sto Return     | 623      |
| Reward Loss         | -86.5    |
| Running Env Steps   | 582500   |
| Running Forward KL  | -3.58    |
| Running Reverse KL  | 8.54     |
| Running Update Time | 1165     |
----------------------------------
2025-02-01 16:42:26.513493 Eastern Standard Time
| Itration            | 1166     |
| Real Det Return     | 654      |
| Real Sto Return     | 629      |
| Reward Loss         | -108     |
| Running Env Steps   | 583000   |
| Running Forward KL  | -2.24    |
| Running Reverse KL  | 7.15     |
| Running Update Time | 1166     |
----------------------------------
2025-02-01 16:42:42.446355 Eastern Standard Time
| Itration            | 1167     |
| Real Det Return     | 668      |
| Real Sto Return     | 644      |
| Reward Loss         | -69.1    |
| Running Env Steps   | 583500   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 8.14     |
| Running Update Time | 1167     |
----------------------------------
2025-02-01 16:42:59.006171 Eastern Standard Time
| Itration            | 1168     |
| Real Det Return     | 632      |
| Real Sto Return     | 618      |
| Reward Loss         | -95.3    |
| Running Env Steps   | 584000   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 6.82     |
| Running Update Time | 1168     |
----------------------------------
2025-02-01 16:43:15.260661 Eastern Standard Time
| Itration            | 1169     |
| Real Det Return     | 693      |
| Real Sto Return     | 652      |
| Reward Loss         | -84.1    |
| Running Env Steps   | 584500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 6.89     |
| Running Update Time | 1169     |
----------------------------------
2025-02-01 16:43:31.343157 Eastern Standard Time
| Itration            | 1170     |
| Real Det Return     | 677      |
| Real Sto Return     | 646      |
| Reward Loss         | -86.6    |
| Running Env Steps   | 585000   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 6.91     |
| Running Update Time | 1170     |
----------------------------------
2025-02-01 16:43:47.468019 Eastern Standard Time
| Itration            | 1171     |
| Real Det Return     | 677      |
| Real Sto Return     | 646      |
| Reward Loss         | -51.5    |
| Running Env Steps   | 585500   |
| Running Forward KL  | -3.16    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 1171     |
----------------------------------
2025-02-01 16:44:03.537655 Eastern Standard Time
| Itration            | 1172     |
| Real Det Return     | 658      |
| Real Sto Return     | 632      |
| Reward Loss         | -115     |
| Running Env Steps   | 586000   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 5.52     |
| Running Update Time | 1172     |
----------------------------------
2025-02-01 16:44:19.368737 Eastern Standard Time
| Itration            | 1173     |
| Real Det Return     | 698      |
| Real Sto Return     | 680      |
| Reward Loss         | -60.7    |
| Running Env Steps   | 586500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 7.16     |
| Running Update Time | 1173     |
----------------------------------
2025-02-01 16:44:35.420400 Eastern Standard Time
| Itration            | 1174     |
| Real Det Return     | 660      |
| Real Sto Return     | 646      |
| Reward Loss         | -59.4    |
| Running Env Steps   | 587000   |
| Running Forward KL  | -3.46    |
| Running Reverse KL  | 7.59     |
| Running Update Time | 1174     |
----------------------------------
2025-02-01 16:44:52.022324 Eastern Standard Time
| Itration            | 1175     |
| Real Det Return     | 674      |
| Real Sto Return     | 648      |
| Reward Loss         | -119     |
| Running Env Steps   | 587500   |
| Running Forward KL  | -2.86    |
| Running Reverse KL  | 7.05     |
| Running Update Time | 1175     |
----------------------------------
2025-02-01 16:45:07.917911 Eastern Standard Time
| Itration            | 1176     |
| Real Det Return     | 678      |
| Real Sto Return     | 643      |
| Reward Loss         | -56.9    |
| Running Env Steps   | 588000   |
| Running Forward KL  | -3.07    |
| Running Reverse KL  | 8.02     |
| Running Update Time | 1176     |
----------------------------------
2025-02-01 16:45:23.958680 Eastern Standard Time
| Itration            | 1177     |
| Real Det Return     | 679      |
| Real Sto Return     | 638      |
| Reward Loss         | -110     |
| Running Env Steps   | 588500   |
| Running Forward KL  | -3.86    |
| Running Reverse KL  | 6.77     |
| Running Update Time | 1177     |
----------------------------------
2025-02-01 16:45:39.916999 Eastern Standard Time
| Itration            | 1178     |
| Real Det Return     | 689      |
| Real Sto Return     | 649      |
| Reward Loss         | -67.6    |
| Running Env Steps   | 589000   |
| Running Forward KL  | -3.46    |
| Running Reverse KL  | 8.24     |
| Running Update Time | 1178     |
----------------------------------
2025-02-01 16:45:56.023678 Eastern Standard Time
| Itration            | 1179     |
| Real Det Return     | 666      |
| Real Sto Return     | 637      |
| Reward Loss         | -92.7    |
| Running Env Steps   | 589500   |
| Running Forward KL  | -3.83    |
| Running Reverse KL  | 8.02     |
| Running Update Time | 1179     |
----------------------------------
2025-02-01 16:46:11.856693 Eastern Standard Time
| Itration            | 1180     |
| Real Det Return     | 629      |
| Real Sto Return     | 600      |
| Reward Loss         | -134     |
| Running Env Steps   | 590000   |
| Running Forward KL  | -3.67    |
| Running Reverse KL  | 6.2      |
| Running Update Time | 1180     |
----------------------------------
2025-02-01 16:46:27.729321 Eastern Standard Time
| Itration            | 1181     |
| Real Det Return     | 643      |
| Real Sto Return     | 609      |
| Reward Loss         | -138     |
| Running Env Steps   | 590500   |
| Running Forward KL  | -3.39    |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1181     |
----------------------------------
2025-02-01 16:46:43.606688 Eastern Standard Time
| Itration            | 1182     |
| Real Det Return     | 652      |
| Real Sto Return     | 641      |
| Reward Loss         | -97.4    |
| Running Env Steps   | 591000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 6.25     |
| Running Update Time | 1182     |
----------------------------------
2025-02-01 16:46:59.607865 Eastern Standard Time
| Itration            | 1183     |
| Real Det Return     | 681      |
| Real Sto Return     | 639      |
| Reward Loss         | -73.7    |
| Running Env Steps   | 591500   |
| Running Forward KL  | -3.36    |
| Running Reverse KL  | 7.49     |
| Running Update Time | 1183     |
----------------------------------
2025-02-01 16:47:15.661903 Eastern Standard Time
| Itration            | 1184     |
| Real Det Return     | 683      |
| Real Sto Return     | 659      |
| Reward Loss         | -53      |
| Running Env Steps   | 592000   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 7.74     |
| Running Update Time | 1184     |
----------------------------------
2025-02-01 16:47:32.026981 Eastern Standard Time
| Itration            | 1185     |
| Real Det Return     | 672      |
| Real Sto Return     | 647      |
| Reward Loss         | -99.6    |
| Running Env Steps   | 592500   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 7.19     |
| Running Update Time | 1185     |
----------------------------------
2025-02-01 16:47:48.108422 Eastern Standard Time
| Itration            | 1186     |
| Real Det Return     | 660      |
| Real Sto Return     | 633      |
| Reward Loss         | -85.6    |
| Running Env Steps   | 593000   |
| Running Forward KL  | -3.8     |
| Running Reverse KL  | 7.28     |
| Running Update Time | 1186     |
----------------------------------
2025-02-01 16:48:04.104705 Eastern Standard Time
| Itration            | 1187     |
| Real Det Return     | 676      |
| Real Sto Return     | 646      |
| Reward Loss         | -60.2    |
| Running Env Steps   | 593500   |
| Running Forward KL  | -3.66    |
| Running Reverse KL  | 7.44     |
| Running Update Time | 1187     |
----------------------------------
2025-02-01 16:48:20.151451 Eastern Standard Time
| Itration            | 1188     |
| Real Det Return     | 685      |
| Real Sto Return     | 660      |
| Reward Loss         | -72.9    |
| Running Env Steps   | 594000   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 7.19     |
| Running Update Time | 1188     |
----------------------------------
2025-02-01 16:48:36.087226 Eastern Standard Time
| Itration            | 1189     |
| Real Det Return     | 677      |
| Real Sto Return     | 646      |
| Reward Loss         | -50.7    |
| Running Env Steps   | 594500   |
| Running Forward KL  | -3.31    |
| Running Reverse KL  | 8        |
| Running Update Time | 1189     |
----------------------------------
2025-02-01 16:48:52.091669 Eastern Standard Time
| Itration            | 1190     |
| Real Det Return     | 657      |
| Real Sto Return     | 661      |
| Reward Loss         | -183     |
| Running Env Steps   | 595000   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 5.98     |
| Running Update Time | 1190     |
----------------------------------
2025-02-01 16:49:08.265082 Eastern Standard Time
| Itration            | 1191     |
| Real Det Return     | 674      |
| Real Sto Return     | 641      |
| Reward Loss         | -80.3    |
| Running Env Steps   | 595500   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 7.59     |
| Running Update Time | 1191     |
----------------------------------
2025-02-01 16:49:24.305692 Eastern Standard Time
| Itration            | 1192     |
| Real Det Return     | 657      |
| Real Sto Return     | 626      |
| Reward Loss         | -104     |
| Running Env Steps   | 596000   |
| Running Forward KL  | -4.08    |
| Running Reverse KL  | 6.59     |
| Running Update Time | 1192     |
----------------------------------
2025-02-01 16:49:40.205148 Eastern Standard Time
| Itration            | 1193     |
| Real Det Return     | 676      |
| Real Sto Return     | 646      |
| Reward Loss         | -86.5    |
| Running Env Steps   | 596500   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1193     |
----------------------------------
2025-02-01 16:49:56.155735 Eastern Standard Time
| Itration            | 1194     |
| Real Det Return     | 667      |
| Real Sto Return     | 639      |
| Reward Loss         | -76.6    |
| Running Env Steps   | 597000   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 6.34     |
| Running Update Time | 1194     |
----------------------------------
2025-02-01 16:50:12.616751 Eastern Standard Time
| Itration            | 1195     |
| Real Det Return     | 686      |
| Real Sto Return     | 640      |
| Reward Loss         | -59.7    |
| Running Env Steps   | 597500   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 7.82     |
| Running Update Time | 1195     |
----------------------------------
2025-02-01 16:50:28.449639 Eastern Standard Time
| Itration            | 1196     |
| Real Det Return     | 636      |
| Real Sto Return     | 575      |
| Reward Loss         | -121     |
| Running Env Steps   | 598000   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 7.41     |
| Running Update Time | 1196     |
----------------------------------
2025-02-01 16:50:44.680030 Eastern Standard Time
| Itration            | 1197     |
| Real Det Return     | 686      |
| Real Sto Return     | 647      |
| Reward Loss         | -96.6    |
| Running Env Steps   | 598500   |
| Running Forward KL  | -3.05    |
| Running Reverse KL  | 8.28     |
| Running Update Time | 1197     |
----------------------------------
2025-02-01 16:51:01.156916 Eastern Standard Time
| Itration            | 1198     |
| Real Det Return     | 689      |
| Real Sto Return     | 647      |
| Reward Loss         | -84.6    |
| Running Env Steps   | 599000   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 8.32     |
| Running Update Time | 1198     |
----------------------------------
2025-02-01 16:51:17.214584 Eastern Standard Time
| Itration            | 1199     |
| Real Det Return     | 679      |
| Real Sto Return     | 641      |
| Reward Loss         | -153     |
| Running Env Steps   | 599500   |
| Running Forward KL  | -3.53    |
| Running Reverse KL  | 6.49     |
| Running Update Time | 1199     |
----------------------------------
2025-02-01 16:51:33.087330 Eastern Standard Time
| Itration            | 1200     |
| Real Det Return     | 638      |
| Real Sto Return     | 611      |
| Reward Loss         | -109     |
| Running Env Steps   | 600000   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 7.51     |
| Running Update Time | 1200     |
----------------------------------
2025-02-01 16:51:48.904569 Eastern Standard Time
| Itration            | 1201     |
| Real Det Return     | 673      |
| Real Sto Return     | 635      |
| Reward Loss         | -96.6    |
| Running Env Steps   | 600500   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 7.1      |
| Running Update Time | 1201     |
----------------------------------
2025-02-01 16:52:04.539238 Eastern Standard Time
| Itration            | 1202     |
| Real Det Return     | 647      |
| Real Sto Return     | 622      |
| Reward Loss         | -95.6    |
| Running Env Steps   | 601000   |
| Running Forward KL  | -3.45    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 1202     |
----------------------------------
2025-02-01 16:52:20.416130 Eastern Standard Time
| Itration            | 1203     |
| Real Det Return     | 678      |
| Real Sto Return     | 640      |
| Reward Loss         | -132     |
| Running Env Steps   | 601500   |
| Running Forward KL  | -2.18    |
| Running Reverse KL  | 6.86     |
| Running Update Time | 1203     |
----------------------------------
2025-02-01 16:52:37.024515 Eastern Standard Time
| Itration            | 1204     |
| Real Det Return     | 642      |
| Real Sto Return     | 619      |
| Reward Loss         | -99.6    |
| Running Env Steps   | 602000   |
| Running Forward KL  | -3.18    |
| Running Reverse KL  | 8.33     |
| Running Update Time | 1204     |
----------------------------------
2025-02-01 16:52:53.567001 Eastern Standard Time
| Itration            | 1205     |
| Real Det Return     | 672      |
| Real Sto Return     | 639      |
| Reward Loss         | -79.5    |
| Running Env Steps   | 602500   |
| Running Forward KL  | -3.53    |
| Running Reverse KL  | 8.43     |
| Running Update Time | 1205     |
----------------------------------
2025-02-01 16:53:09.654813 Eastern Standard Time
| Itration            | 1206     |
| Real Det Return     | 667      |
| Real Sto Return     | 659      |
| Reward Loss         | -92.2    |
| Running Env Steps   | 603000   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 1206     |
----------------------------------
2025-02-01 16:53:26.148413 Eastern Standard Time
| Itration            | 1207     |
| Real Det Return     | 669      |
| Real Sto Return     | 633      |
| Reward Loss         | -103     |
| Running Env Steps   | 603500   |
| Running Forward KL  | -3.23    |
| Running Reverse KL  | 7.73     |
| Running Update Time | 1207     |
----------------------------------
2025-02-01 16:53:41.979472 Eastern Standard Time
| Itration            | 1208     |
| Real Det Return     | 676      |
| Real Sto Return     | 650      |
| Reward Loss         | -114     |
| Running Env Steps   | 604000   |
| Running Forward KL  | -3.12    |
| Running Reverse KL  | 8.04     |
| Running Update Time | 1208     |
----------------------------------
2025-02-01 16:53:58.130159 Eastern Standard Time
| Itration            | 1209     |
| Real Det Return     | 590      |
| Real Sto Return     | 547      |
| Reward Loss         | -166     |
| Running Env Steps   | 604500   |
| Running Forward KL  | -3.23    |
| Running Reverse KL  | 7.36     |
| Running Update Time | 1209     |
----------------------------------
2025-02-01 16:54:14.614719 Eastern Standard Time
| Itration            | 1210     |
| Real Det Return     | 673      |
| Real Sto Return     | 649      |
| Reward Loss         | -49      |
| Running Env Steps   | 605000   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1210     |
----------------------------------
2025-02-01 16:54:30.798754 Eastern Standard Time
| Itration            | 1211     |
| Real Det Return     | 650      |
| Real Sto Return     | 620      |
| Reward Loss         | -89.1    |
| Running Env Steps   | 605500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 6.67     |
| Running Update Time | 1211     |
----------------------------------
2025-02-01 16:54:46.620022 Eastern Standard Time
| Itration            | 1212     |
| Real Det Return     | 647      |
| Real Sto Return     | 568      |
| Reward Loss         | -135     |
| Running Env Steps   | 606000   |
| Running Forward KL  | -1.87    |
| Running Reverse KL  | 6.85     |
| Running Update Time | 1212     |
----------------------------------
2025-02-01 16:55:02.424408 Eastern Standard Time
| Itration            | 1213     |
| Real Det Return     | 634      |
| Real Sto Return     | 620      |
| Reward Loss         | -97.7    |
| Running Env Steps   | 606500   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 6.56     |
| Running Update Time | 1213     |
----------------------------------
2025-02-01 16:55:18.885302 Eastern Standard Time
| Itration            | 1214     |
| Real Det Return     | 673      |
| Real Sto Return     | 648      |
| Reward Loss         | -68.9    |
| Running Env Steps   | 607000   |
| Running Forward KL  | -3.21    |
| Running Reverse KL  | 8.32     |
| Running Update Time | 1214     |
----------------------------------
2025-02-01 16:55:34.706426 Eastern Standard Time
| Itration            | 1215     |
| Real Det Return     | 675      |
| Real Sto Return     | 642      |
| Reward Loss         | -75.8    |
| Running Env Steps   | 607500   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 7.22     |
| Running Update Time | 1215     |
----------------------------------
2025-02-01 16:55:50.688433 Eastern Standard Time
| Itration            | 1216     |
| Real Det Return     | 555      |
| Real Sto Return     | 525      |
| Reward Loss         | -200     |
| Running Env Steps   | 608000   |
| Running Forward KL  | -2.3     |
| Running Reverse KL  | 7.21     |
| Running Update Time | 1216     |
----------------------------------
2025-02-01 16:56:07.385826 Eastern Standard Time
| Itration            | 1217     |
| Real Det Return     | 661      |
| Real Sto Return     | 639      |
| Reward Loss         | -76.9    |
| Running Env Steps   | 608500   |
| Running Forward KL  | -3.38    |
| Running Reverse KL  | 7.39     |
| Running Update Time | 1217     |
----------------------------------
2025-02-01 16:56:23.311396 Eastern Standard Time
| Itration            | 1218     |
| Real Det Return     | 684      |
| Real Sto Return     | 661      |
| Reward Loss         | -51.7    |
| Running Env Steps   | 609000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 7.91     |
| Running Update Time | 1218     |
----------------------------------
2025-02-01 16:56:39.133613 Eastern Standard Time
| Itration            | 1219     |
| Real Det Return     | 687      |
| Real Sto Return     | 651      |
| Reward Loss         | -62.8    |
| Running Env Steps   | 609500   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 7.22     |
| Running Update Time | 1219     |
----------------------------------
2025-02-01 16:56:55.782304 Eastern Standard Time
| Itration            | 1220     |
| Real Det Return     | 672      |
| Real Sto Return     | 651      |
| Reward Loss         | -77.4    |
| Running Env Steps   | 610000   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 7        |
| Running Update Time | 1220     |
----------------------------------
2025-02-01 16:57:11.776133 Eastern Standard Time
| Itration            | 1221     |
| Real Det Return     | 678      |
| Real Sto Return     | 644      |
| Reward Loss         | -65.1    |
| Running Env Steps   | 610500   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 7.73     |
| Running Update Time | 1221     |
----------------------------------
2025-02-01 16:57:28.125183 Eastern Standard Time
| Itration            | 1222     |
| Real Det Return     | 683      |
| Real Sto Return     | 656      |
| Reward Loss         | -65.1    |
| Running Env Steps   | 611000   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 7.13     |
| Running Update Time | 1222     |
----------------------------------
2025-02-01 16:57:44.495526 Eastern Standard Time
| Itration            | 1223     |
| Real Det Return     | 685      |
| Real Sto Return     | 650      |
| Reward Loss         | -65.3    |
| Running Env Steps   | 611500   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 8.01     |
| Running Update Time | 1223     |
----------------------------------
2025-02-01 16:58:00.357638 Eastern Standard Time
| Itration            | 1224     |
| Real Det Return     | 630      |
| Real Sto Return     | 611      |
| Reward Loss         | -160     |
| Running Env Steps   | 612000   |
| Running Forward KL  | -2.92    |
| Running Reverse KL  | 6.43     |
| Running Update Time | 1224     |
----------------------------------
2025-02-01 16:58:16.261140 Eastern Standard Time
| Itration            | 1225     |
| Real Det Return     | 666      |
| Real Sto Return     | 630      |
| Reward Loss         | -121     |
| Running Env Steps   | 612500   |
| Running Forward KL  | -3.64    |
| Running Reverse KL  | 7.6      |
| Running Update Time | 1225     |
----------------------------------
2025-02-01 16:58:32.339123 Eastern Standard Time
| Itration            | 1226     |
| Real Det Return     | 676      |
| Real Sto Return     | 644      |
| Reward Loss         | -104     |
| Running Env Steps   | 613000   |
| Running Forward KL  | -3.15    |
| Running Reverse KL  | 6.77     |
| Running Update Time | 1226     |
----------------------------------
2025-02-01 16:58:48.745162 Eastern Standard Time
| Itration            | 1227     |
| Real Det Return     | 665      |
| Real Sto Return     | 631      |
| Reward Loss         | -114     |
| Running Env Steps   | 613500   |
| Running Forward KL  | -2.82    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 1227     |
----------------------------------
2025-02-01 16:59:04.679923 Eastern Standard Time
| Itration            | 1228     |
| Real Det Return     | 672      |
| Real Sto Return     | 650      |
| Reward Loss         | -81.9    |
| Running Env Steps   | 614000   |
| Running Forward KL  | -3.53    |
| Running Reverse KL  | 8.08     |
| Running Update Time | 1228     |
----------------------------------
2025-02-01 16:59:20.653692 Eastern Standard Time
| Itration            | 1229     |
| Real Det Return     | 662      |
| Real Sto Return     | 636      |
| Reward Loss         | -101     |
| Running Env Steps   | 614500   |
| Running Forward KL  | -2.67    |
| Running Reverse KL  | 8.45     |
| Running Update Time | 1229     |
----------------------------------
2025-02-01 16:59:37.011712 Eastern Standard Time
| Itration            | 1230     |
| Real Det Return     | 688      |
| Real Sto Return     | 662      |
| Reward Loss         | -107     |
| Running Env Steps   | 615000   |
| Running Forward KL  | -2.6     |
| Running Reverse KL  | 8.2      |
| Running Update Time | 1230     |
----------------------------------
2025-02-01 16:59:52.732140 Eastern Standard Time
| Itration            | 1231     |
| Real Det Return     | 673      |
| Real Sto Return     | 654      |
| Reward Loss         | -89.2    |
| Running Env Steps   | 615500   |
| Running Forward KL  | -3.84    |
| Running Reverse KL  | 7.5      |
| Running Update Time | 1231     |
----------------------------------
2025-02-01 17:00:08.448661 Eastern Standard Time
| Itration            | 1232     |
| Real Det Return     | 633      |
| Real Sto Return     | 601      |
| Reward Loss         | -92.3    |
| Running Env Steps   | 616000   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 7.14     |
| Running Update Time | 1232     |
----------------------------------
2025-02-01 17:00:24.285564 Eastern Standard Time
| Itration            | 1233     |
| Real Det Return     | 681      |
| Real Sto Return     | 647      |
| Reward Loss         | -115     |
| Running Env Steps   | 616500   |
| Running Forward KL  | -3.49    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1233     |
----------------------------------
2025-02-01 17:00:40.054381 Eastern Standard Time
| Itration            | 1234     |
| Real Det Return     | 681      |
| Real Sto Return     | 646      |
| Reward Loss         | -72.8    |
| Running Env Steps   | 617000   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1234     |
----------------------------------
2025-02-01 17:00:55.821741 Eastern Standard Time
| Itration            | 1235     |
| Real Det Return     | 688      |
| Real Sto Return     | 650      |
| Reward Loss         | -58.9    |
| Running Env Steps   | 617500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 7.4      |
| Running Update Time | 1235     |
----------------------------------
2025-02-01 17:01:11.654949 Eastern Standard Time
| Itration            | 1236     |
| Real Det Return     | 689      |
| Real Sto Return     | 643      |
| Reward Loss         | -101     |
| Running Env Steps   | 618000   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 7.35     |
| Running Update Time | 1236     |
----------------------------------
2025-02-01 17:01:27.357408 Eastern Standard Time
| Itration            | 1237     |
| Real Det Return     | 669      |
| Real Sto Return     | 639      |
| Reward Loss         | -69.7    |
| Running Env Steps   | 618500   |
| Running Forward KL  | -3.11    |
| Running Reverse KL  | 8.43     |
| Running Update Time | 1237     |
----------------------------------
2025-02-01 17:01:43.057254 Eastern Standard Time
| Itration            | 1238     |
| Real Det Return     | 689      |
| Real Sto Return     | 666      |
| Reward Loss         | -60      |
| Running Env Steps   | 619000   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 7.65     |
| Running Update Time | 1238     |
----------------------------------
2025-02-01 17:01:58.824883 Eastern Standard Time
| Itration            | 1239     |
| Real Det Return     | 703      |
| Real Sto Return     | 662      |
| Reward Loss         | -82      |
| Running Env Steps   | 619500   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 7.65     |
| Running Update Time | 1239     |
----------------------------------
2025-02-01 17:02:14.775066 Eastern Standard Time
| Itration            | 1240     |
| Real Det Return     | 650      |
| Real Sto Return     | 611      |
| Reward Loss         | -138     |
| Running Env Steps   | 620000   |
| Running Forward KL  | -3.5     |
| Running Reverse KL  | 7.31     |
| Running Update Time | 1240     |
----------------------------------
2025-02-01 17:02:30.601839 Eastern Standard Time
| Itration            | 1241     |
| Real Det Return     | 684      |
| Real Sto Return     | 653      |
| Reward Loss         | -82.2    |
| Running Env Steps   | 620500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 7.35     |
| Running Update Time | 1241     |
----------------------------------
2025-02-01 17:02:46.368398 Eastern Standard Time
| Itration            | 1242     |
| Real Det Return     | 654      |
| Real Sto Return     | 618      |
| Reward Loss         | -115     |
| Running Env Steps   | 621000   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 5.86     |
| Running Update Time | 1242     |
----------------------------------
2025-02-01 17:03:02.129459 Eastern Standard Time
| Itration            | 1243     |
| Real Det Return     | 677      |
| Real Sto Return     | 661      |
| Reward Loss         | -61.8    |
| Running Env Steps   | 621500   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 7.96     |
| Running Update Time | 1243     |
----------------------------------
2025-02-01 17:03:18.387204 Eastern Standard Time
| Itration            | 1244     |
| Real Det Return     | 610      |
| Real Sto Return     | 611      |
| Reward Loss         | -127     |
| Running Env Steps   | 622000   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 6.99     |
| Running Update Time | 1244     |
----------------------------------
2025-02-01 17:03:34.165844 Eastern Standard Time
| Itration            | 1245     |
| Real Det Return     | 680      |
| Real Sto Return     | 654      |
| Reward Loss         | -70.8    |
| Running Env Steps   | 622500   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 6.43     |
| Running Update Time | 1245     |
----------------------------------
2025-02-01 17:03:49.916209 Eastern Standard Time
| Itration            | 1246     |
| Real Det Return     | 694      |
| Real Sto Return     | 667      |
| Reward Loss         | -63.2    |
| Running Env Steps   | 623000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 6.84     |
| Running Update Time | 1246     |
----------------------------------
2025-02-01 17:04:05.745805 Eastern Standard Time
| Itration            | 1247     |
| Real Det Return     | 666      |
| Real Sto Return     | 627      |
| Reward Loss         | -99      |
| Running Env Steps   | 623500   |
| Running Forward KL  | -3.58    |
| Running Reverse KL  | 6.73     |
| Running Update Time | 1247     |
----------------------------------
2025-02-01 17:04:21.565395 Eastern Standard Time
| Itration            | 1248     |
| Real Det Return     | 631      |
| Real Sto Return     | 611      |
| Reward Loss         | -138     |
| Running Env Steps   | 624000   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 1248     |
----------------------------------
2025-02-01 17:04:37.344653 Eastern Standard Time
| Itration            | 1249     |
| Real Det Return     | 612      |
| Real Sto Return     | 595      |
| Reward Loss         | -126     |
| Running Env Steps   | 624500   |
| Running Forward KL  | -4.06    |
| Running Reverse KL  | 7.18     |
| Running Update Time | 1249     |
----------------------------------
2025-02-01 17:04:53.282498 Eastern Standard Time
| Itration            | 1250     |
| Real Det Return     | 675      |
| Real Sto Return     | 643      |
| Reward Loss         | -79.7    |
| Running Env Steps   | 625000   |
| Running Forward KL  | -3.2     |
| Running Reverse KL  | 8.06     |
| Running Update Time | 1250     |
----------------------------------
2025-02-01 17:05:09.733483 Eastern Standard Time
| Itration            | 1251     |
| Real Det Return     | 676      |
| Real Sto Return     | 652      |
| Reward Loss         | -61.7    |
| Running Env Steps   | 625500   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 7.78     |
| Running Update Time | 1251     |
----------------------------------
2025-02-01 17:05:25.815666 Eastern Standard Time
| Itration            | 1252     |
| Real Det Return     | 662      |
| Real Sto Return     | 645      |
| Reward Loss         | -83.7    |
| Running Env Steps   | 626000   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 6.79     |
| Running Update Time | 1252     |
----------------------------------
2025-02-01 17:05:41.531078 Eastern Standard Time
| Itration            | 1253     |
| Real Det Return     | 693      |
| Real Sto Return     | 658      |
| Reward Loss         | -60.8    |
| Running Env Steps   | 626500   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 8.09     |
| Running Update Time | 1253     |
----------------------------------
2025-02-01 17:05:57.197934 Eastern Standard Time
| Itration            | 1254     |
| Real Det Return     | 637      |
| Real Sto Return     | 620      |
| Reward Loss         | -131     |
| Running Env Steps   | 627000   |
| Running Forward KL  | -3.49    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 1254     |
----------------------------------
2025-02-01 17:06:12.873446 Eastern Standard Time
| Itration            | 1255     |
| Real Det Return     | 669      |
| Real Sto Return     | 646      |
| Reward Loss         | -77.1    |
| Running Env Steps   | 627500   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 7.75     |
| Running Update Time | 1255     |
----------------------------------
2025-02-01 17:06:28.572494 Eastern Standard Time
| Itration            | 1256     |
| Real Det Return     | 646      |
| Real Sto Return     | 629      |
| Reward Loss         | -115     |
| Running Env Steps   | 628000   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 1256     |
----------------------------------
2025-02-01 17:06:44.224299 Eastern Standard Time
| Itration            | 1257     |
| Real Det Return     | 642      |
| Real Sto Return     | 596      |
| Reward Loss         | -135     |
| Running Env Steps   | 628500   |
| Running Forward KL  | -4.17    |
| Running Reverse KL  | 6.1      |
| Running Update Time | 1257     |
----------------------------------
2025-02-01 17:06:59.869586 Eastern Standard Time
| Itration            | 1258     |
| Real Det Return     | 688      |
| Real Sto Return     | 655      |
| Reward Loss         | -57.1    |
| Running Env Steps   | 629000   |
| Running Forward KL  | -4.65    |
| Running Reverse KL  | 7.21     |
| Running Update Time | 1258     |
----------------------------------
2025-02-01 17:07:15.557254 Eastern Standard Time
| Itration            | 1259     |
| Real Det Return     | 648      |
| Real Sto Return     | 626      |
| Reward Loss         | -130     |
| Running Env Steps   | 629500   |
| Running Forward KL  | -3.58    |
| Running Reverse KL  | 6.88     |
| Running Update Time | 1259     |
----------------------------------
2025-02-01 17:07:31.640812 Eastern Standard Time
| Itration            | 1260     |
| Real Det Return     | 675      |
| Real Sto Return     | 653      |
| Reward Loss         | -109     |
| Running Env Steps   | 630000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 6.41     |
| Running Update Time | 1260     |
----------------------------------
2025-02-01 17:07:47.340964 Eastern Standard Time
| Itration            | 1261     |
| Real Det Return     | 634      |
| Real Sto Return     | 618      |
| Reward Loss         | -117     |
| Running Env Steps   | 630500   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 6.34     |
| Running Update Time | 1261     |
----------------------------------
2025-02-01 17:08:02.982766 Eastern Standard Time
| Itration            | 1262     |
| Real Det Return     | 676      |
| Real Sto Return     | 651      |
| Reward Loss         | -85.7    |
| Running Env Steps   | 631000   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 6.7      |
| Running Update Time | 1262     |
----------------------------------
2025-02-01 17:08:18.778934 Eastern Standard Time
| Itration            | 1263     |
| Real Det Return     | 635      |
| Real Sto Return     | 618      |
| Reward Loss         | -94      |
| Running Env Steps   | 631500   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 8.68     |
| Running Update Time | 1263     |
----------------------------------
2025-02-01 17:08:34.480788 Eastern Standard Time
| Itration            | 1264     |
| Real Det Return     | 671      |
| Real Sto Return     | 655      |
| Reward Loss         | -95.6    |
| Running Env Steps   | 632000   |
| Running Forward KL  | -4.3     |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1264     |
----------------------------------
2025-02-01 17:08:50.178679 Eastern Standard Time
| Itration            | 1265     |
| Real Det Return     | 673      |
| Real Sto Return     | 638      |
| Reward Loss         | -99      |
| Running Env Steps   | 632500   |
| Running Forward KL  | -3.53    |
| Running Reverse KL  | 7.61     |
| Running Update Time | 1265     |
----------------------------------
2025-02-01 17:09:05.884778 Eastern Standard Time
| Itration            | 1266     |
| Real Det Return     | 665      |
| Real Sto Return     | 634      |
| Reward Loss         | -78.9    |
| Running Env Steps   | 633000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1266     |
----------------------------------
2025-02-01 17:09:21.564629 Eastern Standard Time
| Itration            | 1267     |
| Real Det Return     | 673      |
| Real Sto Return     | 643      |
| Reward Loss         | -82.2    |
| Running Env Steps   | 633500   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1267     |
----------------------------------
2025-02-01 17:09:37.311443 Eastern Standard Time
| Itration            | 1268     |
| Real Det Return     | 659      |
| Real Sto Return     | 643      |
| Reward Loss         | -55      |
| Running Env Steps   | 634000   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 8.25     |
| Running Update Time | 1268     |
----------------------------------
2025-02-01 17:09:53.207508 Eastern Standard Time
| Itration            | 1269     |
| Real Det Return     | 678      |
| Real Sto Return     | 648      |
| Reward Loss         | -76.5    |
| Running Env Steps   | 634500   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 6.46     |
| Running Update Time | 1269     |
----------------------------------
2025-02-01 17:10:08.917840 Eastern Standard Time
| Itration            | 1270     |
| Real Det Return     | 618      |
| Real Sto Return     | 599      |
| Reward Loss         | -154     |
| Running Env Steps   | 635000   |
| Running Forward KL  | -3.28    |
| Running Reverse KL  | 6.55     |
| Running Update Time | 1270     |
----------------------------------
2025-02-01 17:10:24.600282 Eastern Standard Time
| Itration            | 1271     |
| Real Det Return     | 700      |
| Real Sto Return     | 653      |
| Reward Loss         | -73.5    |
| Running Env Steps   | 635500   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 7.42     |
| Running Update Time | 1271     |
----------------------------------
2025-02-01 17:10:40.284300 Eastern Standard Time
| Itration            | 1272     |
| Real Det Return     | 678      |
| Real Sto Return     | 639      |
| Reward Loss         | -79.8    |
| Running Env Steps   | 636000   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 7.26     |
| Running Update Time | 1272     |
----------------------------------
2025-02-01 17:10:56.059008 Eastern Standard Time
| Itration            | 1273     |
| Real Det Return     | 673      |
| Real Sto Return     | 653      |
| Reward Loss         | -64.9    |
| Running Env Steps   | 636500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 7.52     |
| Running Update Time | 1273     |
----------------------------------
2025-02-01 17:11:11.755870 Eastern Standard Time
| Itration            | 1274     |
| Real Det Return     | 671      |
| Real Sto Return     | 630      |
| Reward Loss         | -103     |
| Running Env Steps   | 637000   |
| Running Forward KL  | -2.46    |
| Running Reverse KL  | 8.16     |
| Running Update Time | 1274     |
----------------------------------
2025-02-01 17:11:27.467987 Eastern Standard Time
| Itration            | 1275     |
| Real Det Return     | 679      |
| Real Sto Return     | 642      |
| Reward Loss         | -75.7    |
| Running Env Steps   | 637500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 7.4      |
| Running Update Time | 1275     |
----------------------------------
2025-02-01 17:11:43.073666 Eastern Standard Time
| Itration            | 1276     |
| Real Det Return     | 667      |
| Real Sto Return     | 643      |
| Reward Loss         | -105     |
| Running Env Steps   | 638000   |
| Running Forward KL  | -2.87    |
| Running Reverse KL  | 7.6      |
| Running Update Time | 1276     |
----------------------------------
2025-02-01 17:11:58.702827 Eastern Standard Time
| Itration            | 1277     |
| Real Det Return     | 667      |
| Real Sto Return     | 638      |
| Reward Loss         | -105     |
| Running Env Steps   | 638500   |
| Running Forward KL  | -3.32    |
| Running Reverse KL  | 7.55     |
| Running Update Time | 1277     |
----------------------------------
2025-02-01 17:12:14.336240 Eastern Standard Time
| Itration            | 1278     |
| Real Det Return     | 679      |
| Real Sto Return     | 660      |
| Reward Loss         | -82.7    |
| Running Env Steps   | 639000   |
| Running Forward KL  | -4.17    |
| Running Reverse KL  | 7.51     |
| Running Update Time | 1278     |
----------------------------------
2025-02-01 17:12:29.987018 Eastern Standard Time
| Itration            | 1279     |
| Real Det Return     | 658      |
| Real Sto Return     | 633      |
| Reward Loss         | -103     |
| Running Env Steps   | 639500   |
| Running Forward KL  | -2.9     |
| Running Reverse KL  | 7.73     |
| Running Update Time | 1279     |
----------------------------------
2025-02-01 17:12:45.602481 Eastern Standard Time
| Itration            | 1280     |
| Real Det Return     | 682      |
| Real Sto Return     | 661      |
| Reward Loss         | -46.1    |
| Running Env Steps   | 640000   |
| Running Forward KL  | -4       |
| Running Reverse KL  | 7.73     |
| Running Update Time | 1280     |
----------------------------------
2025-02-01 17:13:01.252945 Eastern Standard Time
| Itration            | 1281     |
| Real Det Return     | 681      |
| Real Sto Return     | 656      |
| Reward Loss         | -67.4    |
| Running Env Steps   | 640500   |
| Running Forward KL  | -3.15    |
| Running Reverse KL  | 6.81     |
| Running Update Time | 1281     |
----------------------------------
2025-02-01 17:13:16.903829 Eastern Standard Time
| Itration            | 1282     |
| Real Det Return     | 636      |
| Real Sto Return     | 613      |
| Reward Loss         | -89      |
| Running Env Steps   | 641000   |
| Running Forward KL  | -3.81    |
| Running Reverse KL  | 7.19     |
| Running Update Time | 1282     |
----------------------------------
2025-02-01 17:13:32.524923 Eastern Standard Time
| Itration            | 1283     |
| Real Det Return     | 680      |
| Real Sto Return     | 644      |
| Reward Loss         | -80.5    |
| Running Env Steps   | 641500   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 7.99     |
| Running Update Time | 1283     |
----------------------------------
2025-02-01 17:13:48.214005 Eastern Standard Time
| Itration            | 1284     |
| Real Det Return     | 682      |
| Real Sto Return     | 659      |
| Reward Loss         | -64.7    |
| Running Env Steps   | 642000   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 7.1      |
| Running Update Time | 1284     |
----------------------------------
2025-02-01 17:14:03.843435 Eastern Standard Time
| Itration            | 1285     |
| Real Det Return     | 650      |
| Real Sto Return     | 622      |
| Reward Loss         | -127     |
| Running Env Steps   | 642500   |
| Running Forward KL  | -3.05    |
| Running Reverse KL  | 7.93     |
| Running Update Time | 1285     |
----------------------------------
2025-02-01 17:14:19.541457 Eastern Standard Time
| Itration            | 1286     |
| Real Det Return     | 674      |
| Real Sto Return     | 652      |
| Reward Loss         | -81.7    |
| Running Env Steps   | 643000   |
| Running Forward KL  | -3.88    |
| Running Reverse KL  | 7.62     |
| Running Update Time | 1286     |
----------------------------------
2025-02-01 17:14:35.222111 Eastern Standard Time
| Itration            | 1287     |
| Real Det Return     | 671      |
| Real Sto Return     | 647      |
| Reward Loss         | -81.9    |
| Running Env Steps   | 643500   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 7.02     |
| Running Update Time | 1287     |
----------------------------------
2025-02-01 17:14:50.906016 Eastern Standard Time
| Itration            | 1288     |
| Real Det Return     | 665      |
| Real Sto Return     | 621      |
| Reward Loss         | -87      |
| Running Env Steps   | 644000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 7.89     |
| Running Update Time | 1288     |
----------------------------------
2025-02-01 17:15:06.746157 Eastern Standard Time
| Itration            | 1289     |
| Real Det Return     | 650      |
| Real Sto Return     | 632      |
| Reward Loss         | -80.2    |
| Running Env Steps   | 644500   |
| Running Forward KL  | -3.25    |
| Running Reverse KL  | 7.05     |
| Running Update Time | 1289     |
----------------------------------
2025-02-01 17:15:22.391081 Eastern Standard Time
| Itration            | 1290     |
| Real Det Return     | 687      |
| Real Sto Return     | 651      |
| Reward Loss         | -74      |
| Running Env Steps   | 645000   |
| Running Forward KL  | -3.33    |
| Running Reverse KL  | 8.07     |
| Running Update Time | 1290     |
----------------------------------
2025-02-01 17:15:38.042898 Eastern Standard Time
| Itration            | 1291     |
| Real Det Return     | 667      |
| Real Sto Return     | 649      |
| Reward Loss         | -101     |
| Running Env Steps   | 645500   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1291     |
----------------------------------
2025-02-01 17:15:53.638811 Eastern Standard Time
| Itration            | 1292     |
| Real Det Return     | 661      |
| Real Sto Return     | 639      |
| Reward Loss         | -109     |
| Running Env Steps   | 646000   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 6.78     |
| Running Update Time | 1292     |
----------------------------------
2025-02-01 17:16:09.259775 Eastern Standard Time
| Itration            | 1293     |
| Real Det Return     | 694      |
| Real Sto Return     | 661      |
| Reward Loss         | -68.5    |
| Running Env Steps   | 646500   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 1293     |
----------------------------------
2025-02-01 17:16:24.887877 Eastern Standard Time
| Itration            | 1294     |
| Real Det Return     | 672      |
| Real Sto Return     | 650      |
| Reward Loss         | -93.8    |
| Running Env Steps   | 647000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 6.72     |
| Running Update Time | 1294     |
----------------------------------
2025-02-01 17:16:40.583330 Eastern Standard Time
| Itration            | 1295     |
| Real Det Return     | 655      |
| Real Sto Return     | 652      |
| Reward Loss         | -119     |
| Running Env Steps   | 647500   |
| Running Forward KL  | -2.65    |
| Running Reverse KL  | 7.34     |
| Running Update Time | 1295     |
----------------------------------
2025-02-01 17:16:56.282825 Eastern Standard Time
| Itration            | 1296     |
| Real Det Return     | 674      |
| Real Sto Return     | 648      |
| Reward Loss         | -60.3    |
| Running Env Steps   | 648000   |
| Running Forward KL  | -3.47    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 1296     |
----------------------------------
2025-02-01 17:17:11.964692 Eastern Standard Time
| Itration            | 1297     |
| Real Det Return     | 661      |
| Real Sto Return     | 623      |
| Reward Loss         | -87.9    |
| Running Env Steps   | 648500   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 7.35     |
| Running Update Time | 1297     |
----------------------------------
2025-02-01 17:17:27.659799 Eastern Standard Time
| Itration            | 1298     |
| Real Det Return     | 678      |
| Real Sto Return     | 652      |
| Reward Loss         | -58.9    |
| Running Env Steps   | 649000   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 7.71     |
| Running Update Time | 1298     |
----------------------------------
2025-02-01 17:17:43.302052 Eastern Standard Time
| Itration            | 1299     |
| Real Det Return     | 668      |
| Real Sto Return     | 635      |
| Reward Loss         | -63.3    |
| Running Env Steps   | 649500   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 6.71     |
| Running Update Time | 1299     |
----------------------------------
2025-02-01 17:17:58.917982 Eastern Standard Time
| Itration            | 1300     |
| Real Det Return     | 674      |
| Real Sto Return     | 650      |
| Reward Loss         | -63.8    |
| Running Env Steps   | 650000   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 7.28     |
| Running Update Time | 1300     |
----------------------------------
2025-02-01 17:18:14.614499 Eastern Standard Time
| Itration            | 1301     |
| Real Det Return     | 700      |
| Real Sto Return     | 664      |
| Reward Loss         | -55.6    |
| Running Env Steps   | 650500   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 7.82     |
| Running Update Time | 1301     |
----------------------------------
2025-02-01 17:18:30.368885 Eastern Standard Time
| Itration            | 1302     |
| Real Det Return     | 674      |
| Real Sto Return     | 640      |
| Reward Loss         | -64.9    |
| Running Env Steps   | 651000   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 7.28     |
| Running Update Time | 1302     |
----------------------------------
2025-02-01 17:18:45.941598 Eastern Standard Time
| Itration            | 1303     |
| Real Det Return     | 674      |
| Real Sto Return     | 644      |
| Reward Loss         | -71.5    |
| Running Env Steps   | 651500   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 6.38     |
| Running Update Time | 1303     |
----------------------------------
2025-02-01 17:19:01.566924 Eastern Standard Time
| Itration            | 1304     |
| Real Det Return     | 688      |
| Real Sto Return     | 655      |
| Reward Loss         | -59.8    |
| Running Env Steps   | 652000   |
| Running Forward KL  | -3.87    |
| Running Reverse KL  | 7.85     |
| Running Update Time | 1304     |
----------------------------------
2025-02-01 17:19:17.247691 Eastern Standard Time
| Itration            | 1305     |
| Real Det Return     | 673      |
| Real Sto Return     | 650      |
| Reward Loss         | -83.5    |
| Running Env Steps   | 652500   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 7.52     |
| Running Update Time | 1305     |
----------------------------------
2025-02-01 17:19:32.880409 Eastern Standard Time
| Itration            | 1306     |
| Real Det Return     | 690      |
| Real Sto Return     | 660      |
| Reward Loss         | -33.7    |
| Running Env Steps   | 653000   |
| Running Forward KL  | -3.58    |
| Running Reverse KL  | 8.15     |
| Running Update Time | 1306     |
----------------------------------
2025-02-01 17:19:48.512335 Eastern Standard Time
| Itration            | 1307     |
| Real Det Return     | 691      |
| Real Sto Return     | 660      |
| Reward Loss         | -101     |
| Running Env Steps   | 653500   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1307     |
----------------------------------
2025-02-01 17:20:04.080508 Eastern Standard Time
| Itration            | 1308     |
| Real Det Return     | 688      |
| Real Sto Return     | 657      |
| Reward Loss         | -78.5    |
| Running Env Steps   | 654000   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 8.18     |
| Running Update Time | 1308     |
----------------------------------
2025-02-01 17:20:19.745673 Eastern Standard Time
| Itration            | 1309     |
| Real Det Return     | 679      |
| Real Sto Return     | 658      |
| Reward Loss         | -80.8    |
| Running Env Steps   | 654500   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 7.96     |
| Running Update Time | 1309     |
----------------------------------
2025-02-01 17:20:35.414333 Eastern Standard Time
| Itration            | 1310     |
| Real Det Return     | 675      |
| Real Sto Return     | 665      |
| Reward Loss         | -76.4    |
| Running Env Steps   | 655000   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1310     |
----------------------------------
2025-02-01 17:20:51.091836 Eastern Standard Time
| Itration            | 1311     |
| Real Det Return     | 653      |
| Real Sto Return     | 610      |
| Reward Loss         | -148     |
| Running Env Steps   | 655500   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 7.17     |
| Running Update Time | 1311     |
----------------------------------
2025-02-01 17:21:06.881784 Eastern Standard Time
| Itration            | 1312     |
| Real Det Return     | 663      |
| Real Sto Return     | 634      |
| Reward Loss         | -107     |
| Running Env Steps   | 656000   |
| Running Forward KL  | -2.72    |
| Running Reverse KL  | 7.63     |
| Running Update Time | 1312     |
----------------------------------
2025-02-01 17:21:22.560663 Eastern Standard Time
| Itration            | 1313     |
| Real Det Return     | 674      |
| Real Sto Return     | 652      |
| Reward Loss         | -92.6    |
| Running Env Steps   | 656500   |
| Running Forward KL  | -3.81    |
| Running Reverse KL  | 7.67     |
| Running Update Time | 1313     |
----------------------------------
2025-02-01 17:21:38.110411 Eastern Standard Time
| Itration            | 1314     |
| Real Det Return     | 696      |
| Real Sto Return     | 674      |
| Reward Loss         | -47.7    |
| Running Env Steps   | 657000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 6.79     |
| Running Update Time | 1314     |
----------------------------------
2025-02-01 17:21:53.975859 Eastern Standard Time
| Itration            | 1315     |
| Real Det Return     | 697      |
| Real Sto Return     | 647      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 657500   |
| Running Forward KL  | -3.83    |
| Running Reverse KL  | 8.13     |
| Running Update Time | 1315     |
----------------------------------
2025-02-01 17:22:09.824016 Eastern Standard Time
| Itration            | 1316     |
| Real Det Return     | 680      |
| Real Sto Return     | 654      |
| Reward Loss         | -116     |
| Running Env Steps   | 658000   |
| Running Forward KL  | -2.86    |
| Running Reverse KL  | 6.96     |
| Running Update Time | 1316     |
----------------------------------
2025-02-01 17:22:25.851887 Eastern Standard Time
| Itration            | 1317     |
| Real Det Return     | 684      |
| Real Sto Return     | 663      |
| Reward Loss         | -57.8    |
| Running Env Steps   | 658500   |
| Running Forward KL  | -3.46    |
| Running Reverse KL  | 8.05     |
| Running Update Time | 1317     |
----------------------------------
2025-02-01 17:22:41.751994 Eastern Standard Time
| Itration            | 1318     |
| Real Det Return     | 652      |
| Real Sto Return     | 608      |
| Reward Loss         | -157     |
| Running Env Steps   | 659000   |
| Running Forward KL  | -3.57    |
| Running Reverse KL  | 6.53     |
| Running Update Time | 1318     |
----------------------------------
2025-02-01 17:22:57.617686 Eastern Standard Time
| Itration            | 1319     |
| Real Det Return     | 678      |
| Real Sto Return     | 625      |
| Reward Loss         | -109     |
| Running Env Steps   | 659500   |
| Running Forward KL  | -3.56    |
| Running Reverse KL  | 7.69     |
| Running Update Time | 1319     |
----------------------------------
2025-02-01 17:23:13.834481 Eastern Standard Time
| Itration            | 1320     |
| Real Det Return     | 688      |
| Real Sto Return     | 652      |
| Reward Loss         | -62.3    |
| Running Env Steps   | 660000   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 7.82     |
| Running Update Time | 1320     |
----------------------------------
2025-02-01 17:23:29.581987 Eastern Standard Time
| Itration            | 1321     |
| Real Det Return     | 686      |
| Real Sto Return     | 641      |
| Reward Loss         | -45.5    |
| Running Env Steps   | 660500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 7.44     |
| Running Update Time | 1321     |
----------------------------------
2025-02-01 17:23:45.287066 Eastern Standard Time
| Itration            | 1322     |
| Real Det Return     | 670      |
| Real Sto Return     | 641      |
| Reward Loss         | -54.6    |
| Running Env Steps   | 661000   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 7.63     |
| Running Update Time | 1322     |
----------------------------------
2025-02-01 17:24:01.478114 Eastern Standard Time
| Itration            | 1323     |
| Real Det Return     | 660      |
| Real Sto Return     | 630      |
| Reward Loss         | -134     |
| Running Env Steps   | 661500   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 6.62     |
| Running Update Time | 1323     |
----------------------------------
2025-02-01 17:24:17.114191 Eastern Standard Time
| Itration            | 1324     |
| Real Det Return     | 684      |
| Real Sto Return     | 671      |
| Reward Loss         | -46.7    |
| Running Env Steps   | 662000   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1324     |
----------------------------------
2025-02-01 17:24:32.916154 Eastern Standard Time
| Itration            | 1325     |
| Real Det Return     | 705      |
| Real Sto Return     | 671      |
| Reward Loss         | -96.4    |
| Running Env Steps   | 662500   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 7.47     |
| Running Update Time | 1325     |
----------------------------------
2025-02-01 17:24:48.776657 Eastern Standard Time
| Itration            | 1326     |
| Real Det Return     | 687      |
| Real Sto Return     | 661      |
| Reward Loss         | -87.2    |
| Running Env Steps   | 663000   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 6.87     |
| Running Update Time | 1326     |
----------------------------------
2025-02-01 17:25:04.708339 Eastern Standard Time
| Itration            | 1327     |
| Real Det Return     | 642      |
| Real Sto Return     | 619      |
| Reward Loss         | -102     |
| Running Env Steps   | 663500   |
| Running Forward KL  | -3.58    |
| Running Reverse KL  | 7.66     |
| Running Update Time | 1327     |
----------------------------------
2025-02-01 17:25:20.542176 Eastern Standard Time
| Itration            | 1328     |
| Real Det Return     | 679      |
| Real Sto Return     | 655      |
| Reward Loss         | -89.7    |
| Running Env Steps   | 664000   |
| Running Forward KL  | -2.47    |
| Running Reverse KL  | 8.78     |
| Running Update Time | 1328     |
----------------------------------
2025-02-01 17:25:36.279574 Eastern Standard Time
| Itration            | 1329     |
| Real Det Return     | 679      |
| Real Sto Return     | 659      |
| Reward Loss         | -77.6    |
| Running Env Steps   | 664500   |
| Running Forward KL  | -3.37    |
| Running Reverse KL  | 7.82     |
| Running Update Time | 1329     |
----------------------------------
2025-02-01 17:25:52.235756 Eastern Standard Time
| Itration            | 1330     |
| Real Det Return     | 680      |
| Real Sto Return     | 654      |
| Reward Loss         | -78.2    |
| Running Env Steps   | 665000   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 7.77     |
| Running Update Time | 1330     |
----------------------------------
2025-02-01 17:26:08.414395 Eastern Standard Time
| Itration            | 1331     |
| Real Det Return     | 660      |
| Real Sto Return     | 640      |
| Reward Loss         | -96.1    |
| Running Env Steps   | 665500   |
| Running Forward KL  | -3.54    |
| Running Reverse KL  | 6.67     |
| Running Update Time | 1331     |
----------------------------------
2025-02-01 17:26:24.343117 Eastern Standard Time
| Itration            | 1332     |
| Real Det Return     | 680      |
| Real Sto Return     | 643      |
| Reward Loss         | -52.9    |
| Running Env Steps   | 666000   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 7.12     |
| Running Update Time | 1332     |
----------------------------------
2025-02-01 17:26:40.227465 Eastern Standard Time
| Itration            | 1333     |
| Real Det Return     | 686      |
| Real Sto Return     | 649      |
| Reward Loss         | -75.9    |
| Running Env Steps   | 666500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 6.79     |
| Running Update Time | 1333     |
----------------------------------
2025-02-01 17:26:56.181619 Eastern Standard Time
| Itration            | 1334     |
| Real Det Return     | 679      |
| Real Sto Return     | 655      |
| Reward Loss         | -62.1    |
| Running Env Steps   | 667000   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 7.62     |
| Running Update Time | 1334     |
----------------------------------
2025-02-01 17:27:12.137807 Eastern Standard Time
| Itration            | 1335     |
| Real Det Return     | 677      |
| Real Sto Return     | 658      |
| Reward Loss         | -63.6    |
| Running Env Steps   | 667500   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 7.27     |
| Running Update Time | 1335     |
----------------------------------
2025-02-01 17:27:28.541503 Eastern Standard Time
| Itration            | 1336     |
| Real Det Return     | 694      |
| Real Sto Return     | 669      |
| Reward Loss         | -86.2    |
| Running Env Steps   | 668000   |
| Running Forward KL  | -2.86    |
| Running Reverse KL  | 7.63     |
| Running Update Time | 1336     |
----------------------------------
2025-02-01 17:27:44.496465 Eastern Standard Time
| Itration            | 1337     |
| Real Det Return     | 698      |
| Real Sto Return     | 676      |
| Reward Loss         | -43.6    |
| Running Env Steps   | 668500   |
| Running Forward KL  | -3.84    |
| Running Reverse KL  | 8.27     |
| Running Update Time | 1337     |
----------------------------------
2025-02-01 17:28:00.537943 Eastern Standard Time
| Itration            | 1338     |
| Real Det Return     | 688      |
| Real Sto Return     | 656      |
| Reward Loss         | -45.1    |
| Running Env Steps   | 669000   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 7.87     |
| Running Update Time | 1338     |
----------------------------------
2025-02-01 17:28:16.961009 Eastern Standard Time
| Itration            | 1339     |
| Real Det Return     | 678      |
| Real Sto Return     | 649      |
| Reward Loss         | -78.1    |
| Running Env Steps   | 669500   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 7.32     |
| Running Update Time | 1339     |
----------------------------------
2025-02-01 17:28:33.431480 Eastern Standard Time
| Itration            | 1340     |
| Real Det Return     | 691      |
| Real Sto Return     | 670      |
| Reward Loss         | -64.1    |
| Running Env Steps   | 670000   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 7.5      |
| Running Update Time | 1340     |
----------------------------------
2025-02-01 17:28:49.543220 Eastern Standard Time
| Itration            | 1341     |
| Real Det Return     | 690      |
| Real Sto Return     | 652      |
| Reward Loss         | -80.9    |
| Running Env Steps   | 670500   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 6.33     |
| Running Update Time | 1341     |
----------------------------------
2025-02-01 17:29:05.469015 Eastern Standard Time
| Itration            | 1342     |
| Real Det Return     | 647      |
| Real Sto Return     | 612      |
| Reward Loss         | -180     |
| Running Env Steps   | 671000   |
| Running Forward KL  | -1.34    |
| Running Reverse KL  | 6.63     |
| Running Update Time | 1342     |
----------------------------------
2025-02-01 17:29:21.699308 Eastern Standard Time
| Itration            | 1343     |
| Real Det Return     | 682      |
| Real Sto Return     | 668      |
| Reward Loss         | -85.7    |
| Running Env Steps   | 671500   |
| Running Forward KL  | -3.67    |
| Running Reverse KL  | 7.69     |
| Running Update Time | 1343     |
----------------------------------
2025-02-01 17:29:37.934937 Eastern Standard Time
| Itration            | 1344     |
| Real Det Return     | 691      |
| Real Sto Return     | 664      |
| Reward Loss         | -48.2    |
| Running Env Steps   | 672000   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 8.13     |
| Running Update Time | 1344     |
----------------------------------
2025-02-01 17:29:53.938611 Eastern Standard Time
| Itration            | 1345     |
| Real Det Return     | 689      |
| Real Sto Return     | 661      |
| Reward Loss         | -60.2    |
| Running Env Steps   | 672500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 7.45     |
| Running Update Time | 1345     |
----------------------------------
2025-02-01 17:30:10.836183 Eastern Standard Time
| Itration            | 1346     |
| Real Det Return     | 698      |
| Real Sto Return     | 665      |
| Reward Loss         | -69.6    |
| Running Env Steps   | 673000   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 6.75     |
| Running Update Time | 1346     |
----------------------------------
2025-02-01 17:30:26.854086 Eastern Standard Time
| Itration            | 1347     |
| Real Det Return     | 690      |
| Real Sto Return     | 659      |
| Reward Loss         | -63.6    |
| Running Env Steps   | 673500   |
| Running Forward KL  | -3.52    |
| Running Reverse KL  | 8.43     |
| Running Update Time | 1347     |
----------------------------------
2025-02-01 17:30:42.961952 Eastern Standard Time
| Itration            | 1348     |
| Real Det Return     | 684      |
| Real Sto Return     | 657      |
| Reward Loss         | -56.9    |
| Running Env Steps   | 674000   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 7.03     |
| Running Update Time | 1348     |
----------------------------------
2025-02-01 17:30:59.678604 Eastern Standard Time
| Itration            | 1349     |
| Real Det Return     | 652      |
| Real Sto Return     | 635      |
| Reward Loss         | -126     |
| Running Env Steps   | 674500   |
| Running Forward KL  | -3.84    |
| Running Reverse KL  | 6.87     |
| Running Update Time | 1349     |
----------------------------------
2025-02-01 17:31:15.509342 Eastern Standard Time
| Itration            | 1350     |
| Real Det Return     | 684      |
| Real Sto Return     | 656      |
| Reward Loss         | -88.5    |
| Running Env Steps   | 675000   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 6.85     |
| Running Update Time | 1350     |
----------------------------------
2025-02-01 17:31:31.834968 Eastern Standard Time
| Itration            | 1351     |
| Real Det Return     | 626      |
| Real Sto Return     | 598      |
| Reward Loss         | -153     |
| Running Env Steps   | 675500   |
| Running Forward KL  | -3.21    |
| Running Reverse KL  | 6.49     |
| Running Update Time | 1351     |
----------------------------------
2025-02-01 17:31:47.794382 Eastern Standard Time
| Itration            | 1352     |
| Real Det Return     | 672      |
| Real Sto Return     | 642      |
| Reward Loss         | -92      |
| Running Env Steps   | 676000   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 7.57     |
| Running Update Time | 1352     |
----------------------------------
2025-02-01 17:32:04.009603 Eastern Standard Time
| Itration            | 1353     |
| Real Det Return     | 689      |
| Real Sto Return     | 673      |
| Reward Loss         | -85.4    |
| Running Env Steps   | 676500   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 6.99     |
| Running Update Time | 1353     |
----------------------------------
2025-02-01 17:32:19.748046 Eastern Standard Time
| Itration            | 1354     |
| Real Det Return     | 696      |
| Real Sto Return     | 670      |
| Reward Loss         | -72.8    |
| Running Env Steps   | 677000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 7        |
| Running Update Time | 1354     |
----------------------------------
2025-02-01 17:32:35.507705 Eastern Standard Time
| Itration            | 1355     |
| Real Det Return     | 654      |
| Real Sto Return     | 627      |
| Reward Loss         | -103     |
| Running Env Steps   | 677500   |
| Running Forward KL  | -3.5     |
| Running Reverse KL  | 7.51     |
| Running Update Time | 1355     |
----------------------------------
2025-02-01 17:32:51.150021 Eastern Standard Time
| Itration            | 1356     |
| Real Det Return     | 705      |
| Real Sto Return     | 670      |
| Reward Loss         | -68.6    |
| Running Env Steps   | 678000   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 7.21     |
| Running Update Time | 1356     |
----------------------------------
2025-02-01 17:33:06.831892 Eastern Standard Time
| Itration            | 1357     |
| Real Det Return     | 669      |
| Real Sto Return     | 625      |
| Reward Loss         | -80.2    |
| Running Env Steps   | 678500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 6.57     |
| Running Update Time | 1357     |
----------------------------------
2025-02-01 17:33:22.724471 Eastern Standard Time
| Itration            | 1358     |
| Real Det Return     | 672      |
| Real Sto Return     | 641      |
| Reward Loss         | -78.5    |
| Running Env Steps   | 679000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 1358     |
----------------------------------
2025-02-01 17:33:39.001047 Eastern Standard Time
| Itration            | 1359     |
| Real Det Return     | 679      |
| Real Sto Return     | 644      |
| Reward Loss         | -57.6    |
| Running Env Steps   | 679500   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 7.75     |
| Running Update Time | 1359     |
----------------------------------
2025-02-01 17:33:54.797713 Eastern Standard Time
| Itration            | 1360     |
| Real Det Return     | 689      |
| Real Sto Return     | 659      |
| Reward Loss         | -77.6    |
| Running Env Steps   | 680000   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1360     |
----------------------------------
2025-02-01 17:34:11.282482 Eastern Standard Time
| Itration            | 1361     |
| Real Det Return     | 681      |
| Real Sto Return     | 646      |
| Reward Loss         | -51.1    |
| Running Env Steps   | 680500   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 7.88     |
| Running Update Time | 1361     |
----------------------------------
2025-02-01 17:34:27.105503 Eastern Standard Time
| Itration            | 1362     |
| Real Det Return     | 638      |
| Real Sto Return     | 624      |
| Reward Loss         | -128     |
| Running Env Steps   | 681000   |
| Running Forward KL  | -4.08    |
| Running Reverse KL  | 6.19     |
| Running Update Time | 1362     |
----------------------------------
2025-02-01 17:34:43.206450 Eastern Standard Time
| Itration            | 1363     |
| Real Det Return     | 653      |
| Real Sto Return     | 629      |
| Reward Loss         | -91.5    |
| Running Env Steps   | 681500   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 6.74     |
| Running Update Time | 1363     |
----------------------------------
2025-02-01 17:34:59.254982 Eastern Standard Time
| Itration            | 1364     |
| Real Det Return     | 688      |
| Real Sto Return     | 667      |
| Reward Loss         | -71.2    |
| Running Env Steps   | 682000   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 1364     |
----------------------------------
2025-02-01 17:35:15.556189 Eastern Standard Time
| Itration            | 1365     |
| Real Det Return     | 698      |
| Real Sto Return     | 666      |
| Reward Loss         | -60.9    |
| Running Env Steps   | 682500   |
| Running Forward KL  | -4.08    |
| Running Reverse KL  | 7.63     |
| Running Update Time | 1365     |
----------------------------------
2025-02-01 17:35:31.384758 Eastern Standard Time
| Itration            | 1366     |
| Real Det Return     | 639      |
| Real Sto Return     | 618      |
| Reward Loss         | -126     |
| Running Env Steps   | 683000   |
| Running Forward KL  | -3.19    |
| Running Reverse KL  | 7.43     |
| Running Update Time | 1366     |
----------------------------------
2025-02-01 17:35:47.161945 Eastern Standard Time
| Itration            | 1367     |
| Real Det Return     | 668      |
| Real Sto Return     | 640      |
| Reward Loss         | -96.5    |
| Running Env Steps   | 683500   |
| Running Forward KL  | -3.27    |
| Running Reverse KL  | 6.75     |
| Running Update Time | 1367     |
----------------------------------
2025-02-01 17:36:02.828604 Eastern Standard Time
| Itration            | 1368     |
| Real Det Return     | 667      |
| Real Sto Return     | 649      |
| Reward Loss         | -63.3    |
| Running Env Steps   | 684000   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 7.1      |
| Running Update Time | 1368     |
----------------------------------
2025-02-01 17:36:18.665530 Eastern Standard Time
| Itration            | 1369     |
| Real Det Return     | 603      |
| Real Sto Return     | 593      |
| Reward Loss         | -154     |
| Running Env Steps   | 684500   |
| Running Forward KL  | -2.34    |
| Running Reverse KL  | 7.23     |
| Running Update Time | 1369     |
----------------------------------
2025-02-01 17:36:34.456719 Eastern Standard Time
| Itration            | 1370     |
| Real Det Return     | 669      |
| Real Sto Return     | 657      |
| Reward Loss         | -66.2    |
| Running Env Steps   | 685000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 7.29     |
| Running Update Time | 1370     |
----------------------------------
2025-02-01 17:36:50.214480 Eastern Standard Time
| Itration            | 1371     |
| Real Det Return     | 659      |
| Real Sto Return     | 641      |
| Reward Loss         | -78.9    |
| Running Env Steps   | 685500   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1371     |
----------------------------------
2025-02-01 17:37:05.901649 Eastern Standard Time
| Itration            | 1372     |
| Real Det Return     | 676      |
| Real Sto Return     | 643      |
| Reward Loss         | -95.7    |
| Running Env Steps   | 686000   |
| Running Forward KL  | -3.15    |
| Running Reverse KL  | 7.22     |
| Running Update Time | 1372     |
----------------------------------
2025-02-01 17:37:21.696735 Eastern Standard Time
| Itration            | 1373     |
| Real Det Return     | 680      |
| Real Sto Return     | 659      |
| Reward Loss         | -53.5    |
| Running Env Steps   | 686500   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 7.34     |
| Running Update Time | 1373     |
----------------------------------
2025-02-01 17:37:37.412166 Eastern Standard Time
| Itration            | 1374     |
| Real Det Return     | 702      |
| Real Sto Return     | 654      |
| Reward Loss         | -96.8    |
| Running Env Steps   | 687000   |
| Running Forward KL  | -3.69    |
| Running Reverse KL  | 6.55     |
| Running Update Time | 1374     |
----------------------------------
2025-02-01 17:37:53.214457 Eastern Standard Time
| Itration            | 1375     |
| Real Det Return     | 675      |
| Real Sto Return     | 653      |
| Reward Loss         | -99.1    |
| Running Env Steps   | 687500   |
| Running Forward KL  | -3.41    |
| Running Reverse KL  | 6.63     |
| Running Update Time | 1375     |
----------------------------------
2025-02-01 17:38:09.056599 Eastern Standard Time
| Itration            | 1376     |
| Real Det Return     | 694      |
| Real Sto Return     | 669      |
| Reward Loss         | -61.3    |
| Running Env Steps   | 688000   |
| Running Forward KL  | -3.55    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 1376     |
----------------------------------
2025-02-01 17:38:24.827661 Eastern Standard Time
| Itration            | 1377     |
| Real Det Return     | 660      |
| Real Sto Return     | 639      |
| Reward Loss         | -80.6    |
| Running Env Steps   | 688500   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 6.53     |
| Running Update Time | 1377     |
----------------------------------
2025-02-01 17:38:40.641183 Eastern Standard Time
| Itration            | 1378     |
| Real Det Return     | 693      |
| Real Sto Return     | 669      |
| Reward Loss         | -80.8    |
| Running Env Steps   | 689000   |
| Running Forward KL  | -4.15    |
| Running Reverse KL  | 7.58     |
| Running Update Time | 1378     |
----------------------------------
2025-02-01 17:38:56.504594 Eastern Standard Time
| Itration            | 1379     |
| Real Det Return     | 706      |
| Real Sto Return     | 686      |
| Reward Loss         | -49.5    |
| Running Env Steps   | 689500   |
| Running Forward KL  | -4.24    |
| Running Reverse KL  | 6.94     |
| Running Update Time | 1379     |
----------------------------------
2025-02-01 17:39:12.660067 Eastern Standard Time
| Itration            | 1380     |
| Real Det Return     | 680      |
| Real Sto Return     | 662      |
| Reward Loss         | -73      |
| Running Env Steps   | 690000   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 6.83     |
| Running Update Time | 1380     |
----------------------------------
2025-02-01 17:39:28.308710 Eastern Standard Time
| Itration            | 1381     |
| Real Det Return     | 674      |
| Real Sto Return     | 642      |
| Reward Loss         | -78.5    |
| Running Env Steps   | 690500   |
| Running Forward KL  | -3.23    |
| Running Reverse KL  | 8.59     |
| Running Update Time | 1381     |
----------------------------------
2025-02-01 17:39:43.918639 Eastern Standard Time
| Itration            | 1382     |
| Real Det Return     | 686      |
| Real Sto Return     | 678      |
| Reward Loss         | -50.5    |
| Running Env Steps   | 691000   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 8.2      |
| Running Update Time | 1382     |
----------------------------------
2025-02-01 17:39:59.704368 Eastern Standard Time
| Itration            | 1383     |
| Real Det Return     | 632      |
| Real Sto Return     | 618      |
| Reward Loss         | -125     |
| Running Env Steps   | 691500   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 6.23     |
| Running Update Time | 1383     |
----------------------------------
2025-02-01 17:40:15.698446 Eastern Standard Time
| Itration            | 1384     |
| Real Det Return     | 635      |
| Real Sto Return     | 621      |
| Reward Loss         | -147     |
| Running Env Steps   | 692000   |
| Running Forward KL  | -1.6     |
| Running Reverse KL  | 8.03     |
| Running Update Time | 1384     |
----------------------------------
2025-02-01 17:40:31.682326 Eastern Standard Time
| Itration            | 1385     |
| Real Det Return     | 681      |
| Real Sto Return     | 632      |
| Reward Loss         | -114     |
| Running Env Steps   | 692500   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 7.15     |
| Running Update Time | 1385     |
----------------------------------
2025-02-01 17:40:47.483562 Eastern Standard Time
| Itration            | 1386     |
| Real Det Return     | 688      |
| Real Sto Return     | 660      |
| Reward Loss         | -55.5    |
| Running Env Steps   | 693000   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 7.16     |
| Running Update Time | 1386     |
----------------------------------
2025-02-01 17:41:03.228016 Eastern Standard Time
| Itration            | 1387     |
| Real Det Return     | 683      |
| Real Sto Return     | 652      |
| Reward Loss         | -55.4    |
| Running Env Steps   | 693500   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 1387     |
----------------------------------
2025-02-01 17:41:19.427780 Eastern Standard Time
| Itration            | 1388     |
| Real Det Return     | 670      |
| Real Sto Return     | 635      |
| Reward Loss         | -96.4    |
| Running Env Steps   | 694000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 6.29     |
| Running Update Time | 1388     |
----------------------------------
2025-02-01 17:41:35.320067 Eastern Standard Time
| Itration            | 1389     |
| Real Det Return     | 673      |
| Real Sto Return     | 625      |
| Reward Loss         | -81.4    |
| Running Env Steps   | 694500   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 6.89     |
| Running Update Time | 1389     |
----------------------------------
2025-02-01 17:41:51.368139 Eastern Standard Time
| Itration            | 1390     |
| Real Det Return     | 694      |
| Real Sto Return     | 677      |
| Reward Loss         | -39.2    |
| Running Env Steps   | 695000   |
| Running Forward KL  | -3.87    |
| Running Reverse KL  | 8.53     |
| Running Update Time | 1390     |
----------------------------------
2025-02-01 17:42:07.446041 Eastern Standard Time
| Itration            | 1391     |
| Real Det Return     | 680      |
| Real Sto Return     | 641      |
| Reward Loss         | -67.3    |
| Running Env Steps   | 695500   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 7.69     |
| Running Update Time | 1391     |
----------------------------------
2025-02-01 17:42:23.505782 Eastern Standard Time
| Itration            | 1392     |
| Real Det Return     | 657      |
| Real Sto Return     | 629      |
| Reward Loss         | -96.1    |
| Running Env Steps   | 696000   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 6.26     |
| Running Update Time | 1392     |
----------------------------------
2025-02-01 17:42:39.498657 Eastern Standard Time
| Itration            | 1393     |
| Real Det Return     | 667      |
| Real Sto Return     | 648      |
| Reward Loss         | -72.2    |
| Running Env Steps   | 696500   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 6.44     |
| Running Update Time | 1393     |
----------------------------------
2025-02-01 17:42:55.354410 Eastern Standard Time
| Itration            | 1394     |
| Real Det Return     | 684      |
| Real Sto Return     | 669      |
| Reward Loss         | -80.3    |
| Running Env Steps   | 697000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 7.32     |
| Running Update Time | 1394     |
----------------------------------
2025-02-01 17:43:11.274388 Eastern Standard Time
| Itration            | 1395     |
| Real Det Return     | 677      |
| Real Sto Return     | 645      |
| Reward Loss         | -77.9    |
| Running Env Steps   | 697500   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1395     |
----------------------------------
2025-02-01 17:43:27.096429 Eastern Standard Time
| Itration            | 1396     |
| Real Det Return     | 689      |
| Real Sto Return     | 672      |
| Reward Loss         | -89.8    |
| Running Env Steps   | 698000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 7.16     |
| Running Update Time | 1396     |
----------------------------------
2025-02-01 17:43:42.933937 Eastern Standard Time
| Itration            | 1397     |
| Real Det Return     | 658      |
| Real Sto Return     | 630      |
| Reward Loss         | -93.8    |
| Running Env Steps   | 698500   |
| Running Forward KL  | -5.5     |
| Running Reverse KL  | 6.15     |
| Running Update Time | 1397     |
----------------------------------
2025-02-01 17:43:58.768600 Eastern Standard Time
| Itration            | 1398     |
| Real Det Return     | 681      |
| Real Sto Return     | 670      |
| Reward Loss         | -42.5    |
| Running Env Steps   | 699000   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 8.24     |
| Running Update Time | 1398     |
----------------------------------
2025-02-01 17:44:14.747137 Eastern Standard Time
| Itration            | 1399     |
| Real Det Return     | 672      |
| Real Sto Return     | 640      |
| Reward Loss         | -73.3    |
| Running Env Steps   | 699500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 6.54     |
| Running Update Time | 1399     |
----------------------------------
2025-02-01 17:44:31.098799 Eastern Standard Time
| Itration            | 1400     |
| Real Det Return     | 679      |
| Real Sto Return     | 647      |
| Reward Loss         | -121     |
| Running Env Steps   | 700000   |
| Running Forward KL  | -3.41    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 1400     |
----------------------------------
2025-02-01 17:44:46.816319 Eastern Standard Time
| Itration            | 1401     |
| Real Det Return     | 689      |
| Real Sto Return     | 650      |
| Reward Loss         | -120     |
| Running Env Steps   | 700500   |
| Running Forward KL  | -4.08    |
| Running Reverse KL  | 7.27     |
| Running Update Time | 1401     |
----------------------------------
2025-02-01 17:45:02.485433 Eastern Standard Time
| Itration            | 1402     |
| Real Det Return     | 689      |
| Real Sto Return     | 660      |
| Reward Loss         | -52.7    |
| Running Env Steps   | 701000   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 7.28     |
| Running Update Time | 1402     |
----------------------------------
2025-02-01 17:45:18.180503 Eastern Standard Time
| Itration            | 1403     |
| Real Det Return     | 667      |
| Real Sto Return     | 646      |
| Reward Loss         | -80      |
| Running Env Steps   | 701500   |
| Running Forward KL  | -4.5     |
| Running Reverse KL  | 7.68     |
| Running Update Time | 1403     |
----------------------------------
2025-02-01 17:45:33.842310 Eastern Standard Time
| Itration            | 1404     |
| Real Det Return     | 694      |
| Real Sto Return     | 673      |
| Reward Loss         | -68.7    |
| Running Env Steps   | 702000   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 8.06     |
| Running Update Time | 1404     |
----------------------------------
2025-02-01 17:45:49.560499 Eastern Standard Time
| Itration            | 1405     |
| Real Det Return     | 703      |
| Real Sto Return     | 672      |
| Reward Loss         | -63.4    |
| Running Env Steps   | 702500   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 7.56     |
| Running Update Time | 1405     |
----------------------------------
2025-02-01 17:46:05.574915 Eastern Standard Time
| Itration            | 1406     |
| Real Det Return     | 685      |
| Real Sto Return     | 660      |
| Reward Loss         | -91.8    |
| Running Env Steps   | 703000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 7        |
| Running Update Time | 1406     |
----------------------------------
2025-02-01 17:46:21.543565 Eastern Standard Time
| Itration            | 1407     |
| Real Det Return     | 704      |
| Real Sto Return     | 670      |
| Reward Loss         | -119     |
| Running Env Steps   | 703500   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 6.44     |
| Running Update Time | 1407     |
----------------------------------
2025-02-01 17:46:37.638930 Eastern Standard Time
| Itration            | 1408     |
| Real Det Return     | 686      |
| Real Sto Return     | 659      |
| Reward Loss         | -36.3    |
| Running Env Steps   | 704000   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 6.74     |
| Running Update Time | 1408     |
----------------------------------
2025-02-01 17:46:53.372158 Eastern Standard Time
| Itration            | 1409     |
| Real Det Return     | 646      |
| Real Sto Return     | 637      |
| Reward Loss         | -89.4    |
| Running Env Steps   | 704500   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 6.4      |
| Running Update Time | 1409     |
----------------------------------
2025-02-01 17:47:09.125070 Eastern Standard Time
| Itration            | 1410     |
| Real Det Return     | 677      |
| Real Sto Return     | 665      |
| Reward Loss         | -55.4    |
| Running Env Steps   | 705000   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 6.69     |
| Running Update Time | 1410     |
----------------------------------
2025-02-01 17:47:24.940231 Eastern Standard Time
| Itration            | 1411     |
| Real Det Return     | 678      |
| Real Sto Return     | 662      |
| Reward Loss         | -110     |
| Running Env Steps   | 705500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 7.34     |
| Running Update Time | 1411     |
----------------------------------
2025-02-01 17:47:40.717006 Eastern Standard Time
| Itration            | 1412     |
| Real Det Return     | 681      |
| Real Sto Return     | 643      |
| Reward Loss         | -50.1    |
| Running Env Steps   | 706000   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 7.64     |
| Running Update Time | 1412     |
----------------------------------
2025-02-01 17:47:56.738248 Eastern Standard Time
| Itration            | 1413     |
| Real Det Return     | 707      |
| Real Sto Return     | 678      |
| Reward Loss         | -96.4    |
| Running Env Steps   | 706500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 1413     |
----------------------------------
2025-02-01 17:48:12.633518 Eastern Standard Time
| Itration            | 1414     |
| Real Det Return     | 703      |
| Real Sto Return     | 660      |
| Reward Loss         | -83.7    |
| Running Env Steps   | 707000   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 6.71     |
| Running Update Time | 1414     |
----------------------------------
2025-02-01 17:48:28.636190 Eastern Standard Time
| Itration            | 1415     |
| Real Det Return     | 683      |
| Real Sto Return     | 664      |
| Reward Loss         | -89      |
| Running Env Steps   | 707500   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 7.52     |
| Running Update Time | 1415     |
----------------------------------
2025-02-01 17:48:44.842042 Eastern Standard Time
| Itration            | 1416     |
| Real Det Return     | 697      |
| Real Sto Return     | 675      |
| Reward Loss         | -57.3    |
| Running Env Steps   | 708000   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 7.86     |
| Running Update Time | 1416     |
----------------------------------
2025-02-01 17:49:00.575838 Eastern Standard Time
| Itration            | 1417     |
| Real Det Return     | 686      |
| Real Sto Return     | 660      |
| Reward Loss         | -73.4    |
| Running Env Steps   | 708500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 8.16     |
| Running Update Time | 1417     |
----------------------------------
2025-02-01 17:49:16.329081 Eastern Standard Time
| Itration            | 1418     |
| Real Det Return     | 696      |
| Real Sto Return     | 675      |
| Reward Loss         | -49.1    |
| Running Env Steps   | 709000   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 8.71     |
| Running Update Time | 1418     |
----------------------------------
2025-02-01 17:49:32.072129 Eastern Standard Time
| Itration            | 1419     |
| Real Det Return     | 684      |
| Real Sto Return     | 657      |
| Reward Loss         | -61      |
| Running Env Steps   | 709500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1419     |
----------------------------------
2025-02-01 17:49:47.913294 Eastern Standard Time
| Itration            | 1420     |
| Real Det Return     | 661      |
| Real Sto Return     | 641      |
| Reward Loss         | -62.9    |
| Running Env Steps   | 710000   |
| Running Forward KL  | -4.17    |
| Running Reverse KL  | 7.45     |
| Running Update Time | 1420     |
----------------------------------
2025-02-01 17:50:03.930604 Eastern Standard Time
| Itration            | 1421     |
| Real Det Return     | 696      |
| Real Sto Return     | 663      |
| Reward Loss         | -77      |
| Running Env Steps   | 710500   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 1421     |
----------------------------------
2025-02-01 17:50:19.924575 Eastern Standard Time
| Itration            | 1422     |
| Real Det Return     | 706      |
| Real Sto Return     | 681      |
| Reward Loss         | -88.3    |
| Running Env Steps   | 711000   |
| Running Forward KL  | -3.76    |
| Running Reverse KL  | 7.74     |
| Running Update Time | 1422     |
----------------------------------
2025-02-01 17:50:35.981361 Eastern Standard Time
| Itration            | 1423     |
| Real Det Return     | 669      |
| Real Sto Return     | 656      |
| Reward Loss         | -85.5    |
| Running Env Steps   | 711500   |
| Running Forward KL  | -4.1     |
| Running Reverse KL  | 6.67     |
| Running Update Time | 1423     |
----------------------------------
2025-02-01 17:50:52.217140 Eastern Standard Time
| Itration            | 1424     |
| Real Det Return     | 666      |
| Real Sto Return     | 639      |
| Reward Loss         | -94.1    |
| Running Env Steps   | 712000   |
| Running Forward KL  | -3.16    |
| Running Reverse KL  | 8.08     |
| Running Update Time | 1424     |
----------------------------------
2025-02-01 17:51:08.007860 Eastern Standard Time
| Itration            | 1425     |
| Real Det Return     | 684      |
| Real Sto Return     | 658      |
| Reward Loss         | -83.9    |
| Running Env Steps   | 712500   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 6.58     |
| Running Update Time | 1425     |
----------------------------------
2025-02-01 17:51:23.856459 Eastern Standard Time
| Itration            | 1426     |
| Real Det Return     | 679      |
| Real Sto Return     | 659      |
| Reward Loss         | -31.6    |
| Running Env Steps   | 713000   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 7.65     |
| Running Update Time | 1426     |
----------------------------------
2025-02-01 17:51:39.932770 Eastern Standard Time
| Itration            | 1427     |
| Real Det Return     | 686      |
| Real Sto Return     | 669      |
| Reward Loss         | -74.7    |
| Running Env Steps   | 713500   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 6.71     |
| Running Update Time | 1427     |
----------------------------------
2025-02-01 17:51:55.670033 Eastern Standard Time
| Itration            | 1428     |
| Real Det Return     | 683      |
| Real Sto Return     | 641      |
| Reward Loss         | -57.4    |
| Running Env Steps   | 714000   |
| Running Forward KL  | -4.01    |
| Running Reverse KL  | 7.14     |
| Running Update Time | 1428     |
----------------------------------
2025-02-01 17:52:11.445030 Eastern Standard Time
| Itration            | 1429     |
| Real Det Return     | 675      |
| Real Sto Return     | 648      |
| Reward Loss         | -62.1    |
| Running Env Steps   | 714500   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 7.32     |
| Running Update Time | 1429     |
----------------------------------
2025-02-01 17:52:27.370752 Eastern Standard Time
| Itration            | 1430     |
| Real Det Return     | 683      |
| Real Sto Return     | 656      |
| Reward Loss         | -46.5    |
| Running Env Steps   | 715000   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 7.91     |
| Running Update Time | 1430     |
----------------------------------
2025-02-01 17:52:43.733662 Eastern Standard Time
| Itration            | 1431     |
| Real Det Return     | 689      |
| Real Sto Return     | 657      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 715500   |
| Running Forward KL  | -4.12    |
| Running Reverse KL  | 7.88     |
| Running Update Time | 1431     |
----------------------------------
2025-02-01 17:52:59.719486 Eastern Standard Time
| Itration            | 1432     |
| Real Det Return     | 683      |
| Real Sto Return     | 662      |
| Reward Loss         | -103     |
| Running Env Steps   | 716000   |
| Running Forward KL  | -3.78    |
| Running Reverse KL  | 6.46     |
| Running Update Time | 1432     |
----------------------------------
2025-02-01 17:53:15.961231 Eastern Standard Time
| Itration            | 1433     |
| Real Det Return     | 690      |
| Real Sto Return     | 654      |
| Reward Loss         | -64.3    |
| Running Env Steps   | 716500   |
| Running Forward KL  | -3.84    |
| Running Reverse KL  | 7.34     |
| Running Update Time | 1433     |
----------------------------------
2025-02-01 17:53:31.687370 Eastern Standard Time
| Itration            | 1434     |
| Real Det Return     | 688      |
| Real Sto Return     | 660      |
| Reward Loss         | -71.7    |
| Running Env Steps   | 717000   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 1434     |
----------------------------------
2025-02-01 17:53:47.641539 Eastern Standard Time
| Itration            | 1435     |
| Real Det Return     | 680      |
| Real Sto Return     | 660      |
| Reward Loss         | -73.6    |
| Running Env Steps   | 717500   |
| Running Forward KL  | -3.57    |
| Running Reverse KL  | 7.79     |
| Running Update Time | 1435     |
----------------------------------
2025-02-01 17:54:03.545404 Eastern Standard Time
| Itration            | 1436     |
| Real Det Return     | 677      |
| Real Sto Return     | 674      |
| Reward Loss         | -77.6    |
| Running Env Steps   | 718000   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 8.2      |
| Running Update Time | 1436     |
----------------------------------
2025-02-01 17:54:19.577277 Eastern Standard Time
| Itration            | 1437     |
| Real Det Return     | 688      |
| Real Sto Return     | 677      |
| Reward Loss         | -73.9    |
| Running Env Steps   | 718500   |
| Running Forward KL  | -4.04    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1437     |
----------------------------------
2025-02-01 17:54:35.438847 Eastern Standard Time
| Itration            | 1438     |
| Real Det Return     | 679      |
| Real Sto Return     | 655      |
| Reward Loss         | -85.7    |
| Running Env Steps   | 719000   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 7.78     |
| Running Update Time | 1438     |
----------------------------------
2025-02-01 17:54:51.291590 Eastern Standard Time
| Itration            | 1439     |
| Real Det Return     | 685      |
| Real Sto Return     | 666      |
| Reward Loss         | -85.4    |
| Running Env Steps   | 719500   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 6.86     |
| Running Update Time | 1439     |
----------------------------------
2025-02-01 17:55:07.173135 Eastern Standard Time
| Itration            | 1440     |
| Real Det Return     | 674      |
| Real Sto Return     | 660      |
| Reward Loss         | -84.6    |
| Running Env Steps   | 720000   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 7.85     |
| Running Update Time | 1440     |
----------------------------------
2025-02-01 17:55:23.164318 Eastern Standard Time
| Itration            | 1441     |
| Real Det Return     | 684      |
| Real Sto Return     | 661      |
| Reward Loss         | -75.4    |
| Running Env Steps   | 720500   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 7.21     |
| Running Update Time | 1441     |
----------------------------------
2025-02-01 17:55:38.949334 Eastern Standard Time
| Itration            | 1442     |
| Real Det Return     | 690      |
| Real Sto Return     | 667      |
| Reward Loss         | -60.7    |
| Running Env Steps   | 721000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 7.83     |
| Running Update Time | 1442     |
----------------------------------
2025-02-01 17:55:54.894824 Eastern Standard Time
| Itration            | 1443     |
| Real Det Return     | 652      |
| Real Sto Return     | 626      |
| Reward Loss         | -75.7    |
| Running Env Steps   | 721500   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1443     |
----------------------------------
2025-02-01 17:56:10.920716 Eastern Standard Time
| Itration            | 1444     |
| Real Det Return     | 670      |
| Real Sto Return     | 647      |
| Reward Loss         | -121     |
| Running Env Steps   | 722000   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 7.85     |
| Running Update Time | 1444     |
----------------------------------
2025-02-01 17:56:27.103275 Eastern Standard Time
| Itration            | 1445     |
| Real Det Return     | 698      |
| Real Sto Return     | 673      |
| Reward Loss         | -62.1    |
| Running Env Steps   | 722500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 7.77     |
| Running Update Time | 1445     |
----------------------------------
2025-02-01 17:56:42.916360 Eastern Standard Time
| Itration            | 1446     |
| Real Det Return     | 671      |
| Real Sto Return     | 646      |
| Reward Loss         | -106     |
| Running Env Steps   | 723000   |
| Running Forward KL  | -3.71    |
| Running Reverse KL  | 7.12     |
| Running Update Time | 1446     |
----------------------------------
2025-02-01 17:56:58.606428 Eastern Standard Time
| Itration            | 1447     |
| Real Det Return     | 685      |
| Real Sto Return     | 680      |
| Reward Loss         | -34      |
| Running Env Steps   | 723500   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 8.13     |
| Running Update Time | 1447     |
----------------------------------
2025-02-01 17:57:14.338544 Eastern Standard Time
| Itration            | 1448     |
| Real Det Return     | 687      |
| Real Sto Return     | 650      |
| Reward Loss         | -74      |
| Running Env Steps   | 724000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 6.23     |
| Running Update Time | 1448     |
----------------------------------
2025-02-01 17:57:29.994213 Eastern Standard Time
| Itration            | 1449     |
| Real Det Return     | 661      |
| Real Sto Return     | 608      |
| Reward Loss         | -120     |
| Running Env Steps   | 724500   |
| Running Forward KL  | -3.79    |
| Running Reverse KL  | 7.11     |
| Running Update Time | 1449     |
----------------------------------
2025-02-01 17:57:45.699015 Eastern Standard Time
| Itration            | 1450     |
| Real Det Return     | 701      |
| Real Sto Return     | 678      |
| Reward Loss         | -52.9    |
| Running Env Steps   | 725000   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 7.85     |
| Running Update Time | 1450     |
----------------------------------
2025-02-01 17:58:01.441374 Eastern Standard Time
| Itration            | 1451     |
| Real Det Return     | 691      |
| Real Sto Return     | 661      |
| Reward Loss         | -55.9    |
| Running Env Steps   | 725500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1451     |
----------------------------------
2025-02-01 17:58:17.150516 Eastern Standard Time
| Itration            | 1452     |
| Real Det Return     | 692      |
| Real Sto Return     | 673      |
| Reward Loss         | -63.1    |
| Running Env Steps   | 726000   |
| Running Forward KL  | -5.53    |
| Running Reverse KL  | 7.69     |
| Running Update Time | 1452     |
----------------------------------
2025-02-01 17:58:33.067797 Eastern Standard Time
| Itration            | 1453     |
| Real Det Return     | 693      |
| Real Sto Return     | 663      |
| Reward Loss         | -87.2    |
| Running Env Steps   | 726500   |
| Running Forward KL  | -3.41    |
| Running Reverse KL  | 8.56     |
| Running Update Time | 1453     |
----------------------------------
2025-02-01 17:58:49.168033 Eastern Standard Time
| Itration            | 1454     |
| Real Det Return     | 671      |
| Real Sto Return     | 642      |
| Reward Loss         | -104     |
| Running Env Steps   | 727000   |
| Running Forward KL  | -2.02    |
| Running Reverse KL  | 8.33     |
| Running Update Time | 1454     |
----------------------------------
2025-02-01 17:59:05.108472 Eastern Standard Time
| Itration            | 1455     |
| Real Det Return     | 703      |
| Real Sto Return     | 675      |
| Reward Loss         | -60.9    |
| Running Env Steps   | 727500   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 6.75     |
| Running Update Time | 1455     |
----------------------------------
2025-02-01 17:59:21.234094 Eastern Standard Time
| Itration            | 1456     |
| Real Det Return     | 696      |
| Real Sto Return     | 671      |
| Reward Loss         | -76.7    |
| Running Env Steps   | 728000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 7.18     |
| Running Update Time | 1456     |
----------------------------------
2025-02-01 17:59:37.698148 Eastern Standard Time
| Itration            | 1457     |
| Real Det Return     | 676      |
| Real Sto Return     | 661      |
| Reward Loss         | -105     |
| Running Env Steps   | 728500   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 6.05     |
| Running Update Time | 1457     |
----------------------------------
2025-02-01 17:59:53.602974 Eastern Standard Time
| Itration            | 1458     |
| Real Det Return     | 689      |
| Real Sto Return     | 666      |
| Reward Loss         | -65.4    |
| Running Env Steps   | 729000   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 1458     |
----------------------------------
2025-02-01 18:00:09.561201 Eastern Standard Time
| Itration            | 1459     |
| Real Det Return     | 681      |
| Real Sto Return     | 648      |
| Reward Loss         | -66.4    |
| Running Env Steps   | 729500   |
| Running Forward KL  | -5.47    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 1459     |
----------------------------------
2025-02-01 18:00:25.584327 Eastern Standard Time
| Itration            | 1460     |
| Real Det Return     | 669      |
| Real Sto Return     | 652      |
| Reward Loss         | -50.2    |
| Running Env Steps   | 730000   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1460     |
----------------------------------
2025-02-01 18:00:41.546069 Eastern Standard Time
| Itration            | 1461     |
| Real Det Return     | 691      |
| Real Sto Return     | 666      |
| Reward Loss         | -68      |
| Running Env Steps   | 730500   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1461     |
----------------------------------
2025-02-01 18:00:57.480000 Eastern Standard Time
| Itration            | 1462     |
| Real Det Return     | 672      |
| Real Sto Return     | 640      |
| Reward Loss         | -131     |
| Running Env Steps   | 731000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 6.75     |
| Running Update Time | 1462     |
----------------------------------
2025-02-01 18:01:13.431678 Eastern Standard Time
| Itration            | 1463     |
| Real Det Return     | 687      |
| Real Sto Return     | 659      |
| Reward Loss         | -54.4    |
| Running Env Steps   | 731500   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 7.32     |
| Running Update Time | 1463     |
----------------------------------
2025-02-01 18:01:29.385574 Eastern Standard Time
| Itration            | 1464     |
| Real Det Return     | 662      |
| Real Sto Return     | 638      |
| Reward Loss         | -98      |
| Running Env Steps   | 732000   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 8.34     |
| Running Update Time | 1464     |
----------------------------------
2025-02-01 18:01:45.319605 Eastern Standard Time
| Itration            | 1465     |
| Real Det Return     | 623      |
| Real Sto Return     | 573      |
| Reward Loss         | -117     |
| Running Env Steps   | 732500   |
| Running Forward KL  | -3.01    |
| Running Reverse KL  | 7.29     |
| Running Update Time | 1465     |
----------------------------------
2025-02-01 18:02:01.279200 Eastern Standard Time
| Itration            | 1466     |
| Real Det Return     | 666      |
| Real Sto Return     | 648      |
| Reward Loss         | -97.9    |
| Running Env Steps   | 733000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 6.47     |
| Running Update Time | 1466     |
----------------------------------
2025-02-01 18:02:17.226601 Eastern Standard Time
| Itration            | 1467     |
| Real Det Return     | 669      |
| Real Sto Return     | 636      |
| Reward Loss         | -82.7    |
| Running Env Steps   | 733500   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 6.41     |
| Running Update Time | 1467     |
----------------------------------
2025-02-01 18:02:33.104339 Eastern Standard Time
| Itration            | 1468     |
| Real Det Return     | 694      |
| Real Sto Return     | 665      |
| Reward Loss         | -69.4    |
| Running Env Steps   | 734000   |
| Running Forward KL  | -3.6     |
| Running Reverse KL  | 8.62     |
| Running Update Time | 1468     |
----------------------------------
2025-02-01 18:02:49.027174 Eastern Standard Time
| Itration            | 1469     |
| Real Det Return     | 668      |
| Real Sto Return     | 637      |
| Reward Loss         | -109     |
| Running Env Steps   | 734500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 7.15     |
| Running Update Time | 1469     |
----------------------------------
2025-02-01 18:03:05.262590 Eastern Standard Time
| Itration            | 1470     |
| Real Det Return     | 683      |
| Real Sto Return     | 658      |
| Reward Loss         | -55.6    |
| Running Env Steps   | 735000   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1470     |
----------------------------------
2025-02-01 18:03:21.196985 Eastern Standard Time
| Itration            | 1471     |
| Real Det Return     | 686      |
| Real Sto Return     | 646      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 735500   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 6.53     |
| Running Update Time | 1471     |
----------------------------------
2025-02-01 18:03:37.137183 Eastern Standard Time
| Itration            | 1472     |
| Real Det Return     | 700      |
| Real Sto Return     | 664      |
| Reward Loss         | -93.2    |
| Running Env Steps   | 736000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1472     |
----------------------------------
2025-02-01 18:03:53.119420 Eastern Standard Time
| Itration            | 1473     |
| Real Det Return     | 659      |
| Real Sto Return     | 638      |
| Reward Loss         | -93.4    |
| Running Env Steps   | 736500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 6.23     |
| Running Update Time | 1473     |
----------------------------------
2025-02-01 18:04:09.026100 Eastern Standard Time
| Itration            | 1474     |
| Real Det Return     | 621      |
| Real Sto Return     | 577      |
| Reward Loss         | -176     |
| Running Env Steps   | 737000   |
| Running Forward KL  | -4       |
| Running Reverse KL  | 6.93     |
| Running Update Time | 1474     |
----------------------------------
2025-02-01 18:04:24.935906 Eastern Standard Time
| Itration            | 1475     |
| Real Det Return     | 677      |
| Real Sto Return     | 663      |
| Reward Loss         | -89.1    |
| Running Env Steps   | 737500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 6.03     |
| Running Update Time | 1475     |
----------------------------------
2025-02-01 18:04:41.011298 Eastern Standard Time
| Itration            | 1476     |
| Real Det Return     | 657      |
| Real Sto Return     | 635      |
| Reward Loss         | -106     |
| Running Env Steps   | 738000   |
| Running Forward KL  | -3.71    |
| Running Reverse KL  | 7.61     |
| Running Update Time | 1476     |
----------------------------------
2025-02-01 18:04:57.056536 Eastern Standard Time
| Itration            | 1477     |
| Real Det Return     | 697      |
| Real Sto Return     | 661      |
| Reward Loss         | -88.9    |
| Running Env Steps   | 738500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 6.87     |
| Running Update Time | 1477     |
----------------------------------
2025-02-01 18:05:12.959848 Eastern Standard Time
| Itration            | 1478     |
| Real Det Return     | 686      |
| Real Sto Return     | 665      |
| Reward Loss         | -45.1    |
| Running Env Steps   | 739000   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 8.41     |
| Running Update Time | 1478     |
----------------------------------
2025-02-01 18:05:28.916709 Eastern Standard Time
| Itration            | 1479     |
| Real Det Return     | 679      |
| Real Sto Return     | 649      |
| Reward Loss         | -56.5    |
| Running Env Steps   | 739500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 7.92     |
| Running Update Time | 1479     |
----------------------------------
2025-02-01 18:05:44.806522 Eastern Standard Time
| Itration            | 1480     |
| Real Det Return     | 708      |
| Real Sto Return     | 692      |
| Reward Loss         | -71.4    |
| Running Env Steps   | 740000   |
| Running Forward KL  | -3.72    |
| Running Reverse KL  | 7.66     |
| Running Update Time | 1480     |
----------------------------------
2025-02-01 18:06:00.725522 Eastern Standard Time
| Itration            | 1481     |
| Real Det Return     | 690      |
| Real Sto Return     | 651      |
| Reward Loss         | -72.7    |
| Running Env Steps   | 740500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 6.4      |
| Running Update Time | 1481     |
----------------------------------
2025-02-01 18:06:16.563773 Eastern Standard Time
| Itration            | 1482     |
| Real Det Return     | 621      |
| Real Sto Return     | 605      |
| Reward Loss         | -151     |
| Running Env Steps   | 741000   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 6.31     |
| Running Update Time | 1482     |
----------------------------------
2025-02-01 18:06:32.461845 Eastern Standard Time
| Itration            | 1483     |
| Real Det Return     | 703      |
| Real Sto Return     | 672      |
| Reward Loss         | -64.2    |
| Running Env Steps   | 741500   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 6.81     |
| Running Update Time | 1483     |
----------------------------------
2025-02-01 18:06:48.338379 Eastern Standard Time
| Itration            | 1484     |
| Real Det Return     | 698      |
| Real Sto Return     | 677      |
| Reward Loss         | -56.4    |
| Running Env Steps   | 742000   |
| Running Forward KL  | -5.42    |
| Running Reverse KL  | 7.46     |
| Running Update Time | 1484     |
----------------------------------
2025-02-01 18:07:04.210466 Eastern Standard Time
| Itration            | 1485     |
| Real Det Return     | 676      |
| Real Sto Return     | 656      |
| Reward Loss         | -78.6    |
| Running Env Steps   | 742500   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 6.68     |
| Running Update Time | 1485     |
----------------------------------
2025-02-01 18:07:20.028501 Eastern Standard Time
| Itration            | 1486     |
| Real Det Return     | 691      |
| Real Sto Return     | 661      |
| Reward Loss         | -32.5    |
| Running Env Steps   | 743000   |
| Running Forward KL  | -4.08    |
| Running Reverse KL  | 8.37     |
| Running Update Time | 1486     |
----------------------------------
2025-02-01 18:07:35.965451 Eastern Standard Time
| Itration            | 1487     |
| Real Det Return     | 666      |
| Real Sto Return     | 644      |
| Reward Loss         | -111     |
| Running Env Steps   | 743500   |
| Running Forward KL  | -3.66    |
| Running Reverse KL  | 6.54     |
| Running Update Time | 1487     |
----------------------------------
2025-02-01 18:07:51.990870 Eastern Standard Time
| Itration            | 1488     |
| Real Det Return     | 699      |
| Real Sto Return     | 670      |
| Reward Loss         | -88.5    |
| Running Env Steps   | 744000   |
| Running Forward KL  | -3.86    |
| Running Reverse KL  | 7.9      |
| Running Update Time | 1488     |
----------------------------------
2025-02-01 18:08:07.994490 Eastern Standard Time
| Itration            | 1489     |
| Real Det Return     | 705      |
| Real Sto Return     | 679      |
| Reward Loss         | -67.1    |
| Running Env Steps   | 744500   |
| Running Forward KL  | -3.6     |
| Running Reverse KL  | 7.89     |
| Running Update Time | 1489     |
----------------------------------
2025-02-01 18:08:23.896984 Eastern Standard Time
| Itration            | 1490     |
| Real Det Return     | 676      |
| Real Sto Return     | 653      |
| Reward Loss         | -104     |
| Running Env Steps   | 745000   |
| Running Forward KL  | -3.6     |
| Running Reverse KL  | 7.23     |
| Running Update Time | 1490     |
----------------------------------
2025-02-01 18:08:41.723831 Eastern Standard Time
| Itration            | 1491     |
| Real Det Return     | 678      |
| Real Sto Return     | 644      |
| Reward Loss         | -98.2    |
| Running Env Steps   | 745500   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1491     |
----------------------------------
2025-02-01 18:08:57.989259 Eastern Standard Time
| Itration            | 1492     |
| Real Det Return     | 677      |
| Real Sto Return     | 642      |
| Reward Loss         | -81.8    |
| Running Env Steps   | 746000   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 6.65     |
| Running Update Time | 1492     |
----------------------------------
2025-02-01 18:09:14.158772 Eastern Standard Time
| Itration            | 1493     |
| Real Det Return     | 662      |
| Real Sto Return     | 643      |
| Reward Loss         | -64.3    |
| Running Env Steps   | 746500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 6.79     |
| Running Update Time | 1493     |
----------------------------------
2025-02-01 18:09:30.444814 Eastern Standard Time
| Itration            | 1494     |
| Real Det Return     | 694      |
| Real Sto Return     | 666      |
| Reward Loss         | -66.6    |
| Running Env Steps   | 747000   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 7.45     |
| Running Update Time | 1494     |
----------------------------------
2025-02-01 18:09:46.364178 Eastern Standard Time
| Itration            | 1495     |
| Real Det Return     | 676      |
| Real Sto Return     | 647      |
| Reward Loss         | -89.4    |
| Running Env Steps   | 747500   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 7.19     |
| Running Update Time | 1495     |
----------------------------------
2025-02-01 18:10:02.506634 Eastern Standard Time
| Itration            | 1496     |
| Real Det Return     | 683      |
| Real Sto Return     | 662      |
| Reward Loss         | -74.3    |
| Running Env Steps   | 748000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 6.94     |
| Running Update Time | 1496     |
----------------------------------
2025-02-01 18:10:18.402592 Eastern Standard Time
| Itration            | 1497     |
| Real Det Return     | 661      |
| Real Sto Return     | 647      |
| Reward Loss         | -133     |
| Running Env Steps   | 748500   |
| Running Forward KL  | -4.1     |
| Running Reverse KL  | 6.77     |
| Running Update Time | 1497     |
----------------------------------
2025-02-01 18:10:34.090420 Eastern Standard Time
| Itration            | 1498     |
| Real Det Return     | 688      |
| Real Sto Return     | 664      |
| Reward Loss         | -59.7    |
| Running Env Steps   | 749000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 7.85     |
| Running Update Time | 1498     |
----------------------------------
2025-02-01 18:10:49.829978 Eastern Standard Time
| Itration            | 1499     |
| Real Det Return     | 671      |
| Real Sto Return     | 656      |
| Reward Loss         | -81.8    |
| Running Env Steps   | 749500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 6.25     |
| Running Update Time | 1499     |
----------------------------------
2025-02-01 18:11:05.556880 Eastern Standard Time
| Itration            | 1500     |
| Real Det Return     | 690      |
| Real Sto Return     | 670      |
| Reward Loss         | -95.8    |
| Running Env Steps   | 750000   |
| Running Forward KL  | -3.62    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1500     |
----------------------------------
2025-02-01 18:11:21.278882 Eastern Standard Time
| Itration            | 1501     |
| Real Det Return     | 687      |
| Real Sto Return     | 656      |
| Reward Loss         | -129     |
| Running Env Steps   | 750500   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 7.36     |
| Running Update Time | 1501     |
----------------------------------
2025-02-01 18:11:37.006952 Eastern Standard Time
| Itration            | 1502     |
| Real Det Return     | 700      |
| Real Sto Return     | 664      |
| Reward Loss         | -69.3    |
| Running Env Steps   | 751000   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 7.61     |
| Running Update Time | 1502     |
----------------------------------
2025-02-01 18:11:52.713290 Eastern Standard Time
| Itration            | 1503     |
| Real Det Return     | 689      |
| Real Sto Return     | 668      |
| Reward Loss         | -82.2    |
| Running Env Steps   | 751500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 7.7      |
| Running Update Time | 1503     |
----------------------------------
2025-02-01 18:12:08.382927 Eastern Standard Time
| Itration            | 1504     |
| Real Det Return     | 692      |
| Real Sto Return     | 670      |
| Reward Loss         | -68.7    |
| Running Env Steps   | 752000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1504     |
----------------------------------
2025-02-01 18:12:24.009492 Eastern Standard Time
| Itration            | 1505     |
| Real Det Return     | 694      |
| Real Sto Return     | 672      |
| Reward Loss         | -77.7    |
| Running Env Steps   | 752500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 7.99     |
| Running Update Time | 1505     |
----------------------------------
2025-02-01 18:12:39.668510 Eastern Standard Time
| Itration            | 1506     |
| Real Det Return     | 692      |
| Real Sto Return     | 672      |
| Reward Loss         | -86.6    |
| Running Env Steps   | 753000   |
| Running Forward KL  | -3.9     |
| Running Reverse KL  | 7.31     |
| Running Update Time | 1506     |
----------------------------------
2025-02-01 18:12:55.412855 Eastern Standard Time
| Itration            | 1507     |
| Real Det Return     | 689      |
| Real Sto Return     | 653      |
| Reward Loss         | -59.9    |
| Running Env Steps   | 753500   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 6.84     |
| Running Update Time | 1507     |
----------------------------------
2025-02-01 18:13:11.113121 Eastern Standard Time
| Itration            | 1508     |
| Real Det Return     | 681      |
| Real Sto Return     | 659      |
| Reward Loss         | -41.7    |
| Running Env Steps   | 754000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 8.13     |
| Running Update Time | 1508     |
----------------------------------
2025-02-01 18:13:26.874532 Eastern Standard Time
| Itration            | 1509     |
| Real Det Return     | 690      |
| Real Sto Return     | 657      |
| Reward Loss         | -78.5    |
| Running Env Steps   | 754500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 7.44     |
| Running Update Time | 1509     |
----------------------------------
2025-02-01 18:13:42.617799 Eastern Standard Time
| Itration            | 1510     |
| Real Det Return     | 676      |
| Real Sto Return     | 652      |
| Reward Loss         | -117     |
| Running Env Steps   | 755000   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 6.61     |
| Running Update Time | 1510     |
----------------------------------
2025-02-01 18:13:58.417691 Eastern Standard Time
| Itration            | 1511     |
| Real Det Return     | 673      |
| Real Sto Return     | 650      |
| Reward Loss         | -80.5    |
| Running Env Steps   | 755500   |
| Running Forward KL  | -5.86    |
| Running Reverse KL  | 6.75     |
| Running Update Time | 1511     |
----------------------------------
2025-02-01 18:14:14.754550 Eastern Standard Time
| Itration            | 1512     |
| Real Det Return     | 690      |
| Real Sto Return     | 649      |
| Reward Loss         | -84.8    |
| Running Env Steps   | 756000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 7.61     |
| Running Update Time | 1512     |
----------------------------------
2025-02-01 18:14:30.549180 Eastern Standard Time
| Itration            | 1513     |
| Real Det Return     | 664      |
| Real Sto Return     | 642      |
| Reward Loss         | -83      |
| Running Env Steps   | 756500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1513     |
----------------------------------
2025-02-01 18:14:46.328081 Eastern Standard Time
| Itration            | 1514     |
| Real Det Return     | 675      |
| Real Sto Return     | 655      |
| Reward Loss         | -91.8    |
| Running Env Steps   | 757000   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 1514     |
----------------------------------
2025-02-01 18:15:02.074365 Eastern Standard Time
| Itration            | 1515     |
| Real Det Return     | 695      |
| Real Sto Return     | 662      |
| Reward Loss         | -75      |
| Running Env Steps   | 757500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 7.45     |
| Running Update Time | 1515     |
----------------------------------
2025-02-01 18:15:17.765553 Eastern Standard Time
| Itration            | 1516     |
| Real Det Return     | 700      |
| Real Sto Return     | 683      |
| Reward Loss         | -65.9    |
| Running Env Steps   | 758000   |
| Running Forward KL  | -3.31    |
| Running Reverse KL  | 8.13     |
| Running Update Time | 1516     |
----------------------------------
2025-02-01 18:15:33.511426 Eastern Standard Time
| Itration            | 1517     |
| Real Det Return     | 692      |
| Real Sto Return     | 660      |
| Reward Loss         | -106     |
| Running Env Steps   | 758500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 1517     |
----------------------------------
2025-02-01 18:15:49.225870 Eastern Standard Time
| Itration            | 1518     |
| Real Det Return     | 682      |
| Real Sto Return     | 655      |
| Reward Loss         | -113     |
| Running Env Steps   | 759000   |
| Running Forward KL  | -3.16    |
| Running Reverse KL  | 6.67     |
| Running Update Time | 1518     |
----------------------------------
2025-02-01 18:16:05.028609 Eastern Standard Time
| Itration            | 1519     |
| Real Det Return     | 681      |
| Real Sto Return     | 652      |
| Reward Loss         | -110     |
| Running Env Steps   | 759500   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 7.29     |
| Running Update Time | 1519     |
----------------------------------
2025-02-01 18:16:20.682402 Eastern Standard Time
| Itration            | 1520     |
| Real Det Return     | 659      |
| Real Sto Return     | 624      |
| Reward Loss         | -62      |
| Running Env Steps   | 760000   |
| Running Forward KL  | -4.32    |
| Running Reverse KL  | 7.47     |
| Running Update Time | 1520     |
----------------------------------
2025-02-01 18:16:36.428830 Eastern Standard Time
| Itration            | 1521     |
| Real Det Return     | 688      |
| Real Sto Return     | 651      |
| Reward Loss         | -67.1    |
| Running Env Steps   | 760500   |
| Running Forward KL  | -3.56    |
| Running Reverse KL  | 7.36     |
| Running Update Time | 1521     |
----------------------------------
2025-02-01 18:16:52.059140 Eastern Standard Time
| Itration            | 1522     |
| Real Det Return     | 705      |
| Real Sto Return     | 672      |
| Reward Loss         | -48.6    |
| Running Env Steps   | 761000   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 7.83     |
| Running Update Time | 1522     |
----------------------------------
2025-02-01 18:17:07.803222 Eastern Standard Time
| Itration            | 1523     |
| Real Det Return     | 695      |
| Real Sto Return     | 663      |
| Reward Loss         | -52.8    |
| Running Env Steps   | 761500   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 8.05     |
| Running Update Time | 1523     |
----------------------------------
2025-02-01 18:17:23.512145 Eastern Standard Time
| Itration            | 1524     |
| Real Det Return     | 687      |
| Real Sto Return     | 671      |
| Reward Loss         | -55.8    |
| Running Env Steps   | 762000   |
| Running Forward KL  | -3.57    |
| Running Reverse KL  | 8.79     |
| Running Update Time | 1524     |
----------------------------------
2025-02-01 18:17:39.196481 Eastern Standard Time
| Itration            | 1525     |
| Real Det Return     | 695      |
| Real Sto Return     | 671      |
| Reward Loss         | -108     |
| Running Env Steps   | 762500   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 6.56     |
| Running Update Time | 1525     |
----------------------------------
2025-02-01 18:17:54.850039 Eastern Standard Time
| Itration            | 1526     |
| Real Det Return     | 688      |
| Real Sto Return     | 656      |
| Reward Loss         | -61.8    |
| Running Env Steps   | 763000   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 6.83     |
| Running Update Time | 1526     |
----------------------------------
2025-02-01 18:18:10.576940 Eastern Standard Time
| Itration            | 1527     |
| Real Det Return     | 699      |
| Real Sto Return     | 679      |
| Reward Loss         | -49.2    |
| Running Env Steps   | 763500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 7.33     |
| Running Update Time | 1527     |
----------------------------------
2025-02-01 18:18:26.310871 Eastern Standard Time
| Itration            | 1528     |
| Real Det Return     | 618      |
| Real Sto Return     | 602      |
| Reward Loss         | -199     |
| Running Env Steps   | 764000   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 6.2      |
| Running Update Time | 1528     |
----------------------------------
2025-02-01 18:18:41.996610 Eastern Standard Time
| Itration            | 1529     |
| Real Det Return     | 673      |
| Real Sto Return     | 625      |
| Reward Loss         | -126     |
| Running Env Steps   | 764500   |
| Running Forward KL  | -3.78    |
| Running Reverse KL  | 7.97     |
| Running Update Time | 1529     |
----------------------------------
2025-02-01 18:18:57.744729 Eastern Standard Time
| Itration            | 1530     |
| Real Det Return     | 644      |
| Real Sto Return     | 621      |
| Reward Loss         | -172     |
| Running Env Steps   | 765000   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1530     |
----------------------------------
2025-02-01 18:19:13.520232 Eastern Standard Time
| Itration            | 1531     |
| Real Det Return     | 687      |
| Real Sto Return     | 654      |
| Reward Loss         | -85.1    |
| Running Env Steps   | 765500   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 8.2      |
| Running Update Time | 1531     |
----------------------------------
2025-02-01 18:19:29.235214 Eastern Standard Time
| Itration            | 1532     |
| Real Det Return     | 672      |
| Real Sto Return     | 656      |
| Reward Loss         | -92.3    |
| Running Env Steps   | 766000   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1532     |
----------------------------------
2025-02-01 18:19:44.974855 Eastern Standard Time
| Itration            | 1533     |
| Real Det Return     | 650      |
| Real Sto Return     | 624      |
| Reward Loss         | -123     |
| Running Env Steps   | 766500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 6.31     |
| Running Update Time | 1533     |
----------------------------------
2025-02-01 18:20:00.670969 Eastern Standard Time
| Itration            | 1534     |
| Real Det Return     | 682      |
| Real Sto Return     | 636      |
| Reward Loss         | -81.5    |
| Running Env Steps   | 767000   |
| Running Forward KL  | -3.83    |
| Running Reverse KL  | 8.15     |
| Running Update Time | 1534     |
----------------------------------
2025-02-01 18:20:16.356487 Eastern Standard Time
| Itration            | 1535     |
| Real Det Return     | 683      |
| Real Sto Return     | 655      |
| Reward Loss         | -51.6    |
| Running Env Steps   | 767500   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 8.3      |
| Running Update Time | 1535     |
----------------------------------
2025-02-01 18:20:32.151632 Eastern Standard Time
| Itration            | 1536     |
| Real Det Return     | 652      |
| Real Sto Return     | 627      |
| Reward Loss         | -93.3    |
| Running Env Steps   | 768000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 6.44     |
| Running Update Time | 1536     |
----------------------------------
2025-02-01 18:20:47.791891 Eastern Standard Time
| Itration            | 1537     |
| Real Det Return     | 683      |
| Real Sto Return     | 651      |
| Reward Loss         | -72      |
| Running Env Steps   | 768500   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 7.33     |
| Running Update Time | 1537     |
----------------------------------
2025-02-01 18:21:03.490382 Eastern Standard Time
| Itration            | 1538     |
| Real Det Return     | 696      |
| Real Sto Return     | 660      |
| Reward Loss         | -60.5    |
| Running Env Steps   | 769000   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 7.88     |
| Running Update Time | 1538     |
----------------------------------
2025-02-01 18:21:19.180407 Eastern Standard Time
| Itration            | 1539     |
| Real Det Return     | 648      |
| Real Sto Return     | 640      |
| Reward Loss         | -83.5    |
| Running Env Steps   | 769500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 7.21     |
| Running Update Time | 1539     |
----------------------------------
2025-02-01 18:21:34.865499 Eastern Standard Time
| Itration            | 1540     |
| Real Det Return     | 691      |
| Real Sto Return     | 670      |
| Reward Loss         | -36.7    |
| Running Env Steps   | 770000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 6.73     |
| Running Update Time | 1540     |
----------------------------------
2025-02-01 18:21:50.623139 Eastern Standard Time
| Itration            | 1541     |
| Real Det Return     | 668      |
| Real Sto Return     | 638      |
| Reward Loss         | -76.1    |
| Running Env Steps   | 770500   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 6.94     |
| Running Update Time | 1541     |
----------------------------------
2025-02-01 18:22:06.306843 Eastern Standard Time
| Itration            | 1542     |
| Real Det Return     | 657      |
| Real Sto Return     | 628      |
| Reward Loss         | -137     |
| Running Env Steps   | 771000   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 5.86     |
| Running Update Time | 1542     |
----------------------------------
2025-02-01 18:22:22.030537 Eastern Standard Time
| Itration            | 1543     |
| Real Det Return     | 701      |
| Real Sto Return     | 678      |
| Reward Loss         | -43.9    |
| Running Env Steps   | 771500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 6.66     |
| Running Update Time | 1543     |
----------------------------------
2025-02-01 18:22:37.676050 Eastern Standard Time
| Itration            | 1544     |
| Real Det Return     | 550      |
| Real Sto Return     | 558      |
| Reward Loss         | -305     |
| Running Env Steps   | 772000   |
| Running Forward KL  | -2.64    |
| Running Reverse KL  | 6.58     |
| Running Update Time | 1544     |
----------------------------------
2025-02-01 18:22:53.530494 Eastern Standard Time
| Itration            | 1545     |
| Real Det Return     | 709      |
| Real Sto Return     | 679      |
| Reward Loss         | -37.4    |
| Running Env Steps   | 772500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 7.74     |
| Running Update Time | 1545     |
----------------------------------
2025-02-01 18:23:09.243618 Eastern Standard Time
| Itration            | 1546     |
| Real Det Return     | 695      |
| Real Sto Return     | 663      |
| Reward Loss         | -77.3    |
| Running Env Steps   | 773000   |
| Running Forward KL  | -3.31    |
| Running Reverse KL  | 7.14     |
| Running Update Time | 1546     |
----------------------------------
2025-02-01 18:23:24.909939 Eastern Standard Time
| Itration            | 1547     |
| Real Det Return     | 690      |
| Real Sto Return     | 664      |
| Reward Loss         | -66.1    |
| Running Env Steps   | 773500   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1547     |
----------------------------------
2025-02-01 18:23:40.620883 Eastern Standard Time
| Itration            | 1548     |
| Real Det Return     | 702      |
| Real Sto Return     | 680      |
| Reward Loss         | -95.4    |
| Running Env Steps   | 774000   |
| Running Forward KL  | -3.69    |
| Running Reverse KL  | 7.71     |
| Running Update Time | 1548     |
----------------------------------
2025-02-01 18:23:56.422718 Eastern Standard Time
| Itration            | 1549     |
| Real Det Return     | 657      |
| Real Sto Return     | 639      |
| Reward Loss         | -105     |
| Running Env Steps   | 774500   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1549     |
----------------------------------
2025-02-01 18:24:12.148365 Eastern Standard Time
| Itration            | 1550     |
| Real Det Return     | 696      |
| Real Sto Return     | 666      |
| Reward Loss         | -55.2    |
| Running Env Steps   | 775000   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 7.1      |
| Running Update Time | 1550     |
----------------------------------
2025-02-01 18:24:27.862076 Eastern Standard Time
| Itration            | 1551     |
| Real Det Return     | 696      |
| Real Sto Return     | 677      |
| Reward Loss         | -60.8    |
| Running Env Steps   | 775500   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 7.96     |
| Running Update Time | 1551     |
----------------------------------
2025-02-01 18:24:43.537183 Eastern Standard Time
| Itration            | 1552     |
| Real Det Return     | 684      |
| Real Sto Return     | 660      |
| Reward Loss         | -57.7    |
| Running Env Steps   | 776000   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 8.16     |
| Running Update Time | 1552     |
----------------------------------
2025-02-01 18:24:59.320076 Eastern Standard Time
| Itration            | 1553     |
| Real Det Return     | 704      |
| Real Sto Return     | 675      |
| Reward Loss         | -79      |
| Running Env Steps   | 776500   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 7.63     |
| Running Update Time | 1553     |
----------------------------------
2025-02-01 18:25:15.015513 Eastern Standard Time
| Itration            | 1554     |
| Real Det Return     | 656      |
| Real Sto Return     | 630      |
| Reward Loss         | -96.9    |
| Running Env Steps   | 777000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 6.65     |
| Running Update Time | 1554     |
----------------------------------
2025-02-01 18:25:30.699247 Eastern Standard Time
| Itration            | 1555     |
| Real Det Return     | 700      |
| Real Sto Return     | 673      |
| Reward Loss         | -61.4    |
| Running Env Steps   | 777500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 7.08     |
| Running Update Time | 1555     |
----------------------------------
2025-02-01 18:25:46.374013 Eastern Standard Time
| Itration            | 1556     |
| Real Det Return     | 681      |
| Real Sto Return     | 651      |
| Reward Loss         | -93.4    |
| Running Env Steps   | 778000   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 7.26     |
| Running Update Time | 1556     |
----------------------------------
2025-02-01 18:26:02.096878 Eastern Standard Time
| Itration            | 1557     |
| Real Det Return     | 693      |
| Real Sto Return     | 668      |
| Reward Loss         | -67      |
| Running Env Steps   | 778500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 6.91     |
| Running Update Time | 1557     |
----------------------------------
2025-02-01 18:26:17.725338 Eastern Standard Time
| Itration            | 1558     |
| Real Det Return     | 671      |
| Real Sto Return     | 595      |
| Reward Loss         | -127     |
| Running Env Steps   | 779000   |
| Running Forward KL  | -2.93    |
| Running Reverse KL  | 6.89     |
| Running Update Time | 1558     |
----------------------------------
2025-02-01 18:26:33.448113 Eastern Standard Time
| Itration            | 1559     |
| Real Det Return     | 592      |
| Real Sto Return     | 572      |
| Reward Loss         | -239     |
| Running Env Steps   | 779500   |
| Running Forward KL  | -1.98    |
| Running Reverse KL  | 6.41     |
| Running Update Time | 1559     |
----------------------------------
2025-02-01 18:26:49.118950 Eastern Standard Time
| Itration            | 1560     |
| Real Det Return     | 668      |
| Real Sto Return     | 647      |
| Reward Loss         | -118     |
| Running Env Steps   | 780000   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 7.53     |
| Running Update Time | 1560     |
----------------------------------
2025-02-01 18:27:04.782670 Eastern Standard Time
| Itration            | 1561     |
| Real Det Return     | 687      |
| Real Sto Return     | 659      |
| Reward Loss         | -99.3    |
| Running Env Steps   | 780500   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 1561     |
----------------------------------
2025-02-01 18:27:20.447518 Eastern Standard Time
| Itration            | 1562     |
| Real Det Return     | 704      |
| Real Sto Return     | 678      |
| Reward Loss         | -30.7    |
| Running Env Steps   | 781000   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 7.5      |
| Running Update Time | 1562     |
----------------------------------
2025-02-01 18:27:36.122619 Eastern Standard Time
| Itration            | 1563     |
| Real Det Return     | 694      |
| Real Sto Return     | 662      |
| Reward Loss         | -69      |
| Running Env Steps   | 781500   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 6.84     |
| Running Update Time | 1563     |
----------------------------------
2025-02-01 18:27:51.802428 Eastern Standard Time
| Itration            | 1564     |
| Real Det Return     | 699      |
| Real Sto Return     | 670      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 782000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1564     |
----------------------------------
2025-02-01 18:28:07.455019 Eastern Standard Time
| Itration            | 1565     |
| Real Det Return     | 684      |
| Real Sto Return     | 670      |
| Reward Loss         | -62.6    |
| Running Env Steps   | 782500   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 7.41     |
| Running Update Time | 1565     |
----------------------------------
2025-02-01 18:28:23.148572 Eastern Standard Time
| Itration            | 1566     |
| Real Det Return     | 701      |
| Real Sto Return     | 657      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 783000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 8.62     |
| Running Update Time | 1566     |
----------------------------------
2025-02-01 18:28:38.885447 Eastern Standard Time
| Itration            | 1567     |
| Real Det Return     | 677      |
| Real Sto Return     | 657      |
| Reward Loss         | -91.8    |
| Running Env Steps   | 783500   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 7.85     |
| Running Update Time | 1567     |
----------------------------------
2025-02-01 18:28:54.560055 Eastern Standard Time
| Itration            | 1568     |
| Real Det Return     | 680      |
| Real Sto Return     | 661      |
| Reward Loss         | -40.6    |
| Running Env Steps   | 784000   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 7.56     |
| Running Update Time | 1568     |
----------------------------------
2025-02-01 18:29:10.275043 Eastern Standard Time
| Itration            | 1569     |
| Real Det Return     | 683      |
| Real Sto Return     | 670      |
| Reward Loss         | -93.4    |
| Running Env Steps   | 784500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1569     |
----------------------------------
2025-02-01 18:29:26.142409 Eastern Standard Time
| Itration            | 1570     |
| Real Det Return     | 692      |
| Real Sto Return     | 671      |
| Reward Loss         | -74      |
| Running Env Steps   | 785000   |
| Running Forward KL  | -4.32    |
| Running Reverse KL  | 7.95     |
| Running Update Time | 1570     |
----------------------------------
2025-02-01 18:29:41.872594 Eastern Standard Time
| Itration            | 1571     |
| Real Det Return     | 671      |
| Real Sto Return     | 649      |
| Reward Loss         | -110     |
| Running Env Steps   | 785500   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 7.59     |
| Running Update Time | 1571     |
----------------------------------
2025-02-01 18:29:57.615805 Eastern Standard Time
| Itration            | 1572     |
| Real Det Return     | 690      |
| Real Sto Return     | 662      |
| Reward Loss         | -95.2    |
| Running Env Steps   | 786000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 1572     |
----------------------------------
2025-02-01 18:30:13.273957 Eastern Standard Time
| Itration            | 1573     |
| Real Det Return     | 700      |
| Real Sto Return     | 683      |
| Reward Loss         | -72.6    |
| Running Env Steps   | 786500   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 7.9      |
| Running Update Time | 1573     |
----------------------------------
2025-02-01 18:30:28.900714 Eastern Standard Time
| Itration            | 1574     |
| Real Det Return     | 675      |
| Real Sto Return     | 658      |
| Reward Loss         | -73      |
| Running Env Steps   | 787000   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 7.17     |
| Running Update Time | 1574     |
----------------------------------
2025-02-01 18:30:44.602788 Eastern Standard Time
| Itration            | 1575     |
| Real Det Return     | 670      |
| Real Sto Return     | 654      |
| Reward Loss         | -87.1    |
| Running Env Steps   | 787500   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 8.03     |
| Running Update Time | 1575     |
----------------------------------
2025-02-01 18:31:00.258854 Eastern Standard Time
| Itration            | 1576     |
| Real Det Return     | 680      |
| Real Sto Return     | 662      |
| Reward Loss         | -59.9    |
| Running Env Steps   | 788000   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 8.46     |
| Running Update Time | 1576     |
----------------------------------
2025-02-01 18:31:15.958798 Eastern Standard Time
| Itration            | 1577     |
| Real Det Return     | 618      |
| Real Sto Return     | 599      |
| Reward Loss         | -146     |
| Running Env Steps   | 788500   |
| Running Forward KL  | -3.88    |
| Running Reverse KL  | 8.67     |
| Running Update Time | 1577     |
----------------------------------
2025-02-01 18:31:31.698905 Eastern Standard Time
| Itration            | 1578     |
| Real Det Return     | 701      |
| Real Sto Return     | 674      |
| Reward Loss         | -52.9    |
| Running Env Steps   | 789000   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 8.49     |
| Running Update Time | 1578     |
----------------------------------
2025-02-01 18:31:47.547729 Eastern Standard Time
| Itration            | 1579     |
| Real Det Return     | 708      |
| Real Sto Return     | 663      |
| Reward Loss         | -88.1    |
| Running Env Steps   | 789500   |
| Running Forward KL  | -3.31    |
| Running Reverse KL  | 7.47     |
| Running Update Time | 1579     |
----------------------------------
2025-02-01 18:32:03.285999 Eastern Standard Time
| Itration            | 1580     |
| Real Det Return     | 684      |
| Real Sto Return     | 662      |
| Reward Loss         | -106     |
| Running Env Steps   | 790000   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 6.43     |
| Running Update Time | 1580     |
----------------------------------
2025-02-01 18:32:18.970205 Eastern Standard Time
| Itration            | 1581     |
| Real Det Return     | 714      |
| Real Sto Return     | 685      |
| Reward Loss         | -62.5    |
| Running Env Steps   | 790500   |
| Running Forward KL  | -2.96    |
| Running Reverse KL  | 7.13     |
| Running Update Time | 1581     |
----------------------------------
2025-02-01 18:32:34.601522 Eastern Standard Time
| Itration            | 1582     |
| Real Det Return     | 691      |
| Real Sto Return     | 671      |
| Reward Loss         | -119     |
| Running Env Steps   | 791000   |
| Running Forward KL  | -3.6     |
| Running Reverse KL  | 7.92     |
| Running Update Time | 1582     |
----------------------------------
2025-02-01 18:32:50.262925 Eastern Standard Time
| Itration            | 1583     |
| Real Det Return     | 687      |
| Real Sto Return     | 653      |
| Reward Loss         | -62.9    |
| Running Env Steps   | 791500   |
| Running Forward KL  | -5.79    |
| Running Reverse KL  | 7.26     |
| Running Update Time | 1583     |
----------------------------------
2025-02-01 18:33:05.932027 Eastern Standard Time
| Itration            | 1584     |
| Real Det Return     | 695      |
| Real Sto Return     | 658      |
| Reward Loss         | -95.6    |
| Running Env Steps   | 792000   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 7.71     |
| Running Update Time | 1584     |
----------------------------------
2025-02-01 18:33:21.684253 Eastern Standard Time
| Itration            | 1585     |
| Real Det Return     | 698      |
| Real Sto Return     | 684      |
| Reward Loss         | -91.8    |
| Running Env Steps   | 792500   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1585     |
----------------------------------
2025-02-01 18:33:37.358827 Eastern Standard Time
| Itration            | 1586     |
| Real Det Return     | 685      |
| Real Sto Return     | 674      |
| Reward Loss         | -79      |
| Running Env Steps   | 793000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 1586     |
----------------------------------
2025-02-01 18:33:52.999130 Eastern Standard Time
| Itration            | 1587     |
| Real Det Return     | 666      |
| Real Sto Return     | 646      |
| Reward Loss         | -87.1    |
| Running Env Steps   | 793500   |
| Running Forward KL  | -3.61    |
| Running Reverse KL  | 7.67     |
| Running Update Time | 1587     |
----------------------------------
2025-02-01 18:34:08.697068 Eastern Standard Time
| Itration            | 1588     |
| Real Det Return     | 678      |
| Real Sto Return     | 665      |
| Reward Loss         | -51.6    |
| Running Env Steps   | 794000   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 7.52     |
| Running Update Time | 1588     |
----------------------------------
2025-02-01 18:34:24.371935 Eastern Standard Time
| Itration            | 1589     |
| Real Det Return     | 682      |
| Real Sto Return     | 661      |
| Reward Loss         | -68.6    |
| Running Env Steps   | 794500   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 8        |
| Running Update Time | 1589     |
----------------------------------
2025-02-01 18:34:40.058677 Eastern Standard Time
| Itration            | 1590     |
| Real Det Return     | 693      |
| Real Sto Return     | 661      |
| Reward Loss         | -57      |
| Running Env Steps   | 795000   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 7.39     |
| Running Update Time | 1590     |
----------------------------------
2025-02-01 18:34:55.787081 Eastern Standard Time
| Itration            | 1591     |
| Real Det Return     | 719      |
| Real Sto Return     | 682      |
| Reward Loss         | -69.5    |
| Running Env Steps   | 795500   |
| Running Forward KL  | -3.83    |
| Running Reverse KL  | 7.97     |
| Running Update Time | 1591     |
----------------------------------
2025-02-01 18:35:11.468900 Eastern Standard Time
| Itration            | 1592     |
| Real Det Return     | 680      |
| Real Sto Return     | 649      |
| Reward Loss         | -63.5    |
| Running Env Steps   | 796000   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 7.77     |
| Running Update Time | 1592     |
----------------------------------
2025-02-01 18:35:27.216528 Eastern Standard Time
| Itration            | 1593     |
| Real Det Return     | 681      |
| Real Sto Return     | 662      |
| Reward Loss         | -81.8    |
| Running Env Steps   | 796500   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 7.13     |
| Running Update Time | 1593     |
----------------------------------
2025-02-01 18:35:42.963528 Eastern Standard Time
| Itration            | 1594     |
| Real Det Return     | 693      |
| Real Sto Return     | 673      |
| Reward Loss         | -34.8    |
| Running Env Steps   | 797000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 7.4      |
| Running Update Time | 1594     |
----------------------------------
2025-02-01 18:35:58.710311 Eastern Standard Time
| Itration            | 1595     |
| Real Det Return     | 667      |
| Real Sto Return     | 637      |
| Reward Loss         | -132     |
| Running Env Steps   | 797500   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 6.2      |
| Running Update Time | 1595     |
----------------------------------
2025-02-01 18:36:14.442127 Eastern Standard Time
| Itration            | 1596     |
| Real Det Return     | 690      |
| Real Sto Return     | 677      |
| Reward Loss         | -64      |
| Running Env Steps   | 798000   |
| Running Forward KL  | -5.99    |
| Running Reverse KL  | 6.55     |
| Running Update Time | 1596     |
----------------------------------
2025-02-01 18:36:30.092095 Eastern Standard Time
| Itration            | 1597     |
| Real Det Return     | 699      |
| Real Sto Return     | 667      |
| Reward Loss         | -103     |
| Running Env Steps   | 798500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 6.45     |
| Running Update Time | 1597     |
----------------------------------
2025-02-01 18:36:45.746193 Eastern Standard Time
| Itration            | 1598     |
| Real Det Return     | 688      |
| Real Sto Return     | 670      |
| Reward Loss         | -91.2    |
| Running Env Steps   | 799000   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 1598     |
----------------------------------
2025-02-01 18:37:01.401239 Eastern Standard Time
| Itration            | 1599     |
| Real Det Return     | 681      |
| Real Sto Return     | 662      |
| Reward Loss         | -74.3    |
| Running Env Steps   | 799500   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 7.69     |
| Running Update Time | 1599     |
----------------------------------
2025-02-01 18:37:17.048198 Eastern Standard Time
| Itration            | 1600     |
| Real Det Return     | 690      |
| Real Sto Return     | 652      |
| Reward Loss         | -67.9    |
| Running Env Steps   | 800000   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 7.47     |
| Running Update Time | 1600     |
----------------------------------
2025-02-01 18:37:32.719725 Eastern Standard Time
| Itration            | 1601     |
| Real Det Return     | 683      |
| Real Sto Return     | 660      |
| Reward Loss         | -67.6    |
| Running Env Steps   | 800500   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 7        |
| Running Update Time | 1601     |
----------------------------------
2025-02-01 18:37:48.351925 Eastern Standard Time
| Itration            | 1602     |
| Real Det Return     | 687      |
| Real Sto Return     | 659      |
| Reward Loss         | -72.7    |
| Running Env Steps   | 801000   |
| Running Forward KL  | -5.42    |
| Running Reverse KL  | 7.11     |
| Running Update Time | 1602     |
----------------------------------
2025-02-01 18:38:04.120051 Eastern Standard Time
| Itration            | 1603     |
| Real Det Return     | 697      |
| Real Sto Return     | 656      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 801500   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 7.81     |
| Running Update Time | 1603     |
----------------------------------
2025-02-01 18:38:19.821685 Eastern Standard Time
| Itration            | 1604     |
| Real Det Return     | 678      |
| Real Sto Return     | 650      |
| Reward Loss         | -79.1    |
| Running Env Steps   | 802000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 8.07     |
| Running Update Time | 1604     |
----------------------------------
2025-02-01 18:38:35.524610 Eastern Standard Time
| Itration            | 1605     |
| Real Det Return     | 681      |
| Real Sto Return     | 659      |
| Reward Loss         | -84.6    |
| Running Env Steps   | 802500   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 6.61     |
| Running Update Time | 1605     |
----------------------------------
2025-02-01 18:38:51.248086 Eastern Standard Time
| Itration            | 1606     |
| Real Det Return     | 674      |
| Real Sto Return     | 648      |
| Reward Loss         | -112     |
| Running Env Steps   | 803000   |
| Running Forward KL  | -3.47    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1606     |
----------------------------------
2025-02-01 18:39:06.929552 Eastern Standard Time
| Itration            | 1607     |
| Real Det Return     | 694      |
| Real Sto Return     | 668      |
| Reward Loss         | -114     |
| Running Env Steps   | 803500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 6.86     |
| Running Update Time | 1607     |
----------------------------------
2025-02-01 18:39:22.630430 Eastern Standard Time
| Itration            | 1608     |
| Real Det Return     | 653      |
| Real Sto Return     | 626      |
| Reward Loss         | -102     |
| Running Env Steps   | 804000   |
| Running Forward KL  | -3.77    |
| Running Reverse KL  | 7.65     |
| Running Update Time | 1608     |
----------------------------------
2025-02-01 18:39:38.305204 Eastern Standard Time
| Itration            | 1609     |
| Real Det Return     | 669      |
| Real Sto Return     | 641      |
| Reward Loss         | -85.7    |
| Running Env Steps   | 804500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1609     |
----------------------------------
2025-02-01 18:39:53.985727 Eastern Standard Time
| Itration            | 1610     |
| Real Det Return     | 694      |
| Real Sto Return     | 651      |
| Reward Loss         | -75.4    |
| Running Env Steps   | 805000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 6.73     |
| Running Update Time | 1610     |
----------------------------------
2025-02-01 18:40:09.678389 Eastern Standard Time
| Itration            | 1611     |
| Real Det Return     | 695      |
| Real Sto Return     | 654      |
| Reward Loss         | -40.2    |
| Running Env Steps   | 805500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 7.67     |
| Running Update Time | 1611     |
----------------------------------
2025-02-01 18:40:25.371426 Eastern Standard Time
| Itration            | 1612     |
| Real Det Return     | 667      |
| Real Sto Return     | 657      |
| Reward Loss         | -108     |
| Running Env Steps   | 806000   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1612     |
----------------------------------
2025-02-01 18:40:41.056038 Eastern Standard Time
| Itration            | 1613     |
| Real Det Return     | 718      |
| Real Sto Return     | 672      |
| Reward Loss         | -64.7    |
| Running Env Steps   | 806500   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 7.31     |
| Running Update Time | 1613     |
----------------------------------
2025-02-01 18:40:56.797809 Eastern Standard Time
| Itration            | 1614     |
| Real Det Return     | 700      |
| Real Sto Return     | 671      |
| Reward Loss         | -71.6    |
| Running Env Steps   | 807000   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 7.9      |
| Running Update Time | 1614     |
----------------------------------
2025-02-01 18:41:12.498281 Eastern Standard Time
| Itration            | 1615     |
| Real Det Return     | 687      |
| Real Sto Return     | 682      |
| Reward Loss         | -66.1    |
| Running Env Steps   | 807500   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 7.01     |
| Running Update Time | 1615     |
----------------------------------
2025-02-01 18:41:28.214059 Eastern Standard Time
| Itration            | 1616     |
| Real Det Return     | 674      |
| Real Sto Return     | 648      |
| Reward Loss         | -92.9    |
| Running Env Steps   | 808000   |
| Running Forward KL  | -5.58    |
| Running Reverse KL  | 6.93     |
| Running Update Time | 1616     |
----------------------------------
2025-02-01 18:41:43.852254 Eastern Standard Time
| Itration            | 1617     |
| Real Det Return     | 676      |
| Real Sto Return     | 655      |
| Reward Loss         | -78      |
| Running Env Steps   | 808500   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 1617     |
----------------------------------
2025-02-01 18:41:59.533051 Eastern Standard Time
| Itration            | 1618     |
| Real Det Return     | 704      |
| Real Sto Return     | 664      |
| Reward Loss         | -71      |
| Running Env Steps   | 809000   |
| Running Forward KL  | -5.4     |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1618     |
----------------------------------
2025-02-01 18:42:15.228730 Eastern Standard Time
| Itration            | 1619     |
| Real Det Return     | 691      |
| Real Sto Return     | 661      |
| Reward Loss         | -55.8    |
| Running Env Steps   | 809500   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1619     |
----------------------------------
2025-02-01 18:42:30.920000 Eastern Standard Time
| Itration            | 1620     |
| Real Det Return     | 703      |
| Real Sto Return     | 671      |
| Reward Loss         | -55.3    |
| Running Env Steps   | 810000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 7.32     |
| Running Update Time | 1620     |
----------------------------------
2025-02-01 18:42:46.749476 Eastern Standard Time
| Itration            | 1621     |
| Real Det Return     | 687      |
| Real Sto Return     | 670      |
| Reward Loss         | -60.5    |
| Running Env Steps   | 810500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 7.43     |
| Running Update Time | 1621     |
----------------------------------
2025-02-01 18:43:02.447684 Eastern Standard Time
| Itration            | 1622     |
| Real Det Return     | 674      |
| Real Sto Return     | 651      |
| Reward Loss         | -83.1    |
| Running Env Steps   | 811000   |
| Running Forward KL  | -5.46    |
| Running Reverse KL  | 7.32     |
| Running Update Time | 1622     |
----------------------------------
2025-02-01 18:43:18.130972 Eastern Standard Time
| Itration            | 1623     |
| Real Det Return     | 704      |
| Real Sto Return     | 674      |
| Reward Loss         | -87.4    |
| Running Env Steps   | 811500   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 7.33     |
| Running Update Time | 1623     |
----------------------------------
2025-02-01 18:43:33.856219 Eastern Standard Time
| Itration            | 1624     |
| Real Det Return     | 681      |
| Real Sto Return     | 644      |
| Reward Loss         | -68.9    |
| Running Env Steps   | 812000   |
| Running Forward KL  | -4.12    |
| Running Reverse KL  | 7.84     |
| Running Update Time | 1624     |
----------------------------------
2025-02-01 18:43:49.572519 Eastern Standard Time
| Itration            | 1625     |
| Real Det Return     | 699      |
| Real Sto Return     | 658      |
| Reward Loss         | -52.4    |
| Running Env Steps   | 812500   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 8.51     |
| Running Update Time | 1625     |
----------------------------------
2025-02-01 18:44:05.324690 Eastern Standard Time
| Itration            | 1626     |
| Real Det Return     | 703      |
| Real Sto Return     | 688      |
| Reward Loss         | -49.7    |
| Running Env Steps   | 813000   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 7.39     |
| Running Update Time | 1626     |
----------------------------------
2025-02-01 18:44:21.273779 Eastern Standard Time
| Itration            | 1627     |
| Real Det Return     | 672      |
| Real Sto Return     | 653      |
| Reward Loss         | -51.1    |
| Running Env Steps   | 813500   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 7.95     |
| Running Update Time | 1627     |
----------------------------------
2025-02-01 18:44:37.005956 Eastern Standard Time
| Itration            | 1628     |
| Real Det Return     | 681      |
| Real Sto Return     | 658      |
| Reward Loss         | -150     |
| Running Env Steps   | 814000   |
| Running Forward KL  | -4.17    |
| Running Reverse KL  | 5.96     |
| Running Update Time | 1628     |
----------------------------------
2025-02-01 18:44:52.725793 Eastern Standard Time
| Itration            | 1629     |
| Real Det Return     | 695      |
| Real Sto Return     | 678      |
| Reward Loss         | -37.2    |
| Running Env Steps   | 814500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1629     |
----------------------------------
2025-02-01 18:45:08.467494 Eastern Standard Time
| Itration            | 1630     |
| Real Det Return     | 686      |
| Real Sto Return     | 669      |
| Reward Loss         | -77.8    |
| Running Env Steps   | 815000   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 7.04     |
| Running Update Time | 1630     |
----------------------------------
2025-02-01 18:45:24.227825 Eastern Standard Time
| Itration            | 1631     |
| Real Det Return     | 694      |
| Real Sto Return     | 667      |
| Reward Loss         | -65.6    |
| Running Env Steps   | 815500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 6.8      |
| Running Update Time | 1631     |
----------------------------------
2025-02-01 18:45:39.997786 Eastern Standard Time
| Itration            | 1632     |
| Real Det Return     | 651      |
| Real Sto Return     | 594      |
| Reward Loss         | -151     |
| Running Env Steps   | 816000   |
| Running Forward KL  | -3.51    |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1632     |
----------------------------------
2025-02-01 18:45:55.669411 Eastern Standard Time
| Itration            | 1633     |
| Real Det Return     | 694      |
| Real Sto Return     | 673      |
| Reward Loss         | -52.8    |
| Running Env Steps   | 816500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 7.19     |
| Running Update Time | 1633     |
----------------------------------
2025-02-01 18:46:11.333800 Eastern Standard Time
| Itration            | 1634     |
| Real Det Return     | 708      |
| Real Sto Return     | 684      |
| Reward Loss         | -74      |
| Running Env Steps   | 817000   |
| Running Forward KL  | -3.93    |
| Running Reverse KL  | 7.97     |
| Running Update Time | 1634     |
----------------------------------
2025-02-01 18:46:27.011094 Eastern Standard Time
| Itration            | 1635     |
| Real Det Return     | 707      |
| Real Sto Return     | 674      |
| Reward Loss         | -76      |
| Running Env Steps   | 817500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 8.02     |
| Running Update Time | 1635     |
----------------------------------
2025-02-01 18:46:42.646609 Eastern Standard Time
| Itration            | 1636     |
| Real Det Return     | 682      |
| Real Sto Return     | 665      |
| Reward Loss         | -85.6    |
| Running Env Steps   | 818000   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 5.68     |
| Running Update Time | 1636     |
----------------------------------
2025-02-01 18:46:58.338467 Eastern Standard Time
| Itration            | 1637     |
| Real Det Return     | 614      |
| Real Sto Return     | 590      |
| Reward Loss         | -194     |
| Running Env Steps   | 818500   |
| Running Forward KL  | -2.85    |
| Running Reverse KL  | 6.16     |
| Running Update Time | 1637     |
----------------------------------
2025-02-01 18:47:14.100448 Eastern Standard Time
| Itration            | 1638     |
| Real Det Return     | 714      |
| Real Sto Return     | 683      |
| Reward Loss         | -31.6    |
| Running Env Steps   | 819000   |
| Running Forward KL  | -4.12    |
| Running Reverse KL  | 7.9      |
| Running Update Time | 1638     |
----------------------------------
2025-02-01 18:47:29.742978 Eastern Standard Time
| Itration            | 1639     |
| Real Det Return     | 648      |
| Real Sto Return     | 610      |
| Reward Loss         | -185     |
| Running Env Steps   | 819500   |
| Running Forward KL  | -3.27    |
| Running Reverse KL  | 6.18     |
| Running Update Time | 1639     |
----------------------------------
2025-02-01 18:47:45.436289 Eastern Standard Time
| Itration            | 1640     |
| Real Det Return     | 690      |
| Real Sto Return     | 659      |
| Reward Loss         | -76.7    |
| Running Env Steps   | 820000   |
| Running Forward KL  | -4.08    |
| Running Reverse KL  | 8.29     |
| Running Update Time | 1640     |
----------------------------------
2025-02-01 18:48:01.189396 Eastern Standard Time
| Itration            | 1641     |
| Real Det Return     | 683      |
| Real Sto Return     | 669      |
| Reward Loss         | -119     |
| Running Env Steps   | 820500   |
| Running Forward KL  | -3.44    |
| Running Reverse KL  | 8        |
| Running Update Time | 1641     |
----------------------------------
2025-02-01 18:48:16.904046 Eastern Standard Time
| Itration            | 1642     |
| Real Det Return     | 683      |
| Real Sto Return     | 669      |
| Reward Loss         | -74.9    |
| Running Env Steps   | 821000   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1642     |
----------------------------------
2025-02-01 18:48:32.645621 Eastern Standard Time
| Itration            | 1643     |
| Real Det Return     | 720      |
| Real Sto Return     | 684      |
| Reward Loss         | -63      |
| Running Env Steps   | 821500   |
| Running Forward KL  | -4.24    |
| Running Reverse KL  | 8.85     |
| Running Update Time | 1643     |
----------------------------------
2025-02-01 18:48:48.406095 Eastern Standard Time
| Itration            | 1644     |
| Real Det Return     | 692      |
| Real Sto Return     | 673      |
| Reward Loss         | -63.3    |
| Running Env Steps   | 822000   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 7.5      |
| Running Update Time | 1644     |
----------------------------------
2025-02-01 18:49:04.163785 Eastern Standard Time
| Itration            | 1645     |
| Real Det Return     | 702      |
| Real Sto Return     | 675      |
| Reward Loss         | -79.4    |
| Running Env Steps   | 822500   |
| Running Forward KL  | -5.66    |
| Running Reverse KL  | 6.21     |
| Running Update Time | 1645     |
----------------------------------
2025-02-01 18:49:19.826600 Eastern Standard Time
| Itration            | 1646     |
| Real Det Return     | 680      |
| Real Sto Return     | 633      |
| Reward Loss         | -60.1    |
| Running Env Steps   | 823000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 1646     |
----------------------------------
2025-02-01 18:49:35.527891 Eastern Standard Time
| Itration            | 1647     |
| Real Det Return     | 674      |
| Real Sto Return     | 651      |
| Reward Loss         | -92      |
| Running Env Steps   | 823500   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 7.11     |
| Running Update Time | 1647     |
----------------------------------
2025-02-01 18:49:51.184711 Eastern Standard Time
| Itration            | 1648     |
| Real Det Return     | 689      |
| Real Sto Return     | 660      |
| Reward Loss         | -76.8    |
| Running Env Steps   | 824000   |
| Running Forward KL  | -3.94    |
| Running Reverse KL  | 7.55     |
| Running Update Time | 1648     |
----------------------------------
2025-02-01 18:50:06.871527 Eastern Standard Time
| Itration            | 1649     |
| Real Det Return     | 653      |
| Real Sto Return     | 634      |
| Reward Loss         | -94      |
| Running Env Steps   | 824500   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 6.35     |
| Running Update Time | 1649     |
----------------------------------
2025-02-01 18:50:22.572370 Eastern Standard Time
| Itration            | 1650     |
| Real Det Return     | 696      |
| Real Sto Return     | 665      |
| Reward Loss         | -78.5    |
| Running Env Steps   | 825000   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 7.73     |
| Running Update Time | 1650     |
----------------------------------
2025-02-01 18:50:38.348938 Eastern Standard Time
| Itration            | 1651     |
| Real Det Return     | 700      |
| Real Sto Return     | 659      |
| Reward Loss         | -93.1    |
| Running Env Steps   | 825500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 6.27     |
| Running Update Time | 1651     |
----------------------------------
2025-02-01 18:50:54.055951 Eastern Standard Time
| Itration            | 1652     |
| Real Det Return     | 699      |
| Real Sto Return     | 659      |
| Reward Loss         | -86.6    |
| Running Env Steps   | 826000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 7.61     |
| Running Update Time | 1652     |
----------------------------------
2025-02-01 18:51:09.779464 Eastern Standard Time
| Itration            | 1653     |
| Real Det Return     | 685      |
| Real Sto Return     | 666      |
| Reward Loss         | -93.4    |
| Running Env Steps   | 826500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 6.61     |
| Running Update Time | 1653     |
----------------------------------
2025-02-01 18:51:25.449689 Eastern Standard Time
| Itration            | 1654     |
| Real Det Return     | 677      |
| Real Sto Return     | 652      |
| Reward Loss         | -84.6    |
| Running Env Steps   | 827000   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 7.87     |
| Running Update Time | 1654     |
----------------------------------
2025-02-01 18:51:41.212991 Eastern Standard Time
| Itration            | 1655     |
| Real Det Return     | 689      |
| Real Sto Return     | 671      |
| Reward Loss         | -50.2    |
| Running Env Steps   | 827500   |
| Running Forward KL  | -3.72    |
| Running Reverse KL  | 7.1      |
| Running Update Time | 1655     |
----------------------------------
2025-02-01 18:51:56.945613 Eastern Standard Time
| Itration            | 1656     |
| Real Det Return     | 678      |
| Real Sto Return     | 655      |
| Reward Loss         | -87      |
| Running Env Steps   | 828000   |
| Running Forward KL  | -5.53    |
| Running Reverse KL  | 7.05     |
| Running Update Time | 1656     |
----------------------------------
2025-02-01 18:52:12.733439 Eastern Standard Time
| Itration            | 1657     |
| Real Det Return     | 700      |
| Real Sto Return     | 683      |
| Reward Loss         | -29.8    |
| Running Env Steps   | 828500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 7.97     |
| Running Update Time | 1657     |
----------------------------------
2025-02-01 18:52:28.387646 Eastern Standard Time
| Itration            | 1658     |
| Real Det Return     | 699      |
| Real Sto Return     | 669      |
| Reward Loss         | -63.1    |
| Running Env Steps   | 829000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 1658     |
----------------------------------
2025-02-01 18:52:44.142587 Eastern Standard Time
| Itration            | 1659     |
| Real Det Return     | 704      |
| Real Sto Return     | 683      |
| Reward Loss         | -44      |
| Running Env Steps   | 829500   |
| Running Forward KL  | -5.46    |
| Running Reverse KL  | 7.57     |
| Running Update Time | 1659     |
----------------------------------
2025-02-01 18:52:59.854806 Eastern Standard Time
| Itration            | 1660     |
| Real Det Return     | 718      |
| Real Sto Return     | 683      |
| Reward Loss         | -75.6    |
| Running Env Steps   | 830000   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 7.45     |
| Running Update Time | 1660     |
----------------------------------
2025-02-01 18:53:15.623066 Eastern Standard Time
| Itration            | 1661     |
| Real Det Return     | 715      |
| Real Sto Return     | 667      |
| Reward Loss         | -61.7    |
| Running Env Steps   | 830500   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 7.94     |
| Running Update Time | 1661     |
----------------------------------
2025-02-01 18:53:31.273813 Eastern Standard Time
| Itration            | 1662     |
| Real Det Return     | 682      |
| Real Sto Return     | 661      |
| Reward Loss         | -47.3    |
| Running Env Steps   | 831000   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 7.51     |
| Running Update Time | 1662     |
----------------------------------
2025-02-01 18:53:47.012605 Eastern Standard Time
| Itration            | 1663     |
| Real Det Return     | 704      |
| Real Sto Return     | 663      |
| Reward Loss         | -101     |
| Running Env Steps   | 831500   |
| Running Forward KL  | -4.15    |
| Running Reverse KL  | 7.72     |
| Running Update Time | 1663     |
----------------------------------
2025-02-01 18:54:02.599456 Eastern Standard Time
| Itration            | 1664     |
| Real Det Return     | 681      |
| Real Sto Return     | 659      |
| Reward Loss         | -88.6    |
| Running Env Steps   | 832000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 7.15     |
| Running Update Time | 1664     |
----------------------------------
2025-02-01 18:54:18.311496 Eastern Standard Time
| Itration            | 1665     |
| Real Det Return     | 695      |
| Real Sto Return     | 666      |
| Reward Loss         | -52.8    |
| Running Env Steps   | 832500   |
| Running Forward KL  | -3.61    |
| Running Reverse KL  | 8.37     |
| Running Update Time | 1665     |
----------------------------------
2025-02-01 18:54:34.006802 Eastern Standard Time
| Itration            | 1666     |
| Real Det Return     | 689      |
| Real Sto Return     | 660      |
| Reward Loss         | -100     |
| Running Env Steps   | 833000   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 7.97     |
| Running Update Time | 1666     |
----------------------------------
2025-02-01 18:54:49.744898 Eastern Standard Time
| Itration            | 1667     |
| Real Det Return     | 688      |
| Real Sto Return     | 657      |
| Reward Loss         | -92.7    |
| Running Env Steps   | 833500   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 7.38     |
| Running Update Time | 1667     |
----------------------------------
2025-02-01 18:55:05.425154 Eastern Standard Time
| Itration            | 1668     |
| Real Det Return     | 696      |
| Real Sto Return     | 675      |
| Reward Loss         | -90.8    |
| Running Env Steps   | 834000   |
| Running Forward KL  | -4.5     |
| Running Reverse KL  | 6.74     |
| Running Update Time | 1668     |
----------------------------------
2025-02-01 18:55:21.073506 Eastern Standard Time
| Itration            | 1669     |
| Real Det Return     | 694      |
| Real Sto Return     | 676      |
| Reward Loss         | -79.1    |
| Running Env Steps   | 834500   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 7.16     |
| Running Update Time | 1669     |
----------------------------------
2025-02-01 18:55:36.740193 Eastern Standard Time
| Itration            | 1670     |
| Real Det Return     | 715      |
| Real Sto Return     | 688      |
| Reward Loss         | -95.5    |
| Running Env Steps   | 835000   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1670     |
----------------------------------
2025-02-01 18:55:52.425896 Eastern Standard Time
| Itration            | 1671     |
| Real Det Return     | 691      |
| Real Sto Return     | 671      |
| Reward Loss         | -76.5    |
| Running Env Steps   | 835500   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 6.78     |
| Running Update Time | 1671     |
----------------------------------
2025-02-01 18:56:08.115737 Eastern Standard Time
| Itration            | 1672     |
| Real Det Return     | 698      |
| Real Sto Return     | 678      |
| Reward Loss         | -73.5    |
| Running Env Steps   | 836000   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1672     |
----------------------------------
2025-02-01 18:56:23.808905 Eastern Standard Time
| Itration            | 1673     |
| Real Det Return     | 634      |
| Real Sto Return     | 602      |
| Reward Loss         | -178     |
| Running Env Steps   | 836500   |
| Running Forward KL  | -3.39    |
| Running Reverse KL  | 6.21     |
| Running Update Time | 1673     |
----------------------------------
2025-02-01 18:56:39.447445 Eastern Standard Time
| Itration            | 1674     |
| Real Det Return     | 702      |
| Real Sto Return     | 677      |
| Reward Loss         | -38.5    |
| Running Env Steps   | 837000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 7.27     |
| Running Update Time | 1674     |
----------------------------------
2025-02-01 18:56:55.097848 Eastern Standard Time
| Itration            | 1675     |
| Real Det Return     | 707      |
| Real Sto Return     | 675      |
| Reward Loss         | -64.1    |
| Running Env Steps   | 837500   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 8.38     |
| Running Update Time | 1675     |
----------------------------------
2025-02-01 18:57:10.781318 Eastern Standard Time
| Itration            | 1676     |
| Real Det Return     | 651      |
| Real Sto Return     | 656      |
| Reward Loss         | -111     |
| Running Env Steps   | 838000   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 6.73     |
| Running Update Time | 1676     |
----------------------------------
2025-02-01 18:57:26.512933 Eastern Standard Time
| Itration            | 1677     |
| Real Det Return     | 705      |
| Real Sto Return     | 678      |
| Reward Loss         | -68.6    |
| Running Env Steps   | 838500   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 7.26     |
| Running Update Time | 1677     |
----------------------------------
2025-02-01 18:57:42.324975 Eastern Standard Time
| Itration            | 1678     |
| Real Det Return     | 697      |
| Real Sto Return     | 669      |
| Reward Loss         | -78.1    |
| Running Env Steps   | 839000   |
| Running Forward KL  | -5.61    |
| Running Reverse KL  | 7.65     |
| Running Update Time | 1678     |
----------------------------------
2025-02-01 18:57:57.988544 Eastern Standard Time
| Itration            | 1679     |
| Real Det Return     | 700      |
| Real Sto Return     | 679      |
| Reward Loss         | -107     |
| Running Env Steps   | 839500   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 7.18     |
| Running Update Time | 1679     |
----------------------------------
2025-02-01 18:58:13.680633 Eastern Standard Time
| Itration            | 1680     |
| Real Det Return     | 689      |
| Real Sto Return     | 654      |
| Reward Loss         | -87.5    |
| Running Env Steps   | 840000   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 7.49     |
| Running Update Time | 1680     |
----------------------------------
2025-02-01 18:58:29.452288 Eastern Standard Time
| Itration            | 1681     |
| Real Det Return     | 713      |
| Real Sto Return     | 679      |
| Reward Loss         | -65.4    |
| Running Env Steps   | 840500   |
| Running Forward KL  | -4.12    |
| Running Reverse KL  | 8.35     |
| Running Update Time | 1681     |
----------------------------------
2025-02-01 18:58:45.260615 Eastern Standard Time
| Itration            | 1682     |
| Real Det Return     | 692      |
| Real Sto Return     | 665      |
| Reward Loss         | -78.1    |
| Running Env Steps   | 841000   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 8.08     |
| Running Update Time | 1682     |
----------------------------------
2025-02-01 18:59:01.074922 Eastern Standard Time
| Itration            | 1683     |
| Real Det Return     | 695      |
| Real Sto Return     | 670      |
| Reward Loss         | -117     |
| Running Env Steps   | 841500   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 6.66     |
| Running Update Time | 1683     |
----------------------------------
2025-02-01 18:59:16.857615 Eastern Standard Time
| Itration            | 1684     |
| Real Det Return     | 707      |
| Real Sto Return     | 680      |
| Reward Loss         | -47.4    |
| Running Env Steps   | 842000   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 7.63     |
| Running Update Time | 1684     |
----------------------------------
2025-02-01 18:59:32.825270 Eastern Standard Time
| Itration            | 1685     |
| Real Det Return     | 663      |
| Real Sto Return     | 634      |
| Reward Loss         | -184     |
| Running Env Steps   | 842500   |
| Running Forward KL  | -2.29    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 1685     |
----------------------------------
2025-02-01 18:59:48.534778 Eastern Standard Time
| Itration            | 1686     |
| Real Det Return     | 693      |
| Real Sto Return     | 672      |
| Reward Loss         | -84.5    |
| Running Env Steps   | 843000   |
| Running Forward KL  | -3.69    |
| Running Reverse KL  | 7.85     |
| Running Update Time | 1686     |
----------------------------------
2025-02-01 19:00:04.231513 Eastern Standard Time
| Itration            | 1687     |
| Real Det Return     | 687      |
| Real Sto Return     | 668      |
| Reward Loss         | -99.2    |
| Running Env Steps   | 843500   |
| Running Forward KL  | -2.95    |
| Running Reverse KL  | 8.42     |
| Running Update Time | 1687     |
----------------------------------
2025-02-01 19:00:19.914919 Eastern Standard Time
| Itration            | 1688     |
| Real Det Return     | 671      |
| Real Sto Return     | 647      |
| Reward Loss         | -76.9    |
| Running Env Steps   | 844000   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 7.16     |
| Running Update Time | 1688     |
----------------------------------
2025-02-01 19:00:35.690034 Eastern Standard Time
| Itration            | 1689     |
| Real Det Return     | 705      |
| Real Sto Return     | 679      |
| Reward Loss         | -89.8    |
| Running Env Steps   | 844500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 6.89     |
| Running Update Time | 1689     |
----------------------------------
2025-02-01 19:00:51.422987 Eastern Standard Time
| Itration            | 1690     |
| Real Det Return     | 699      |
| Real Sto Return     | 683      |
| Reward Loss         | -90.5    |
| Running Env Steps   | 845000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1690     |
----------------------------------
2025-02-01 19:01:07.106055 Eastern Standard Time
| Itration            | 1691     |
| Real Det Return     | 687      |
| Real Sto Return     | 670      |
| Reward Loss         | -119     |
| Running Env Steps   | 845500   |
| Running Forward KL  | -2.88    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 1691     |
----------------------------------
2025-02-01 19:01:22.826531 Eastern Standard Time
| Itration            | 1692     |
| Real Det Return     | 712      |
| Real Sto Return     | 673      |
| Reward Loss         | -54.1    |
| Running Env Steps   | 846000   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 7.87     |
| Running Update Time | 1692     |
----------------------------------
2025-02-01 19:01:38.577619 Eastern Standard Time
| Itration            | 1693     |
| Real Det Return     | 702      |
| Real Sto Return     | 677      |
| Reward Loss         | -41.6    |
| Running Env Steps   | 846500   |
| Running Forward KL  | -5.49    |
| Running Reverse KL  | 7.17     |
| Running Update Time | 1693     |
----------------------------------
2025-02-01 19:01:54.282914 Eastern Standard Time
| Itration            | 1694     |
| Real Det Return     | 669      |
| Real Sto Return     | 641      |
| Reward Loss         | -104     |
| Running Env Steps   | 847000   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 5.96     |
| Running Update Time | 1694     |
----------------------------------
2025-02-01 19:02:10.009158 Eastern Standard Time
| Itration            | 1695     |
| Real Det Return     | 697      |
| Real Sto Return     | 677      |
| Reward Loss         | -56      |
| Running Env Steps   | 847500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 7.57     |
| Running Update Time | 1695     |
----------------------------------
2025-02-01 19:02:25.753140 Eastern Standard Time
| Itration            | 1696     |
| Real Det Return     | 725      |
| Real Sto Return     | 698      |
| Reward Loss         | -49.1    |
| Running Env Steps   | 848000   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 8.09     |
| Running Update Time | 1696     |
----------------------------------
2025-02-01 19:02:41.557614 Eastern Standard Time
| Itration            | 1697     |
| Real Det Return     | 701      |
| Real Sto Return     | 663      |
| Reward Loss         | -59.4    |
| Running Env Steps   | 848500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 6.96     |
| Running Update Time | 1697     |
----------------------------------
2025-02-01 19:02:57.227811 Eastern Standard Time
| Itration            | 1698     |
| Real Det Return     | 670      |
| Real Sto Return     | 647      |
| Reward Loss         | -93.7    |
| Running Env Steps   | 849000   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1698     |
----------------------------------
2025-02-01 19:03:12.974958 Eastern Standard Time
| Itration            | 1699     |
| Real Det Return     | 699      |
| Real Sto Return     | 655      |
| Reward Loss         | -122     |
| Running Env Steps   | 849500   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 7.71     |
| Running Update Time | 1699     |
----------------------------------
2025-02-01 19:03:28.686968 Eastern Standard Time
| Itration            | 1700     |
| Real Det Return     | 670      |
| Real Sto Return     | 646      |
| Reward Loss         | -111     |
| Running Env Steps   | 850000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 7.73     |
| Running Update Time | 1700     |
----------------------------------
2025-02-01 19:03:44.367604 Eastern Standard Time
| Itration            | 1701     |
| Real Det Return     | 718      |
| Real Sto Return     | 687      |
| Reward Loss         | -66.6    |
| Running Env Steps   | 850500   |
| Running Forward KL  | -5.4     |
| Running Reverse KL  | 6.86     |
| Running Update Time | 1701     |
----------------------------------
2025-02-01 19:04:00.123920 Eastern Standard Time
| Itration            | 1702     |
| Real Det Return     | 651      |
| Real Sto Return     | 632      |
| Reward Loss         | -106     |
| Running Env Steps   | 851000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 7.76     |
| Running Update Time | 1702     |
----------------------------------
2025-02-01 19:04:15.836862 Eastern Standard Time
| Itration            | 1703     |
| Real Det Return     | 693      |
| Real Sto Return     | 674      |
| Reward Loss         | -71.1    |
| Running Env Steps   | 851500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 7.59     |
| Running Update Time | 1703     |
----------------------------------
2025-02-01 19:04:31.544541 Eastern Standard Time
| Itration            | 1704     |
| Real Det Return     | 694      |
| Real Sto Return     | 665      |
| Reward Loss         | -74.9    |
| Running Env Steps   | 852000   |
| Running Forward KL  | -5.57    |
| Running Reverse KL  | 7.57     |
| Running Update Time | 1704     |
----------------------------------
2025-02-01 19:04:47.260027 Eastern Standard Time
| Itration            | 1705     |
| Real Det Return     | 701      |
| Real Sto Return     | 679      |
| Reward Loss         | -73      |
| Running Env Steps   | 852500   |
| Running Forward KL  | -3.91    |
| Running Reverse KL  | 7.94     |
| Running Update Time | 1705     |
----------------------------------
2025-02-01 19:05:02.999035 Eastern Standard Time
| Itration            | 1706     |
| Real Det Return     | 693      |
| Real Sto Return     | 670      |
| Reward Loss         | -82.8    |
| Running Env Steps   | 853000   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 7.89     |
| Running Update Time | 1706     |
----------------------------------
2025-02-01 19:05:18.676356 Eastern Standard Time
| Itration            | 1707     |
| Real Det Return     | 725      |
| Real Sto Return     | 676      |
| Reward Loss         | -42.9    |
| Running Env Steps   | 853500   |
| Running Forward KL  | -4.5     |
| Running Reverse KL  | 7.45     |
| Running Update Time | 1707     |
----------------------------------
2025-02-01 19:05:34.452271 Eastern Standard Time
| Itration            | 1708     |
| Real Det Return     | 671      |
| Real Sto Return     | 638      |
| Reward Loss         | -83.5    |
| Running Env Steps   | 854000   |
| Running Forward KL  | -5.89    |
| Running Reverse KL  | 6.87     |
| Running Update Time | 1708     |
----------------------------------
2025-02-01 19:05:50.179956 Eastern Standard Time
| Itration            | 1709     |
| Real Det Return     | 688      |
| Real Sto Return     | 666      |
| Reward Loss         | -66.6    |
| Running Env Steps   | 854500   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1709     |
----------------------------------
2025-02-01 19:06:05.900637 Eastern Standard Time
| Itration            | 1710     |
| Real Det Return     | 700      |
| Real Sto Return     | 673      |
| Reward Loss         | -77      |
| Running Env Steps   | 855000   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 6.8      |
| Running Update Time | 1710     |
----------------------------------
2025-02-01 19:06:21.631087 Eastern Standard Time
| Itration            | 1711     |
| Real Det Return     | 702      |
| Real Sto Return     | 681      |
| Reward Loss         | -84.1    |
| Running Env Steps   | 855500   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 1711     |
----------------------------------
2025-02-01 19:06:37.294693 Eastern Standard Time
| Itration            | 1712     |
| Real Det Return     | 712      |
| Real Sto Return     | 677      |
| Reward Loss         | -80.9    |
| Running Env Steps   | 856000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 7.1      |
| Running Update Time | 1712     |
----------------------------------
2025-02-01 19:06:53.027464 Eastern Standard Time
| Itration            | 1713     |
| Real Det Return     | 694      |
| Real Sto Return     | 659      |
| Reward Loss         | -49.2    |
| Running Env Steps   | 856500   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 7.67     |
| Running Update Time | 1713     |
----------------------------------
2025-02-01 19:07:08.758304 Eastern Standard Time
| Itration            | 1714     |
| Real Det Return     | 672      |
| Real Sto Return     | 634      |
| Reward Loss         | -68.1    |
| Running Env Steps   | 857000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 7.96     |
| Running Update Time | 1714     |
----------------------------------
2025-02-01 19:07:24.644486 Eastern Standard Time
| Itration            | 1715     |
| Real Det Return     | 683      |
| Real Sto Return     | 677      |
| Reward Loss         | -56.3    |
| Running Env Steps   | 857500   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 7.31     |
| Running Update Time | 1715     |
----------------------------------
2025-02-01 19:07:40.370647 Eastern Standard Time
| Itration            | 1716     |
| Real Det Return     | 708      |
| Real Sto Return     | 691      |
| Reward Loss         | -77.9    |
| Running Env Steps   | 858000   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 8.37     |
| Running Update Time | 1716     |
----------------------------------
2025-02-01 19:07:56.101223 Eastern Standard Time
| Itration            | 1717     |
| Real Det Return     | 693      |
| Real Sto Return     | 673      |
| Reward Loss         | -59.6    |
| Running Env Steps   | 858500   |
| Running Forward KL  | -5.6     |
| Running Reverse KL  | 6.46     |
| Running Update Time | 1717     |
----------------------------------
2025-02-01 19:08:11.823960 Eastern Standard Time
| Itration            | 1718     |
| Real Det Return     | 704      |
| Real Sto Return     | 673      |
| Reward Loss         | -70.1    |
| Running Env Steps   | 859000   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 6.65     |
| Running Update Time | 1718     |
----------------------------------
2025-02-01 19:08:27.515234 Eastern Standard Time
| Itration            | 1719     |
| Real Det Return     | 703      |
| Real Sto Return     | 678      |
| Reward Loss         | -51.8    |
| Running Env Steps   | 859500   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 6.93     |
| Running Update Time | 1719     |
----------------------------------
2025-02-01 19:08:43.169507 Eastern Standard Time
| Itration            | 1720     |
| Real Det Return     | 668      |
| Real Sto Return     | 661      |
| Reward Loss         | -54.7    |
| Running Env Steps   | 860000   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 1720     |
----------------------------------
2025-02-01 19:08:58.954669 Eastern Standard Time
| Itration            | 1721     |
| Real Det Return     | 691      |
| Real Sto Return     | 667      |
| Reward Loss         | -65.9    |
| Running Env Steps   | 860500   |
| Running Forward KL  | -5.51    |
| Running Reverse KL  | 7.48     |
| Running Update Time | 1721     |
----------------------------------
2025-02-01 19:09:14.696972 Eastern Standard Time
| Itration            | 1722     |
| Real Det Return     | 671      |
| Real Sto Return     | 648      |
| Reward Loss         | -70.3    |
| Running Env Steps   | 861000   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 6.91     |
| Running Update Time | 1722     |
----------------------------------
2025-02-01 19:09:30.461089 Eastern Standard Time
| Itration            | 1723     |
| Real Det Return     | 686      |
| Real Sto Return     | 652      |
| Reward Loss         | -71.9    |
| Running Env Steps   | 861500   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 6.92     |
| Running Update Time | 1723     |
----------------------------------
2025-02-01 19:09:46.181828 Eastern Standard Time
| Itration            | 1724     |
| Real Det Return     | 698      |
| Real Sto Return     | 666      |
| Reward Loss         | -106     |
| Running Env Steps   | 862000   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 6.49     |
| Running Update Time | 1724     |
----------------------------------
2025-02-01 19:10:01.863467 Eastern Standard Time
| Itration            | 1725     |
| Real Det Return     | 671      |
| Real Sto Return     | 663      |
| Reward Loss         | -74.5    |
| Running Env Steps   | 862500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 6.86     |
| Running Update Time | 1725     |
----------------------------------
2025-02-01 19:10:17.572928 Eastern Standard Time
| Itration            | 1726     |
| Real Det Return     | 656      |
| Real Sto Return     | 639      |
| Reward Loss         | -133     |
| Running Env Steps   | 863000   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 5.9      |
| Running Update Time | 1726     |
----------------------------------
2025-02-01 19:10:33.273745 Eastern Standard Time
| Itration            | 1727     |
| Real Det Return     | 661      |
| Real Sto Return     | 628      |
| Reward Loss         | -162     |
| Running Env Steps   | 863500   |
| Running Forward KL  | -3.77    |
| Running Reverse KL  | 6.8      |
| Running Update Time | 1727     |
----------------------------------
2025-02-01 19:10:49.022490 Eastern Standard Time
| Itration            | 1728     |
| Real Det Return     | 667      |
| Real Sto Return     | 655      |
| Reward Loss         | -119     |
| Running Env Steps   | 864000   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 7        |
| Running Update Time | 1728     |
----------------------------------
2025-02-01 19:11:04.744914 Eastern Standard Time
| Itration            | 1729     |
| Real Det Return     | 706      |
| Real Sto Return     | 675      |
| Reward Loss         | -55.4    |
| Running Env Steps   | 864500   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 7.08     |
| Running Update Time | 1729     |
----------------------------------
2025-02-01 19:11:20.497262 Eastern Standard Time
| Itration            | 1730     |
| Real Det Return     | 700      |
| Real Sto Return     | 664      |
| Reward Loss         | -80.6    |
| Running Env Steps   | 865000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 1730     |
----------------------------------
2025-02-01 19:11:36.200586 Eastern Standard Time
| Itration            | 1731     |
| Real Det Return     | 706      |
| Real Sto Return     | 666      |
| Reward Loss         | -75.4    |
| Running Env Steps   | 865500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1731     |
----------------------------------
2025-02-01 19:11:51.945672 Eastern Standard Time
| Itration            | 1732     |
| Real Det Return     | 702      |
| Real Sto Return     | 689      |
| Reward Loss         | -89.7    |
| Running Env Steps   | 866000   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 8.23     |
| Running Update Time | 1732     |
----------------------------------
2025-02-01 19:12:07.702547 Eastern Standard Time
| Itration            | 1733     |
| Real Det Return     | 692      |
| Real Sto Return     | 671      |
| Reward Loss         | -91.5    |
| Running Env Steps   | 866500   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 6.69     |
| Running Update Time | 1733     |
----------------------------------
2025-02-01 19:12:23.415451 Eastern Standard Time
| Itration            | 1734     |
| Real Det Return     | 689      |
| Real Sto Return     | 671      |
| Reward Loss         | -93.8    |
| Running Env Steps   | 867000   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 7.08     |
| Running Update Time | 1734     |
----------------------------------
2025-02-01 19:12:39.146887 Eastern Standard Time
| Itration            | 1735     |
| Real Det Return     | 645      |
| Real Sto Return     | 612      |
| Reward Loss         | -144     |
| Running Env Steps   | 867500   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 5.81     |
| Running Update Time | 1735     |
----------------------------------
2025-02-01 19:12:54.730385 Eastern Standard Time
| Itration            | 1736     |
| Real Det Return     | 700      |
| Real Sto Return     | 672      |
| Reward Loss         | -70.5    |
| Running Env Steps   | 868000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 7.96     |
| Running Update Time | 1736     |
----------------------------------
2025-02-01 19:13:10.526933 Eastern Standard Time
| Itration            | 1737     |
| Real Det Return     | 693      |
| Real Sto Return     | 670      |
| Reward Loss         | -85.5    |
| Running Env Steps   | 868500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 6.93     |
| Running Update Time | 1737     |
----------------------------------
2025-02-01 19:13:26.276978 Eastern Standard Time
| Itration            | 1738     |
| Real Det Return     | 676      |
| Real Sto Return     | 664      |
| Reward Loss         | -67.9    |
| Running Env Steps   | 869000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 7.97     |
| Running Update Time | 1738     |
----------------------------------
2025-02-01 19:13:42.012665 Eastern Standard Time
| Itration            | 1739     |
| Real Det Return     | 682      |
| Real Sto Return     | 656      |
| Reward Loss         | -95.7    |
| Running Env Steps   | 869500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 7.28     |
| Running Update Time | 1739     |
----------------------------------
2025-02-01 19:13:57.789795 Eastern Standard Time
| Itration            | 1740     |
| Real Det Return     | 704      |
| Real Sto Return     | 676      |
| Reward Loss         | -88.6    |
| Running Env Steps   | 870000   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 7.38     |
| Running Update Time | 1740     |
----------------------------------
2025-02-01 19:14:13.904958 Eastern Standard Time
| Itration            | 1741     |
| Real Det Return     | 708      |
| Real Sto Return     | 679      |
| Reward Loss         | -53.6    |
| Running Env Steps   | 870500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 7.04     |
| Running Update Time | 1741     |
----------------------------------
2025-02-01 19:14:29.820037 Eastern Standard Time
| Itration            | 1742     |
| Real Det Return     | 680      |
| Real Sto Return     | 654      |
| Reward Loss         | -89.7    |
| Running Env Steps   | 871000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 1742     |
----------------------------------
2025-02-01 19:14:45.557250 Eastern Standard Time
| Itration            | 1743     |
| Real Det Return     | 696      |
| Real Sto Return     | 683      |
| Reward Loss         | -84.9    |
| Running Env Steps   | 871500   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 7.25     |
| Running Update Time | 1743     |
----------------------------------
2025-02-01 19:15:01.275999 Eastern Standard Time
| Itration            | 1744     |
| Real Det Return     | 704      |
| Real Sto Return     | 662      |
| Reward Loss         | -55.2    |
| Running Env Steps   | 872000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 8.46     |
| Running Update Time | 1744     |
----------------------------------
2025-02-01 19:15:16.932510 Eastern Standard Time
| Itration            | 1745     |
| Real Det Return     | 667      |
| Real Sto Return     | 648      |
| Reward Loss         | -76      |
| Running Env Steps   | 872500   |
| Running Forward KL  | -5.45    |
| Running Reverse KL  | 6.96     |
| Running Update Time | 1745     |
----------------------------------
2025-02-01 19:15:32.651494 Eastern Standard Time
| Itration            | 1746     |
| Real Det Return     | 683      |
| Real Sto Return     | 636      |
| Reward Loss         | -89.4    |
| Running Env Steps   | 873000   |
| Running Forward KL  | -5.32    |
| Running Reverse KL  | 6.23     |
| Running Update Time | 1746     |
----------------------------------
2025-02-01 19:15:48.372830 Eastern Standard Time
| Itration            | 1747     |
| Real Det Return     | 710      |
| Real Sto Return     | 666      |
| Reward Loss         | -47.8    |
| Running Env Steps   | 873500   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 7.61     |
| Running Update Time | 1747     |
----------------------------------
2025-02-01 19:16:04.053892 Eastern Standard Time
| Itration            | 1748     |
| Real Det Return     | 676      |
| Real Sto Return     | 644      |
| Reward Loss         | -169     |
| Running Env Steps   | 874000   |
| Running Forward KL  | -4.15    |
| Running Reverse KL  | 6.6      |
| Running Update Time | 1748     |
----------------------------------
2025-02-01 19:16:19.807299 Eastern Standard Time
| Itration            | 1749     |
| Real Det Return     | 704      |
| Real Sto Return     | 670      |
| Reward Loss         | -61      |
| Running Env Steps   | 874500   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 8        |
| Running Update Time | 1749     |
----------------------------------
2025-02-01 19:16:35.558954 Eastern Standard Time
| Itration            | 1750     |
| Real Det Return     | 651      |
| Real Sto Return     | 607      |
| Reward Loss         | -120     |
| Running Env Steps   | 875000   |
| Running Forward KL  | -3.02    |
| Running Reverse KL  | 7.87     |
| Running Update Time | 1750     |
----------------------------------
2025-02-01 19:16:51.406817 Eastern Standard Time
| Itration            | 1751     |
| Real Det Return     | 689      |
| Real Sto Return     | 658      |
| Reward Loss         | -81      |
| Running Env Steps   | 875500   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 6.44     |
| Running Update Time | 1751     |
----------------------------------
2025-02-01 19:17:07.245034 Eastern Standard Time
| Itration            | 1752     |
| Real Det Return     | 707      |
| Real Sto Return     | 676      |
| Reward Loss         | -38.4    |
| Running Env Steps   | 876000   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 7.21     |
| Running Update Time | 1752     |
----------------------------------
2025-02-01 19:17:22.961564 Eastern Standard Time
| Itration            | 1753     |
| Real Det Return     | 704      |
| Real Sto Return     | 684      |
| Reward Loss         | -79.5    |
| Running Env Steps   | 876500   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 7.1      |
| Running Update Time | 1753     |
----------------------------------
2025-02-01 19:17:38.779964 Eastern Standard Time
| Itration            | 1754     |
| Real Det Return     | 703      |
| Real Sto Return     | 687      |
| Reward Loss         | -44.5    |
| Running Env Steps   | 877000   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 7.01     |
| Running Update Time | 1754     |
----------------------------------
2025-02-01 19:17:54.478653 Eastern Standard Time
| Itration            | 1755     |
| Real Det Return     | 714      |
| Real Sto Return     | 682      |
| Reward Loss         | -124     |
| Running Env Steps   | 877500   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 6.88     |
| Running Update Time | 1755     |
----------------------------------
2025-02-01 19:18:10.212940 Eastern Standard Time
| Itration            | 1756     |
| Real Det Return     | 694      |
| Real Sto Return     | 672      |
| Reward Loss         | -120     |
| Running Env Steps   | 878000   |
| Running Forward KL  | -4.24    |
| Running Reverse KL  | 6.48     |
| Running Update Time | 1756     |
----------------------------------
2025-02-01 19:18:25.926975 Eastern Standard Time
| Itration            | 1757     |
| Real Det Return     | 686      |
| Real Sto Return     | 664      |
| Reward Loss         | -82      |
| Running Env Steps   | 878500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 6.83     |
| Running Update Time | 1757     |
----------------------------------
2025-02-01 19:18:41.690393 Eastern Standard Time
| Itration            | 1758     |
| Real Det Return     | 696      |
| Real Sto Return     | 685      |
| Reward Loss         | -50.9    |
| Running Env Steps   | 879000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 8.26     |
| Running Update Time | 1758     |
----------------------------------
2025-02-01 19:18:57.461838 Eastern Standard Time
| Itration            | 1759     |
| Real Det Return     | 670      |
| Real Sto Return     | 662      |
| Reward Loss         | -73.6    |
| Running Env Steps   | 879500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 7.8      |
| Running Update Time | 1759     |
----------------------------------
2025-02-01 19:19:13.188982 Eastern Standard Time
| Itration            | 1760     |
| Real Det Return     | 695      |
| Real Sto Return     | 661      |
| Reward Loss         | -69      |
| Running Env Steps   | 880000   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 7.7      |
| Running Update Time | 1760     |
----------------------------------
2025-02-01 19:19:28.968251 Eastern Standard Time
| Itration            | 1761     |
| Real Det Return     | 715      |
| Real Sto Return     | 679      |
| Reward Loss         | -50.3    |
| Running Env Steps   | 880500   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 7.74     |
| Running Update Time | 1761     |
----------------------------------
2025-02-01 19:19:44.682659 Eastern Standard Time
| Itration            | 1762     |
| Real Det Return     | 709      |
| Real Sto Return     | 701      |
| Reward Loss         | -56      |
| Running Env Steps   | 881000   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 7.87     |
| Running Update Time | 1762     |
----------------------------------
2025-02-01 19:20:00.454961 Eastern Standard Time
| Itration            | 1763     |
| Real Det Return     | 674      |
| Real Sto Return     | 646      |
| Reward Loss         | -101     |
| Running Env Steps   | 881500   |
| Running Forward KL  | -4.87    |
| Running Reverse KL  | 6.55     |
| Running Update Time | 1763     |
----------------------------------
2025-02-01 19:20:16.325731 Eastern Standard Time
| Itration            | 1764     |
| Real Det Return     | 701      |
| Real Sto Return     | 668      |
| Reward Loss         | -100     |
| Running Env Steps   | 882000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 1764     |
----------------------------------
2025-02-01 19:20:32.197833 Eastern Standard Time
| Itration            | 1765     |
| Real Det Return     | 703      |
| Real Sto Return     | 670      |
| Reward Loss         | -105     |
| Running Env Steps   | 882500   |
| Running Forward KL  | -5.37    |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1765     |
----------------------------------
2025-02-01 19:20:47.933126 Eastern Standard Time
| Itration            | 1766     |
| Real Det Return     | 712      |
| Real Sto Return     | 687      |
| Reward Loss         | -68.3    |
| Running Env Steps   | 883000   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 6.86     |
| Running Update Time | 1766     |
----------------------------------
2025-02-01 19:21:03.685770 Eastern Standard Time
| Itration            | 1767     |
| Real Det Return     | 659      |
| Real Sto Return     | 635      |
| Reward Loss         | -120     |
| Running Env Steps   | 883500   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 6.65     |
| Running Update Time | 1767     |
----------------------------------
2025-02-01 19:21:19.492261 Eastern Standard Time
| Itration            | 1768     |
| Real Det Return     | 706      |
| Real Sto Return     | 681      |
| Reward Loss         | -60.9    |
| Running Env Steps   | 884000   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 7.15     |
| Running Update Time | 1768     |
----------------------------------
2025-02-01 19:21:35.281208 Eastern Standard Time
| Itration            | 1769     |
| Real Det Return     | 686      |
| Real Sto Return     | 678      |
| Reward Loss         | -96.7    |
| Running Env Steps   | 884500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1769     |
----------------------------------
2025-02-01 19:21:50.953958 Eastern Standard Time
| Itration            | 1770     |
| Real Det Return     | 679      |
| Real Sto Return     | 654      |
| Reward Loss         | -74.4    |
| Running Env Steps   | 885000   |
| Running Forward KL  | -3.69    |
| Running Reverse KL  | 7.44     |
| Running Update Time | 1770     |
----------------------------------
2025-02-01 19:22:06.684573 Eastern Standard Time
| Itration            | 1771     |
| Real Det Return     | 706      |
| Real Sto Return     | 685      |
| Reward Loss         | -49.8    |
| Running Env Steps   | 885500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 6.85     |
| Running Update Time | 1771     |
----------------------------------
2025-02-01 19:22:22.474217 Eastern Standard Time
| Itration            | 1772     |
| Real Det Return     | 702      |
| Real Sto Return     | 676      |
| Reward Loss         | -108     |
| Running Env Steps   | 886000   |
| Running Forward KL  | -3.3     |
| Running Reverse KL  | 8.64     |
| Running Update Time | 1772     |
----------------------------------
2025-02-01 19:22:38.155937 Eastern Standard Time
| Itration            | 1773     |
| Real Det Return     | 687      |
| Real Sto Return     | 667      |
| Reward Loss         | -84.7    |
| Running Env Steps   | 886500   |
| Running Forward KL  | -5.47    |
| Running Reverse KL  | 6.8      |
| Running Update Time | 1773     |
----------------------------------
2025-02-01 19:22:53.884945 Eastern Standard Time
| Itration            | 1774     |
| Real Det Return     | 693      |
| Real Sto Return     | 677      |
| Reward Loss         | -70.8    |
| Running Env Steps   | 887000   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 8        |
| Running Update Time | 1774     |
----------------------------------
2025-02-01 19:23:09.648608 Eastern Standard Time
| Itration            | 1775     |
| Real Det Return     | 661      |
| Real Sto Return     | 656      |
| Reward Loss         | -117     |
| Running Env Steps   | 887500   |
| Running Forward KL  | -5.64    |
| Running Reverse KL  | 6.18     |
| Running Update Time | 1775     |
----------------------------------
2025-02-01 19:23:25.313787 Eastern Standard Time
| Itration            | 1776     |
| Real Det Return     | 702      |
| Real Sto Return     | 677      |
| Reward Loss         | -74.6    |
| Running Env Steps   | 888000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 6.35     |
| Running Update Time | 1776     |
----------------------------------
2025-02-01 19:23:41.071863 Eastern Standard Time
| Itration            | 1777     |
| Real Det Return     | 646      |
| Real Sto Return     | 607      |
| Reward Loss         | -162     |
| Running Env Steps   | 888500   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 6.68     |
| Running Update Time | 1777     |
----------------------------------
2025-02-01 19:23:56.820054 Eastern Standard Time
| Itration            | 1778     |
| Real Det Return     | 672      |
| Real Sto Return     | 644      |
| Reward Loss         | -85.6    |
| Running Env Steps   | 889000   |
| Running Forward KL  | -5.55    |
| Running Reverse KL  | 7.58     |
| Running Update Time | 1778     |
----------------------------------
2025-02-01 19:24:12.574715 Eastern Standard Time
| Itration            | 1779     |
| Real Det Return     | 700      |
| Real Sto Return     | 674      |
| Reward Loss         | -51.2    |
| Running Env Steps   | 889500   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 7.26     |
| Running Update Time | 1779     |
----------------------------------
2025-02-01 19:24:28.340267 Eastern Standard Time
| Itration            | 1780     |
| Real Det Return     | 692      |
| Real Sto Return     | 659      |
| Reward Loss         | -94.5    |
| Running Env Steps   | 890000   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 6.88     |
| Running Update Time | 1780     |
----------------------------------
2025-02-01 19:24:44.071841 Eastern Standard Time
| Itration            | 1781     |
| Real Det Return     | 720      |
| Real Sto Return     | 677      |
| Reward Loss         | -68.1    |
| Running Env Steps   | 890500   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 7.78     |
| Running Update Time | 1781     |
----------------------------------
2025-02-01 19:24:59.743628 Eastern Standard Time
| Itration            | 1782     |
| Real Det Return     | 717      |
| Real Sto Return     | 690      |
| Reward Loss         | -58      |
| Running Env Steps   | 891000   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 7.11     |
| Running Update Time | 1782     |
----------------------------------
2025-02-01 19:25:15.421343 Eastern Standard Time
| Itration            | 1783     |
| Real Det Return     | 694      |
| Real Sto Return     | 671      |
| Reward Loss         | -52.9    |
| Running Env Steps   | 891500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 6.56     |
| Running Update Time | 1783     |
----------------------------------
2025-02-01 19:25:31.152249 Eastern Standard Time
| Itration            | 1784     |
| Real Det Return     | 638      |
| Real Sto Return     | 602      |
| Reward Loss         | -141     |
| Running Env Steps   | 892000   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 6.99     |
| Running Update Time | 1784     |
----------------------------------
2025-02-01 19:25:46.915378 Eastern Standard Time
| Itration            | 1785     |
| Real Det Return     | 655      |
| Real Sto Return     | 619      |
| Reward Loss         | -131     |
| Running Env Steps   | 892500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 7.47     |
| Running Update Time | 1785     |
----------------------------------
2025-02-01 19:26:02.708181 Eastern Standard Time
| Itration            | 1786     |
| Real Det Return     | 703      |
| Real Sto Return     | 682      |
| Reward Loss         | -70.1    |
| Running Env Steps   | 893000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1786     |
----------------------------------
2025-02-01 19:26:18.392124 Eastern Standard Time
| Itration            | 1787     |
| Real Det Return     | 670      |
| Real Sto Return     | 642      |
| Reward Loss         | -95.6    |
| Running Env Steps   | 893500   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 6.19     |
| Running Update Time | 1787     |
----------------------------------
2025-02-01 19:26:34.139073 Eastern Standard Time
| Itration            | 1788     |
| Real Det Return     | 707      |
| Real Sto Return     | 679      |
| Reward Loss         | -91.8    |
| Running Env Steps   | 894000   |
| Running Forward KL  | -5.4     |
| Running Reverse KL  | 7.28     |
| Running Update Time | 1788     |
----------------------------------
2025-02-01 19:26:49.917809 Eastern Standard Time
| Itration            | 1789     |
| Real Det Return     | 686      |
| Real Sto Return     | 666      |
| Reward Loss         | -73.5    |
| Running Env Steps   | 894500   |
| Running Forward KL  | -4.3     |
| Running Reverse KL  | 8.52     |
| Running Update Time | 1789     |
----------------------------------
2025-02-01 19:27:05.696420 Eastern Standard Time
| Itration            | 1790     |
| Real Det Return     | 720      |
| Real Sto Return     | 690      |
| Reward Loss         | -87.7    |
| Running Env Steps   | 895000   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 7.53     |
| Running Update Time | 1790     |
----------------------------------
2025-02-01 19:27:21.457669 Eastern Standard Time
| Itration            | 1791     |
| Real Det Return     | 689      |
| Real Sto Return     | 659      |
| Reward Loss         | -181     |
| Running Env Steps   | 895500   |
| Running Forward KL  | -2.4     |
| Running Reverse KL  | 7.12     |
| Running Update Time | 1791     |
----------------------------------
2025-02-01 19:27:37.140284 Eastern Standard Time
| Itration            | 1792     |
| Real Det Return     | 701      |
| Real Sto Return     | 660      |
| Reward Loss         | -92.2    |
| Running Env Steps   | 896000   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 7.46     |
| Running Update Time | 1792     |
----------------------------------
2025-02-01 19:27:52.824738 Eastern Standard Time
| Itration            | 1793     |
| Real Det Return     | 701      |
| Real Sto Return     | 671      |
| Reward Loss         | -81.1    |
| Running Env Steps   | 896500   |
| Running Forward KL  | -5.42    |
| Running Reverse KL  | 7.11     |
| Running Update Time | 1793     |
----------------------------------
2025-02-01 19:28:08.560303 Eastern Standard Time
| Itration            | 1794     |
| Real Det Return     | 699      |
| Real Sto Return     | 665      |
| Reward Loss         | -66.9    |
| Running Env Steps   | 897000   |
| Running Forward KL  | -5.46    |
| Running Reverse KL  | 6.93     |
| Running Update Time | 1794     |
----------------------------------
2025-02-01 19:28:24.340094 Eastern Standard Time
| Itration            | 1795     |
| Real Det Return     | 713      |
| Real Sto Return     | 689      |
| Reward Loss         | -41.7    |
| Running Env Steps   | 897500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 7.99     |
| Running Update Time | 1795     |
----------------------------------
2025-02-01 19:28:40.132513 Eastern Standard Time
| Itration            | 1796     |
| Real Det Return     | 688      |
| Real Sto Return     | 661      |
| Reward Loss         | -86.1    |
| Running Env Steps   | 898000   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 7.56     |
| Running Update Time | 1796     |
----------------------------------
2025-02-01 19:28:55.852688 Eastern Standard Time
| Itration            | 1797     |
| Real Det Return     | 700      |
| Real Sto Return     | 675      |
| Reward Loss         | -71.8    |
| Running Env Steps   | 898500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 1797     |
----------------------------------
2025-02-01 19:29:11.560699 Eastern Standard Time
| Itration            | 1798     |
| Real Det Return     | 696      |
| Real Sto Return     | 670      |
| Reward Loss         | -85.3    |
| Running Env Steps   | 899000   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 7.43     |
| Running Update Time | 1798     |
----------------------------------
2025-02-01 19:29:27.464735 Eastern Standard Time
| Itration            | 1799     |
| Real Det Return     | 682      |
| Real Sto Return     | 676      |
| Reward Loss         | -84.1    |
| Running Env Steps   | 899500   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 7        |
| Running Update Time | 1799     |
----------------------------------
2025-02-01 19:29:43.180417 Eastern Standard Time
| Itration            | 1800     |
| Real Det Return     | 693      |
| Real Sto Return     | 660      |
| Reward Loss         | -76.6    |
| Running Env Steps   | 900000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 6.91     |
| Running Update Time | 1800     |
----------------------------------
2025-02-01 19:29:58.896860 Eastern Standard Time
| Itration            | 1801     |
| Real Det Return     | 699      |
| Real Sto Return     | 686      |
| Reward Loss         | -65.6    |
| Running Env Steps   | 900500   |
| Running Forward KL  | -3.94    |
| Running Reverse KL  | 6.88     |
| Running Update Time | 1801     |
----------------------------------
2025-02-01 19:30:14.607995 Eastern Standard Time
| Itration            | 1802     |
| Real Det Return     | 696      |
| Real Sto Return     | 681      |
| Reward Loss         | -67.7    |
| Running Env Steps   | 901000   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 6.42     |
| Running Update Time | 1802     |
----------------------------------
2025-02-01 19:30:30.285084 Eastern Standard Time
| Itration            | 1803     |
| Real Det Return     | 691      |
| Real Sto Return     | 670      |
| Reward Loss         | -72.9    |
| Running Env Steps   | 901500   |
| Running Forward KL  | -5.82    |
| Running Reverse KL  | 6.83     |
| Running Update Time | 1803     |
----------------------------------
2025-02-01 19:30:45.964399 Eastern Standard Time
| Itration            | 1804     |
| Real Det Return     | 697      |
| Real Sto Return     | 669      |
| Reward Loss         | -73.2    |
| Running Env Steps   | 902000   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 8.61     |
| Running Update Time | 1804     |
----------------------------------
2025-02-01 19:31:01.713289 Eastern Standard Time
| Itration            | 1805     |
| Real Det Return     | 709      |
| Real Sto Return     | 684      |
| Reward Loss         | -70.5    |
| Running Env Steps   | 902500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 8.17     |
| Running Update Time | 1805     |
----------------------------------
2025-02-01 19:31:17.509152 Eastern Standard Time
| Itration            | 1806     |
| Real Det Return     | 702      |
| Real Sto Return     | 681      |
| Reward Loss         | -60.7    |
| Running Env Steps   | 903000   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 6.84     |
| Running Update Time | 1806     |
----------------------------------
2025-02-01 19:31:33.180577 Eastern Standard Time
| Itration            | 1807     |
| Real Det Return     | 698      |
| Real Sto Return     | 682      |
| Reward Loss         | -46.6    |
| Running Env Steps   | 903500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 7.13     |
| Running Update Time | 1807     |
----------------------------------
2025-02-01 19:31:48.897050 Eastern Standard Time
| Itration            | 1808     |
| Real Det Return     | 699      |
| Real Sto Return     | 679      |
| Reward Loss         | -64.3    |
| Running Env Steps   | 904000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 7.88     |
| Running Update Time | 1808     |
----------------------------------
2025-02-01 19:32:04.663647 Eastern Standard Time
| Itration            | 1809     |
| Real Det Return     | 708      |
| Real Sto Return     | 684      |
| Reward Loss         | -48.7    |
| Running Env Steps   | 904500   |
| Running Forward KL  | -5.32    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 1809     |
----------------------------------
2025-02-01 19:32:20.464348 Eastern Standard Time
| Itration            | 1810     |
| Real Det Return     | 686      |
| Real Sto Return     | 670      |
| Reward Loss         | -62.9    |
| Running Env Steps   | 905000   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 6.84     |
| Running Update Time | 1810     |
----------------------------------
2025-02-01 19:32:36.213908 Eastern Standard Time
| Itration            | 1811     |
| Real Det Return     | 675      |
| Real Sto Return     | 620      |
| Reward Loss         | -108     |
| Running Env Steps   | 905500   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 8.38     |
| Running Update Time | 1811     |
----------------------------------
2025-02-01 19:32:51.942663 Eastern Standard Time
| Itration            | 1812     |
| Real Det Return     | 704      |
| Real Sto Return     | 678      |
| Reward Loss         | -81      |
| Running Env Steps   | 906000   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 6.89     |
| Running Update Time | 1812     |
----------------------------------
2025-02-01 19:33:07.663531 Eastern Standard Time
| Itration            | 1813     |
| Real Det Return     | 674      |
| Real Sto Return     | 648      |
| Reward Loss         | -142     |
| Running Env Steps   | 906500   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 6.68     |
| Running Update Time | 1813     |
----------------------------------
2025-02-01 19:33:23.358673 Eastern Standard Time
| Itration            | 1814     |
| Real Det Return     | 678      |
| Real Sto Return     | 668      |
| Reward Loss         | -80      |
| Running Env Steps   | 907000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 7.56     |
| Running Update Time | 1814     |
----------------------------------
2025-02-01 19:33:39.067099 Eastern Standard Time
| Itration            | 1815     |
| Real Det Return     | 711      |
| Real Sto Return     | 675      |
| Reward Loss         | -61.8    |
| Running Env Steps   | 907500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 7.55     |
| Running Update Time | 1815     |
----------------------------------
2025-02-01 19:33:54.819833 Eastern Standard Time
| Itration            | 1816     |
| Real Det Return     | 703      |
| Real Sto Return     | 682      |
| Reward Loss         | -59      |
| Running Env Steps   | 908000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 7.97     |
| Running Update Time | 1816     |
----------------------------------
2025-02-01 19:34:10.612725 Eastern Standard Time
| Itration            | 1817     |
| Real Det Return     | 683      |
| Real Sto Return     | 669      |
| Reward Loss         | -70.6    |
| Running Env Steps   | 908500   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 7.74     |
| Running Update Time | 1817     |
----------------------------------
2025-02-01 19:34:26.437900 Eastern Standard Time
| Itration            | 1818     |
| Real Det Return     | 693      |
| Real Sto Return     | 654      |
| Reward Loss         | -59.7    |
| Running Env Steps   | 909000   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 1818     |
----------------------------------
2025-02-01 19:34:42.182683 Eastern Standard Time
| Itration            | 1819     |
| Real Det Return     | 699      |
| Real Sto Return     | 665      |
| Reward Loss         | -65.4    |
| Running Env Steps   | 909500   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 7.64     |
| Running Update Time | 1819     |
----------------------------------
2025-02-01 19:34:57.923136 Eastern Standard Time
| Itration            | 1820     |
| Real Det Return     | 700      |
| Real Sto Return     | 654      |
| Reward Loss         | -108     |
| Running Env Steps   | 910000   |
| Running Forward KL  | -6.2     |
| Running Reverse KL  | 6.89     |
| Running Update Time | 1820     |
----------------------------------
2025-02-01 19:35:13.712945 Eastern Standard Time
| Itration            | 1821     |
| Real Det Return     | 680      |
| Real Sto Return     | 652      |
| Reward Loss         | -80.3    |
| Running Env Steps   | 910500   |
| Running Forward KL  | -5.71    |
| Running Reverse KL  | 6.58     |
| Running Update Time | 1821     |
----------------------------------
2025-02-01 19:35:29.371632 Eastern Standard Time
| Itration            | 1822     |
| Real Det Return     | 712      |
| Real Sto Return     | 684      |
| Reward Loss         | -62.1    |
| Running Env Steps   | 911000   |
| Running Forward KL  | -5.43    |
| Running Reverse KL  | 6.7      |
| Running Update Time | 1822     |
----------------------------------
2025-02-01 19:35:45.163164 Eastern Standard Time
| Itration            | 1823     |
| Real Det Return     | 698      |
| Real Sto Return     | 670      |
| Reward Loss         | -88.2    |
| Running Env Steps   | 911500   |
| Running Forward KL  | -5.39    |
| Running Reverse KL  | 6.75     |
| Running Update Time | 1823     |
----------------------------------
2025-02-01 19:36:00.921151 Eastern Standard Time
| Itration            | 1824     |
| Real Det Return     | 689      |
| Real Sto Return     | 675      |
| Reward Loss         | -89.2    |
| Running Env Steps   | 912000   |
| Running Forward KL  | -5.58    |
| Running Reverse KL  | 5.67     |
| Running Update Time | 1824     |
----------------------------------
2025-02-01 19:36:16.677101 Eastern Standard Time
| Itration            | 1825     |
| Real Det Return     | 692      |
| Real Sto Return     | 679      |
| Reward Loss         | -56.2    |
| Running Env Steps   | 912500   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 7.88     |
| Running Update Time | 1825     |
----------------------------------
2025-02-01 19:36:32.394287 Eastern Standard Time
| Itration            | 1826     |
| Real Det Return     | 674      |
| Real Sto Return     | 664      |
| Reward Loss         | -75.8    |
| Running Env Steps   | 913000   |
| Running Forward KL  | -5.55    |
| Running Reverse KL  | 6.93     |
| Running Update Time | 1826     |
----------------------------------
2025-02-01 19:36:48.062010 Eastern Standard Time
| Itration            | 1827     |
| Real Det Return     | 669      |
| Real Sto Return     | 646      |
| Reward Loss         | -125     |
| Running Env Steps   | 913500   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 7.03     |
| Running Update Time | 1827     |
----------------------------------
2025-02-01 19:37:03.746082 Eastern Standard Time
| Itration            | 1828     |
| Real Det Return     | 696      |
| Real Sto Return     | 675      |
| Reward Loss         | -82.5    |
| Running Env Steps   | 914000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 7.28     |
| Running Update Time | 1828     |
----------------------------------
2025-02-01 19:37:19.433677 Eastern Standard Time
| Itration            | 1829     |
| Real Det Return     | 671      |
| Real Sto Return     | 657      |
| Reward Loss         | -109     |
| Running Env Steps   | 914500   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 7.39     |
| Running Update Time | 1829     |
----------------------------------
2025-02-01 19:37:35.136198 Eastern Standard Time
| Itration            | 1830     |
| Real Det Return     | 694      |
| Real Sto Return     | 668      |
| Reward Loss         | -60.9    |
| Running Env Steps   | 915000   |
| Running Forward KL  | -5.52    |
| Running Reverse KL  | 7        |
| Running Update Time | 1830     |
----------------------------------
2025-02-01 19:37:50.840422 Eastern Standard Time
| Itration            | 1831     |
| Real Det Return     | 666      |
| Real Sto Return     | 641      |
| Reward Loss         | -72.7    |
| Running Env Steps   | 915500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 7.04     |
| Running Update Time | 1831     |
----------------------------------
2025-02-01 19:38:06.507771 Eastern Standard Time
| Itration            | 1832     |
| Real Det Return     | 668      |
| Real Sto Return     | 620      |
| Reward Loss         | -110     |
| Running Env Steps   | 916000   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 8.28     |
| Running Update Time | 1832     |
----------------------------------
2025-02-01 19:38:22.254836 Eastern Standard Time
| Itration            | 1833     |
| Real Det Return     | 698      |
| Real Sto Return     | 683      |
| Reward Loss         | -54.5    |
| Running Env Steps   | 916500   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 7.8      |
| Running Update Time | 1833     |
----------------------------------
2025-02-01 19:38:38.086856 Eastern Standard Time
| Itration            | 1834     |
| Real Det Return     | 689      |
| Real Sto Return     | 659      |
| Reward Loss         | -59.3    |
| Running Env Steps   | 917000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 6.7      |
| Running Update Time | 1834     |
----------------------------------
2025-02-01 19:38:53.773925 Eastern Standard Time
| Itration            | 1835     |
| Real Det Return     | 680      |
| Real Sto Return     | 654      |
| Reward Loss         | -109     |
| Running Env Steps   | 917500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 6.75     |
| Running Update Time | 1835     |
----------------------------------
2025-02-01 19:39:09.539651 Eastern Standard Time
| Itration            | 1836     |
| Real Det Return     | 710      |
| Real Sto Return     | 700      |
| Reward Loss         | -72      |
| Running Env Steps   | 918000   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 7.79     |
| Running Update Time | 1836     |
----------------------------------
2025-02-01 19:39:25.217419 Eastern Standard Time
| Itration            | 1837     |
| Real Det Return     | 688      |
| Real Sto Return     | 667      |
| Reward Loss         | -68.3    |
| Running Env Steps   | 918500   |
| Running Forward KL  | -4.65    |
| Running Reverse KL  | 7.77     |
| Running Update Time | 1837     |
----------------------------------
2025-02-01 19:39:40.933357 Eastern Standard Time
| Itration            | 1838     |
| Real Det Return     | 702      |
| Real Sto Return     | 656      |
| Reward Loss         | -88.2    |
| Running Env Steps   | 919000   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 7.49     |
| Running Update Time | 1838     |
----------------------------------
2025-02-01 19:39:56.697115 Eastern Standard Time
| Itration            | 1839     |
| Real Det Return     | 722      |
| Real Sto Return     | 689      |
| Reward Loss         | -73.7    |
| Running Env Steps   | 919500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 7.29     |
| Running Update Time | 1839     |
----------------------------------
2025-02-01 19:40:12.371850 Eastern Standard Time
| Itration            | 1840     |
| Real Det Return     | 695      |
| Real Sto Return     | 676      |
| Reward Loss         | -80.4    |
| Running Env Steps   | 920000   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 6.92     |
| Running Update Time | 1840     |
----------------------------------
2025-02-01 19:40:28.027925 Eastern Standard Time
| Itration            | 1841     |
| Real Det Return     | 712      |
| Real Sto Return     | 677      |
| Reward Loss         | -50.1    |
| Running Env Steps   | 920500   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 7.93     |
| Running Update Time | 1841     |
----------------------------------
2025-02-01 19:40:43.696069 Eastern Standard Time
| Itration            | 1842     |
| Real Det Return     | 699      |
| Real Sto Return     | 663      |
| Reward Loss         | -93.5    |
| Running Env Steps   | 921000   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 7.94     |
| Running Update Time | 1842     |
----------------------------------
2025-02-01 19:41:00.579243 Eastern Standard Time
| Itration            | 1843     |
| Real Det Return     | 692      |
| Real Sto Return     | 667      |
| Reward Loss         | -66.7    |
| Running Env Steps   | 921500   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1843     |
----------------------------------
2025-02-01 19:41:16.287553 Eastern Standard Time
| Itration            | 1844     |
| Real Det Return     | 673      |
| Real Sto Return     | 652      |
| Reward Loss         | -83.6    |
| Running Env Steps   | 922000   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 7.22     |
| Running Update Time | 1844     |
----------------------------------
2025-02-01 19:41:32.009843 Eastern Standard Time
| Itration            | 1845     |
| Real Det Return     | 723      |
| Real Sto Return     | 683      |
| Reward Loss         | -117     |
| Running Env Steps   | 922500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 6.39     |
| Running Update Time | 1845     |
----------------------------------
2025-02-01 19:41:47.729743 Eastern Standard Time
| Itration            | 1846     |
| Real Det Return     | 698      |
| Real Sto Return     | 683      |
| Reward Loss         | -61.2    |
| Running Env Steps   | 923000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 8.32     |
| Running Update Time | 1846     |
----------------------------------
2025-02-01 19:42:03.442409 Eastern Standard Time
| Itration            | 1847     |
| Real Det Return     | 689      |
| Real Sto Return     | 671      |
| Reward Loss         | -64      |
| Running Env Steps   | 923500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 7.18     |
| Running Update Time | 1847     |
----------------------------------
2025-02-01 19:42:19.104831 Eastern Standard Time
| Itration            | 1848     |
| Real Det Return     | 695      |
| Real Sto Return     | 667      |
| Reward Loss         | -61.1    |
| Running Env Steps   | 924000   |
| Running Forward KL  | -5.44    |
| Running Reverse KL  | 6.88     |
| Running Update Time | 1848     |
----------------------------------
2025-02-01 19:42:34.804902 Eastern Standard Time
| Itration            | 1849     |
| Real Det Return     | 706      |
| Real Sto Return     | 678      |
| Reward Loss         | -75.8    |
| Running Env Steps   | 924500   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 8.51     |
| Running Update Time | 1849     |
----------------------------------
2025-02-01 19:42:50.481645 Eastern Standard Time
| Itration            | 1850     |
| Real Det Return     | 708      |
| Real Sto Return     | 665      |
| Reward Loss         | -54      |
| Running Env Steps   | 925000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 8.09     |
| Running Update Time | 1850     |
----------------------------------
2025-02-01 19:43:06.224002 Eastern Standard Time
| Itration            | 1851     |
| Real Det Return     | 637      |
| Real Sto Return     | 617      |
| Reward Loss         | -153     |
| Running Env Steps   | 925500   |
| Running Forward KL  | -4.06    |
| Running Reverse KL  | 6.88     |
| Running Update Time | 1851     |
----------------------------------
2025-02-01 19:43:21.954112 Eastern Standard Time
| Itration            | 1852     |
| Real Det Return     | 705      |
| Real Sto Return     | 684      |
| Reward Loss         | -70.2    |
| Running Env Steps   | 926000   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 7.16     |
| Running Update Time | 1852     |
----------------------------------
2025-02-01 19:43:37.620893 Eastern Standard Time
| Itration            | 1853     |
| Real Det Return     | 722      |
| Real Sto Return     | 698      |
| Reward Loss         | -54.3    |
| Running Env Steps   | 926500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 6.78     |
| Running Update Time | 1853     |
----------------------------------
2025-02-01 19:43:53.316515 Eastern Standard Time
| Itration            | 1854     |
| Real Det Return     | 696      |
| Real Sto Return     | 673      |
| Reward Loss         | -58.5    |
| Running Env Steps   | 927000   |
| Running Forward KL  | -5.63    |
| Running Reverse KL  | 7.12     |
| Running Update Time | 1854     |
----------------------------------
2025-02-01 19:44:09.011805 Eastern Standard Time
| Itration            | 1855     |
| Real Det Return     | 717      |
| Real Sto Return     | 698      |
| Reward Loss         | -105     |
| Running Env Steps   | 927500   |
| Running Forward KL  | -3.79    |
| Running Reverse KL  | 7.29     |
| Running Update Time | 1855     |
----------------------------------
2025-02-01 19:44:24.739409 Eastern Standard Time
| Itration            | 1856     |
| Real Det Return     | 668      |
| Real Sto Return     | 656      |
| Reward Loss         | -113     |
| Running Env Steps   | 928000   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 7.77     |
| Running Update Time | 1856     |
----------------------------------
2025-02-01 19:44:40.477296 Eastern Standard Time
| Itration            | 1857     |
| Real Det Return     | 694      |
| Real Sto Return     | 680      |
| Reward Loss         | -106     |
| Running Env Steps   | 928500   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 6.59     |
| Running Update Time | 1857     |
----------------------------------
2025-02-01 19:44:56.222814 Eastern Standard Time
| Itration            | 1858     |
| Real Det Return     | 715      |
| Real Sto Return     | 685      |
| Reward Loss         | -43.1    |
| Running Env Steps   | 929000   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 8.39     |
| Running Update Time | 1858     |
----------------------------------
2025-02-01 19:45:11.976471 Eastern Standard Time
| Itration            | 1859     |
| Real Det Return     | 708      |
| Real Sto Return     | 690      |
| Reward Loss         | -91.6    |
| Running Env Steps   | 929500   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 6.77     |
| Running Update Time | 1859     |
----------------------------------
2025-02-01 19:45:27.667687 Eastern Standard Time
| Itration            | 1860     |
| Real Det Return     | 684      |
| Real Sto Return     | 674      |
| Reward Loss         | -43.4    |
| Running Env Steps   | 930000   |
| Running Forward KL  | -5.61    |
| Running Reverse KL  | 7.95     |
| Running Update Time | 1860     |
----------------------------------
2025-02-01 19:45:43.556516 Eastern Standard Time
| Itration            | 1861     |
| Real Det Return     | 715      |
| Real Sto Return     | 691      |
| Reward Loss         | -78.1    |
| Running Env Steps   | 930500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 7.31     |
| Running Update Time | 1861     |
----------------------------------
2025-02-01 19:45:59.319191 Eastern Standard Time
| Itration            | 1862     |
| Real Det Return     | 711      |
| Real Sto Return     | 680      |
| Reward Loss         | -102     |
| Running Env Steps   | 931000   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 7.84     |
| Running Update Time | 1862     |
----------------------------------
2025-02-01 19:46:15.043055 Eastern Standard Time
| Itration            | 1863     |
| Real Det Return     | 687      |
| Real Sto Return     | 676      |
| Reward Loss         | -98.1    |
| Running Env Steps   | 931500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 6.94     |
| Running Update Time | 1863     |
----------------------------------
2025-02-01 19:46:30.707264 Eastern Standard Time
| Itration            | 1864     |
| Real Det Return     | 723      |
| Real Sto Return     | 698      |
| Reward Loss         | -60.6    |
| Running Env Steps   | 932000   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 7.23     |
| Running Update Time | 1864     |
----------------------------------
2025-02-01 19:46:46.604750 Eastern Standard Time
| Itration            | 1865     |
| Real Det Return     | 704      |
| Real Sto Return     | 689      |
| Reward Loss         | -41.3    |
| Running Env Steps   | 932500   |
| Running Forward KL  | -5.84    |
| Running Reverse KL  | 7.48     |
| Running Update Time | 1865     |
----------------------------------
2025-02-01 19:47:02.321753 Eastern Standard Time
| Itration            | 1866     |
| Real Det Return     | 696      |
| Real Sto Return     | 657      |
| Reward Loss         | -49      |
| Running Env Steps   | 933000   |
| Running Forward KL  | -6.04    |
| Running Reverse KL  | 6.85     |
| Running Update Time | 1866     |
----------------------------------
2025-02-01 19:47:18.132039 Eastern Standard Time
| Itration            | 1867     |
| Real Det Return     | 701      |
| Real Sto Return     | 688      |
| Reward Loss         | -97.5    |
| Running Env Steps   | 933500   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 7.7      |
| Running Update Time | 1867     |
----------------------------------
2025-02-01 19:47:33.873863 Eastern Standard Time
| Itration            | 1868     |
| Real Det Return     | 702      |
| Real Sto Return     | 678      |
| Reward Loss         | -71.1    |
| Running Env Steps   | 934000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 7.91     |
| Running Update Time | 1868     |
----------------------------------
2025-02-01 19:47:49.593014 Eastern Standard Time
| Itration            | 1869     |
| Real Det Return     | 718      |
| Real Sto Return     | 682      |
| Reward Loss         | -48.3    |
| Running Env Steps   | 934500   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 7.86     |
| Running Update Time | 1869     |
----------------------------------
2025-02-01 19:48:05.384590 Eastern Standard Time
| Itration            | 1870     |
| Real Det Return     | 714      |
| Real Sto Return     | 677      |
| Reward Loss         | -83.7    |
| Running Env Steps   | 935000   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 7.08     |
| Running Update Time | 1870     |
----------------------------------
2025-02-01 19:48:21.120662 Eastern Standard Time
| Itration            | 1871     |
| Real Det Return     | 680      |
| Real Sto Return     | 660      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 935500   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 6.92     |
| Running Update Time | 1871     |
----------------------------------
2025-02-01 19:48:36.867443 Eastern Standard Time
| Itration            | 1872     |
| Real Det Return     | 680      |
| Real Sto Return     | 662      |
| Reward Loss         | -69.6    |
| Running Env Steps   | 936000   |
| Running Forward KL  | -5.83    |
| Running Reverse KL  | 7.19     |
| Running Update Time | 1872     |
----------------------------------
2025-02-01 19:48:52.622227 Eastern Standard Time
| Itration            | 1873     |
| Real Det Return     | 708      |
| Real Sto Return     | 676      |
| Reward Loss         | -70.5    |
| Running Env Steps   | 936500   |
| Running Forward KL  | -5.52    |
| Running Reverse KL  | 6.86     |
| Running Update Time | 1873     |
----------------------------------
2025-02-01 19:49:08.308194 Eastern Standard Time
| Itration            | 1874     |
| Real Det Return     | 717      |
| Real Sto Return     | 704      |
| Reward Loss         | -67.2    |
| Running Env Steps   | 937000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 7.48     |
| Running Update Time | 1874     |
----------------------------------
2025-02-01 19:49:24.049428 Eastern Standard Time
| Itration            | 1875     |
| Real Det Return     | 705      |
| Real Sto Return     | 675      |
| Reward Loss         | -66.9    |
| Running Env Steps   | 937500   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 7.61     |
| Running Update Time | 1875     |
----------------------------------
2025-02-01 19:49:39.782196 Eastern Standard Time
| Itration            | 1876     |
| Real Det Return     | 702      |
| Real Sto Return     | 688      |
| Reward Loss         | -66      |
| Running Env Steps   | 938000   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 6.47     |
| Running Update Time | 1876     |
----------------------------------
2025-02-01 19:49:55.416972 Eastern Standard Time
| Itration            | 1877     |
| Real Det Return     | 697      |
| Real Sto Return     | 688      |
| Reward Loss         | -65.7    |
| Running Env Steps   | 938500   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 7.81     |
| Running Update Time | 1877     |
----------------------------------
2025-02-01 19:50:11.192321 Eastern Standard Time
| Itration            | 1878     |
| Real Det Return     | 710      |
| Real Sto Return     | 663      |
| Reward Loss         | -63.2    |
| Running Env Steps   | 939000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 7.32     |
| Running Update Time | 1878     |
----------------------------------
2025-02-01 19:50:26.892302 Eastern Standard Time
| Itration            | 1879     |
| Real Det Return     | 648      |
| Real Sto Return     | 625      |
| Reward Loss         | -146     |
| Running Env Steps   | 939500   |
| Running Forward KL  | -3.64    |
| Running Reverse KL  | 6.88     |
| Running Update Time | 1879     |
----------------------------------
2025-02-01 19:50:42.653667 Eastern Standard Time
| Itration            | 1880     |
| Real Det Return     | 705      |
| Real Sto Return     | 674      |
| Reward Loss         | -78.3    |
| Running Env Steps   | 940000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 7.59     |
| Running Update Time | 1880     |
----------------------------------
2025-02-01 19:50:58.371365 Eastern Standard Time
| Itration            | 1881     |
| Real Det Return     | 678      |
| Real Sto Return     | 660      |
| Reward Loss         | -119     |
| Running Env Steps   | 940500   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 6.81     |
| Running Update Time | 1881     |
----------------------------------
2025-02-01 19:51:14.057720 Eastern Standard Time
| Itration            | 1882     |
| Real Det Return     | 708      |
| Real Sto Return     | 673      |
| Reward Loss         | -66.2    |
| Running Env Steps   | 941000   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 6.89     |
| Running Update Time | 1882     |
----------------------------------
2025-02-01 19:51:29.878315 Eastern Standard Time
| Itration            | 1883     |
| Real Det Return     | 692      |
| Real Sto Return     | 677      |
| Reward Loss         | -59.8    |
| Running Env Steps   | 941500   |
| Running Forward KL  | -6.13    |
| Running Reverse KL  | 6.36     |
| Running Update Time | 1883     |
----------------------------------
2025-02-01 19:51:45.624385 Eastern Standard Time
| Itration            | 1884     |
| Real Det Return     | 620      |
| Real Sto Return     | 614      |
| Reward Loss         | -134     |
| Running Env Steps   | 942000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 6.38     |
| Running Update Time | 1884     |
----------------------------------
2025-02-01 19:52:01.369803 Eastern Standard Time
| Itration            | 1885     |
| Real Det Return     | 690      |
| Real Sto Return     | 657      |
| Reward Loss         | -103     |
| Running Env Steps   | 942500   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 7.17     |
| Running Update Time | 1885     |
----------------------------------
2025-02-01 19:52:17.120164 Eastern Standard Time
| Itration            | 1886     |
| Real Det Return     | 707      |
| Real Sto Return     | 673      |
| Reward Loss         | -94.7    |
| Running Env Steps   | 943000   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 6.99     |
| Running Update Time | 1886     |
----------------------------------
2025-02-01 19:52:32.826314 Eastern Standard Time
| Itration            | 1887     |
| Real Det Return     | 670      |
| Real Sto Return     | 650      |
| Reward Loss         | -92.8    |
| Running Env Steps   | 943500   |
| Running Forward KL  | -3.87    |
| Running Reverse KL  | 7.09     |
| Running Update Time | 1887     |
----------------------------------
2025-02-01 19:52:48.576126 Eastern Standard Time
| Itration            | 1888     |
| Real Det Return     | 712      |
| Real Sto Return     | 686      |
| Reward Loss         | -70.1    |
| Running Env Steps   | 944000   |
| Running Forward KL  | -5.68    |
| Running Reverse KL  | 6.64     |
| Running Update Time | 1888     |
----------------------------------
2025-02-01 19:53:04.347205 Eastern Standard Time
| Itration            | 1889     |
| Real Det Return     | 709      |
| Real Sto Return     | 679      |
| Reward Loss         | -95.5    |
| Running Env Steps   | 944500   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 7.05     |
| Running Update Time | 1889     |
----------------------------------
2025-02-01 19:53:19.994988 Eastern Standard Time
| Itration            | 1890     |
| Real Det Return     | 704      |
| Real Sto Return     | 681      |
| Reward Loss         | -67.7    |
| Running Env Steps   | 945000   |
| Running Forward KL  | -4.65    |
| Running Reverse KL  | 6.23     |
| Running Update Time | 1890     |
----------------------------------
2025-02-01 19:53:35.718802 Eastern Standard Time
| Itration            | 1891     |
| Real Det Return     | 695      |
| Real Sto Return     | 673      |
| Reward Loss         | -55.3    |
| Running Env Steps   | 945500   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 6.42     |
| Running Update Time | 1891     |
----------------------------------
2025-02-01 19:53:51.447713 Eastern Standard Time
| Itration            | 1892     |
| Real Det Return     | 683      |
| Real Sto Return     | 657      |
| Reward Loss         | -106     |
| Running Env Steps   | 946000   |
| Running Forward KL  | -4.04    |
| Running Reverse KL  | 6.67     |
| Running Update Time | 1892     |
----------------------------------
2025-02-01 19:54:07.228676 Eastern Standard Time
| Itration            | 1893     |
| Real Det Return     | 707      |
| Real Sto Return     | 682      |
| Reward Loss         | -85.6    |
| Running Env Steps   | 946500   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 6.46     |
| Running Update Time | 1893     |
----------------------------------
2025-02-01 19:54:22.897475 Eastern Standard Time
| Itration            | 1894     |
| Real Det Return     | 683      |
| Real Sto Return     | 661      |
| Reward Loss         | -73.8    |
| Running Env Steps   | 947000   |
| Running Forward KL  | -5.88    |
| Running Reverse KL  | 6.07     |
| Running Update Time | 1894     |
----------------------------------
2025-02-01 19:54:38.568539 Eastern Standard Time
| Itration            | 1895     |
| Real Det Return     | 706      |
| Real Sto Return     | 665      |
| Reward Loss         | -61.2    |
| Running Env Steps   | 947500   |
| Running Forward KL  | -6       |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1895     |
----------------------------------
2025-02-01 19:54:54.239473 Eastern Standard Time
| Itration            | 1896     |
| Real Det Return     | 720      |
| Real Sto Return     | 692      |
| Reward Loss         | -34.6    |
| Running Env Steps   | 948000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 7.67     |
| Running Update Time | 1896     |
----------------------------------
2025-02-01 19:55:09.981584 Eastern Standard Time
| Itration            | 1897     |
| Real Det Return     | 674      |
| Real Sto Return     | 652      |
| Reward Loss         | -74.3    |
| Running Env Steps   | 948500   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 5.8      |
| Running Update Time | 1897     |
----------------------------------
2025-02-01 19:55:25.687025 Eastern Standard Time
| Itration            | 1898     |
| Real Det Return     | 703      |
| Real Sto Return     | 680      |
| Reward Loss         | -59.5    |
| Running Env Steps   | 949000   |
| Running Forward KL  | -5.67    |
| Running Reverse KL  | 6.75     |
| Running Update Time | 1898     |
----------------------------------
2025-02-01 19:55:41.399817 Eastern Standard Time
| Itration            | 1899     |
| Real Det Return     | 661      |
| Real Sto Return     | 614      |
| Reward Loss         | -122     |
| Running Env Steps   | 949500   |
| Running Forward KL  | -3.76    |
| Running Reverse KL  | 8.7      |
| Running Update Time | 1899     |
----------------------------------
2025-02-01 19:55:57.082545 Eastern Standard Time
| Itration            | 1900     |
| Real Det Return     | 694      |
| Real Sto Return     | 655      |
| Reward Loss         | -107     |
| Running Env Steps   | 950000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 1900     |
----------------------------------
2025-02-01 19:56:12.774225 Eastern Standard Time
| Itration            | 1901     |
| Real Det Return     | 687      |
| Real Sto Return     | 678      |
| Reward Loss         | -32.9    |
| Running Env Steps   | 950500   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 7.46     |
| Running Update Time | 1901     |
----------------------------------
2025-02-01 19:56:28.449191 Eastern Standard Time
| Itration            | 1902     |
| Real Det Return     | 697      |
| Real Sto Return     | 693      |
| Reward Loss         | -30.9    |
| Running Env Steps   | 951000   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 8.15     |
| Running Update Time | 1902     |
----------------------------------
2025-02-01 19:56:44.153451 Eastern Standard Time
| Itration            | 1903     |
| Real Det Return     | 707      |
| Real Sto Return     | 686      |
| Reward Loss         | -83.8    |
| Running Env Steps   | 951500   |
| Running Forward KL  | -5.45    |
| Running Reverse KL  | 6.59     |
| Running Update Time | 1903     |
----------------------------------
2025-02-01 19:56:59.855462 Eastern Standard Time
| Itration            | 1904     |
| Real Det Return     | 704      |
| Real Sto Return     | 675      |
| Reward Loss         | -73.7    |
| Running Env Steps   | 952000   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 7.35     |
| Running Update Time | 1904     |
----------------------------------
2025-02-01 19:57:15.632207 Eastern Standard Time
| Itration            | 1905     |
| Real Det Return     | 657      |
| Real Sto Return     | 607      |
| Reward Loss         | -117     |
| Running Env Steps   | 952500   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1905     |
----------------------------------
2025-02-01 19:57:31.379796 Eastern Standard Time
| Itration            | 1906     |
| Real Det Return     | 700      |
| Real Sto Return     | 663      |
| Reward Loss         | -73.1    |
| Running Env Steps   | 953000   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 7.72     |
| Running Update Time | 1906     |
----------------------------------
2025-02-01 19:57:47.128242 Eastern Standard Time
| Itration            | 1907     |
| Real Det Return     | 697      |
| Real Sto Return     | 652      |
| Reward Loss         | -125     |
| Running Env Steps   | 953500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 6.92     |
| Running Update Time | 1907     |
----------------------------------
2025-02-01 19:58:02.970420 Eastern Standard Time
| Itration            | 1908     |
| Real Det Return     | 671      |
| Real Sto Return     | 675      |
| Reward Loss         | -83.8    |
| Running Env Steps   | 954000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 6.84     |
| Running Update Time | 1908     |
----------------------------------
2025-02-01 19:58:18.763681 Eastern Standard Time
| Itration            | 1909     |
| Real Det Return     | 689      |
| Real Sto Return     | 662      |
| Reward Loss         | -65.2    |
| Running Env Steps   | 954500   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 7.45     |
| Running Update Time | 1909     |
----------------------------------
2025-02-01 19:58:34.539327 Eastern Standard Time
| Itration            | 1910     |
| Real Det Return     | 714      |
| Real Sto Return     | 682      |
| Reward Loss         | -65.6    |
| Running Env Steps   | 955000   |
| Running Forward KL  | -5.66    |
| Running Reverse KL  | 7.21     |
| Running Update Time | 1910     |
----------------------------------
2025-02-01 19:58:50.300933 Eastern Standard Time
| Itration            | 1911     |
| Real Det Return     | 716      |
| Real Sto Return     | 687      |
| Reward Loss         | -87.2    |
| Running Env Steps   | 955500   |
| Running Forward KL  | -4.17    |
| Running Reverse KL  | 7.31     |
| Running Update Time | 1911     |
----------------------------------
2025-02-01 19:59:06.001500 Eastern Standard Time
| Itration            | 1912     |
| Real Det Return     | 694      |
| Real Sto Return     | 664      |
| Reward Loss         | -66.2    |
| Running Env Steps   | 956000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 6.54     |
| Running Update Time | 1912     |
----------------------------------
2025-02-01 19:59:21.895769 Eastern Standard Time
| Itration            | 1913     |
| Real Det Return     | 713      |
| Real Sto Return     | 695      |
| Reward Loss         | -74.1    |
| Running Env Steps   | 956500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 7.64     |
| Running Update Time | 1913     |
----------------------------------
2025-02-01 19:59:37.691121 Eastern Standard Time
| Itration            | 1914     |
| Real Det Return     | 705      |
| Real Sto Return     | 683      |
| Reward Loss         | -89.1    |
| Running Env Steps   | 957000   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 6.13     |
| Running Update Time | 1914     |
----------------------------------
2025-02-01 19:59:53.416182 Eastern Standard Time
| Itration            | 1915     |
| Real Det Return     | 702      |
| Real Sto Return     | 677      |
| Reward Loss         | -69.3    |
| Running Env Steps   | 957500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 6.99     |
| Running Update Time | 1915     |
----------------------------------
2025-02-01 20:00:09.121341 Eastern Standard Time
| Itration            | 1916     |
| Real Det Return     | 706      |
| Real Sto Return     | 679      |
| Reward Loss         | -76      |
| Running Env Steps   | 958000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 6.93     |
| Running Update Time | 1916     |
----------------------------------
2025-02-01 20:00:24.892832 Eastern Standard Time
| Itration            | 1917     |
| Real Det Return     | 707      |
| Real Sto Return     | 674      |
| Reward Loss         | -106     |
| Running Env Steps   | 958500   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 6.99     |
| Running Update Time | 1917     |
----------------------------------
2025-02-01 20:00:40.639609 Eastern Standard Time
| Itration            | 1918     |
| Real Det Return     | 711      |
| Real Sto Return     | 691      |
| Reward Loss         | -68.2    |
| Running Env Steps   | 959000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 7.8      |
| Running Update Time | 1918     |
----------------------------------
2025-02-01 20:00:56.394951 Eastern Standard Time
| Itration            | 1919     |
| Real Det Return     | 692      |
| Real Sto Return     | 679      |
| Reward Loss         | -92.3    |
| Running Env Steps   | 959500   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 6.97     |
| Running Update Time | 1919     |
----------------------------------
2025-02-01 20:01:12.167774 Eastern Standard Time
| Itration            | 1920     |
| Real Det Return     | 700      |
| Real Sto Return     | 680      |
| Reward Loss         | -89.2    |
| Running Env Steps   | 960000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 7.5      |
| Running Update Time | 1920     |
----------------------------------
2025-02-01 20:01:27.898554 Eastern Standard Time
| Itration            | 1921     |
| Real Det Return     | 700      |
| Real Sto Return     | 658      |
| Reward Loss         | -89.9    |
| Running Env Steps   | 960500   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 7.11     |
| Running Update Time | 1921     |
----------------------------------
2025-02-01 20:01:43.616077 Eastern Standard Time
| Itration            | 1922     |
| Real Det Return     | 716      |
| Real Sto Return     | 692      |
| Reward Loss         | -78.5    |
| Running Env Steps   | 961000   |
| Running Forward KL  | -5.43    |
| Running Reverse KL  | 7.58     |
| Running Update Time | 1922     |
----------------------------------
2025-02-01 20:01:59.287013 Eastern Standard Time
| Itration            | 1923     |
| Real Det Return     | 691      |
| Real Sto Return     | 672      |
| Reward Loss         | -79      |
| Running Env Steps   | 961500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 7.43     |
| Running Update Time | 1923     |
----------------------------------
2025-02-01 20:02:15.259516 Eastern Standard Time
| Itration            | 1924     |
| Real Det Return     | 683      |
| Real Sto Return     | 662      |
| Reward Loss         | -55      |
| Running Env Steps   | 962000   |
| Running Forward KL  | -5.66    |
| Running Reverse KL  | 7.5      |
| Running Update Time | 1924     |
----------------------------------
2025-02-01 20:02:31.063143 Eastern Standard Time
| Itration            | 1925     |
| Real Det Return     | 670      |
| Real Sto Return     | 650      |
| Reward Loss         | -51.8    |
| Running Env Steps   | 962500   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 6.39     |
| Running Update Time | 1925     |
----------------------------------
2025-02-01 20:02:46.977970 Eastern Standard Time
| Itration            | 1926     |
| Real Det Return     | 663      |
| Real Sto Return     | 643      |
| Reward Loss         | -66.7    |
| Running Env Steps   | 963000   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 7.07     |
| Running Update Time | 1926     |
----------------------------------
2025-02-01 20:03:02.697159 Eastern Standard Time
| Itration            | 1927     |
| Real Det Return     | 689      |
| Real Sto Return     | 661      |
| Reward Loss         | -86.2    |
| Running Env Steps   | 963500   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 7.16     |
| Running Update Time | 1927     |
----------------------------------
2025-02-01 20:03:18.428357 Eastern Standard Time
| Itration            | 1928     |
| Real Det Return     | 696      |
| Real Sto Return     | 664      |
| Reward Loss         | -77.7    |
| Running Env Steps   | 964000   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 6.62     |
| Running Update Time | 1928     |
----------------------------------
2025-02-01 20:03:34.233689 Eastern Standard Time
| Itration            | 1929     |
| Real Det Return     | 726      |
| Real Sto Return     | 696      |
| Reward Loss         | -76      |
| Running Env Steps   | 964500   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 7.65     |
| Running Update Time | 1929     |
----------------------------------
2025-02-01 20:03:49.952611 Eastern Standard Time
| Itration            | 1930     |
| Real Det Return     | 705      |
| Real Sto Return     | 672      |
| Reward Loss         | -79.3    |
| Running Env Steps   | 965000   |
| Running Forward KL  | -4.17    |
| Running Reverse KL  | 7.12     |
| Running Update Time | 1930     |
----------------------------------
2025-02-01 20:04:05.714907 Eastern Standard Time
| Itration            | 1931     |
| Real Det Return     | 691      |
| Real Sto Return     | 666      |
| Reward Loss         | -80.1    |
| Running Env Steps   | 965500   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 7.7      |
| Running Update Time | 1931     |
----------------------------------
2025-02-01 20:04:21.460784 Eastern Standard Time
| Itration            | 1932     |
| Real Det Return     | 712      |
| Real Sto Return     | 682      |
| Reward Loss         | -69.9    |
| Running Env Steps   | 966000   |
| Running Forward KL  | -5.39    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 1932     |
----------------------------------
2025-02-01 20:04:37.155257 Eastern Standard Time
| Itration            | 1933     |
| Real Det Return     | 686      |
| Real Sto Return     | 658      |
| Reward Loss         | -103     |
| Running Env Steps   | 966500   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 6.5      |
| Running Update Time | 1933     |
----------------------------------
2025-02-01 20:04:52.854043 Eastern Standard Time
| Itration            | 1934     |
| Real Det Return     | 684      |
| Real Sto Return     | 674      |
| Reward Loss         | -85      |
| Running Env Steps   | 967000   |
| Running Forward KL  | -5.89    |
| Running Reverse KL  | 7.94     |
| Running Update Time | 1934     |
----------------------------------
2025-02-01 20:05:08.541161 Eastern Standard Time
| Itration            | 1935     |
| Real Det Return     | 700      |
| Real Sto Return     | 680      |
| Reward Loss         | -56.9    |
| Running Env Steps   | 967500   |
| Running Forward KL  | -5.97    |
| Running Reverse KL  | 7.54     |
| Running Update Time | 1935     |
----------------------------------
2025-02-01 20:05:24.179391 Eastern Standard Time
| Itration            | 1936     |
| Real Det Return     | 713      |
| Real Sto Return     | 687      |
| Reward Loss         | -34.2    |
| Running Env Steps   | 968000   |
| Running Forward KL  | -5.77    |
| Running Reverse KL  | 7.45     |
| Running Update Time | 1936     |
----------------------------------
2025-02-01 20:05:39.912424 Eastern Standard Time
| Itration            | 1937     |
| Real Det Return     | 698      |
| Real Sto Return     | 663      |
| Reward Loss         | -114     |
| Running Env Steps   | 968500   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 7.76     |
| Running Update Time | 1937     |
----------------------------------
2025-02-01 20:05:55.684550 Eastern Standard Time
| Itration            | 1938     |
| Real Det Return     | 709      |
| Real Sto Return     | 687      |
| Reward Loss         | -108     |
| Running Env Steps   | 969000   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 6.57     |
| Running Update Time | 1938     |
----------------------------------
2025-02-01 20:06:11.523486 Eastern Standard Time
| Itration            | 1939     |
| Real Det Return     | 718      |
| Real Sto Return     | 682      |
| Reward Loss         | -49.7    |
| Running Env Steps   | 969500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 6.8      |
| Running Update Time | 1939     |
----------------------------------
2025-02-01 20:06:27.280082 Eastern Standard Time
| Itration            | 1940     |
| Real Det Return     | 673      |
| Real Sto Return     | 645      |
| Reward Loss         | -94.6    |
| Running Env Steps   | 970000   |
| Running Forward KL  | -5.41    |
| Running Reverse KL  | 5.88     |
| Running Update Time | 1940     |
----------------------------------
2025-02-01 20:06:43.152378 Eastern Standard Time
| Itration            | 1941     |
| Real Det Return     | 702      |
| Real Sto Return     | 688      |
| Reward Loss         | -82.6    |
| Running Env Steps   | 970500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 7.53     |
| Running Update Time | 1941     |
----------------------------------
2025-02-01 20:06:58.962055 Eastern Standard Time
| Itration            | 1942     |
| Real Det Return     | 711      |
| Real Sto Return     | 685      |
| Reward Loss         | -68.2    |
| Running Env Steps   | 971000   |
| Running Forward KL  | -5.8     |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1942     |
----------------------------------
2025-02-01 20:07:14.768454 Eastern Standard Time
| Itration            | 1943     |
| Real Det Return     | 695      |
| Real Sto Return     | 672      |
| Reward Loss         | -86.6    |
| Running Env Steps   | 971500   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 6.49     |
| Running Update Time | 1943     |
----------------------------------
2025-02-01 20:07:30.621130 Eastern Standard Time
| Itration            | 1944     |
| Real Det Return     | 704      |
| Real Sto Return     | 677      |
| Reward Loss         | -63.1    |
| Running Env Steps   | 972000   |
| Running Forward KL  | -5.59    |
| Running Reverse KL  | 6.66     |
| Running Update Time | 1944     |
----------------------------------
2025-02-01 20:07:46.462689 Eastern Standard Time
| Itration            | 1945     |
| Real Det Return     | 701      |
| Real Sto Return     | 671      |
| Reward Loss         | -62.2    |
| Running Env Steps   | 972500   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 7.59     |
| Running Update Time | 1945     |
----------------------------------
2025-02-01 20:08:02.251491 Eastern Standard Time
| Itration            | 1946     |
| Real Det Return     | 684      |
| Real Sto Return     | 667      |
| Reward Loss         | -87.7    |
| Running Env Steps   | 973000   |
| Running Forward KL  | -5.97    |
| Running Reverse KL  | 6.53     |
| Running Update Time | 1946     |
----------------------------------
2025-02-01 20:08:18.151637 Eastern Standard Time
| Itration            | 1947     |
| Real Det Return     | 706      |
| Real Sto Return     | 676      |
| Reward Loss         | -123     |
| Running Env Steps   | 973500   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 7.41     |
| Running Update Time | 1947     |
----------------------------------
2025-02-01 20:08:33.996514 Eastern Standard Time
| Itration            | 1948     |
| Real Det Return     | 676      |
| Real Sto Return     | 633      |
| Reward Loss         | -122     |
| Running Env Steps   | 974000   |
| Running Forward KL  | -3       |
| Running Reverse KL  | 7.15     |
| Running Update Time | 1948     |
----------------------------------
2025-02-01 20:08:49.825308 Eastern Standard Time
| Itration            | 1949     |
| Real Det Return     | 698      |
| Real Sto Return     | 679      |
| Reward Loss         | -44.4    |
| Running Env Steps   | 974500   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 8.03     |
| Running Update Time | 1949     |
----------------------------------
2025-02-01 20:09:05.767680 Eastern Standard Time
| Itration            | 1950     |
| Real Det Return     | 711      |
| Real Sto Return     | 676      |
| Reward Loss         | -91.6    |
| Running Env Steps   | 975000   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 7.58     |
| Running Update Time | 1950     |
----------------------------------
2025-02-01 20:09:21.671393 Eastern Standard Time
| Itration            | 1951     |
| Real Det Return     | 707      |
| Real Sto Return     | 687      |
| Reward Loss         | -51.5    |
| Running Env Steps   | 975500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 7.75     |
| Running Update Time | 1951     |
----------------------------------
2025-02-01 20:09:37.465020 Eastern Standard Time
| Itration            | 1952     |
| Real Det Return     | 689      |
| Real Sto Return     | 661      |
| Reward Loss         | -52.5    |
| Running Env Steps   | 976000   |
| Running Forward KL  | -5.5     |
| Running Reverse KL  | 6.87     |
| Running Update Time | 1952     |
----------------------------------
2025-02-01 20:09:53.270735 Eastern Standard Time
| Itration            | 1953     |
| Real Det Return     | 680      |
| Real Sto Return     | 677      |
| Reward Loss         | -86.4    |
| Running Env Steps   | 976500   |
| Running Forward KL  | -5.73    |
| Running Reverse KL  | 5.94     |
| Running Update Time | 1953     |
----------------------------------
2025-02-01 20:10:09.082491 Eastern Standard Time
| Itration            | 1954     |
| Real Det Return     | 707      |
| Real Sto Return     | 686      |
| Reward Loss         | -61.7    |
| Running Env Steps   | 977000   |
| Running Forward KL  | -5.45    |
| Running Reverse KL  | 7.31     |
| Running Update Time | 1954     |
----------------------------------
2025-02-01 20:10:24.974004 Eastern Standard Time
| Itration            | 1955     |
| Real Det Return     | 697      |
| Real Sto Return     | 670      |
| Reward Loss         | -53.4    |
| Running Env Steps   | 977500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 6.83     |
| Running Update Time | 1955     |
----------------------------------
2025-02-01 20:10:40.743210 Eastern Standard Time
| Itration            | 1956     |
| Real Det Return     | 709      |
| Real Sto Return     | 683      |
| Reward Loss         | -89      |
| Running Env Steps   | 978000   |
| Running Forward KL  | -2.85    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1956     |
----------------------------------
2025-02-01 20:10:56.555729 Eastern Standard Time
| Itration            | 1957     |
| Real Det Return     | 704      |
| Real Sto Return     | 670      |
| Reward Loss         | -73.9    |
| Running Env Steps   | 978500   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 1957     |
----------------------------------
2025-02-01 20:11:12.336562 Eastern Standard Time
| Itration            | 1958     |
| Real Det Return     | 662      |
| Real Sto Return     | 650      |
| Reward Loss         | -146     |
| Running Env Steps   | 979000   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 5.95     |
| Running Update Time | 1958     |
----------------------------------
2025-02-01 20:11:28.126065 Eastern Standard Time
| Itration            | 1959     |
| Real Det Return     | 691      |
| Real Sto Return     | 681      |
| Reward Loss         | -55.6    |
| Running Env Steps   | 979500   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 8.04     |
| Running Update Time | 1959     |
----------------------------------
2025-02-01 20:11:43.854379 Eastern Standard Time
| Itration            | 1960     |
| Real Det Return     | 725      |
| Real Sto Return     | 695      |
| Reward Loss         | -86.6    |
| Running Env Steps   | 980000   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 8.18     |
| Running Update Time | 1960     |
----------------------------------
2025-02-01 20:11:59.643516 Eastern Standard Time
| Itration            | 1961     |
| Real Det Return     | 668      |
| Real Sto Return     | 641      |
| Reward Loss         | -104     |
| Running Env Steps   | 980500   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 7.32     |
| Running Update Time | 1961     |
----------------------------------
2025-02-01 20:12:15.416811 Eastern Standard Time
| Itration            | 1962     |
| Real Det Return     | 710      |
| Real Sto Return     | 692      |
| Reward Loss         | -95.7    |
| Running Env Steps   | 981000   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 7.37     |
| Running Update Time | 1962     |
----------------------------------
2025-02-01 20:12:31.327351 Eastern Standard Time
| Itration            | 1963     |
| Real Det Return     | 695      |
| Real Sto Return     | 686      |
| Reward Loss         | -86.7    |
| Running Env Steps   | 981500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 7.46     |
| Running Update Time | 1963     |
----------------------------------
2025-02-01 20:12:47.056047 Eastern Standard Time
| Itration            | 1964     |
| Real Det Return     | 689      |
| Real Sto Return     | 683      |
| Reward Loss         | -54.1    |
| Running Env Steps   | 982000   |
| Running Forward KL  | -5.64    |
| Running Reverse KL  | 6.92     |
| Running Update Time | 1964     |
----------------------------------
2025-02-01 20:13:02.931587 Eastern Standard Time
| Itration            | 1965     |
| Real Det Return     | 702      |
| Real Sto Return     | 677      |
| Reward Loss         | -61.9    |
| Running Env Steps   | 982500   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 6.13     |
| Running Update Time | 1965     |
----------------------------------
2025-02-01 20:13:18.742004 Eastern Standard Time
| Itration            | 1966     |
| Real Det Return     | 703      |
| Real Sto Return     | 678      |
| Reward Loss         | -47.6    |
| Running Env Steps   | 983000   |
| Running Forward KL  | -5.65    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1966     |
----------------------------------
2025-02-01 20:13:34.528543 Eastern Standard Time
| Itration            | 1967     |
| Real Det Return     | 621      |
| Real Sto Return     | 607      |
| Reward Loss         | -200     |
| Running Env Steps   | 983500   |
| Running Forward KL  | -3.59    |
| Running Reverse KL  | 6.62     |
| Running Update Time | 1967     |
----------------------------------
2025-02-01 20:13:50.275774 Eastern Standard Time
| Itration            | 1968     |
| Real Det Return     | 690      |
| Real Sto Return     | 663      |
| Reward Loss         | -86      |
| Running Env Steps   | 984000   |
| Running Forward KL  | -5.4     |
| Running Reverse KL  | 6.93     |
| Running Update Time | 1968     |
----------------------------------
2025-02-01 20:14:06.138600 Eastern Standard Time
| Itration            | 1969     |
| Real Det Return     | 717      |
| Real Sto Return     | 696      |
| Reward Loss         | -80.5    |
| Running Env Steps   | 984500   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 6.79     |
| Running Update Time | 1969     |
----------------------------------
2025-02-01 20:14:22.002697 Eastern Standard Time
| Itration            | 1970     |
| Real Det Return     | 721      |
| Real Sto Return     | 681      |
| Reward Loss         | -68.5    |
| Running Env Steps   | 985000   |
| Running Forward KL  | -5.53    |
| Running Reverse KL  | 7.53     |
| Running Update Time | 1970     |
----------------------------------
2025-02-01 20:14:37.839847 Eastern Standard Time
| Itration            | 1971     |
| Real Det Return     | 685      |
| Real Sto Return     | 683      |
| Reward Loss         | -74.7    |
| Running Env Steps   | 985500   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 6.93     |
| Running Update Time | 1971     |
----------------------------------
2025-02-01 20:14:53.625674 Eastern Standard Time
| Itration            | 1972     |
| Real Det Return     | 685      |
| Real Sto Return     | 662      |
| Reward Loss         | -90.5    |
| Running Env Steps   | 986000   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 7.11     |
| Running Update Time | 1972     |
----------------------------------
2025-02-01 20:15:09.507682 Eastern Standard Time
| Itration            | 1973     |
| Real Det Return     | 693      |
| Real Sto Return     | 664      |
| Reward Loss         | -36.1    |
| Running Env Steps   | 986500   |
| Running Forward KL  | -5.71    |
| Running Reverse KL  | 7.22     |
| Running Update Time | 1973     |
----------------------------------
2025-02-01 20:15:25.330935 Eastern Standard Time
| Itration            | 1974     |
| Real Det Return     | 693      |
| Real Sto Return     | 676      |
| Reward Loss         | -78.9    |
| Running Env Steps   | 987000   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 6.62     |
| Running Update Time | 1974     |
----------------------------------
2025-02-01 20:15:41.096853 Eastern Standard Time
| Itration            | 1975     |
| Real Det Return     | 689      |
| Real Sto Return     | 658      |
| Reward Loss         | -73      |
| Running Env Steps   | 987500   |
| Running Forward KL  | -5.51    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1975     |
----------------------------------
2025-02-01 20:15:56.959628 Eastern Standard Time
| Itration            | 1976     |
| Real Det Return     | 704      |
| Real Sto Return     | 655      |
| Reward Loss         | -63.1    |
| Running Env Steps   | 988000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 7.78     |
| Running Update Time | 1976     |
----------------------------------
2025-02-01 20:16:12.798051 Eastern Standard Time
| Itration            | 1977     |
| Real Det Return     | 710      |
| Real Sto Return     | 680      |
| Reward Loss         | -83.3    |
| Running Env Steps   | 988500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 7.14     |
| Running Update Time | 1977     |
----------------------------------
2025-02-01 20:16:28.575172 Eastern Standard Time
| Itration            | 1978     |
| Real Det Return     | 715      |
| Real Sto Return     | 685      |
| Reward Loss         | -39.2    |
| Running Env Steps   | 989000   |
| Running Forward KL  | -5.62    |
| Running Reverse KL  | 7.63     |
| Running Update Time | 1978     |
----------------------------------
2025-02-01 20:16:44.411594 Eastern Standard Time
| Itration            | 1979     |
| Real Det Return     | 699      |
| Real Sto Return     | 668      |
| Reward Loss         | -99.6    |
| Running Env Steps   | 989500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 6.25     |
| Running Update Time | 1979     |
----------------------------------
2025-02-01 20:17:00.265590 Eastern Standard Time
| Itration            | 1980     |
| Real Det Return     | 708      |
| Real Sto Return     | 687      |
| Reward Loss         | -68.3    |
| Running Env Steps   | 990000   |
| Running Forward KL  | -5.52    |
| Running Reverse KL  | 6.66     |
| Running Update Time | 1980     |
----------------------------------
2025-02-01 20:17:16.109526 Eastern Standard Time
| Itration            | 1981     |
| Real Det Return     | 715      |
| Real Sto Return     | 694      |
| Reward Loss         | -37.9    |
| Running Env Steps   | 990500   |
| Running Forward KL  | -3.83    |
| Running Reverse KL  | 9.17     |
| Running Update Time | 1981     |
----------------------------------
2025-02-01 20:17:31.961639 Eastern Standard Time
| Itration            | 1982     |
| Real Det Return     | 706      |
| Real Sto Return     | 681      |
| Reward Loss         | -65.8    |
| Running Env Steps   | 991000   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 7.5      |
| Running Update Time | 1982     |
----------------------------------
2025-02-01 20:17:47.945204 Eastern Standard Time
| Itration            | 1983     |
| Real Det Return     | 726      |
| Real Sto Return     | 676      |
| Reward Loss         | -42.1    |
| Running Env Steps   | 991500   |
| Running Forward KL  | -5.42    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 1983     |
----------------------------------
2025-02-01 20:18:03.873541 Eastern Standard Time
| Itration            | 1984     |
| Real Det Return     | 687      |
| Real Sto Return     | 665      |
| Reward Loss         | -44.1    |
| Running Env Steps   | 992000   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1984     |
----------------------------------
2025-02-01 20:18:19.718024 Eastern Standard Time
| Itration            | 1985     |
| Real Det Return     | 675      |
| Real Sto Return     | 653      |
| Reward Loss         | -99.7    |
| Running Env Steps   | 992500   |
| Running Forward KL  | -5.63    |
| Running Reverse KL  | 5.94     |
| Running Update Time | 1985     |
----------------------------------
2025-02-01 20:18:35.280082 Eastern Standard Time
| Itration            | 1986     |
| Real Det Return     | 685      |
| Real Sto Return     | 660      |
| Reward Loss         | -89.8    |
| Running Env Steps   | 993000   |
| Running Forward KL  | -5.66    |
| Running Reverse KL  | 7.34     |
| Running Update Time | 1986     |
----------------------------------
2025-02-01 20:18:50.714172 Eastern Standard Time
| Itration            | 1987     |
| Real Det Return     | 711      |
| Real Sto Return     | 680      |
| Reward Loss         | -85.5    |
| Running Env Steps   | 993500   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 6.37     |
| Running Update Time | 1987     |
----------------------------------
2025-02-01 20:19:06.246522 Eastern Standard Time
| Itration            | 1988     |
| Real Det Return     | 676      |
| Real Sto Return     | 657      |
| Reward Loss         | -94.1    |
| Running Env Steps   | 994000   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 8.32     |
| Running Update Time | 1988     |
----------------------------------
2025-02-01 20:19:21.644773 Eastern Standard Time
| Itration            | 1989     |
| Real Det Return     | 683      |
| Real Sto Return     | 682      |
| Reward Loss         | -64      |
| Running Env Steps   | 994500   |
| Running Forward KL  | -5.56    |
| Running Reverse KL  | 8.53     |
| Running Update Time | 1989     |
----------------------------------
2025-02-01 20:19:37.041062 Eastern Standard Time
| Itration            | 1990     |
| Real Det Return     | 698      |
| Real Sto Return     | 659      |
| Reward Loss         | -82.2    |
| Running Env Steps   | 995000   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 6.98     |
| Running Update Time | 1990     |
----------------------------------
2025-02-01 20:19:52.462943 Eastern Standard Time
| Itration            | 1991     |
| Real Det Return     | 710      |
| Real Sto Return     | 691      |
| Reward Loss         | -45.5    |
| Running Env Steps   | 995500   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 7.35     |
| Running Update Time | 1991     |
----------------------------------
2025-02-01 20:20:07.850721 Eastern Standard Time
| Itration            | 1992     |
| Real Det Return     | 693      |
| Real Sto Return     | 669      |
| Reward Loss         | -89.4    |
| Running Env Steps   | 996000   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 7.2      |
| Running Update Time | 1992     |
----------------------------------
2025-02-01 20:20:23.310797 Eastern Standard Time
| Itration            | 1993     |
| Real Det Return     | 708      |
| Real Sto Return     | 673      |
| Reward Loss         | -99.6    |
| Running Env Steps   | 996500   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1993     |
----------------------------------
2025-02-01 20:20:38.736175 Eastern Standard Time
| Itration            | 1994     |
| Real Det Return     | 696      |
| Real Sto Return     | 670      |
| Reward Loss         | -92.4    |
| Running Env Steps   | 997000   |
| Running Forward KL  | -5.69    |
| Running Reverse KL  | 6.5      |
| Running Update Time | 1994     |
----------------------------------
2025-02-01 20:20:54.212368 Eastern Standard Time
| Itration            | 1995     |
| Real Det Return     | 679      |
| Real Sto Return     | 656      |
| Reward Loss         | -92.3    |
| Running Env Steps   | 997500   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 6.88     |
| Running Update Time | 1995     |
----------------------------------
2025-02-01 20:21:09.691811 Eastern Standard Time
| Itration            | 1996     |
| Real Det Return     | 702      |
| Real Sto Return     | 674      |
| Reward Loss         | -25.9    |
| Running Env Steps   | 998000   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 6.99     |
| Running Update Time | 1996     |
----------------------------------
2025-02-01 20:21:25.133320 Eastern Standard Time
| Itration            | 1997     |
| Real Det Return     | 710      |
| Real Sto Return     | 678      |
| Reward Loss         | -71.3    |
| Running Env Steps   | 998500   |
| Running Forward KL  | -5.9     |
| Running Reverse KL  | 7.24     |
| Running Update Time | 1997     |
----------------------------------
2025-02-01 20:21:40.562127 Eastern Standard Time
| Itration            | 1998     |
| Real Det Return     | 704      |
| Real Sto Return     | 671      |
| Reward Loss         | -87.9    |
| Running Env Steps   | 999000   |
| Running Forward KL  | -5.97    |
| Running Reverse KL  | 6.02     |
| Running Update Time | 1998     |
----------------------------------
2025-02-01 20:21:55.979052 Eastern Standard Time
| Itration            | 1999     |
| Real Det Return     | 713      |
| Real Sto Return     | 681      |
| Reward Loss         | -81.7    |
| Running Env Steps   | 999500   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 6.9      |
| Running Update Time | 1999     |
----------------------------------
