Logging to logs/HalfCheetah-v4/IRL/2025_02_01_11_36_20
2025-02-01 11:36:34.280851 Eastern Standard Time
| Itration            | 0        |
| Real Det Return     | -0.52    |
| Real Sto Return     | -25      |
| Reward Loss         | 8.75     |
| Running Env Steps   | 0        |
| Running Forward KL  | 3.7      |
| Running Reverse KL  | 5.97     |
| Running Update Time | 0        |
----------------------------------
2025-02-01 11:36:48.289840 Eastern Standard Time
| Itration            | 1        |
| Real Det Return     | -1.03    |
| Real Sto Return     | -35.9    |
| Reward Loss         | 8.83     |
| Running Env Steps   | 500      |
| Running Forward KL  | 3.05     |
| Running Reverse KL  | 6.23     |
| Running Update Time | 1        |
----------------------------------
2025-02-01 11:37:02.536991 Eastern Standard Time
| Itration            | 2        |
| Real Det Return     | -1.13    |
| Real Sto Return     | -32.9    |
| Reward Loss         | 8.84     |
| Running Env Steps   | 1000     |
| Running Forward KL  | 3.06     |
| Running Reverse KL  | 6.15     |
| Running Update Time | 2        |
----------------------------------
2025-02-01 11:37:16.689821 Eastern Standard Time
| Itration            | 3        |
| Real Det Return     | -1.69    |
| Real Sto Return     | -25.6    |
| Reward Loss         | 6.17     |
| Running Env Steps   | 1500     |
| Running Forward KL  | 3.1      |
| Running Reverse KL  | 5.89     |
| Running Update Time | 3        |
----------------------------------
2025-02-01 11:37:30.935225 Eastern Standard Time
| Itration            | 4        |
| Real Det Return     | -2.19    |
| Real Sto Return     | -24.1    |
| Reward Loss         | 4.68     |
| Running Env Steps   | 2000     |
| Running Forward KL  | 3.36     |
| Running Reverse KL  | 6.25     |
| Running Update Time | 4        |
----------------------------------
2025-02-01 11:37:45.088029 Eastern Standard Time
| Itration            | 5        |
| Real Det Return     | -2.31    |
| Real Sto Return     | -19.1    |
| Reward Loss         | 4.47     |
| Running Env Steps   | 2500     |
| Running Forward KL  | 3.35     |
| Running Reverse KL  | 6.83     |
| Running Update Time | 5        |
----------------------------------
2025-02-01 11:37:59.331225 Eastern Standard Time
| Itration            | 6        |
| Real Det Return     | 1.07     |
| Real Sto Return     | -19.4    |
| Reward Loss         | 9.77     |
| Running Env Steps   | 3000     |
| Running Forward KL  | 3.16     |
| Running Reverse KL  | 7.38     |
| Running Update Time | 6        |
----------------------------------
2025-02-01 11:38:13.446535 Eastern Standard Time
| Itration            | 7        |
| Real Det Return     | -0.49    |
| Real Sto Return     | -5.5     |
| Reward Loss         | 6.7      |
| Running Env Steps   | 3500     |
| Running Forward KL  | 3.33     |
| Running Reverse KL  | 6.91     |
| Running Update Time | 7        |
----------------------------------
2025-02-01 11:38:27.720746 Eastern Standard Time
| Itration            | 8        |
| Real Det Return     | 6.71     |
| Real Sto Return     | -3.05    |
| Reward Loss         | 7.42     |
| Running Env Steps   | 4000     |
| Running Forward KL  | 3.3      |
| Running Reverse KL  | 7.08     |
| Running Update Time | 8        |
----------------------------------
2025-02-01 11:38:41.917602 Eastern Standard Time
| Itration            | 9        |
| Real Det Return     | -1.92    |
| Real Sto Return     | -12.2    |
| Reward Loss         | 0.923    |
| Running Env Steps   | 4500     |
| Running Forward KL  | 2.93     |
| Running Reverse KL  | 6.65     |
| Running Update Time | 9        |
----------------------------------
2025-02-01 11:38:56.141820 Eastern Standard Time
| Itration            | 10       |
| Real Det Return     | 7.63     |
| Real Sto Return     | -5.27    |
| Reward Loss         | 6.17     |
| Running Env Steps   | 5000     |
| Running Forward KL  | 3.23     |
| Running Reverse KL  | 7.79     |
| Running Update Time | 10       |
----------------------------------
2025-02-01 11:39:10.371946 Eastern Standard Time
| Itration            | 11       |
| Real Det Return     | 7.22     |
| Real Sto Return     | -5.17    |
| Reward Loss         | 4.97     |
| Running Env Steps   | 5500     |
| Running Forward KL  | 3.02     |
| Running Reverse KL  | 7.61     |
| Running Update Time | 11       |
----------------------------------
2025-02-01 11:39:24.525036 Eastern Standard Time
| Itration            | 12       |
| Real Det Return     | 7.3      |
| Real Sto Return     | -7.2     |
| Reward Loss         | 1.46     |
| Running Env Steps   | 6000     |
| Running Forward KL  | 3.29     |
| Running Reverse KL  | 7.37     |
| Running Update Time | 12       |
----------------------------------
2025-02-01 11:39:38.692034 Eastern Standard Time
| Itration            | 13       |
| Real Det Return     | 7.5      |
| Real Sto Return     | -1.32    |
| Reward Loss         | -2.66    |
| Running Env Steps   | 6500     |
| Running Forward KL  | 3.07     |
| Running Reverse KL  | 7.3      |
| Running Update Time | 13       |
----------------------------------
2025-02-01 11:39:52.840333 Eastern Standard Time
| Itration            | 14       |
| Real Det Return     | 4.6      |
| Real Sto Return     | -3.3     |
| Reward Loss         | 1.43     |
| Running Env Steps   | 7000     |
| Running Forward KL  | 3.04     |
| Running Reverse KL  | 7.93     |
| Running Update Time | 14       |
----------------------------------
2025-02-01 11:40:07.024990 Eastern Standard Time
| Itration            | 15       |
| Real Det Return     | 3.46     |
| Real Sto Return     | -2.84    |
| Reward Loss         | -1.04    |
| Running Env Steps   | 7500     |
| Running Forward KL  | 3.44     |
| Running Reverse KL  | 8.26     |
| Running Update Time | 15       |
----------------------------------
2025-02-01 11:40:21.128654 Eastern Standard Time
| Itration            | 16       |
| Real Det Return     | 10.8     |
| Real Sto Return     | 0.08     |
| Reward Loss         | -3.65    |
| Running Env Steps   | 8000     |
| Running Forward KL  | 3.12     |
| Running Reverse KL  | 7.89     |
| Running Update Time | 16       |
----------------------------------
2025-02-01 11:40:35.295539 Eastern Standard Time
| Itration            | 17       |
| Real Det Return     | 12.2     |
| Real Sto Return     | 3.73     |
| Reward Loss         | -5.77    |
| Running Env Steps   | 8500     |
| Running Forward KL  | 2.83     |
| Running Reverse KL  | 7.33     |
| Running Update Time | 17       |
----------------------------------
2025-02-01 11:40:49.725539 Eastern Standard Time
| Itration            | 18       |
| Real Det Return     | 1.21     |
| Real Sto Return     | -3.79    |
| Reward Loss         | -9.42    |
| Running Env Steps   | 9000     |
| Running Forward KL  | 3.11     |
| Running Reverse KL  | 7.24     |
| Running Update Time | 18       |
----------------------------------
2025-02-01 11:41:04.378843 Eastern Standard Time
| Itration            | 19       |
| Real Det Return     | 11.6     |
| Real Sto Return     | 4        |
| Reward Loss         | -11.4    |
| Running Env Steps   | 9500     |
| Running Forward KL  | 3.03     |
| Running Reverse KL  | 7.88     |
| Running Update Time | 19       |
----------------------------------
2025-02-01 11:41:18.855093 Eastern Standard Time
| Itration            | 20       |
| Real Det Return     | 9.19     |
| Real Sto Return     | -4.3     |
| Reward Loss         | -8.15    |
| Running Env Steps   | 10000    |
| Running Forward KL  | 2.95     |
| Running Reverse KL  | 8.36     |
| Running Update Time | 20       |
----------------------------------
2025-02-01 11:41:32.829043 Eastern Standard Time
| Itration            | 21       |
| Real Det Return     | 11.8     |
| Real Sto Return     | -2.44    |
| Reward Loss         | -12.4    |
| Running Env Steps   | 10500    |
| Running Forward KL  | 3.34     |
| Running Reverse KL  | 8.25     |
| Running Update Time | 21       |
----------------------------------
2025-02-01 11:41:47.076134 Eastern Standard Time
| Itration            | 22       |
| Real Det Return     | 9.51     |
| Real Sto Return     | -2.04    |
| Reward Loss         | -13.5    |
| Running Env Steps   | 11000    |
| Running Forward KL  | 2.54     |
| Running Reverse KL  | 7.7      |
| Running Update Time | 22       |
----------------------------------
2025-02-01 11:42:01.335355 Eastern Standard Time
| Itration            | 23       |
| Real Det Return     | 10.6     |
| Real Sto Return     | -2.52    |
| Reward Loss         | -13.7    |
| Running Env Steps   | 11500    |
| Running Forward KL  | 2.74     |
| Running Reverse KL  | 8.04     |
| Running Update Time | 23       |
----------------------------------
2025-02-01 11:42:15.481148 Eastern Standard Time
| Itration            | 24       |
| Real Det Return     | 6.78     |
| Real Sto Return     | 0.76     |
| Reward Loss         | -18.8    |
| Running Env Steps   | 12000    |
| Running Forward KL  | 2.9      |
| Running Reverse KL  | 7.42     |
| Running Update Time | 24       |
----------------------------------
2025-02-01 11:42:29.496471 Eastern Standard Time
| Itration            | 25       |
| Real Det Return     | 10.5     |
| Real Sto Return     | -0.97    |
| Reward Loss         | -20.5    |
| Running Env Steps   | 12500    |
| Running Forward KL  | 2.84     |
| Running Reverse KL  | 7.76     |
| Running Update Time | 25       |
----------------------------------
2025-02-01 11:42:43.572527 Eastern Standard Time
| Itration            | 26       |
| Real Det Return     | 9.6      |
| Real Sto Return     | 4.3      |
| Reward Loss         | -20      |
| Running Env Steps   | 13000    |
| Running Forward KL  | 2.68     |
| Running Reverse KL  | 7.83     |
| Running Update Time | 26       |
----------------------------------
2025-02-01 11:42:57.697630 Eastern Standard Time
| Itration            | 27       |
| Real Det Return     | 9.63     |
| Real Sto Return     | 2.22     |
| Reward Loss         | -23.5    |
| Running Env Steps   | 13500    |
| Running Forward KL  | 3.07     |
| Running Reverse KL  | 8.23     |
| Running Update Time | 27       |
----------------------------------
2025-02-01 11:43:11.734342 Eastern Standard Time
| Itration            | 28       |
| Real Det Return     | 10.2     |
| Real Sto Return     | 1.16     |
| Reward Loss         | -24.3    |
| Running Env Steps   | 14000    |
| Running Forward KL  | 2.87     |
| Running Reverse KL  | 8.07     |
| Running Update Time | 28       |
----------------------------------
2025-02-01 11:43:25.842730 Eastern Standard Time
| Itration            | 29       |
| Real Det Return     | 6.89     |
| Real Sto Return     | 0.53     |
| Reward Loss         | -25.2    |
| Running Env Steps   | 14500    |
| Running Forward KL  | 3.05     |
| Running Reverse KL  | 8.32     |
| Running Update Time | 29       |
----------------------------------
2025-02-01 11:43:39.887979 Eastern Standard Time
| Itration            | 30       |
| Real Det Return     | 9.26     |
| Real Sto Return     | -0.63    |
| Reward Loss         | -26.9    |
| Running Env Steps   | 15000    |
| Running Forward KL  | 2.71     |
| Running Reverse KL  | 8.42     |
| Running Update Time | 30       |
----------------------------------
2025-02-01 11:43:53.800403 Eastern Standard Time
| Itration            | 31       |
| Real Det Return     | 8.28     |
| Real Sto Return     | -0.03    |
| Reward Loss         | -30.5    |
| Running Env Steps   | 15500    |
| Running Forward KL  | 2.29     |
| Running Reverse KL  | 7.63     |
| Running Update Time | 31       |
----------------------------------
2025-02-01 11:44:07.745375 Eastern Standard Time
| Itration            | 32       |
| Real Det Return     | 8.3      |
| Real Sto Return     | -1.2     |
| Reward Loss         | -32.1    |
| Running Env Steps   | 16000    |
| Running Forward KL  | 2.53     |
| Running Reverse KL  | 7.5      |
| Running Update Time | 32       |
----------------------------------
2025-02-01 11:44:21.750545 Eastern Standard Time
| Itration            | 33       |
| Real Det Return     | 7.91     |
| Real Sto Return     | -1.54    |
| Reward Loss         | -33.4    |
| Running Env Steps   | 16500    |
| Running Forward KL  | 2.53     |
| Running Reverse KL  | 7.62     |
| Running Update Time | 33       |
----------------------------------
2025-02-01 11:44:35.712605 Eastern Standard Time
| Itration            | 34       |
| Real Det Return     | 8.6      |
| Real Sto Return     | 1.5      |
| Reward Loss         | -33.4    |
| Running Env Steps   | 17000    |
| Running Forward KL  | 2.64     |
| Running Reverse KL  | 8.35     |
| Running Update Time | 34       |
----------------------------------
2025-02-01 11:44:49.749961 Eastern Standard Time
| Itration            | 35       |
| Real Det Return     | 10.6     |
| Real Sto Return     | 4.41     |
| Reward Loss         | -36.8    |
| Running Env Steps   | 17500    |
| Running Forward KL  | 2.61     |
| Running Reverse KL  | 7.89     |
| Running Update Time | 35       |
----------------------------------
2025-02-01 11:45:03.783952 Eastern Standard Time
| Itration            | 36       |
| Real Det Return     | 10.4     |
| Real Sto Return     | 0.9      |
| Reward Loss         | -37.9    |
| Running Env Steps   | 18000    |
| Running Forward KL  | 3.24     |
| Running Reverse KL  | 8.27     |
| Running Update Time | 36       |
----------------------------------
2025-02-01 11:45:17.763009 Eastern Standard Time
| Itration            | 37       |
| Real Det Return     | 7.14     |
| Real Sto Return     | 1.84     |
| Reward Loss         | -38.8    |
| Running Env Steps   | 18500    |
| Running Forward KL  | 2.02     |
| Running Reverse KL  | 7.51     |
| Running Update Time | 37       |
----------------------------------
2025-02-01 11:45:31.736312 Eastern Standard Time
| Itration            | 38       |
| Real Det Return     | 7.4      |
| Real Sto Return     | 0.53     |
| Reward Loss         | -41.3    |
| Running Env Steps   | 19000    |
| Running Forward KL  | 2.61     |
| Running Reverse KL  | 7.73     |
| Running Update Time | 38       |
----------------------------------
2025-02-01 11:45:45.737867 Eastern Standard Time
| Itration            | 39       |
| Real Det Return     | 7.39     |
| Real Sto Return     | 0.59     |
| Reward Loss         | -42.6    |
| Running Env Steps   | 19500    |
| Running Forward KL  | 2.37     |
| Running Reverse KL  | 7.87     |
| Running Update Time | 39       |
----------------------------------
2025-02-01 11:45:59.721726 Eastern Standard Time
| Itration            | 40       |
| Real Det Return     | 8.56     |
| Real Sto Return     | 3        |
| Reward Loss         | -45      |
| Running Env Steps   | 20000    |
| Running Forward KL  | 2.86     |
| Running Reverse KL  | 8.45     |
| Running Update Time | 40       |
----------------------------------
2025-02-01 11:46:13.819311 Eastern Standard Time
| Itration            | 41       |
| Real Det Return     | 7.45     |
| Real Sto Return     | 2.17     |
| Reward Loss         | -45.2    |
| Running Env Steps   | 20500    |
| Running Forward KL  | 2.29     |
| Running Reverse KL  | 7.78     |
| Running Update Time | 41       |
----------------------------------
2025-02-01 11:46:27.912714 Eastern Standard Time
| Itration            | 42       |
| Real Det Return     | 8.07     |
| Real Sto Return     | 1.05     |
| Reward Loss         | -47.3    |
| Running Env Steps   | 21000    |
| Running Forward KL  | 2.28     |
| Running Reverse KL  | 7.43     |
| Running Update Time | 42       |
----------------------------------
2025-02-01 11:46:41.939093 Eastern Standard Time
| Itration            | 43       |
| Real Det Return     | 9.74     |
| Real Sto Return     | 0.59     |
| Reward Loss         | -48.9    |
| Running Env Steps   | 21500    |
| Running Forward KL  | 2.38     |
| Running Reverse KL  | 7.52     |
| Running Update Time | 43       |
----------------------------------
2025-02-01 11:46:56.062935 Eastern Standard Time
| Itration            | 44       |
| Real Det Return     | 9.34     |
| Real Sto Return     | 4.02     |
| Reward Loss         | -48.8    |
| Running Env Steps   | 22000    |
| Running Forward KL  | 1.72     |
| Running Reverse KL  | 7.85     |
| Running Update Time | 44       |
----------------------------------
2025-02-01 11:47:10.268885 Eastern Standard Time
| Itration            | 45       |
| Real Det Return     | 10.4     |
| Real Sto Return     | 3.33     |
| Reward Loss         | -53.8    |
| Running Env Steps   | 22500    |
| Running Forward KL  | 2.17     |
| Running Reverse KL  | 7.58     |
| Running Update Time | 45       |
----------------------------------
2025-02-01 11:47:24.236770 Eastern Standard Time
| Itration            | 46       |
| Real Det Return     | 8.17     |
| Real Sto Return     | 3.68     |
| Reward Loss         | -49      |
| Running Env Steps   | 23000    |
| Running Forward KL  | 1.53     |
| Running Reverse KL  | 7.59     |
| Running Update Time | 46       |
----------------------------------
2025-02-01 11:47:38.224894 Eastern Standard Time
| Itration            | 47       |
| Real Det Return     | 12       |
| Real Sto Return     | 13.1     |
| Reward Loss         | -57.3    |
| Running Env Steps   | 23500    |
| Running Forward KL  | 1.85     |
| Running Reverse KL  | 7.17     |
| Running Update Time | 47       |
----------------------------------
2025-02-01 11:47:52.249162 Eastern Standard Time
| Itration            | 48       |
| Real Det Return     | 9.35     |
| Real Sto Return     | 3.59     |
| Reward Loss         | -54      |
| Running Env Steps   | 24000    |
| Running Forward KL  | 1.7      |
| Running Reverse KL  | 7.57     |
| Running Update Time | 48       |
----------------------------------
2025-02-01 11:48:06.238676 Eastern Standard Time
| Itration            | 49       |
| Real Det Return     | 9.48     |
| Real Sto Return     | 5.66     |
| Reward Loss         | -56      |
| Running Env Steps   | 24500    |
| Running Forward KL  | 1.74     |
| Running Reverse KL  | 7.4      |
| Running Update Time | 49       |
----------------------------------
2025-02-01 11:48:20.182172 Eastern Standard Time
| Itration            | 50       |
| Real Det Return     | 8.49     |
| Real Sto Return     | 6.43     |
| Reward Loss         | -58      |
| Running Env Steps   | 25000    |
| Running Forward KL  | 1.74     |
| Running Reverse KL  | 7.83     |
| Running Update Time | 50       |
----------------------------------
2025-02-01 11:48:34.147386 Eastern Standard Time
| Itration            | 51       |
| Real Det Return     | 6.01     |
| Real Sto Return     | 3.17     |
| Reward Loss         | -53.6    |
| Running Env Steps   | 25500    |
| Running Forward KL  | 1.32     |
| Running Reverse KL  | 7.74     |
| Running Update Time | 51       |
----------------------------------
2025-02-01 11:48:48.212638 Eastern Standard Time
| Itration            | 52       |
| Real Det Return     | 6.65     |
| Real Sto Return     | 4.25     |
| Reward Loss         | -56.7    |
| Running Env Steps   | 26000    |
| Running Forward KL  | 1.64     |
| Running Reverse KL  | 8.11     |
| Running Update Time | 52       |
----------------------------------
2025-02-01 11:49:02.249166 Eastern Standard Time
| Itration            | 53       |
| Real Det Return     | 7.7      |
| Real Sto Return     | 5.71     |
| Reward Loss         | -60.6    |
| Running Env Steps   | 26500    |
| Running Forward KL  | 1.4      |
| Running Reverse KL  | 7.43     |
| Running Update Time | 53       |
----------------------------------
2025-02-01 11:49:16.262196 Eastern Standard Time
| Itration            | 54       |
| Real Det Return     | 5.19     |
| Real Sto Return     | 6.25     |
| Reward Loss         | -60      |
| Running Env Steps   | 27000    |
| Running Forward KL  | 1.34     |
| Running Reverse KL  | 7.17     |
| Running Update Time | 54       |
----------------------------------
2025-02-01 11:49:30.152178 Eastern Standard Time
| Itration            | 55       |
| Real Det Return     | 7.28     |
| Real Sto Return     | 6.02     |
| Reward Loss         | -59.2    |
| Running Env Steps   | 27500    |
| Running Forward KL  | 1.09     |
| Running Reverse KL  | 7.21     |
| Running Update Time | 55       |
----------------------------------
2025-02-01 11:49:44.165828 Eastern Standard Time
| Itration            | 56       |
| Real Det Return     | 12       |
| Real Sto Return     | 15.7     |
| Reward Loss         | -53.9    |
| Running Env Steps   | 28000    |
| Running Forward KL  | 1.06     |
| Running Reverse KL  | 7.66     |
| Running Update Time | 56       |
----------------------------------
2025-02-01 11:49:58.117236 Eastern Standard Time
| Itration            | 57       |
| Real Det Return     | 26.2     |
| Real Sto Return     | 12.8     |
| Reward Loss         | -58.6    |
| Running Env Steps   | 28500    |
| Running Forward KL  | 0.746    |
| Running Reverse KL  | 6.49     |
| Running Update Time | 57       |
----------------------------------
2025-02-01 11:50:12.110233 Eastern Standard Time
| Itration            | 58       |
| Real Det Return     | 19.5     |
| Real Sto Return     | 14.4     |
| Reward Loss         | -60.6    |
| Running Env Steps   | 29000    |
| Running Forward KL  | 0.906    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 58       |
----------------------------------
2025-02-01 11:50:26.046754 Eastern Standard Time
| Itration            | 59       |
| Real Det Return     | 22.4     |
| Real Sto Return     | 20.6     |
| Reward Loss         | -58.4    |
| Running Env Steps   | 29500    |
| Running Forward KL  | 0.737    |
| Running Reverse KL  | 6.94     |
| Running Update Time | 59       |
----------------------------------
2025-02-01 11:50:40.022380 Eastern Standard Time
| Itration            | 60       |
| Real Det Return     | 24.1     |
| Real Sto Return     | 19.7     |
| Reward Loss         | -58.4    |
| Running Env Steps   | 30000    |
| Running Forward KL  | 0.453    |
| Running Reverse KL  | 6.46     |
| Running Update Time | 60       |
----------------------------------
2025-02-01 11:50:53.942029 Eastern Standard Time
| Itration            | 61       |
| Real Det Return     | 32.9     |
| Real Sto Return     | 17.1     |
| Reward Loss         | -58.3    |
| Running Env Steps   | 30500    |
| Running Forward KL  | 0.239    |
| Running Reverse KL  | 7.3      |
| Running Update Time | 61       |
----------------------------------
2025-02-01 11:51:07.938125 Eastern Standard Time
| Itration            | 62       |
| Real Det Return     | 39.1     |
| Real Sto Return     | 18.4     |
| Reward Loss         | -55.1    |
| Running Env Steps   | 31000    |
| Running Forward KL  | 0.372    |
| Running Reverse KL  | 6.95     |
| Running Update Time | 62       |
----------------------------------
2025-02-01 11:51:21.916563 Eastern Standard Time
| Itration            | 63       |
| Real Det Return     | 34.6     |
| Real Sto Return     | 20       |
| Reward Loss         | -55.9    |
| Running Env Steps   | 31500    |
| Running Forward KL  | 0.304    |
| Running Reverse KL  | 6.32     |
| Running Update Time | 63       |
----------------------------------
2025-02-01 11:51:35.869818 Eastern Standard Time
| Itration            | 64       |
| Real Det Return     | 38.8     |
| Real Sto Return     | 20.3     |
| Reward Loss         | -63.9    |
| Running Env Steps   | 32000    |
| Running Forward KL  | 0.709    |
| Running Reverse KL  | 7.12     |
| Running Update Time | 64       |
----------------------------------
2025-02-01 11:51:49.752132 Eastern Standard Time
| Itration            | 65       |
| Real Det Return     | 35.1     |
| Real Sto Return     | 15.4     |
| Reward Loss         | -57.9    |
| Running Env Steps   | 32500    |
| Running Forward KL  | 0.0269   |
| Running Reverse KL  | 6.65     |
| Running Update Time | 65       |
----------------------------------
2025-02-01 11:52:03.675827 Eastern Standard Time
| Itration            | 66       |
| Real Det Return     | 47.1     |
| Real Sto Return     | 22.2     |
| Reward Loss         | -57.2    |
| Running Env Steps   | 33000    |
| Running Forward KL  | 0.198    |
| Running Reverse KL  | 6.76     |
| Running Update Time | 66       |
----------------------------------
2025-02-01 11:52:17.624384 Eastern Standard Time
| Itration            | 67       |
| Real Det Return     | 24.7     |
| Real Sto Return     | 17.4     |
| Reward Loss         | -65.3    |
| Running Env Steps   | 33500    |
| Running Forward KL  | 0.712    |
| Running Reverse KL  | 6.98     |
| Running Update Time | 67       |
----------------------------------
2025-02-01 11:52:31.496045 Eastern Standard Time
| Itration            | 68       |
| Real Det Return     | 53.9     |
| Real Sto Return     | 29.4     |
| Reward Loss         | -58.9    |
| Running Env Steps   | 34000    |
| Running Forward KL  | 0.127    |
| Running Reverse KL  | 6.69     |
| Running Update Time | 68       |
----------------------------------
2025-02-01 11:52:45.404526 Eastern Standard Time
| Itration            | 69       |
| Real Det Return     | 33.6     |
| Real Sto Return     | 12.3     |
| Reward Loss         | -58      |
| Running Env Steps   | 34500    |
| Running Forward KL  | -0.0523  |
| Running Reverse KL  | 6.32     |
| Running Update Time | 69       |
----------------------------------
2025-02-01 11:52:59.269886 Eastern Standard Time
| Itration            | 70       |
| Real Det Return     | 70.9     |
| Real Sto Return     | 35.5     |
| Reward Loss         | -59.6    |
| Running Env Steps   | 35000    |
| Running Forward KL  | 0.214    |
| Running Reverse KL  | 7.35     |
| Running Update Time | 70       |
----------------------------------
2025-02-01 11:53:13.213702 Eastern Standard Time
| Itration            | 71       |
| Real Det Return     | 50.5     |
| Real Sto Return     | 37.9     |
| Reward Loss         | -55.9    |
| Running Env Steps   | 35500    |
| Running Forward KL  | -0.221   |
| Running Reverse KL  | 6.71     |
| Running Update Time | 71       |
----------------------------------
2025-02-01 11:53:27.126871 Eastern Standard Time
| Itration            | 72       |
| Real Det Return     | 34.5     |
| Real Sto Return     | 17.8     |
| Reward Loss         | -65.2    |
| Running Env Steps   | 36000    |
| Running Forward KL  | 0.281    |
| Running Reverse KL  | 5.87     |
| Running Update Time | 72       |
----------------------------------
2025-02-01 11:53:40.981835 Eastern Standard Time
| Itration            | 73       |
| Real Det Return     | 37.2     |
| Real Sto Return     | 22.2     |
| Reward Loss         | -63.8    |
| Running Env Steps   | 36500    |
| Running Forward KL  | 0.223    |
| Running Reverse KL  | 6.4      |
| Running Update Time | 73       |
----------------------------------
2025-02-01 11:53:54.896963 Eastern Standard Time
| Itration            | 74       |
| Real Det Return     | 36.3     |
| Real Sto Return     | 31.2     |
| Reward Loss         | -55.3    |
| Running Env Steps   | 37000    |
| Running Forward KL  | -0.103   |
| Running Reverse KL  | 6.56     |
| Running Update Time | 74       |
----------------------------------
2025-02-01 11:54:08.800314 Eastern Standard Time
| Itration            | 75       |
| Real Det Return     | 54.1     |
| Real Sto Return     | 28.5     |
| Reward Loss         | -60.2    |
| Running Env Steps   | 37500    |
| Running Forward KL  | -0.109   |
| Running Reverse KL  | 6.87     |
| Running Update Time | 75       |
----------------------------------
2025-02-01 11:54:22.701703 Eastern Standard Time
| Itration            | 76       |
| Real Det Return     | 37       |
| Real Sto Return     | 31.7     |
| Reward Loss         | -58.6    |
| Running Env Steps   | 38000    |
| Running Forward KL  | -0.113   |
| Running Reverse KL  | 6.37     |
| Running Update Time | 76       |
----------------------------------
2025-02-01 11:54:36.649802 Eastern Standard Time
| Itration            | 77       |
| Real Det Return     | 69       |
| Real Sto Return     | 34.5     |
| Reward Loss         | -58.8    |
| Running Env Steps   | 38500    |
| Running Forward KL  | -0.332   |
| Running Reverse KL  | 5.96     |
| Running Update Time | 77       |
----------------------------------
2025-02-01 11:54:50.577766 Eastern Standard Time
| Itration            | 78       |
| Real Det Return     | 58.8     |
| Real Sto Return     | 37.6     |
| Reward Loss         | -59.2    |
| Running Env Steps   | 39000    |
| Running Forward KL  | -0.364   |
| Running Reverse KL  | 6.03     |
| Running Update Time | 78       |
----------------------------------
2025-02-01 11:55:04.552364 Eastern Standard Time
| Itration            | 79       |
| Real Det Return     | 66.1     |
| Real Sto Return     | 35       |
| Reward Loss         | -62.2    |
| Running Env Steps   | 39500    |
| Running Forward KL  | -0.391   |
| Running Reverse KL  | 6.42     |
| Running Update Time | 79       |
----------------------------------
2025-02-01 11:55:18.499945 Eastern Standard Time
| Itration            | 80       |
| Real Det Return     | 64.9     |
| Real Sto Return     | 39.8     |
| Reward Loss         | -56.2    |
| Running Env Steps   | 40000    |
| Running Forward KL  | -0.167   |
| Running Reverse KL  | 6.11     |
| Running Update Time | 80       |
----------------------------------
2025-02-01 11:55:32.380124 Eastern Standard Time
| Itration            | 81       |
| Real Det Return     | 49.9     |
| Real Sto Return     | 35.3     |
| Reward Loss         | -60.5    |
| Running Env Steps   | 40500    |
| Running Forward KL  | -0.47    |
| Running Reverse KL  | 6.83     |
| Running Update Time | 81       |
----------------------------------
2025-02-01 11:55:46.235598 Eastern Standard Time
| Itration            | 82       |
| Real Det Return     | 28.4     |
| Real Sto Return     | 29       |
| Reward Loss         | -57.9    |
| Running Env Steps   | 41000    |
| Running Forward KL  | -0.591   |
| Running Reverse KL  | 6.6      |
| Running Update Time | 82       |
----------------------------------
2025-02-01 11:56:00.049938 Eastern Standard Time
| Itration            | 83       |
| Real Det Return     | 57       |
| Real Sto Return     | 51       |
| Reward Loss         | -57.1    |
| Running Env Steps   | 41500    |
| Running Forward KL  | -0.194   |
| Running Reverse KL  | 7.31     |
| Running Update Time | 83       |
----------------------------------
2025-02-01 11:56:13.975245 Eastern Standard Time
| Itration            | 84       |
| Real Det Return     | 68.2     |
| Real Sto Return     | 38.9     |
| Reward Loss         | -65      |
| Running Env Steps   | 42000    |
| Running Forward KL  | -0.261   |
| Running Reverse KL  | 6.25     |
| Running Update Time | 84       |
----------------------------------
2025-02-01 11:56:27.879842 Eastern Standard Time
| Itration            | 85       |
| Real Det Return     | 40.7     |
| Real Sto Return     | 35.5     |
| Reward Loss         | -62.5    |
| Running Env Steps   | 42500    |
| Running Forward KL  | -0.415   |
| Running Reverse KL  | 6.53     |
| Running Update Time | 85       |
----------------------------------
2025-02-01 11:56:41.767360 Eastern Standard Time
| Itration            | 86       |
| Real Det Return     | 75.3     |
| Real Sto Return     | 32.2     |
| Reward Loss         | -69.7    |
| Running Env Steps   | 43000    |
| Running Forward KL  | -0.237   |
| Running Reverse KL  | 6.06     |
| Running Update Time | 86       |
----------------------------------
2025-02-01 11:56:55.633989 Eastern Standard Time
| Itration            | 87       |
| Real Det Return     | 55.4     |
| Real Sto Return     | 34.6     |
| Reward Loss         | -58.4    |
| Running Env Steps   | 43500    |
| Running Forward KL  | -0.484   |
| Running Reverse KL  | 6.46     |
| Running Update Time | 87       |
----------------------------------
2025-02-01 11:57:09.589014 Eastern Standard Time
| Itration            | 88       |
| Real Det Return     | 93.2     |
| Real Sto Return     | 41.1     |
| Reward Loss         | -62.8    |
| Running Env Steps   | 44000    |
| Running Forward KL  | -0.643   |
| Running Reverse KL  | 6.28     |
| Running Update Time | 88       |
----------------------------------
2025-02-01 11:57:23.502178 Eastern Standard Time
| Itration            | 89       |
| Real Det Return     | 77.4     |
| Real Sto Return     | 42.8     |
| Reward Loss         | -70.3    |
| Running Env Steps   | 44500    |
| Running Forward KL  | -0.224   |
| Running Reverse KL  | 6.38     |
| Running Update Time | 89       |
----------------------------------
2025-02-01 11:57:37.286933 Eastern Standard Time
| Itration            | 90       |
| Real Det Return     | 102      |
| Real Sto Return     | 43.4     |
| Reward Loss         | -72.2    |
| Running Env Steps   | 45000    |
| Running Forward KL  | 0.22     |
| Running Reverse KL  | 7.11     |
| Running Update Time | 90       |
----------------------------------
2025-02-01 11:57:51.185494 Eastern Standard Time
| Itration            | 91       |
| Real Det Return     | 125      |
| Real Sto Return     | 48.8     |
| Reward Loss         | -58.9    |
| Running Env Steps   | 45500    |
| Running Forward KL  | -0.951   |
| Running Reverse KL  | 6.36     |
| Running Update Time | 91       |
----------------------------------
2025-02-01 11:58:05.035510 Eastern Standard Time
| Itration            | 92       |
| Real Det Return     | 70.5     |
| Real Sto Return     | 35       |
| Reward Loss         | -69.8    |
| Running Env Steps   | 46000    |
| Running Forward KL  | -0.345   |
| Running Reverse KL  | 6.18     |
| Running Update Time | 92       |
----------------------------------
2025-02-01 11:58:18.953112 Eastern Standard Time
| Itration            | 93       |
| Real Det Return     | 121      |
| Real Sto Return     | 64       |
| Reward Loss         | -60.6    |
| Running Env Steps   | 46500    |
| Running Forward KL  | -1.15    |
| Running Reverse KL  | 6.31     |
| Running Update Time | 93       |
----------------------------------
2025-02-01 11:58:32.852300 Eastern Standard Time
| Itration            | 94       |
| Real Det Return     | 161      |
| Real Sto Return     | 61.7     |
| Reward Loss         | -62.9    |
| Running Env Steps   | 47000    |
| Running Forward KL  | -0.482   |
| Running Reverse KL  | 6.15     |
| Running Update Time | 94       |
----------------------------------
2025-02-01 11:58:46.765981 Eastern Standard Time
| Itration            | 95       |
| Real Det Return     | 78.6     |
| Real Sto Return     | 50.9     |
| Reward Loss         | -64.5    |
| Running Env Steps   | 47500    |
| Running Forward KL  | -0.764   |
| Running Reverse KL  | 6.23     |
| Running Update Time | 95       |
----------------------------------
2025-02-01 11:59:00.605203 Eastern Standard Time
| Itration            | 96       |
| Real Det Return     | 164      |
| Real Sto Return     | 63       |
| Reward Loss         | -66.4    |
| Running Env Steps   | 48000    |
| Running Forward KL  | -1.04    |
| Running Reverse KL  | 5.3      |
| Running Update Time | 96       |
----------------------------------
2025-02-01 11:59:14.525195 Eastern Standard Time
| Itration            | 97       |
| Real Det Return     | 121      |
| Real Sto Return     | 40.6     |
| Reward Loss         | -68.5    |
| Running Env Steps   | 48500    |
| Running Forward KL  | -0.508   |
| Running Reverse KL  | 7.13     |
| Running Update Time | 97       |
----------------------------------
2025-02-01 11:59:28.432776 Eastern Standard Time
| Itration            | 98       |
| Real Det Return     | 130      |
| Real Sto Return     | 53.4     |
| Reward Loss         | -73.6    |
| Running Env Steps   | 49000    |
| Running Forward KL  | -0.461   |
| Running Reverse KL  | 6.49     |
| Running Update Time | 98       |
----------------------------------
2025-02-01 11:59:42.265419 Eastern Standard Time
| Itration            | 99       |
| Real Det Return     | 136      |
| Real Sto Return     | 69.1     |
| Reward Loss         | -54.6    |
| Running Env Steps   | 49500    |
| Running Forward KL  | -1.66    |
| Running Reverse KL  | 5.89     |
| Running Update Time | 99       |
----------------------------------
2025-02-01 11:59:56.128127 Eastern Standard Time
| Itration            | 100      |
| Real Det Return     | 118      |
| Real Sto Return     | 69.2     |
| Reward Loss         | -72.7    |
| Running Env Steps   | 50000    |
| Running Forward KL  | -0.749   |
| Running Reverse KL  | 6.36     |
| Running Update Time | 100      |
----------------------------------
2025-02-01 12:00:10.172330 Eastern Standard Time
| Itration            | 101      |
| Real Det Return     | 169      |
| Real Sto Return     | 74.7     |
| Reward Loss         | -72.7    |
| Running Env Steps   | 50500    |
| Running Forward KL  | -0.615   |
| Running Reverse KL  | 6.56     |
| Running Update Time | 101      |
----------------------------------
2025-02-01 12:00:24.072247 Eastern Standard Time
| Itration            | 102      |
| Real Det Return     | 165      |
| Real Sto Return     | 68.3     |
| Reward Loss         | -72.6    |
| Running Env Steps   | 51000    |
| Running Forward KL  | -0.712   |
| Running Reverse KL  | 5.83     |
| Running Update Time | 102      |
----------------------------------
2025-02-01 12:00:37.915886 Eastern Standard Time
| Itration            | 103      |
| Real Det Return     | 165      |
| Real Sto Return     | 70.7     |
| Reward Loss         | -66.5    |
| Running Env Steps   | 51500    |
| Running Forward KL  | -0.856   |
| Running Reverse KL  | 6.42     |
| Running Update Time | 103      |
----------------------------------
2025-02-01 12:00:51.753222 Eastern Standard Time
| Itration            | 104      |
| Real Det Return     | 180      |
| Real Sto Return     | 76.8     |
| Reward Loss         | -68.7    |
| Running Env Steps   | 52000    |
| Running Forward KL  | -0.757   |
| Running Reverse KL  | 5.7      |
| Running Update Time | 104      |
----------------------------------
2025-02-01 12:01:05.618554 Eastern Standard Time
| Itration            | 105      |
| Real Det Return     | 159      |
| Real Sto Return     | 83.3     |
| Reward Loss         | -64.9    |
| Running Env Steps   | 52500    |
| Running Forward KL  | -0.97    |
| Running Reverse KL  | 5.41     |
| Running Update Time | 105      |
----------------------------------
2025-02-01 12:01:19.511386 Eastern Standard Time
| Itration            | 106      |
| Real Det Return     | 173      |
| Real Sto Return     | 90.1     |
| Reward Loss         | -66.4    |
| Running Env Steps   | 53000    |
| Running Forward KL  | -1.23    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 106      |
----------------------------------
2025-02-01 12:01:33.393180 Eastern Standard Time
| Itration            | 107      |
| Real Det Return     | 139      |
| Real Sto Return     | 80.2     |
| Reward Loss         | -67.4    |
| Running Env Steps   | 53500    |
| Running Forward KL  | -0.906   |
| Running Reverse KL  | 5.23     |
| Running Update Time | 107      |
----------------------------------
2025-02-01 12:01:47.290696 Eastern Standard Time
| Itration            | 108      |
| Real Det Return     | 148      |
| Real Sto Return     | 89.1     |
| Reward Loss         | -65.4    |
| Running Env Steps   | 54000    |
| Running Forward KL  | -1.11    |
| Running Reverse KL  | 5.22     |
| Running Update Time | 108      |
----------------------------------
2025-02-01 12:02:01.458207 Eastern Standard Time
| Itration            | 109      |
| Real Det Return     | 164      |
| Real Sto Return     | 80.3     |
| Reward Loss         | -60.3    |
| Running Env Steps   | 54500    |
| Running Forward KL  | -1.23    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 109      |
----------------------------------
2025-02-01 12:02:15.391014 Eastern Standard Time
| Itration            | 110      |
| Real Det Return     | 166      |
| Real Sto Return     | 104      |
| Reward Loss         | -51      |
| Running Env Steps   | 55000    |
| Running Forward KL  | -1.75    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 110      |
----------------------------------
2025-02-01 12:02:29.233685 Eastern Standard Time
| Itration            | 111      |
| Real Det Return     | 158      |
| Real Sto Return     | 91.3     |
| Reward Loss         | -57.9    |
| Running Env Steps   | 55500    |
| Running Forward KL  | -1.69    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 111      |
----------------------------------
2025-02-01 12:02:43.080580 Eastern Standard Time
| Itration            | 112      |
| Real Det Return     | 154      |
| Real Sto Return     | 106      |
| Reward Loss         | -68.4    |
| Running Env Steps   | 56000    |
| Running Forward KL  | -1.17    |
| Running Reverse KL  | 4.84     |
| Running Update Time | 112      |
----------------------------------
2025-02-01 12:02:56.971148 Eastern Standard Time
| Itration            | 113      |
| Real Det Return     | 139      |
| Real Sto Return     | 103      |
| Reward Loss         | -58.5    |
| Running Env Steps   | 56500    |
| Running Forward KL  | -1.12    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 113      |
----------------------------------
2025-02-01 12:03:10.793679 Eastern Standard Time
| Itration            | 114      |
| Real Det Return     | 193      |
| Real Sto Return     | 112      |
| Reward Loss         | -65.6    |
| Running Env Steps   | 57000    |
| Running Forward KL  | -1.27    |
| Running Reverse KL  | 5.81     |
| Running Update Time | 114      |
----------------------------------
2025-02-01 12:03:24.658268 Eastern Standard Time
| Itration            | 115      |
| Real Det Return     | 169      |
| Real Sto Return     | 104      |
| Reward Loss         | -54      |
| Running Env Steps   | 57500    |
| Running Forward KL  | -1.49    |
| Running Reverse KL  | 5.52     |
| Running Update Time | 115      |
----------------------------------
2025-02-01 12:03:38.627560 Eastern Standard Time
| Itration            | 116      |
| Real Det Return     | 180      |
| Real Sto Return     | 113      |
| Reward Loss         | -68.3    |
| Running Env Steps   | 58000    |
| Running Forward KL  | -1.04    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 116      |
----------------------------------
2025-02-01 12:03:52.397202 Eastern Standard Time
| Itration            | 117      |
| Real Det Return     | 175      |
| Real Sto Return     | 105      |
| Reward Loss         | -51.6    |
| Running Env Steps   | 58500    |
| Running Forward KL  | -1.45    |
| Running Reverse KL  | 5.22     |
| Running Update Time | 117      |
----------------------------------
2025-02-01 12:04:06.297478 Eastern Standard Time
| Itration            | 118      |
| Real Det Return     | 153      |
| Real Sto Return     | 104      |
| Reward Loss         | -54.8    |
| Running Env Steps   | 59000    |
| Running Forward KL  | -1.53    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 118      |
----------------------------------
2025-02-01 12:04:20.200412 Eastern Standard Time
| Itration            | 119      |
| Real Det Return     | 189      |
| Real Sto Return     | 115      |
| Reward Loss         | -57.3    |
| Running Env Steps   | 59500    |
| Running Forward KL  | -1.3     |
| Running Reverse KL  | 5.42     |
| Running Update Time | 119      |
----------------------------------
2025-02-01 12:04:34.106579 Eastern Standard Time
| Itration            | 120      |
| Real Det Return     | 170      |
| Real Sto Return     | 110      |
| Reward Loss         | -64      |
| Running Env Steps   | 60000    |
| Running Forward KL  | -1.31    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 120      |
----------------------------------
2025-02-01 12:04:48.069644 Eastern Standard Time
| Itration            | 121      |
| Real Det Return     | 174      |
| Real Sto Return     | 123      |
| Reward Loss         | -64      |
| Running Env Steps   | 60500    |
| Running Forward KL  | -1.63    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 121      |
----------------------------------
2025-02-01 12:05:01.903281 Eastern Standard Time
| Itration            | 122      |
| Real Det Return     | 170      |
| Real Sto Return     | 109      |
| Reward Loss         | -51.3    |
| Running Env Steps   | 61000    |
| Running Forward KL  | -1.72    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 122      |
----------------------------------
2025-02-01 12:05:15.740043 Eastern Standard Time
| Itration            | 123      |
| Real Det Return     | 183      |
| Real Sto Return     | 130      |
| Reward Loss         | -61.6    |
| Running Env Steps   | 61500    |
| Running Forward KL  | -1.38    |
| Running Reverse KL  | 5        |
| Running Update Time | 123      |
----------------------------------
2025-02-01 12:05:29.512248 Eastern Standard Time
| Itration            | 124      |
| Real Det Return     | 199      |
| Real Sto Return     | 145      |
| Reward Loss         | -55.1    |
| Running Env Steps   | 62000    |
| Running Forward KL  | -2.36    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 124      |
----------------------------------
2025-02-01 12:05:43.359081 Eastern Standard Time
| Itration            | 125      |
| Real Det Return     | 156      |
| Real Sto Return     | 117      |
| Reward Loss         | -71.8    |
| Running Env Steps   | 62500    |
| Running Forward KL  | -1.52    |
| Running Reverse KL  | 5.15     |
| Running Update Time | 125      |
----------------------------------
2025-02-01 12:05:57.242875 Eastern Standard Time
| Itration            | 126      |
| Real Det Return     | 182      |
| Real Sto Return     | 127      |
| Reward Loss         | -53.8    |
| Running Env Steps   | 63000    |
| Running Forward KL  | -1.42    |
| Running Reverse KL  | 5.51     |
| Running Update Time | 126      |
----------------------------------
2025-02-01 12:06:11.151490 Eastern Standard Time
| Itration            | 127      |
| Real Det Return     | 182      |
| Real Sto Return     | 127      |
| Reward Loss         | -62.3    |
| Running Env Steps   | 63500    |
| Running Forward KL  | -1.3     |
| Running Reverse KL  | 4.75     |
| Running Update Time | 127      |
----------------------------------
2025-02-01 12:06:25.008706 Eastern Standard Time
| Itration            | 128      |
| Real Det Return     | 186      |
| Real Sto Return     | 122      |
| Reward Loss         | -66.5    |
| Running Env Steps   | 64000    |
| Running Forward KL  | -1.8     |
| Running Reverse KL  | 4.75     |
| Running Update Time | 128      |
----------------------------------
2025-02-01 12:06:38.880065 Eastern Standard Time
| Itration            | 129      |
| Real Det Return     | 180      |
| Real Sto Return     | 136      |
| Reward Loss         | -61.5    |
| Running Env Steps   | 64500    |
| Running Forward KL  | -1.48    |
| Running Reverse KL  | 5.32     |
| Running Update Time | 129      |
----------------------------------
2025-02-01 12:06:52.787373 Eastern Standard Time
| Itration            | 130      |
| Real Det Return     | 223      |
| Real Sto Return     | 144      |
| Reward Loss         | -67.9    |
| Running Env Steps   | 65000    |
| Running Forward KL  | -1.51    |
| Running Reverse KL  | 5.35     |
| Running Update Time | 130      |
----------------------------------
2025-02-01 12:07:06.674527 Eastern Standard Time
| Itration            | 131      |
| Real Det Return     | 214      |
| Real Sto Return     | 141      |
| Reward Loss         | -66.4    |
| Running Env Steps   | 65500    |
| Running Forward KL  | -1.48    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 131      |
----------------------------------
2025-02-01 12:07:20.496445 Eastern Standard Time
| Itration            | 132      |
| Real Det Return     | 221      |
| Real Sto Return     | 140      |
| Reward Loss         | -63.6    |
| Running Env Steps   | 66000    |
| Running Forward KL  | -1.88    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 132      |
----------------------------------
2025-02-01 12:07:34.414012 Eastern Standard Time
| Itration            | 133      |
| Real Det Return     | 210      |
| Real Sto Return     | 124      |
| Reward Loss         | -71.7    |
| Running Env Steps   | 66500    |
| Running Forward KL  | -1.41    |
| Running Reverse KL  | 5.07     |
| Running Update Time | 133      |
----------------------------------
2025-02-01 12:07:48.337416 Eastern Standard Time
| Itration            | 134      |
| Real Det Return     | 220      |
| Real Sto Return     | 155      |
| Reward Loss         | -61.7    |
| Running Env Steps   | 67000    |
| Running Forward KL  | -2       |
| Running Reverse KL  | 4.99     |
| Running Update Time | 134      |
----------------------------------
2025-02-01 12:08:02.277639 Eastern Standard Time
| Itration            | 135      |
| Real Det Return     | 218      |
| Real Sto Return     | 158      |
| Reward Loss         | -61      |
| Running Env Steps   | 67500    |
| Running Forward KL  | -2.04    |
| Running Reverse KL  | 4.57     |
| Running Update Time | 135      |
----------------------------------
2025-02-01 12:08:16.243894 Eastern Standard Time
| Itration            | 136      |
| Real Det Return     | 216      |
| Real Sto Return     | 147      |
| Reward Loss         | -65.7    |
| Running Env Steps   | 68000    |
| Running Forward KL  | -1.65    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 136      |
----------------------------------
2025-02-01 12:08:30.271302 Eastern Standard Time
| Itration            | 137      |
| Real Det Return     | 223      |
| Real Sto Return     | 154      |
| Reward Loss         | -64.7    |
| Running Env Steps   | 68500    |
| Running Forward KL  | -1.77    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 137      |
----------------------------------
2025-02-01 12:08:44.682515 Eastern Standard Time
| Itration            | 138      |
| Real Det Return     | 241      |
| Real Sto Return     | 160      |
| Reward Loss         | -56.4    |
| Running Env Steps   | 69000    |
| Running Forward KL  | -2.04    |
| Running Reverse KL  | 5.15     |
| Running Update Time | 138      |
----------------------------------
2025-02-01 12:08:59.622641 Eastern Standard Time
| Itration            | 139      |
| Real Det Return     | 227      |
| Real Sto Return     | 155      |
| Reward Loss         | -58.2    |
| Running Env Steps   | 69500    |
| Running Forward KL  | -2.01    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 139      |
----------------------------------
2025-02-01 12:09:13.751923 Eastern Standard Time
| Itration            | 140      |
| Real Det Return     | 227      |
| Real Sto Return     | 174      |
| Reward Loss         | -57.3    |
| Running Env Steps   | 70000    |
| Running Forward KL  | -1.5     |
| Running Reverse KL  | 5.28     |
| Running Update Time | 140      |
----------------------------------
2025-02-01 12:09:27.920426 Eastern Standard Time
| Itration            | 141      |
| Real Det Return     | 232      |
| Real Sto Return     | 170      |
| Reward Loss         | -60.9    |
| Running Env Steps   | 70500    |
| Running Forward KL  | -1.84    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 141      |
----------------------------------
2025-02-01 12:09:42.138162 Eastern Standard Time
| Itration            | 142      |
| Real Det Return     | 213      |
| Real Sto Return     | 162      |
| Reward Loss         | -68      |
| Running Env Steps   | 71000    |
| Running Forward KL  | -1.58    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 142      |
----------------------------------
2025-02-01 12:09:56.425892 Eastern Standard Time
| Itration            | 143      |
| Real Det Return     | 229      |
| Real Sto Return     | 154      |
| Reward Loss         | -58.5    |
| Running Env Steps   | 71500    |
| Running Forward KL  | -2.03    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 143      |
----------------------------------
2025-02-01 12:10:10.866669 Eastern Standard Time
| Itration            | 144      |
| Real Det Return     | 250      |
| Real Sto Return     | 176      |
| Reward Loss         | -63.5    |
| Running Env Steps   | 72000    |
| Running Forward KL  | -1.74    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 144      |
----------------------------------
2025-02-01 12:10:25.350786 Eastern Standard Time
| Itration            | 145      |
| Real Det Return     | 241      |
| Real Sto Return     | 174      |
| Reward Loss         | -53.6    |
| Running Env Steps   | 72500    |
| Running Forward KL  | -2.09    |
| Running Reverse KL  | 5        |
| Running Update Time | 145      |
----------------------------------
2025-02-01 12:10:39.885049 Eastern Standard Time
| Itration            | 146      |
| Real Det Return     | 255      |
| Real Sto Return     | 194      |
| Reward Loss         | -67      |
| Running Env Steps   | 73000    |
| Running Forward KL  | -2.11    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 146      |
----------------------------------
2025-02-01 12:10:54.452359 Eastern Standard Time
| Itration            | 147      |
| Real Det Return     | 247      |
| Real Sto Return     | 177      |
| Reward Loss         | -64.1    |
| Running Env Steps   | 73500    |
| Running Forward KL  | -2.03    |
| Running Reverse KL  | 5.09     |
| Running Update Time | 147      |
----------------------------------
2025-02-01 12:11:09.221503 Eastern Standard Time
| Itration            | 148      |
| Real Det Return     | 260      |
| Real Sto Return     | 183      |
| Reward Loss         | -64.2    |
| Running Env Steps   | 74000    |
| Running Forward KL  | -1.96    |
| Running Reverse KL  | 5.17     |
| Running Update Time | 148      |
----------------------------------
2025-02-01 12:11:24.003405 Eastern Standard Time
| Itration            | 149      |
| Real Det Return     | 261      |
| Real Sto Return     | 185      |
| Reward Loss         | -54.7    |
| Running Env Steps   | 74500    |
| Running Forward KL  | -2.6     |
| Running Reverse KL  | 5.39     |
| Running Update Time | 149      |
----------------------------------
2025-02-01 12:11:38.870753 Eastern Standard Time
| Itration            | 150      |
| Real Det Return     | 244      |
| Real Sto Return     | 175      |
| Reward Loss         | -62.5    |
| Running Env Steps   | 75000    |
| Running Forward KL  | -2.4     |
| Running Reverse KL  | 4.94     |
| Running Update Time | 150      |
----------------------------------
2025-02-01 12:11:53.866047 Eastern Standard Time
| Itration            | 151      |
| Real Det Return     | 252      |
| Real Sto Return     | 171      |
| Reward Loss         | -69.4    |
| Running Env Steps   | 75500    |
| Running Forward KL  | -2.03    |
| Running Reverse KL  | 4.52     |
| Running Update Time | 151      |
----------------------------------
2025-02-01 12:12:08.861479 Eastern Standard Time
| Itration            | 152      |
| Real Det Return     | 263      |
| Real Sto Return     | 180      |
| Reward Loss         | -68.7    |
| Running Env Steps   | 76000    |
| Running Forward KL  | -2.47    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 152      |
----------------------------------
2025-02-01 12:12:23.854866 Eastern Standard Time
| Itration            | 153      |
| Real Det Return     | 235      |
| Real Sto Return     | 177      |
| Reward Loss         | -65.3    |
| Running Env Steps   | 76500    |
| Running Forward KL  | -1.64    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 153      |
----------------------------------
2025-02-01 12:12:38.924801 Eastern Standard Time
| Itration            | 154      |
| Real Det Return     | 280      |
| Real Sto Return     | 207      |
| Reward Loss         | -65.7    |
| Running Env Steps   | 77000    |
| Running Forward KL  | -1.86    |
| Running Reverse KL  | 5.03     |
| Running Update Time | 154      |
----------------------------------
2025-02-01 12:12:54.048015 Eastern Standard Time
| Itration            | 155      |
| Real Det Return     | 251      |
| Real Sto Return     | 189      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 77500    |
| Running Forward KL  | -1.8     |
| Running Reverse KL  | 4.64     |
| Running Update Time | 155      |
----------------------------------
2025-02-01 12:13:09.196282 Eastern Standard Time
| Itration            | 156      |
| Real Det Return     | 272      |
| Real Sto Return     | 193      |
| Reward Loss         | -60.6    |
| Running Env Steps   | 78000    |
| Running Forward KL  | -1.97    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 156      |
----------------------------------
2025-02-01 12:13:24.317180 Eastern Standard Time
| Itration            | 157      |
| Real Det Return     | 247      |
| Real Sto Return     | 169      |
| Reward Loss         | -69.2    |
| Running Env Steps   | 78500    |
| Running Forward KL  | -2.06    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 157      |
----------------------------------
2025-02-01 12:13:39.473665 Eastern Standard Time
| Itration            | 158      |
| Real Det Return     | 257      |
| Real Sto Return     | 198      |
| Reward Loss         | -64.4    |
| Running Env Steps   | 79000    |
| Running Forward KL  | -2.24    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 158      |
----------------------------------
2025-02-01 12:13:54.651716 Eastern Standard Time
| Itration            | 159      |
| Real Det Return     | 260      |
| Real Sto Return     | 205      |
| Reward Loss         | -56.2    |
| Running Env Steps   | 79500    |
| Running Forward KL  | -2.47    |
| Running Reverse KL  | 4.55     |
| Running Update Time | 159      |
----------------------------------
2025-02-01 12:14:09.830908 Eastern Standard Time
| Itration            | 160      |
| Real Det Return     | 281      |
| Real Sto Return     | 204      |
| Reward Loss         | -68.9    |
| Running Env Steps   | 80000    |
| Running Forward KL  | -2.1     |
| Running Reverse KL  | 5.19     |
| Running Update Time | 160      |
----------------------------------
2025-02-01 12:14:25.072331 Eastern Standard Time
| Itration            | 161      |
| Real Det Return     | 270      |
| Real Sto Return     | 216      |
| Reward Loss         | -70.4    |
| Running Env Steps   | 80500    |
| Running Forward KL  | -1.94    |
| Running Reverse KL  | 5.52     |
| Running Update Time | 161      |
----------------------------------
2025-02-01 12:14:40.317471 Eastern Standard Time
| Itration            | 162      |
| Real Det Return     | 274      |
| Real Sto Return     | 205      |
| Reward Loss         | -64      |
| Running Env Steps   | 81000    |
| Running Forward KL  | -2.45    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 162      |
----------------------------------
2025-02-01 12:14:55.654338 Eastern Standard Time
| Itration            | 163      |
| Real Det Return     | 275      |
| Real Sto Return     | 196      |
| Reward Loss         | -67.9    |
| Running Env Steps   | 81500    |
| Running Forward KL  | -2.04    |
| Running Reverse KL  | 5.08     |
| Running Update Time | 163      |
----------------------------------
2025-02-01 12:15:10.958846 Eastern Standard Time
| Itration            | 164      |
| Real Det Return     | 241      |
| Real Sto Return     | 190      |
| Reward Loss         | -68      |
| Running Env Steps   | 82000    |
| Running Forward KL  | -2.09    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 164      |
----------------------------------
2025-02-01 12:15:26.245648 Eastern Standard Time
| Itration            | 165      |
| Real Det Return     | 288      |
| Real Sto Return     | 226      |
| Reward Loss         | -55      |
| Running Env Steps   | 82500    |
| Running Forward KL  | -2.12    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 165      |
----------------------------------
2025-02-01 12:15:41.507893 Eastern Standard Time
| Itration            | 166      |
| Real Det Return     | 265      |
| Real Sto Return     | 215      |
| Reward Loss         | -56.3    |
| Running Env Steps   | 83000    |
| Running Forward KL  | -2.02    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 166      |
----------------------------------
2025-02-01 12:15:56.756303 Eastern Standard Time
| Itration            | 167      |
| Real Det Return     | 283      |
| Real Sto Return     | 227      |
| Reward Loss         | -66.3    |
| Running Env Steps   | 83500    |
| Running Forward KL  | -2.45    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 167      |
----------------------------------
2025-02-01 12:16:12.042440 Eastern Standard Time
| Itration            | 168      |
| Real Det Return     | 272      |
| Real Sto Return     | 213      |
| Reward Loss         | -57.3    |
| Running Env Steps   | 84000    |
| Running Forward KL  | -2.15    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 168      |
----------------------------------
2025-02-01 12:16:27.365075 Eastern Standard Time
| Itration            | 169      |
| Real Det Return     | 272      |
| Real Sto Return     | 203      |
| Reward Loss         | -63.6    |
| Running Env Steps   | 84500    |
| Running Forward KL  | -1.9     |
| Running Reverse KL  | 4.61     |
| Running Update Time | 169      |
----------------------------------
2025-02-01 12:16:42.701827 Eastern Standard Time
| Itration            | 170      |
| Real Det Return     | 287      |
| Real Sto Return     | 209      |
| Reward Loss         | -64.9    |
| Running Env Steps   | 85000    |
| Running Forward KL  | -2.49    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 170      |
----------------------------------
2025-02-01 12:16:58.027115 Eastern Standard Time
| Itration            | 171      |
| Real Det Return     | 289      |
| Real Sto Return     | 211      |
| Reward Loss         | -64.7    |
| Running Env Steps   | 85500    |
| Running Forward KL  | -2.56    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 171      |
----------------------------------
2025-02-01 12:17:13.566943 Eastern Standard Time
| Itration            | 172      |
| Real Det Return     | 288      |
| Real Sto Return     | 213      |
| Reward Loss         | -61.7    |
| Running Env Steps   | 86000    |
| Running Forward KL  | -2.61    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 172      |
----------------------------------
2025-02-01 12:17:28.968506 Eastern Standard Time
| Itration            | 173      |
| Real Det Return     | 292      |
| Real Sto Return     | 234      |
| Reward Loss         | -65.4    |
| Running Env Steps   | 86500    |
| Running Forward KL  | -2.15    |
| Running Reverse KL  | 5.3      |
| Running Update Time | 173      |
----------------------------------
2025-02-01 12:17:44.381146 Eastern Standard Time
| Itration            | 174      |
| Real Det Return     | 300      |
| Real Sto Return     | 227      |
| Reward Loss         | -67.9    |
| Running Env Steps   | 87000    |
| Running Forward KL  | -2.37    |
| Running Reverse KL  | 5.21     |
| Running Update Time | 174      |
----------------------------------
2025-02-01 12:17:59.719609 Eastern Standard Time
| Itration            | 175      |
| Real Det Return     | 289      |
| Real Sto Return     | 224      |
| Reward Loss         | -63      |
| Running Env Steps   | 87500    |
| Running Forward KL  | -2.25    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 175      |
----------------------------------
2025-02-01 12:18:15.172869 Eastern Standard Time
| Itration            | 176      |
| Real Det Return     | 304      |
| Real Sto Return     | 230      |
| Reward Loss         | -64.8    |
| Running Env Steps   | 88000    |
| Running Forward KL  | -2.76    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 176      |
----------------------------------
2025-02-01 12:18:30.576499 Eastern Standard Time
| Itration            | 177      |
| Real Det Return     | 306      |
| Real Sto Return     | 227      |
| Reward Loss         | -74.1    |
| Running Env Steps   | 88500    |
| Running Forward KL  | -2.02    |
| Running Reverse KL  | 5.18     |
| Running Update Time | 177      |
----------------------------------
2025-02-01 12:18:45.936135 Eastern Standard Time
| Itration            | 178      |
| Real Det Return     | 310      |
| Real Sto Return     | 237      |
| Reward Loss         | -70      |
| Running Env Steps   | 89000    |
| Running Forward KL  | -2.04    |
| Running Reverse KL  | 5.2      |
| Running Update Time | 178      |
----------------------------------
2025-02-01 12:19:01.387162 Eastern Standard Time
| Itration            | 179      |
| Real Det Return     | 305      |
| Real Sto Return     | 237      |
| Reward Loss         | -51.2    |
| Running Env Steps   | 89500    |
| Running Forward KL  | -2.57    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 179      |
----------------------------------
2025-02-01 12:19:16.815039 Eastern Standard Time
| Itration            | 180      |
| Real Det Return     | 292      |
| Real Sto Return     | 212      |
| Reward Loss         | -80.6    |
| Running Env Steps   | 90000    |
| Running Forward KL  | -1.61    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 180      |
----------------------------------
2025-02-01 12:19:32.162628 Eastern Standard Time
| Itration            | 181      |
| Real Det Return     | 296      |
| Real Sto Return     | 221      |
| Reward Loss         | -66.8    |
| Running Env Steps   | 90500    |
| Running Forward KL  | -2.18    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 181      |
----------------------------------
2025-02-01 12:19:47.668640 Eastern Standard Time
| Itration            | 182      |
| Real Det Return     | 301      |
| Real Sto Return     | 244      |
| Reward Loss         | -62.9    |
| Running Env Steps   | 91000    |
| Running Forward KL  | -2.74    |
| Running Reverse KL  | 5.38     |
| Running Update Time | 182      |
----------------------------------
2025-02-01 12:20:03.145956 Eastern Standard Time
| Itration            | 183      |
| Real Det Return     | 305      |
| Real Sto Return     | 242      |
| Reward Loss         | -66.6    |
| Running Env Steps   | 91500    |
| Running Forward KL  | -2.47    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 183      |
----------------------------------
2025-02-01 12:20:18.679585 Eastern Standard Time
| Itration            | 184      |
| Real Det Return     | 299      |
| Real Sto Return     | 230      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 92000    |
| Running Forward KL  | -2.44    |
| Running Reverse KL  | 4.55     |
| Running Update Time | 184      |
----------------------------------
2025-02-01 12:20:34.139941 Eastern Standard Time
| Itration            | 185      |
| Real Det Return     | 310      |
| Real Sto Return     | 246      |
| Reward Loss         | -59.2    |
| Running Env Steps   | 92500    |
| Running Forward KL  | -2.96    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 185      |
----------------------------------
2025-02-01 12:20:49.621560 Eastern Standard Time
| Itration            | 186      |
| Real Det Return     | 298      |
| Real Sto Return     | 230      |
| Reward Loss         | -68.4    |
| Running Env Steps   | 93000    |
| Running Forward KL  | -2.54    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 186      |
----------------------------------
2025-02-01 12:21:05.159556 Eastern Standard Time
| Itration            | 187      |
| Real Det Return     | 305      |
| Real Sto Return     | 232      |
| Reward Loss         | -66.9    |
| Running Env Steps   | 93500    |
| Running Forward KL  | -2.38    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 187      |
----------------------------------
2025-02-01 12:21:20.669594 Eastern Standard Time
| Itration            | 188      |
| Real Det Return     | 319      |
| Real Sto Return     | 236      |
| Reward Loss         | -60.9    |
| Running Env Steps   | 94000    |
| Running Forward KL  | -2.63    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 188      |
----------------------------------
2025-02-01 12:21:36.162047 Eastern Standard Time
| Itration            | 189      |
| Real Det Return     | 300      |
| Real Sto Return     | 218      |
| Reward Loss         | -67.8    |
| Running Env Steps   | 94500    |
| Running Forward KL  | -2.37    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 189      |
----------------------------------
2025-02-01 12:21:51.698997 Eastern Standard Time
| Itration            | 190      |
| Real Det Return     | 327      |
| Real Sto Return     | 249      |
| Reward Loss         | -62.9    |
| Running Env Steps   | 95000    |
| Running Forward KL  | -2.65    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 190      |
----------------------------------
2025-02-01 12:22:07.175540 Eastern Standard Time
| Itration            | 191      |
| Real Det Return     | 337      |
| Real Sto Return     | 263      |
| Reward Loss         | -73      |
| Running Env Steps   | 95500    |
| Running Forward KL  | -2.36    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 191      |
----------------------------------
2025-02-01 12:22:22.660136 Eastern Standard Time
| Itration            | 192      |
| Real Det Return     | 323      |
| Real Sto Return     | 244      |
| Reward Loss         | -71      |
| Running Env Steps   | 96000    |
| Running Forward KL  | -2.16    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 192      |
----------------------------------
2025-02-01 12:22:38.126619 Eastern Standard Time
| Itration            | 193      |
| Real Det Return     | 332      |
| Real Sto Return     | 241      |
| Reward Loss         | -67.3    |
| Running Env Steps   | 96500    |
| Running Forward KL  | -2.43    |
| Running Reverse KL  | 4.27     |
| Running Update Time | 193      |
----------------------------------
2025-02-01 12:22:53.645361 Eastern Standard Time
| Itration            | 194      |
| Real Det Return     | 299      |
| Real Sto Return     | 228      |
| Reward Loss         | -68      |
| Running Env Steps   | 97000    |
| Running Forward KL  | -2.56    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 194      |
----------------------------------
2025-02-01 12:23:09.164319 Eastern Standard Time
| Itration            | 195      |
| Real Det Return     | 328      |
| Real Sto Return     | 266      |
| Reward Loss         | -63.6    |
| Running Env Steps   | 97500    |
| Running Forward KL  | -2.77    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 195      |
----------------------------------
2025-02-01 12:23:25.344528 Eastern Standard Time
| Itration            | 196      |
| Real Det Return     | 326      |
| Real Sto Return     | 254      |
| Reward Loss         | -62.5    |
| Running Env Steps   | 98000    |
| Running Forward KL  | -2.36    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 196      |
----------------------------------
2025-02-01 12:23:41.223801 Eastern Standard Time
| Itration            | 197      |
| Real Det Return     | 317      |
| Real Sto Return     | 251      |
| Reward Loss         | -73.2    |
| Running Env Steps   | 98500    |
| Running Forward KL  | -2.71    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 197      |
----------------------------------
2025-02-01 12:23:57.484958 Eastern Standard Time
| Itration            | 198      |
| Real Det Return     | 325      |
| Real Sto Return     | 245      |
| Reward Loss         | -70.7    |
| Running Env Steps   | 99000    |
| Running Forward KL  | -3.26    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 198      |
----------------------------------
2025-02-01 12:24:13.871492 Eastern Standard Time
| Itration            | 199      |
| Real Det Return     | 334      |
| Real Sto Return     | 236      |
| Reward Loss         | -71.9    |
| Running Env Steps   | 99500    |
| Running Forward KL  | -2.41    |
| Running Reverse KL  | 4.52     |
| Running Update Time | 199      |
----------------------------------
2025-02-01 12:24:30.478088 Eastern Standard Time
| Itration            | 200      |
| Real Det Return     | 343      |
| Real Sto Return     | 268      |
| Reward Loss         | -71.4    |
| Running Env Steps   | 100000   |
| Running Forward KL  | -2.46    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 200      |
----------------------------------
2025-02-01 12:24:46.203493 Eastern Standard Time
| Itration            | 201      |
| Real Det Return     | 351      |
| Real Sto Return     | 271      |
| Reward Loss         | -72.8    |
| Running Env Steps   | 100500   |
| Running Forward KL  | -2.52    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 201      |
----------------------------------
2025-02-01 12:25:02.183038 Eastern Standard Time
| Itration            | 202      |
| Real Det Return     | 340      |
| Real Sto Return     | 266      |
| Reward Loss         | -66.1    |
| Running Env Steps   | 101000   |
| Running Forward KL  | -2.46    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 202      |
----------------------------------
2025-02-01 12:25:17.943511 Eastern Standard Time
| Itration            | 203      |
| Real Det Return     | 346      |
| Real Sto Return     | 278      |
| Reward Loss         | -62.2    |
| Running Env Steps   | 101500   |
| Running Forward KL  | -1.99    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 203      |
----------------------------------
2025-02-01 12:25:33.955007 Eastern Standard Time
| Itration            | 204      |
| Real Det Return     | 335      |
| Real Sto Return     | 258      |
| Reward Loss         | -67.8    |
| Running Env Steps   | 102000   |
| Running Forward KL  | -2.63    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 204      |
----------------------------------
2025-02-01 12:25:49.934533 Eastern Standard Time
| Itration            | 205      |
| Real Det Return     | 351      |
| Real Sto Return     | 273      |
| Reward Loss         | -66.9    |
| Running Env Steps   | 102500   |
| Running Forward KL  | -2.8     |
| Running Reverse KL  | 4.47     |
| Running Update Time | 205      |
----------------------------------
2025-02-01 12:26:07.275597 Eastern Standard Time
| Itration            | 206      |
| Real Det Return     | 333      |
| Real Sto Return     | 261      |
| Reward Loss         | -56.6    |
| Running Env Steps   | 103000   |
| Running Forward KL  | -2.75    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 206      |
----------------------------------
2025-02-01 12:26:23.091515 Eastern Standard Time
| Itration            | 207      |
| Real Det Return     | 340      |
| Real Sto Return     | 270      |
| Reward Loss         | -63.1    |
| Running Env Steps   | 103500   |
| Running Forward KL  | -2.67    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 207      |
----------------------------------
2025-02-01 12:26:38.950225 Eastern Standard Time
| Itration            | 208      |
| Real Det Return     | 352      |
| Real Sto Return     | 267      |
| Reward Loss         | -72.6    |
| Running Env Steps   | 104000   |
| Running Forward KL  | -2.65    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 208      |
----------------------------------
2025-02-01 12:26:55.014691 Eastern Standard Time
| Itration            | 209      |
| Real Det Return     | 334      |
| Real Sto Return     | 268      |
| Reward Loss         | -71.5    |
| Running Env Steps   | 104500   |
| Running Forward KL  | -2.56    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 209      |
----------------------------------
2025-02-01 12:27:11.681120 Eastern Standard Time
| Itration            | 210      |
| Real Det Return     | 354      |
| Real Sto Return     | 276      |
| Reward Loss         | -70.1    |
| Running Env Steps   | 105000   |
| Running Forward KL  | -2.66    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 210      |
----------------------------------
2025-02-01 12:27:27.444239 Eastern Standard Time
| Itration            | 211      |
| Real Det Return     | 370      |
| Real Sto Return     | 275      |
| Reward Loss         | -71.7    |
| Running Env Steps   | 105500   |
| Running Forward KL  | -2.44    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 211      |
----------------------------------
2025-02-01 12:27:43.193968 Eastern Standard Time
| Itration            | 212      |
| Real Det Return     | 358      |
| Real Sto Return     | 279      |
| Reward Loss         | -69.6    |
| Running Env Steps   | 106000   |
| Running Forward KL  | -2.8     |
| Running Reverse KL  | 4.47     |
| Running Update Time | 212      |
----------------------------------
2025-02-01 12:27:58.700980 Eastern Standard Time
| Itration            | 213      |
| Real Det Return     | 348      |
| Real Sto Return     | 271      |
| Reward Loss         | -69.8    |
| Running Env Steps   | 106500   |
| Running Forward KL  | -2.87    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 213      |
----------------------------------
2025-02-01 12:28:14.260523 Eastern Standard Time
| Itration            | 214      |
| Real Det Return     | 373      |
| Real Sto Return     | 292      |
| Reward Loss         | -58.6    |
| Running Env Steps   | 107000   |
| Running Forward KL  | -2.83    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 214      |
----------------------------------
2025-02-01 12:28:29.848211 Eastern Standard Time
| Itration            | 215      |
| Real Det Return     | 361      |
| Real Sto Return     | 292      |
| Reward Loss         | -66.6    |
| Running Env Steps   | 107500   |
| Running Forward KL  | -2.56    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 215      |
----------------------------------
2025-02-01 12:28:45.480708 Eastern Standard Time
| Itration            | 216      |
| Real Det Return     | 370      |
| Real Sto Return     | 276      |
| Reward Loss         | -68      |
| Running Env Steps   | 108000   |
| Running Forward KL  | -2.63    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 216      |
----------------------------------
2025-02-01 12:29:02.143474 Eastern Standard Time
| Itration            | 217      |
| Real Det Return     | 358      |
| Real Sto Return     | 280      |
| Reward Loss         | -67.5    |
| Running Env Steps   | 108500   |
| Running Forward KL  | -3.31    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 217      |
----------------------------------
2025-02-01 12:29:25.539693 Eastern Standard Time
| Itration            | 218      |
| Real Det Return     | 352      |
| Real Sto Return     | 275      |
| Reward Loss         | -73.7    |
| Running Env Steps   | 109000   |
| Running Forward KL  | -2.78    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 218      |
----------------------------------
2025-02-01 12:29:46.114639 Eastern Standard Time
| Itration            | 219      |
| Real Det Return     | 364      |
| Real Sto Return     | 285      |
| Reward Loss         | -76.7    |
| Running Env Steps   | 109500   |
| Running Forward KL  | -2.4     |
| Running Reverse KL  | 4.46     |
| Running Update Time | 219      |
----------------------------------
2025-02-01 12:30:02.827319 Eastern Standard Time
| Itration            | 220      |
| Real Det Return     | 360      |
| Real Sto Return     | 307      |
| Reward Loss         | -67.7    |
| Running Env Steps   | 110000   |
| Running Forward KL  | -2.9     |
| Running Reverse KL  | 4.1      |
| Running Update Time | 220      |
----------------------------------
2025-02-01 12:30:18.838824 Eastern Standard Time
| Itration            | 221      |
| Real Det Return     | 377      |
| Real Sto Return     | 292      |
| Reward Loss         | -61.8    |
| Running Env Steps   | 110500   |
| Running Forward KL  | -2.97    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 221      |
----------------------------------
2025-02-01 12:30:35.125612 Eastern Standard Time
| Itration            | 222      |
| Real Det Return     | 362      |
| Real Sto Return     | 277      |
| Reward Loss         | -67.2    |
| Running Env Steps   | 111000   |
| Running Forward KL  | -2.7     |
| Running Reverse KL  | 4.64     |
| Running Update Time | 222      |
----------------------------------
2025-02-01 12:30:51.114181 Eastern Standard Time
| Itration            | 223      |
| Real Det Return     | 396      |
| Real Sto Return     | 305      |
| Reward Loss         | -67.2    |
| Running Env Steps   | 111500   |
| Running Forward KL  | -2.87    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 223      |
----------------------------------
2025-02-01 12:31:07.033660 Eastern Standard Time
| Itration            | 224      |
| Real Det Return     | 372      |
| Real Sto Return     | 277      |
| Reward Loss         | -66.9    |
| Running Env Steps   | 112000   |
| Running Forward KL  | -2.86    |
| Running Reverse KL  | 4.45     |
| Running Update Time | 224      |
----------------------------------
2025-02-01 12:31:23.577023 Eastern Standard Time
| Itration            | 225      |
| Real Det Return     | 372      |
| Real Sto Return     | 288      |
| Reward Loss         | -61.5    |
| Running Env Steps   | 112500   |
| Running Forward KL  | -2.37    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 225      |
----------------------------------
2025-02-01 12:31:39.411151 Eastern Standard Time
| Itration            | 226      |
| Real Det Return     | 376      |
| Real Sto Return     | 294      |
| Reward Loss         | -59.9    |
| Running Env Steps   | 113000   |
| Running Forward KL  | -3.01    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 226      |
----------------------------------
2025-02-01 12:31:55.654187 Eastern Standard Time
| Itration            | 227      |
| Real Det Return     | 374      |
| Real Sto Return     | 296      |
| Reward Loss         | -66.1    |
| Running Env Steps   | 113500   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 227      |
----------------------------------
2025-02-01 12:32:11.707161 Eastern Standard Time
| Itration            | 228      |
| Real Det Return     | 378      |
| Real Sto Return     | 308      |
| Reward Loss         | -59.8    |
| Running Env Steps   | 114000   |
| Running Forward KL  | -3.13    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 228      |
----------------------------------
2025-02-01 12:32:27.668711 Eastern Standard Time
| Itration            | 229      |
| Real Det Return     | 374      |
| Real Sto Return     | 290      |
| Reward Loss         | -60.6    |
| Running Env Steps   | 114500   |
| Running Forward KL  | -2.98    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 229      |
----------------------------------
2025-02-01 12:32:43.474540 Eastern Standard Time
| Itration            | 230      |
| Real Det Return     | 385      |
| Real Sto Return     | 309      |
| Reward Loss         | -71      |
| Running Env Steps   | 115000   |
| Running Forward KL  | -3.24    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 230      |
----------------------------------
2025-02-01 12:32:59.027147 Eastern Standard Time
| Itration            | 231      |
| Real Det Return     | 379      |
| Real Sto Return     | 296      |
| Reward Loss         | -60.7    |
| Running Env Steps   | 115500   |
| Running Forward KL  | -2.83    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 231      |
----------------------------------
2025-02-01 12:33:14.819848 Eastern Standard Time
| Itration            | 232      |
| Real Det Return     | 383      |
| Real Sto Return     | 306      |
| Reward Loss         | -69.4    |
| Running Env Steps   | 116000   |
| Running Forward KL  | -2.9     |
| Running Reverse KL  | 4.51     |
| Running Update Time | 232      |
----------------------------------
2025-02-01 12:33:30.730487 Eastern Standard Time
| Itration            | 233      |
| Real Det Return     | 375      |
| Real Sto Return     | 302      |
| Reward Loss         | -73.1    |
| Running Env Steps   | 116500   |
| Running Forward KL  | -2.84    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 233      |
----------------------------------
2025-02-01 12:33:47.614084 Eastern Standard Time
| Itration            | 234      |
| Real Det Return     | 400      |
| Real Sto Return     | 307      |
| Reward Loss         | -57.4    |
| Running Env Steps   | 117000   |
| Running Forward KL  | -2.87    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 234      |
----------------------------------
2025-02-01 12:34:03.504009 Eastern Standard Time
| Itration            | 235      |
| Real Det Return     | 373      |
| Real Sto Return     | 286      |
| Reward Loss         | -64.1    |
| Running Env Steps   | 117500   |
| Running Forward KL  | -3.03    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 235      |
----------------------------------
2025-02-01 12:34:19.457412 Eastern Standard Time
| Itration            | 236      |
| Real Det Return     | 413      |
| Real Sto Return     | 313      |
| Reward Loss         | -70.6    |
| Running Env Steps   | 118000   |
| Running Forward KL  | -2.96    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 236      |
----------------------------------
2025-02-01 12:34:35.342869 Eastern Standard Time
| Itration            | 237      |
| Real Det Return     | 389      |
| Real Sto Return     | 312      |
| Reward Loss         | -73.7    |
| Running Env Steps   | 118500   |
| Running Forward KL  | -3.3     |
| Running Reverse KL  | 3.87     |
| Running Update Time | 237      |
----------------------------------
2025-02-01 12:34:51.203270 Eastern Standard Time
| Itration            | 238      |
| Real Det Return     | 372      |
| Real Sto Return     | 301      |
| Reward Loss         | -69.5    |
| Running Env Steps   | 119000   |
| Running Forward KL  | -3.14    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 238      |
----------------------------------
2025-02-01 12:35:07.076231 Eastern Standard Time
| Itration            | 239      |
| Real Det Return     | 405      |
| Real Sto Return     | 321      |
| Reward Loss         | -61.5    |
| Running Env Steps   | 119500   |
| Running Forward KL  | -3.03    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 239      |
----------------------------------
2025-02-01 12:35:22.985842 Eastern Standard Time
| Itration            | 240      |
| Real Det Return     | 393      |
| Real Sto Return     | 324      |
| Reward Loss         | -53.4    |
| Running Env Steps   | 120000   |
| Running Forward KL  | -2.78    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 240      |
----------------------------------
2025-02-01 12:35:38.807591 Eastern Standard Time
| Itration            | 241      |
| Real Det Return     | 400      |
| Real Sto Return     | 332      |
| Reward Loss         | -61.4    |
| Running Env Steps   | 120500   |
| Running Forward KL  | -3.15    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 241      |
----------------------------------
2025-02-01 12:35:55.545912 Eastern Standard Time
| Itration            | 242      |
| Real Det Return     | 400      |
| Real Sto Return     | 326      |
| Reward Loss         | -55.1    |
| Running Env Steps   | 121000   |
| Running Forward KL  | -3.5     |
| Running Reverse KL  | 4.18     |
| Running Update Time | 242      |
----------------------------------
2025-02-01 12:36:12.091300 Eastern Standard Time
| Itration            | 243      |
| Real Det Return     | 395      |
| Real Sto Return     | 313      |
| Reward Loss         | -56.8    |
| Running Env Steps   | 121500   |
| Running Forward KL  | -3.1     |
| Running Reverse KL  | 4.49     |
| Running Update Time | 243      |
----------------------------------
2025-02-01 12:36:28.742860 Eastern Standard Time
| Itration            | 244      |
| Real Det Return     | 402      |
| Real Sto Return     | 327      |
| Reward Loss         | -62.8    |
| Running Env Steps   | 122000   |
| Running Forward KL  | -3.21    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 244      |
----------------------------------
2025-02-01 12:36:44.951065 Eastern Standard Time
| Itration            | 245      |
| Real Det Return     | 407      |
| Real Sto Return     | 310      |
| Reward Loss         | -65.6    |
| Running Env Steps   | 122500   |
| Running Forward KL  | -3.01    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 245      |
----------------------------------
2025-02-01 12:37:01.195918 Eastern Standard Time
| Itration            | 246      |
| Real Det Return     | 387      |
| Real Sto Return     | 323      |
| Reward Loss         | -63.4    |
| Running Env Steps   | 123000   |
| Running Forward KL  | -3.51    |
| Running Reverse KL  | 4.1      |
| Running Update Time | 246      |
----------------------------------
2025-02-01 12:37:18.318755 Eastern Standard Time
| Itration            | 247      |
| Real Det Return     | 399      |
| Real Sto Return     | 318      |
| Reward Loss         | -60.3    |
| Running Env Steps   | 123500   |
| Running Forward KL  | -3.29    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 247      |
----------------------------------
2025-02-01 12:37:34.470955 Eastern Standard Time
| Itration            | 248      |
| Real Det Return     | 396      |
| Real Sto Return     | 318      |
| Reward Loss         | -67.5    |
| Running Env Steps   | 124000   |
| Running Forward KL  | -3.2     |
| Running Reverse KL  | 4.45     |
| Running Update Time | 248      |
----------------------------------
2025-02-01 12:37:51.192507 Eastern Standard Time
| Itration            | 249      |
| Real Det Return     | 404      |
| Real Sto Return     | 311      |
| Reward Loss         | -61.5    |
| Running Env Steps   | 124500   |
| Running Forward KL  | -3.41    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 249      |
----------------------------------
2025-02-01 12:38:10.920334 Eastern Standard Time
| Itration            | 250      |
| Real Det Return     | 402      |
| Real Sto Return     | 327      |
| Reward Loss         | -58.5    |
| Running Env Steps   | 125000   |
| Running Forward KL  | -3.65    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 250      |
----------------------------------
2025-02-01 12:38:26.574664 Eastern Standard Time
| Itration            | 251      |
| Real Det Return     | 412      |
| Real Sto Return     | 324      |
| Reward Loss         | -63.8    |
| Running Env Steps   | 125500   |
| Running Forward KL  | -3.04    |
| Running Reverse KL  | 4.55     |
| Running Update Time | 251      |
----------------------------------
2025-02-01 12:38:42.206156 Eastern Standard Time
| Itration            | 252      |
| Real Det Return     | 397      |
| Real Sto Return     | 307      |
| Reward Loss         | -73.4    |
| Running Env Steps   | 126000   |
| Running Forward KL  | -3.51    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 252      |
----------------------------------
2025-02-01 12:38:59.029669 Eastern Standard Time
| Itration            | 253      |
| Real Det Return     | 406      |
| Real Sto Return     | 315      |
| Reward Loss         | -52.6    |
| Running Env Steps   | 126500   |
| Running Forward KL  | -3.27    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 253      |
----------------------------------
2025-02-01 12:39:14.941533 Eastern Standard Time
| Itration            | 254      |
| Real Det Return     | 411      |
| Real Sto Return     | 339      |
| Reward Loss         | -62.5    |
| Running Env Steps   | 127000   |
| Running Forward KL  | -3.48    |
| Running Reverse KL  | 4.05     |
| Running Update Time | 254      |
----------------------------------
2025-02-01 12:39:30.752546 Eastern Standard Time
| Itration            | 255      |
| Real Det Return     | 386      |
| Real Sto Return     | 313      |
| Reward Loss         | -60      |
| Running Env Steps   | 127500   |
| Running Forward KL  | -3.24    |
| Running Reverse KL  | 4.08     |
| Running Update Time | 255      |
----------------------------------
2025-02-01 12:39:46.465315 Eastern Standard Time
| Itration            | 256      |
| Real Det Return     | 393      |
| Real Sto Return     | 315      |
| Reward Loss         | -64.2    |
| Running Env Steps   | 128000   |
| Running Forward KL  | -3.3     |
| Running Reverse KL  | 4.24     |
| Running Update Time | 256      |
----------------------------------
2025-02-01 12:40:02.324804 Eastern Standard Time
| Itration            | 257      |
| Real Det Return     | 406      |
| Real Sto Return     | 326      |
| Reward Loss         | -65      |
| Running Env Steps   | 128500   |
| Running Forward KL  | -3.03    |
| Running Reverse KL  | 4.38     |
| Running Update Time | 257      |
----------------------------------
2025-02-01 12:40:17.905809 Eastern Standard Time
| Itration            | 258      |
| Real Det Return     | 414      |
| Real Sto Return     | 322      |
| Reward Loss         | -63.2    |
| Running Env Steps   | 129000   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 3.81     |
| Running Update Time | 258      |
----------------------------------
2025-02-01 12:40:33.613291 Eastern Standard Time
| Itration            | 259      |
| Real Det Return     | 411      |
| Real Sto Return     | 322      |
| Reward Loss         | -61.1    |
| Running Env Steps   | 129500   |
| Running Forward KL  | -3.2     |
| Running Reverse KL  | 4.06     |
| Running Update Time | 259      |
----------------------------------
2025-02-01 12:40:49.793711 Eastern Standard Time
| Itration            | 260      |
| Real Det Return     | 413      |
| Real Sto Return     | 319      |
| Reward Loss         | -71.2    |
| Running Env Steps   | 130000   |
| Running Forward KL  | -3.2     |
| Running Reverse KL  | 4.1      |
| Running Update Time | 260      |
----------------------------------
2025-02-01 12:41:07.113644 Eastern Standard Time
| Itration            | 261      |
| Real Det Return     | 403      |
| Real Sto Return     | 318      |
| Reward Loss         | -63.4    |
| Running Env Steps   | 130500   |
| Running Forward KL  | -3.52    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 261      |
----------------------------------
2025-02-01 12:41:23.375355 Eastern Standard Time
| Itration            | 262      |
| Real Det Return     | 433      |
| Real Sto Return     | 332      |
| Reward Loss         | -65.4    |
| Running Env Steps   | 131000   |
| Running Forward KL  | -3.49    |
| Running Reverse KL  | 4        |
| Running Update Time | 262      |
----------------------------------
2025-02-01 12:41:39.446643 Eastern Standard Time
| Itration            | 263      |
| Real Det Return     | 420      |
| Real Sto Return     | 333      |
| Reward Loss         | -62.2    |
| Running Env Steps   | 131500   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 263      |
----------------------------------
2025-02-01 12:41:55.226980 Eastern Standard Time
| Itration            | 264      |
| Real Det Return     | 423      |
| Real Sto Return     | 346      |
| Reward Loss         | -53.9    |
| Running Env Steps   | 132000   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 4.31     |
| Running Update Time | 264      |
----------------------------------
2025-02-01 12:42:10.932034 Eastern Standard Time
| Itration            | 265      |
| Real Det Return     | 418      |
| Real Sto Return     | 331      |
| Reward Loss         | -62.7    |
| Running Env Steps   | 132500   |
| Running Forward KL  | -3.34    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 265      |
----------------------------------
2025-02-01 12:42:26.876115 Eastern Standard Time
| Itration            | 266      |
| Real Det Return     | 406      |
| Real Sto Return     | 326      |
| Reward Loss         | -64.8    |
| Running Env Steps   | 133000   |
| Running Forward KL  | -3.5     |
| Running Reverse KL  | 4.35     |
| Running Update Time | 266      |
----------------------------------
2025-02-01 12:42:42.958731 Eastern Standard Time
| Itration            | 267      |
| Real Det Return     | 434      |
| Real Sto Return     | 326      |
| Reward Loss         | -65      |
| Running Env Steps   | 133500   |
| Running Forward KL  | -3.32    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 267      |
----------------------------------
2025-02-01 12:42:59.745093 Eastern Standard Time
| Itration            | 268      |
| Real Det Return     | 435      |
| Real Sto Return     | 341      |
| Reward Loss         | -60.9    |
| Running Env Steps   | 134000   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 268      |
----------------------------------
2025-02-01 12:43:15.359166 Eastern Standard Time
| Itration            | 269      |
| Real Det Return     | 416      |
| Real Sto Return     | 334      |
| Reward Loss         | -62.4    |
| Running Env Steps   | 134500   |
| Running Forward KL  | -3.18    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 269      |
----------------------------------
2025-02-01 12:43:30.884514 Eastern Standard Time
| Itration            | 270      |
| Real Det Return     | 443      |
| Real Sto Return     | 339      |
| Reward Loss         | -66.7    |
| Running Env Steps   | 135000   |
| Running Forward KL  | -3.24    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 270      |
----------------------------------
2025-02-01 12:43:46.496116 Eastern Standard Time
| Itration            | 271      |
| Real Det Return     | 417      |
| Real Sto Return     | 339      |
| Reward Loss         | -68.3    |
| Running Env Steps   | 135500   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 3.7      |
| Running Update Time | 271      |
----------------------------------
2025-02-01 12:44:02.055084 Eastern Standard Time
| Itration            | 272      |
| Real Det Return     | 432      |
| Real Sto Return     | 355      |
| Reward Loss         | -54.2    |
| Running Env Steps   | 136000   |
| Running Forward KL  | -3.15    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 272      |
----------------------------------
2025-02-01 12:44:17.593559 Eastern Standard Time
| Itration            | 273      |
| Real Det Return     | 412      |
| Real Sto Return     | 346      |
| Reward Loss         | -57      |
| Running Env Steps   | 136500   |
| Running Forward KL  | -3.54    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 273      |
----------------------------------
2025-02-01 12:44:33.217442 Eastern Standard Time
| Itration            | 274      |
| Real Det Return     | 400      |
| Real Sto Return     | 326      |
| Reward Loss         | -65.3    |
| Running Env Steps   | 137000   |
| Running Forward KL  | -3.36    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 274      |
----------------------------------
2025-02-01 12:44:48.761238 Eastern Standard Time
| Itration            | 275      |
| Real Det Return     | 419      |
| Real Sto Return     | 340      |
| Reward Loss         | -59.3    |
| Running Env Steps   | 137500   |
| Running Forward KL  | -3.49    |
| Running Reverse KL  | 4.03     |
| Running Update Time | 275      |
----------------------------------
2025-02-01 12:45:04.264806 Eastern Standard Time
| Itration            | 276      |
| Real Det Return     | 427      |
| Real Sto Return     | 340      |
| Reward Loss         | -59.1    |
| Running Env Steps   | 138000   |
| Running Forward KL  | -3.18    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 276      |
----------------------------------
2025-02-01 12:45:19.898693 Eastern Standard Time
| Itration            | 277      |
| Real Det Return     | 413      |
| Real Sto Return     | 338      |
| Reward Loss         | -59.2    |
| Running Env Steps   | 138500   |
| Running Forward KL  | -3.3     |
| Running Reverse KL  | 4.37     |
| Running Update Time | 277      |
----------------------------------
2025-02-01 12:45:35.470674 Eastern Standard Time
| Itration            | 278      |
| Real Det Return     | 446      |
| Real Sto Return     | 367      |
| Reward Loss         | -57.3    |
| Running Env Steps   | 139000   |
| Running Forward KL  | -3.15    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 278      |
----------------------------------
2025-02-01 12:45:51.052648 Eastern Standard Time
| Itration            | 279      |
| Real Det Return     | 409      |
| Real Sto Return     | 352      |
| Reward Loss         | -68.4    |
| Running Env Steps   | 139500   |
| Running Forward KL  | -3.52    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 279      |
----------------------------------
2025-02-01 12:46:06.688499 Eastern Standard Time
| Itration            | 280      |
| Real Det Return     | 431      |
| Real Sto Return     | 351      |
| Reward Loss         | -62.3    |
| Running Env Steps   | 140000   |
| Running Forward KL  | -3.3     |
| Running Reverse KL  | 3.9      |
| Running Update Time | 280      |
----------------------------------
2025-02-01 12:46:22.220509 Eastern Standard Time
| Itration            | 281      |
| Real Det Return     | 428      |
| Real Sto Return     | 344      |
| Reward Loss         | -53.8    |
| Running Env Steps   | 140500   |
| Running Forward KL  | -3.59    |
| Running Reverse KL  | 3.78     |
| Running Update Time | 281      |
----------------------------------
2025-02-01 12:46:37.750473 Eastern Standard Time
| Itration            | 282      |
| Real Det Return     | 428      |
| Real Sto Return     | 351      |
| Reward Loss         | -59.6    |
| Running Env Steps   | 141000   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 4.27     |
| Running Update Time | 282      |
----------------------------------
2025-02-01 12:46:53.256248 Eastern Standard Time
| Itration            | 283      |
| Real Det Return     | 420      |
| Real Sto Return     | 352      |
| Reward Loss         | -60.6    |
| Running Env Steps   | 141500   |
| Running Forward KL  | -3.22    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 283      |
----------------------------------
2025-02-01 12:47:10.059802 Eastern Standard Time
| Itration            | 284      |
| Real Det Return     | 432      |
| Real Sto Return     | 347      |
| Reward Loss         | -49.5    |
| Running Env Steps   | 142000   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 4.31     |
| Running Update Time | 284      |
----------------------------------
2025-02-01 12:47:26.460320 Eastern Standard Time
| Itration            | 285      |
| Real Det Return     | 455      |
| Real Sto Return     | 369      |
| Reward Loss         | -53.7    |
| Running Env Steps   | 142500   |
| Running Forward KL  | -3.66    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 285      |
----------------------------------
2025-02-01 12:47:42.064529 Eastern Standard Time
| Itration            | 286      |
| Real Det Return     | 431      |
| Real Sto Return     | 338      |
| Reward Loss         | -65.1    |
| Running Env Steps   | 143000   |
| Running Forward KL  | -3.46    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 286      |
----------------------------------
2025-02-01 12:47:59.097115 Eastern Standard Time
| Itration            | 287      |
| Real Det Return     | 448      |
| Real Sto Return     | 352      |
| Reward Loss         | -66.9    |
| Running Env Steps   | 143500   |
| Running Forward KL  | -3.33    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 287      |
----------------------------------
2025-02-01 12:48:15.301867 Eastern Standard Time
| Itration            | 288      |
| Real Det Return     | 423      |
| Real Sto Return     | 362      |
| Reward Loss         | -63      |
| Running Env Steps   | 144000   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 4.01     |
| Running Update Time | 288      |
----------------------------------
2025-02-01 12:48:30.920841 Eastern Standard Time
| Itration            | 289      |
| Real Det Return     | 435      |
| Real Sto Return     | 341      |
| Reward Loss         | -62.2    |
| Running Env Steps   | 144500   |
| Running Forward KL  | -3.17    |
| Running Reverse KL  | 3.83     |
| Running Update Time | 289      |
----------------------------------
2025-02-01 12:48:46.614982 Eastern Standard Time
| Itration            | 290      |
| Real Det Return     | 467      |
| Real Sto Return     | 371      |
| Reward Loss         | -55.6    |
| Running Env Steps   | 145000   |
| Running Forward KL  | -3.52    |
| Running Reverse KL  | 3.66     |
| Running Update Time | 290      |
----------------------------------
2025-02-01 12:49:02.697158 Eastern Standard Time
| Itration            | 291      |
| Real Det Return     | 440      |
| Real Sto Return     | 358      |
| Reward Loss         | -54.7    |
| Running Env Steps   | 145500   |
| Running Forward KL  | -3.78    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 291      |
----------------------------------
2025-02-01 12:49:18.380634 Eastern Standard Time
| Itration            | 292      |
| Real Det Return     | 445      |
| Real Sto Return     | 355      |
| Reward Loss         | -61.8    |
| Running Env Steps   | 146000   |
| Running Forward KL  | -3.54    |
| Running Reverse KL  | 3.92     |
| Running Update Time | 292      |
----------------------------------
2025-02-01 12:49:34.504566 Eastern Standard Time
| Itration            | 293      |
| Real Det Return     | 432      |
| Real Sto Return     | 367      |
| Reward Loss         | -58.2    |
| Running Env Steps   | 146500   |
| Running Forward KL  | -2.88    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 293      |
----------------------------------
2025-02-01 12:49:50.311065 Eastern Standard Time
| Itration            | 294      |
| Real Det Return     | 446      |
| Real Sto Return     | 354      |
| Reward Loss         | -62.1    |
| Running Env Steps   | 147000   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 3.87     |
| Running Update Time | 294      |
----------------------------------
2025-02-01 12:50:05.953383 Eastern Standard Time
| Itration            | 295      |
| Real Det Return     | 438      |
| Real Sto Return     | 352      |
| Reward Loss         | -61.7    |
| Running Env Steps   | 147500   |
| Running Forward KL  | -3.14    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 295      |
----------------------------------
2025-02-01 12:50:23.063684 Eastern Standard Time
| Itration            | 296      |
| Real Det Return     | 437      |
| Real Sto Return     | 360      |
| Reward Loss         | -48.4    |
| Running Env Steps   | 148000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 3.52     |
| Running Update Time | 296      |
----------------------------------
2025-02-01 12:50:38.792282 Eastern Standard Time
| Itration            | 297      |
| Real Det Return     | 435      |
| Real Sto Return     | 362      |
| Reward Loss         | -56.2    |
| Running Env Steps   | 148500   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 297      |
----------------------------------
2025-02-01 12:50:54.743422 Eastern Standard Time
| Itration            | 298      |
| Real Det Return     | 442      |
| Real Sto Return     | 377      |
| Reward Loss         | -41.1    |
| Running Env Steps   | 149000   |
| Running Forward KL  | -4.1     |
| Running Reverse KL  | 3.9      |
| Running Update Time | 298      |
----------------------------------
2025-02-01 12:51:12.680564 Eastern Standard Time
| Itration            | 299      |
| Real Det Return     | 460      |
| Real Sto Return     | 364      |
| Reward Loss         | -43.5    |
| Running Env Steps   | 149500   |
| Running Forward KL  | -3.87    |
| Running Reverse KL  | 3.77     |
| Running Update Time | 299      |
----------------------------------
2025-02-01 12:51:28.512003 Eastern Standard Time
| Itration            | 300      |
| Real Det Return     | 446      |
| Real Sto Return     | 364      |
| Reward Loss         | -56.5    |
| Running Env Steps   | 150000   |
| Running Forward KL  | -3.19    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 300      |
----------------------------------
2025-02-01 12:51:47.557504 Eastern Standard Time
| Itration            | 301      |
| Real Det Return     | 450      |
| Real Sto Return     | 363      |
| Reward Loss         | -66.9    |
| Running Env Steps   | 150500   |
| Running Forward KL  | -3.55    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 301      |
----------------------------------
2025-02-01 12:52:03.936834 Eastern Standard Time
| Itration            | 302      |
| Real Det Return     | 452      |
| Real Sto Return     | 361      |
| Reward Loss         | -53.8    |
| Running Env Steps   | 151000   |
| Running Forward KL  | -3.61    |
| Running Reverse KL  | 4.01     |
| Running Update Time | 302      |
----------------------------------
2025-02-01 12:52:19.880329 Eastern Standard Time
| Itration            | 303      |
| Real Det Return     | 438      |
| Real Sto Return     | 372      |
| Reward Loss         | -53      |
| Running Env Steps   | 151500   |
| Running Forward KL  | -3.94    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 303      |
----------------------------------
2025-02-01 12:52:37.998152 Eastern Standard Time
| Itration            | 304      |
| Real Det Return     | 444      |
| Real Sto Return     | 365      |
| Reward Loss         | -59.3    |
| Running Env Steps   | 152000   |
| Running Forward KL  | -3.52    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 304      |
----------------------------------
2025-02-01 12:52:54.354422 Eastern Standard Time
| Itration            | 305      |
| Real Det Return     | 441      |
| Real Sto Return     | 358      |
| Reward Loss         | -58.5    |
| Running Env Steps   | 152500   |
| Running Forward KL  | -3.71    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 305      |
----------------------------------
2025-02-01 12:53:10.828710 Eastern Standard Time
| Itration            | 306      |
| Real Det Return     | 427      |
| Real Sto Return     | 367      |
| Reward Loss         | -50.8    |
| Running Env Steps   | 153000   |
| Running Forward KL  | -3.66    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 306      |
----------------------------------
2025-02-01 12:53:27.209057 Eastern Standard Time
| Itration            | 307      |
| Real Det Return     | 429      |
| Real Sto Return     | 371      |
| Reward Loss         | -49      |
| Running Env Steps   | 153500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 307      |
----------------------------------
2025-02-01 12:53:43.331474 Eastern Standard Time
| Itration            | 308      |
| Real Det Return     | 458      |
| Real Sto Return     | 372      |
| Reward Loss         | -49.3    |
| Running Env Steps   | 154000   |
| Running Forward KL  | -3.29    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 308      |
----------------------------------
2025-02-01 12:53:58.956849 Eastern Standard Time
| Itration            | 309      |
| Real Det Return     | 456      |
| Real Sto Return     | 383      |
| Reward Loss         | -46.3    |
| Running Env Steps   | 154500   |
| Running Forward KL  | -3.41    |
| Running Reverse KL  | 3.99     |
| Running Update Time | 309      |
----------------------------------
2025-02-01 12:54:14.591300 Eastern Standard Time
| Itration            | 310      |
| Real Det Return     | 447      |
| Real Sto Return     | 365      |
| Reward Loss         | -59.2    |
| Running Env Steps   | 155000   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 3.83     |
| Running Update Time | 310      |
----------------------------------
2025-02-01 12:54:30.607180 Eastern Standard Time
| Itration            | 311      |
| Real Det Return     | 464      |
| Real Sto Return     | 375      |
| Reward Loss         | -64.1    |
| Running Env Steps   | 155500   |
| Running Forward KL  | -3.65    |
| Running Reverse KL  | 3.87     |
| Running Update Time | 311      |
----------------------------------
2025-02-01 12:54:47.402226 Eastern Standard Time
| Itration            | 312      |
| Real Det Return     | 443      |
| Real Sto Return     | 365      |
| Reward Loss         | -55      |
| Running Env Steps   | 156000   |
| Running Forward KL  | -3.46    |
| Running Reverse KL  | 3.81     |
| Running Update Time | 312      |
----------------------------------
2025-02-01 12:55:03.472881 Eastern Standard Time
| Itration            | 313      |
| Real Det Return     | 452      |
| Real Sto Return     | 383      |
| Reward Loss         | -43      |
| Running Env Steps   | 156500   |
| Running Forward KL  | -3.37    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 313      |
----------------------------------
2025-02-01 12:55:18.965513 Eastern Standard Time
| Itration            | 314      |
| Real Det Return     | 434      |
| Real Sto Return     | 366      |
| Reward Loss         | -55.3    |
| Running Env Steps   | 157000   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 3.84     |
| Running Update Time | 314      |
----------------------------------
2025-02-01 12:55:34.517693 Eastern Standard Time
| Itration            | 315      |
| Real Det Return     | 452      |
| Real Sto Return     | 373      |
| Reward Loss         | -46.7    |
| Running Env Steps   | 157500   |
| Running Forward KL  | -3.81    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 315      |
----------------------------------
2025-02-01 12:55:50.488194 Eastern Standard Time
| Itration            | 316      |
| Real Det Return     | 457      |
| Real Sto Return     | 390      |
| Reward Loss         | -56.4    |
| Running Env Steps   | 158000   |
| Running Forward KL  | -3.6     |
| Running Reverse KL  | 4.07     |
| Running Update Time | 316      |
----------------------------------
2025-02-01 12:56:06.284411 Eastern Standard Time
| Itration            | 317      |
| Real Det Return     | 438      |
| Real Sto Return     | 371      |
| Reward Loss         | -52.2    |
| Running Env Steps   | 158500   |
| Running Forward KL  | -3.91    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 317      |
----------------------------------
2025-02-01 12:56:22.140624 Eastern Standard Time
| Itration            | 318      |
| Real Det Return     | 455      |
| Real Sto Return     | 377      |
| Reward Loss         | -65.1    |
| Running Env Steps   | 159000   |
| Running Forward KL  | -3.8     |
| Running Reverse KL  | 4.36     |
| Running Update Time | 318      |
----------------------------------
2025-02-01 12:56:37.843305 Eastern Standard Time
| Itration            | 319      |
| Real Det Return     | 456      |
| Real Sto Return     | 383      |
| Reward Loss         | -43.5    |
| Running Env Steps   | 159500   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 319      |
----------------------------------
2025-02-01 12:56:53.408603 Eastern Standard Time
| Itration            | 320      |
| Real Det Return     | 431      |
| Real Sto Return     | 370      |
| Reward Loss         | -52.2    |
| Running Env Steps   | 160000   |
| Running Forward KL  | -3.9     |
| Running Reverse KL  | 4.25     |
| Running Update Time | 320      |
----------------------------------
2025-02-01 12:57:09.018151 Eastern Standard Time
| Itration            | 321      |
| Real Det Return     | 478      |
| Real Sto Return     | 389      |
| Reward Loss         | -52.7    |
| Running Env Steps   | 160500   |
| Running Forward KL  | -3.58    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 321      |
----------------------------------
2025-02-01 12:57:24.668468 Eastern Standard Time
| Itration            | 322      |
| Real Det Return     | 444      |
| Real Sto Return     | 379      |
| Reward Loss         | -44.9    |
| Running Env Steps   | 161000   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 322      |
----------------------------------
2025-02-01 12:57:40.221632 Eastern Standard Time
| Itration            | 323      |
| Real Det Return     | 479      |
| Real Sto Return     | 390      |
| Reward Loss         | -53.5    |
| Running Env Steps   | 161500   |
| Running Forward KL  | -3.44    |
| Running Reverse KL  | 3.73     |
| Running Update Time | 323      |
----------------------------------
2025-02-01 12:57:55.909995 Eastern Standard Time
| Itration            | 324      |
| Real Det Return     | 460      |
| Real Sto Return     | 391      |
| Reward Loss         | -59      |
| Running Env Steps   | 162000   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 3.56     |
| Running Update Time | 324      |
----------------------------------
2025-02-01 12:58:11.972738 Eastern Standard Time
| Itration            | 325      |
| Real Det Return     | 478      |
| Real Sto Return     | 388      |
| Reward Loss         | -60.5    |
| Running Env Steps   | 162500   |
| Running Forward KL  | -3.37    |
| Running Reverse KL  | 3.91     |
| Running Update Time | 325      |
----------------------------------
2025-02-01 12:58:27.668746 Eastern Standard Time
| Itration            | 326      |
| Real Det Return     | 478      |
| Real Sto Return     | 405      |
| Reward Loss         | -38.8    |
| Running Env Steps   | 163000   |
| Running Forward KL  | -3.55    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 326      |
----------------------------------
2025-02-01 12:58:43.388587 Eastern Standard Time
| Itration            | 327      |
| Real Det Return     | 464      |
| Real Sto Return     | 389      |
| Reward Loss         | -49.8    |
| Running Env Steps   | 163500   |
| Running Forward KL  | -3.62    |
| Running Reverse KL  | 3.79     |
| Running Update Time | 327      |
----------------------------------
2025-02-01 12:58:59.874543 Eastern Standard Time
| Itration            | 328      |
| Real Det Return     | 470      |
| Real Sto Return     | 391      |
| Reward Loss         | -52.2    |
| Running Env Steps   | 164000   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 3.76     |
| Running Update Time | 328      |
----------------------------------
2025-02-01 12:59:16.184025 Eastern Standard Time
| Itration            | 329      |
| Real Det Return     | 468      |
| Real Sto Return     | 390      |
| Reward Loss         | -49.4    |
| Running Env Steps   | 164500   |
| Running Forward KL  | -3.56    |
| Running Reverse KL  | 4.1      |
| Running Update Time | 329      |
----------------------------------
2025-02-01 12:59:32.473555 Eastern Standard Time
| Itration            | 330      |
| Real Det Return     | 470      |
| Real Sto Return     | 398      |
| Reward Loss         | -54      |
| Running Env Steps   | 165000   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 3.71     |
| Running Update Time | 330      |
----------------------------------
2025-02-01 12:59:48.630549 Eastern Standard Time
| Itration            | 331      |
| Real Det Return     | 473      |
| Real Sto Return     | 409      |
| Reward Loss         | -42      |
| Running Env Steps   | 165500   |
| Running Forward KL  | -3.61    |
| Running Reverse KL  | 3.93     |
| Running Update Time | 331      |
----------------------------------
2025-02-01 13:00:04.478359 Eastern Standard Time
| Itration            | 332      |
| Real Det Return     | 487      |
| Real Sto Return     | 395      |
| Reward Loss         | -44.3    |
| Running Env Steps   | 166000   |
| Running Forward KL  | -3.7     |
| Running Reverse KL  | 4.22     |
| Running Update Time | 332      |
----------------------------------
2025-02-01 13:00:20.842243 Eastern Standard Time
| Itration            | 333      |
| Real Det Return     | 451      |
| Real Sto Return     | 387      |
| Reward Loss         | -57.8    |
| Running Env Steps   | 166500   |
| Running Forward KL  | -3.59    |
| Running Reverse KL  | 3.82     |
| Running Update Time | 333      |
----------------------------------
2025-02-01 13:00:36.896993 Eastern Standard Time
| Itration            | 334      |
| Real Det Return     | 449      |
| Real Sto Return     | 382      |
| Reward Loss         | -53.2    |
| Running Env Steps   | 167000   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 3.81     |
| Running Update Time | 334      |
----------------------------------
2025-02-01 13:00:52.748748 Eastern Standard Time
| Itration            | 335      |
| Real Det Return     | 448      |
| Real Sto Return     | 380      |
| Reward Loss         | -47.9    |
| Running Env Steps   | 167500   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 335      |
----------------------------------
2025-02-01 13:01:09.180726 Eastern Standard Time
| Itration            | 336      |
| Real Det Return     | 467      |
| Real Sto Return     | 399      |
| Reward Loss         | -47      |
| Running Env Steps   | 168000   |
| Running Forward KL  | -3.3     |
| Running Reverse KL  | 3.95     |
| Running Update Time | 336      |
----------------------------------
2025-02-01 13:01:25.884711 Eastern Standard Time
| Itration            | 337      |
| Real Det Return     | 455      |
| Real Sto Return     | 384      |
| Reward Loss         | -44.3    |
| Running Env Steps   | 168500   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 337      |
----------------------------------
2025-02-01 13:01:42.459349 Eastern Standard Time
| Itration            | 338      |
| Real Det Return     | 467      |
| Real Sto Return     | 409      |
| Reward Loss         | -43.8    |
| Running Env Steps   | 169000   |
| Running Forward KL  | -3.86    |
| Running Reverse KL  | 3.64     |
| Running Update Time | 338      |
----------------------------------
2025-02-01 13:01:59.008160 Eastern Standard Time
| Itration            | 339      |
| Real Det Return     | 471      |
| Real Sto Return     | 391      |
| Reward Loss         | -51      |
| Running Env Steps   | 169500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 339      |
----------------------------------
2025-02-01 13:02:15.224458 Eastern Standard Time
| Itration            | 340      |
| Real Det Return     | 457      |
| Real Sto Return     | 396      |
| Reward Loss         | -39.9    |
| Running Env Steps   | 170000   |
| Running Forward KL  | -3.84    |
| Running Reverse KL  | 3.91     |
| Running Update Time | 340      |
----------------------------------
2025-02-01 13:02:30.783353 Eastern Standard Time
| Itration            | 341      |
| Real Det Return     | 485      |
| Real Sto Return     | 414      |
| Reward Loss         | -39.7    |
| Running Env Steps   | 170500   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 3.94     |
| Running Update Time | 341      |
----------------------------------
2025-02-01 13:02:46.625247 Eastern Standard Time
| Itration            | 342      |
| Real Det Return     | 482      |
| Real Sto Return     | 406      |
| Reward Loss         | -51.9    |
| Running Env Steps   | 171000   |
| Running Forward KL  | -4.04    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 342      |
----------------------------------
2025-02-01 13:03:02.176076 Eastern Standard Time
| Itration            | 343      |
| Real Det Return     | 470      |
| Real Sto Return     | 388      |
| Reward Loss         | -64.1    |
| Running Env Steps   | 171500   |
| Running Forward KL  | -3.14    |
| Running Reverse KL  | 3.61     |
| Running Update Time | 343      |
----------------------------------
2025-02-01 13:03:17.774846 Eastern Standard Time
| Itration            | 344      |
| Real Det Return     | 484      |
| Real Sto Return     | 400      |
| Reward Loss         | -42.6    |
| Running Env Steps   | 172000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 344      |
----------------------------------
2025-02-01 13:03:33.252200 Eastern Standard Time
| Itration            | 345      |
| Real Det Return     | 504      |
| Real Sto Return     | 400      |
| Reward Loss         | -47      |
| Running Env Steps   | 172500   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 3.81     |
| Running Update Time | 345      |
----------------------------------
2025-02-01 13:03:48.788130 Eastern Standard Time
| Itration            | 346      |
| Real Det Return     | 458      |
| Real Sto Return     | 387      |
| Reward Loss         | -54.8    |
| Running Env Steps   | 173000   |
| Running Forward KL  | -3.88    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 346      |
----------------------------------
2025-02-01 13:04:04.409118 Eastern Standard Time
| Itration            | 347      |
| Real Det Return     | 463      |
| Real Sto Return     | 395      |
| Reward Loss         | -49.3    |
| Running Env Steps   | 173500   |
| Running Forward KL  | -3.55    |
| Running Reverse KL  | 3.87     |
| Running Update Time | 347      |
----------------------------------
2025-02-01 13:04:20.002231 Eastern Standard Time
| Itration            | 348      |
| Real Det Return     | 479      |
| Real Sto Return     | 411      |
| Reward Loss         | -48.1    |
| Running Env Steps   | 174000   |
| Running Forward KL  | -3.51    |
| Running Reverse KL  | 3.97     |
| Running Update Time | 348      |
----------------------------------
2025-02-01 13:04:35.534743 Eastern Standard Time
| Itration            | 349      |
| Real Det Return     | 513      |
| Real Sto Return     | 427      |
| Reward Loss         | -48.9    |
| Running Env Steps   | 174500   |
| Running Forward KL  | -3.31    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 349      |
----------------------------------
2025-02-01 13:04:51.141055 Eastern Standard Time
| Itration            | 350      |
| Real Det Return     | 473      |
| Real Sto Return     | 405      |
| Reward Loss         | -51      |
| Running Env Steps   | 175000   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 4        |
| Running Update Time | 350      |
----------------------------------
2025-02-01 13:05:06.790011 Eastern Standard Time
| Itration            | 351      |
| Real Det Return     | 455      |
| Real Sto Return     | 400      |
| Reward Loss         | -48.4    |
| Running Env Steps   | 175500   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 3.73     |
| Running Update Time | 351      |
----------------------------------
2025-02-01 13:05:23.396223 Eastern Standard Time
| Itration            | 352      |
| Real Det Return     | 498      |
| Real Sto Return     | 409      |
| Reward Loss         | -45.6    |
| Running Env Steps   | 176000   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 4.12     |
| Running Update Time | 352      |
----------------------------------
2025-02-01 13:05:39.202889 Eastern Standard Time
| Itration            | 353      |
| Real Det Return     | 473      |
| Real Sto Return     | 397      |
| Reward Loss         | -46.1    |
| Running Env Steps   | 176500   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 353      |
----------------------------------
2025-02-01 13:05:55.229774 Eastern Standard Time
| Itration            | 354      |
| Real Det Return     | 494      |
| Real Sto Return     | 418      |
| Reward Loss         | -36.6    |
| Running Env Steps   | 177000   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 354      |
----------------------------------
2025-02-01 13:06:11.655341 Eastern Standard Time
| Itration            | 355      |
| Real Det Return     | 472      |
| Real Sto Return     | 390      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 177500   |
| Running Forward KL  | -3.96    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 355      |
----------------------------------
2025-02-01 13:06:27.879072 Eastern Standard Time
| Itration            | 356      |
| Real Det Return     | 478      |
| Real Sto Return     | 401      |
| Reward Loss         | -46.9    |
| Running Env Steps   | 178000   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 3.8      |
| Running Update Time | 356      |
----------------------------------
2025-02-01 13:06:43.422887 Eastern Standard Time
| Itration            | 357      |
| Real Det Return     | 484      |
| Real Sto Return     | 406      |
| Reward Loss         | -50.3    |
| Running Env Steps   | 178500   |
| Running Forward KL  | -3.43    |
| Running Reverse KL  | 3.77     |
| Running Update Time | 357      |
----------------------------------
2025-02-01 13:06:58.900523 Eastern Standard Time
| Itration            | 358      |
| Real Det Return     | 493      |
| Real Sto Return     | 413      |
| Reward Loss         | -32.7    |
| Running Env Steps   | 179000   |
| Running Forward KL  | -3.8     |
| Running Reverse KL  | 3.98     |
| Running Update Time | 358      |
----------------------------------
2025-02-01 13:07:14.509518 Eastern Standard Time
| Itration            | 359      |
| Real Det Return     | 486      |
| Real Sto Return     | 413      |
| Reward Loss         | -44.3    |
| Running Env Steps   | 179500   |
| Running Forward KL  | -3.37    |
| Running Reverse KL  | 3.79     |
| Running Update Time | 359      |
----------------------------------
2025-02-01 13:07:30.084143 Eastern Standard Time
| Itration            | 360      |
| Real Det Return     | 476      |
| Real Sto Return     | 413      |
| Reward Loss         | -48.9    |
| Running Env Steps   | 180000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 4.12     |
| Running Update Time | 360      |
----------------------------------
2025-02-01 13:07:46.312527 Eastern Standard Time
| Itration            | 361      |
| Real Det Return     | 479      |
| Real Sto Return     | 412      |
| Reward Loss         | -47      |
| Running Env Steps   | 180500   |
| Running Forward KL  | -3.91    |
| Running Reverse KL  | 3.79     |
| Running Update Time | 361      |
----------------------------------
2025-02-01 13:08:01.970733 Eastern Standard Time
| Itration            | 362      |
| Real Det Return     | 467      |
| Real Sto Return     | 414      |
| Reward Loss         | -55.6    |
| Running Env Steps   | 181000   |
| Running Forward KL  | -3.49    |
| Running Reverse KL  | 3.73     |
| Running Update Time | 362      |
----------------------------------
2025-02-01 13:08:17.544910 Eastern Standard Time
| Itration            | 363      |
| Real Det Return     | 454      |
| Real Sto Return     | 397      |
| Reward Loss         | -48      |
| Running Env Steps   | 181500   |
| Running Forward KL  | -4.06    |
| Running Reverse KL  | 4.31     |
| Running Update Time | 363      |
----------------------------------
2025-02-01 13:08:33.097631 Eastern Standard Time
| Itration            | 364      |
| Real Det Return     | 486      |
| Real Sto Return     | 408      |
| Reward Loss         | -37.4    |
| Running Env Steps   | 182000   |
| Running Forward KL  | -3.96    |
| Running Reverse KL  | 3.6      |
| Running Update Time | 364      |
----------------------------------
2025-02-01 13:08:48.706913 Eastern Standard Time
| Itration            | 365      |
| Real Det Return     | 504      |
| Real Sto Return     | 413      |
| Reward Loss         | -25.7    |
| Running Env Steps   | 182500   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 365      |
----------------------------------
2025-02-01 13:09:04.381533 Eastern Standard Time
| Itration            | 366      |
| Real Det Return     | 501      |
| Real Sto Return     | 425      |
| Reward Loss         | -55.1    |
| Running Env Steps   | 183000   |
| Running Forward KL  | -3.87    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 366      |
----------------------------------
2025-02-01 13:09:19.948204 Eastern Standard Time
| Itration            | 367      |
| Real Det Return     | 482      |
| Real Sto Return     | 420      |
| Reward Loss         | -36      |
| Running Env Steps   | 183500   |
| Running Forward KL  | -3.94    |
| Running Reverse KL  | 3.94     |
| Running Update Time | 367      |
----------------------------------
2025-02-01 13:09:35.529925 Eastern Standard Time
| Itration            | 368      |
| Real Det Return     | 474      |
| Real Sto Return     | 413      |
| Reward Loss         | -37.9    |
| Running Env Steps   | 184000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 3.76     |
| Running Update Time | 368      |
----------------------------------
2025-02-01 13:09:51.083209 Eastern Standard Time
| Itration            | 369      |
| Real Det Return     | 473      |
| Real Sto Return     | 405      |
| Reward Loss         | -38.1    |
| Running Env Steps   | 184500   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 3.93     |
| Running Update Time | 369      |
----------------------------------
2025-02-01 13:10:06.754507 Eastern Standard Time
| Itration            | 370      |
| Real Det Return     | 484      |
| Real Sto Return     | 427      |
| Reward Loss         | -37.2    |
| Running Env Steps   | 185000   |
| Running Forward KL  | -3.72    |
| Running Reverse KL  | 3.76     |
| Running Update Time | 370      |
----------------------------------
2025-02-01 13:10:22.308237 Eastern Standard Time
| Itration            | 371      |
| Real Det Return     | 480      |
| Real Sto Return     | 404      |
| Reward Loss         | -46.9    |
| Running Env Steps   | 185500   |
| Running Forward KL  | -3.83    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 371      |
----------------------------------
2025-02-01 13:10:37.899020 Eastern Standard Time
| Itration            | 372      |
| Real Det Return     | 492      |
| Real Sto Return     | 413      |
| Reward Loss         | -33.3    |
| Running Env Steps   | 186000   |
| Running Forward KL  | -4.01    |
| Running Reverse KL  | 4.38     |
| Running Update Time | 372      |
----------------------------------
2025-02-01 13:10:53.554631 Eastern Standard Time
| Itration            | 373      |
| Real Det Return     | 470      |
| Real Sto Return     | 425      |
| Reward Loss         | -41.8    |
| Running Env Steps   | 186500   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 373      |
----------------------------------
2025-02-01 13:11:09.301388 Eastern Standard Time
| Itration            | 374      |
| Real Det Return     | 476      |
| Real Sto Return     | 405      |
| Reward Loss         | -33.8    |
| Running Env Steps   | 187000   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 4.03     |
| Running Update Time | 374      |
----------------------------------
2025-02-01 13:11:24.930126 Eastern Standard Time
| Itration            | 375      |
| Real Det Return     | 469      |
| Real Sto Return     | 406      |
| Reward Loss         | -59.7    |
| Running Env Steps   | 187500   |
| Running Forward KL  | -3.8     |
| Running Reverse KL  | 3.86     |
| Running Update Time | 375      |
----------------------------------
2025-02-01 13:11:40.701050 Eastern Standard Time
| Itration            | 376      |
| Real Det Return     | 498      |
| Real Sto Return     | 423      |
| Reward Loss         | -35.9    |
| Running Env Steps   | 188000   |
| Running Forward KL  | -3.35    |
| Running Reverse KL  | 3.99     |
| Running Update Time | 376      |
----------------------------------
2025-02-01 13:11:56.353983 Eastern Standard Time
| Itration            | 377      |
| Real Det Return     | 472      |
| Real Sto Return     | 405      |
| Reward Loss         | -36.4    |
| Running Env Steps   | 188500   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 377      |
----------------------------------
2025-02-01 13:12:12.048216 Eastern Standard Time
| Itration            | 378      |
| Real Det Return     | 473      |
| Real Sto Return     | 415      |
| Reward Loss         | -46.5    |
| Running Env Steps   | 189000   |
| Running Forward KL  | -3.81    |
| Running Reverse KL  | 3.73     |
| Running Update Time | 378      |
----------------------------------
2025-02-01 13:12:27.828609 Eastern Standard Time
| Itration            | 379      |
| Real Det Return     | 493      |
| Real Sto Return     | 426      |
| Reward Loss         | -32.4    |
| Running Env Steps   | 189500   |
| Running Forward KL  | -3.93    |
| Running Reverse KL  | 3.78     |
| Running Update Time | 379      |
----------------------------------
2025-02-01 13:12:43.657382 Eastern Standard Time
| Itration            | 380      |
| Real Det Return     | 494      |
| Real Sto Return     | 431      |
| Reward Loss         | -29.5    |
| Running Env Steps   | 190000   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 3.41     |
| Running Update Time | 380      |
----------------------------------
2025-02-01 13:12:59.274928 Eastern Standard Time
| Itration            | 381      |
| Real Det Return     | 499      |
| Real Sto Return     | 420      |
| Reward Loss         | -38.1    |
| Running Env Steps   | 190500   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 3.94     |
| Running Update Time | 381      |
----------------------------------
2025-02-01 13:13:14.928487 Eastern Standard Time
| Itration            | 382      |
| Real Det Return     | 462      |
| Real Sto Return     | 399      |
| Reward Loss         | -49      |
| Running Env Steps   | 191000   |
| Running Forward KL  | -3.94    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 382      |
----------------------------------
2025-02-01 13:13:30.582243 Eastern Standard Time
| Itration            | 383      |
| Real Det Return     | 497      |
| Real Sto Return     | 432      |
| Reward Loss         | -40.7    |
| Running Env Steps   | 191500   |
| Running Forward KL  | -3.59    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 383      |
----------------------------------
2025-02-01 13:13:46.110292 Eastern Standard Time
| Itration            | 384      |
| Real Det Return     | 465      |
| Real Sto Return     | 411      |
| Reward Loss         | -49.1    |
| Running Env Steps   | 192000   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 4.05     |
| Running Update Time | 384      |
----------------------------------
2025-02-01 13:14:01.642560 Eastern Standard Time
| Itration            | 385      |
| Real Det Return     | 480      |
| Real Sto Return     | 415      |
| Reward Loss         | -33.9    |
| Running Env Steps   | 192500   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 385      |
----------------------------------
2025-02-01 13:14:17.355544 Eastern Standard Time
| Itration            | 386      |
| Real Det Return     | 492      |
| Real Sto Return     | 434      |
| Reward Loss         | -37.9    |
| Running Env Steps   | 193000   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 386      |
----------------------------------
2025-02-01 13:14:32.914174 Eastern Standard Time
| Itration            | 387      |
| Real Det Return     | 498      |
| Real Sto Return     | 431      |
| Reward Loss         | -35.3    |
| Running Env Steps   | 193500   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 3.45     |
| Running Update Time | 387      |
----------------------------------
2025-02-01 13:14:49.136707 Eastern Standard Time
| Itration            | 388      |
| Real Det Return     | 498      |
| Real Sto Return     | 436      |
| Reward Loss         | -54.4    |
| Running Env Steps   | 194000   |
| Running Forward KL  | -3.33    |
| Running Reverse KL  | 3.8      |
| Running Update Time | 388      |
----------------------------------
2025-02-01 13:15:04.785354 Eastern Standard Time
| Itration            | 389      |
| Real Det Return     | 463      |
| Real Sto Return     | 420      |
| Reward Loss         | -44.2    |
| Running Env Steps   | 194500   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 389      |
----------------------------------
2025-02-01 13:15:20.779207 Eastern Standard Time
| Itration            | 390      |
| Real Det Return     | 490      |
| Real Sto Return     | 430      |
| Reward Loss         | -42.9    |
| Running Env Steps   | 195000   |
| Running Forward KL  | -3.75    |
| Running Reverse KL  | 3.72     |
| Running Update Time | 390      |
----------------------------------
2025-02-01 13:15:36.323490 Eastern Standard Time
| Itration            | 391      |
| Real Det Return     | 508      |
| Real Sto Return     | 434      |
| Reward Loss         | -19.5    |
| Running Env Steps   | 195500   |
| Running Forward KL  | -3.81    |
| Running Reverse KL  | 3.55     |
| Running Update Time | 391      |
----------------------------------
2025-02-01 13:15:52.295717 Eastern Standard Time
| Itration            | 392      |
| Real Det Return     | 502      |
| Real Sto Return     | 421      |
| Reward Loss         | -56.6    |
| Running Env Steps   | 196000   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 3.77     |
| Running Update Time | 392      |
----------------------------------
2025-02-01 13:16:09.062611 Eastern Standard Time
| Itration            | 393      |
| Real Det Return     | 498      |
| Real Sto Return     | 430      |
| Reward Loss         | -35.5    |
| Running Env Steps   | 196500   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 393      |
----------------------------------
2025-02-01 13:16:25.636503 Eastern Standard Time
| Itration            | 394      |
| Real Det Return     | 500      |
| Real Sto Return     | 425      |
| Reward Loss         | -45.4    |
| Running Env Steps   | 197000   |
| Running Forward KL  | -3.86    |
| Running Reverse KL  | 3.72     |
| Running Update Time | 394      |
----------------------------------
2025-02-01 13:16:43.847939 Eastern Standard Time
| Itration            | 395      |
| Real Det Return     | 478      |
| Real Sto Return     | 424      |
| Reward Loss         | -40.9    |
| Running Env Steps   | 197500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 3.7      |
| Running Update Time | 395      |
----------------------------------
2025-02-01 13:16:59.644726 Eastern Standard Time
| Itration            | 396      |
| Real Det Return     | 492      |
| Real Sto Return     | 430      |
| Reward Loss         | -34.1    |
| Running Env Steps   | 198000   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 396      |
----------------------------------
2025-02-01 13:17:15.864378 Eastern Standard Time
| Itration            | 397      |
| Real Det Return     | 485      |
| Real Sto Return     | 424      |
| Reward Loss         | -45.7    |
| Running Env Steps   | 198500   |
| Running Forward KL  | -4.15    |
| Running Reverse KL  | 3.68     |
| Running Update Time | 397      |
----------------------------------
2025-02-01 13:17:31.536697 Eastern Standard Time
| Itration            | 398      |
| Real Det Return     | 481      |
| Real Sto Return     | 409      |
| Reward Loss         | -38.8    |
| Running Env Steps   | 199000   |
| Running Forward KL  | -3.86    |
| Running Reverse KL  | 3.83     |
| Running Update Time | 398      |
----------------------------------
2025-02-01 13:17:47.728138 Eastern Standard Time
| Itration            | 399      |
| Real Det Return     | 517      |
| Real Sto Return     | 432      |
| Reward Loss         | -37.8    |
| Running Env Steps   | 199500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 399      |
----------------------------------
2025-02-01 13:18:04.700236 Eastern Standard Time
| Itration            | 400      |
| Real Det Return     | 505      |
| Real Sto Return     | 450      |
| Reward Loss         | -36.9    |
| Running Env Steps   | 200000   |
| Running Forward KL  | -3.67    |
| Running Reverse KL  | 4.2      |
| Running Update Time | 400      |
----------------------------------
2025-02-01 13:18:21.195288 Eastern Standard Time
| Itration            | 401      |
| Real Det Return     | 482      |
| Real Sto Return     | 430      |
| Reward Loss         | -33.4    |
| Running Env Steps   | 200500   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 401      |
----------------------------------
2025-02-01 13:18:38.050361 Eastern Standard Time
| Itration            | 402      |
| Real Det Return     | 482      |
| Real Sto Return     | 429      |
| Reward Loss         | -36.4    |
| Running Env Steps   | 201000   |
| Running Forward KL  | -3.93    |
| Running Reverse KL  | 3.91     |
| Running Update Time | 402      |
----------------------------------
2025-02-01 13:18:54.959506 Eastern Standard Time
| Itration            | 403      |
| Real Det Return     | 497      |
| Real Sto Return     | 427      |
| Reward Loss         | -51.7    |
| Running Env Steps   | 201500   |
| Running Forward KL  | -3.67    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 403      |
----------------------------------
2025-02-01 13:19:10.703770 Eastern Standard Time
| Itration            | 404      |
| Real Det Return     | 482      |
| Real Sto Return     | 423      |
| Reward Loss         | -42.2    |
| Running Env Steps   | 202000   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 3.63     |
| Running Update Time | 404      |
----------------------------------
2025-02-01 13:19:26.284668 Eastern Standard Time
| Itration            | 405      |
| Real Det Return     | 483      |
| Real Sto Return     | 419      |
| Reward Loss         | -36.4    |
| Running Env Steps   | 202500   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 3.98     |
| Running Update Time | 405      |
----------------------------------
2025-02-01 13:19:41.753203 Eastern Standard Time
| Itration            | 406      |
| Real Det Return     | 480      |
| Real Sto Return     | 413      |
| Reward Loss         | -45.2    |
| Running Env Steps   | 203000   |
| Running Forward KL  | -3.62    |
| Running Reverse KL  | 4.03     |
| Running Update Time | 406      |
----------------------------------
2025-02-01 13:19:57.348169 Eastern Standard Time
| Itration            | 407      |
| Real Det Return     | 495      |
| Real Sto Return     | 428      |
| Reward Loss         | -36.8    |
| Running Env Steps   | 203500   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 407      |
----------------------------------
2025-02-01 13:20:12.960172 Eastern Standard Time
| Itration            | 408      |
| Real Det Return     | 485      |
| Real Sto Return     | 417      |
| Reward Loss         | -39.8    |
| Running Env Steps   | 204000   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 3.85     |
| Running Update Time | 408      |
----------------------------------
2025-02-01 13:20:28.519916 Eastern Standard Time
| Itration            | 409      |
| Real Det Return     | 472      |
| Real Sto Return     | 411      |
| Reward Loss         | -45.1    |
| Running Env Steps   | 204500   |
| Running Forward KL  | -3.63    |
| Running Reverse KL  | 3.78     |
| Running Update Time | 409      |
----------------------------------
2025-02-01 13:20:44.067422 Eastern Standard Time
| Itration            | 410      |
| Real Det Return     | 492      |
| Real Sto Return     | 427      |
| Reward Loss         | -36.9    |
| Running Env Steps   | 205000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 3.74     |
| Running Update Time | 410      |
----------------------------------
2025-02-01 13:20:59.640656 Eastern Standard Time
| Itration            | 411      |
| Real Det Return     | 507      |
| Real Sto Return     | 450      |
| Reward Loss         | -33      |
| Running Env Steps   | 205500   |
| Running Forward KL  | -4.1     |
| Running Reverse KL  | 3.65     |
| Running Update Time | 411      |
----------------------------------
2025-02-01 13:21:15.287893 Eastern Standard Time
| Itration            | 412      |
| Real Det Return     | 490      |
| Real Sto Return     | 422      |
| Reward Loss         | -47.5    |
| Running Env Steps   | 206000   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 3.83     |
| Running Update Time | 412      |
----------------------------------
2025-02-01 13:21:31.368791 Eastern Standard Time
| Itration            | 413      |
| Real Det Return     | 507      |
| Real Sto Return     | 439      |
| Reward Loss         | -29.1    |
| Running Env Steps   | 206500   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 3.95     |
| Running Update Time | 413      |
----------------------------------
2025-02-01 13:21:47.281672 Eastern Standard Time
| Itration            | 414      |
| Real Det Return     | 501      |
| Real Sto Return     | 430      |
| Reward Loss         | -44.9    |
| Running Env Steps   | 207000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 3.67     |
| Running Update Time | 414      |
----------------------------------
2025-02-01 13:22:03.324485 Eastern Standard Time
| Itration            | 415      |
| Real Det Return     | 500      |
| Real Sto Return     | 438      |
| Reward Loss         | -24.7    |
| Running Env Steps   | 207500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 3.61     |
| Running Update Time | 415      |
----------------------------------
2025-02-01 13:22:19.860515 Eastern Standard Time
| Itration            | 416      |
| Real Det Return     | 489      |
| Real Sto Return     | 440      |
| Reward Loss         | -32.2    |
| Running Env Steps   | 208000   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 3.94     |
| Running Update Time | 416      |
----------------------------------
2025-02-01 13:22:36.083144 Eastern Standard Time
| Itration            | 417      |
| Real Det Return     | 498      |
| Real Sto Return     | 425      |
| Reward Loss         | -37.5    |
| Running Env Steps   | 208500   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 3.57     |
| Running Update Time | 417      |
----------------------------------
2025-02-01 13:22:51.752940 Eastern Standard Time
| Itration            | 418      |
| Real Det Return     | 502      |
| Real Sto Return     | 437      |
| Reward Loss         | -35.1    |
| Running Env Steps   | 209000   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 418      |
----------------------------------
2025-02-01 13:23:07.455083 Eastern Standard Time
| Itration            | 419      |
| Real Det Return     | 505      |
| Real Sto Return     | 439      |
| Reward Loss         | -30.3    |
| Running Env Steps   | 209500   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 3.85     |
| Running Update Time | 419      |
----------------------------------
2025-02-01 13:23:23.380209 Eastern Standard Time
| Itration            | 420      |
| Real Det Return     | 502      |
| Real Sto Return     | 455      |
| Reward Loss         | -32.4    |
| Running Env Steps   | 210000   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 3.52     |
| Running Update Time | 420      |
----------------------------------
2025-02-01 13:23:42.729956 Eastern Standard Time
| Itration            | 421      |
| Real Det Return     | 476      |
| Real Sto Return     | 431      |
| Reward Loss         | -49.1    |
| Running Env Steps   | 210500   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 421      |
----------------------------------
2025-02-01 13:24:03.826586 Eastern Standard Time
| Itration            | 422      |
| Real Det Return     | 517      |
| Real Sto Return     | 432      |
| Reward Loss         | -44.5    |
| Running Env Steps   | 211000   |
| Running Forward KL  | -3.91    |
| Running Reverse KL  | 3.75     |
| Running Update Time | 422      |
----------------------------------
2025-02-01 13:24:20.021497 Eastern Standard Time
| Itration            | 423      |
| Real Det Return     | 486      |
| Real Sto Return     | 428      |
| Reward Loss         | -42.9    |
| Running Env Steps   | 211500   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 3.97     |
| Running Update Time | 423      |
----------------------------------
2025-02-01 13:24:35.698918 Eastern Standard Time
| Itration            | 424      |
| Real Det Return     | 503      |
| Real Sto Return     | 433      |
| Reward Loss         | -33.3    |
| Running Env Steps   | 212000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 3.26     |
| Running Update Time | 424      |
----------------------------------
2025-02-01 13:24:51.220437 Eastern Standard Time
| Itration            | 425      |
| Real Det Return     | 484      |
| Real Sto Return     | 425      |
| Reward Loss         | -34.6    |
| Running Env Steps   | 212500   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 425      |
----------------------------------
2025-02-01 13:25:06.760688 Eastern Standard Time
| Itration            | 426      |
| Real Det Return     | 489      |
| Real Sto Return     | 427      |
| Reward Loss         | -32.4    |
| Running Env Steps   | 213000   |
| Running Forward KL  | -3.77    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 426      |
----------------------------------
2025-02-01 13:25:22.280122 Eastern Standard Time
| Itration            | 427      |
| Real Det Return     | 499      |
| Real Sto Return     | 435      |
| Reward Loss         | -34.5    |
| Running Env Steps   | 213500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 3.99     |
| Running Update Time | 427      |
----------------------------------
2025-02-01 13:25:37.855206 Eastern Standard Time
| Itration            | 428      |
| Real Det Return     | 484      |
| Real Sto Return     | 428      |
| Reward Loss         | -31.5    |
| Running Env Steps   | 214000   |
| Running Forward KL  | -4.08    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 428      |
----------------------------------
2025-02-01 13:25:53.479100 Eastern Standard Time
| Itration            | 429      |
| Real Det Return     | 505      |
| Real Sto Return     | 440      |
| Reward Loss         | -24.6    |
| Running Env Steps   | 214500   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 3.51     |
| Running Update Time | 429      |
----------------------------------
2025-02-01 13:26:08.969208 Eastern Standard Time
| Itration            | 430      |
| Real Det Return     | 488      |
| Real Sto Return     | 425      |
| Reward Loss         | -30.5    |
| Running Env Steps   | 215000   |
| Running Forward KL  | -3.84    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 430      |
----------------------------------
2025-02-01 13:26:24.521995 Eastern Standard Time
| Itration            | 431      |
| Real Det Return     | 514      |
| Real Sto Return     | 443      |
| Reward Loss         | -33.5    |
| Running Env Steps   | 215500   |
| Running Forward KL  | -3.87    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 431      |
----------------------------------
2025-02-01 13:26:40.025285 Eastern Standard Time
| Itration            | 432      |
| Real Det Return     | 486      |
| Real Sto Return     | 427      |
| Reward Loss         | -57.2    |
| Running Env Steps   | 216000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 3.67     |
| Running Update Time | 432      |
----------------------------------
2025-02-01 13:26:55.770753 Eastern Standard Time
| Itration            | 433      |
| Real Det Return     | 486      |
| Real Sto Return     | 438      |
| Reward Loss         | -39.6    |
| Running Env Steps   | 216500   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 3.75     |
| Running Update Time | 433      |
----------------------------------
2025-02-01 13:27:11.395489 Eastern Standard Time
| Itration            | 434      |
| Real Det Return     | 482      |
| Real Sto Return     | 430      |
| Reward Loss         | -36.2    |
| Running Env Steps   | 217000   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 434      |
----------------------------------
2025-02-01 13:27:26.880596 Eastern Standard Time
| Itration            | 435      |
| Real Det Return     | 494      |
| Real Sto Return     | 434      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 217500   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 435      |
----------------------------------
2025-02-01 13:27:42.383062 Eastern Standard Time
| Itration            | 436      |
| Real Det Return     | 487      |
| Real Sto Return     | 421      |
| Reward Loss         | -54.4    |
| Running Env Steps   | 218000   |
| Running Forward KL  | -3.9     |
| Running Reverse KL  | 3.98     |
| Running Update Time | 436      |
----------------------------------
2025-02-01 13:27:57.972913 Eastern Standard Time
| Itration            | 437      |
| Real Det Return     | 495      |
| Real Sto Return     | 441      |
| Reward Loss         | -30.9    |
| Running Env Steps   | 218500   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 437      |
----------------------------------
2025-02-01 13:28:13.627023 Eastern Standard Time
| Itration            | 438      |
| Real Det Return     | 503      |
| Real Sto Return     | 449      |
| Reward Loss         | -30.3    |
| Running Env Steps   | 219000   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 438      |
----------------------------------
2025-02-01 13:28:29.179477 Eastern Standard Time
| Itration            | 439      |
| Real Det Return     | 498      |
| Real Sto Return     | 430      |
| Reward Loss         | -34.7    |
| Running Env Steps   | 219500   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 3.97     |
| Running Update Time | 439      |
----------------------------------
2025-02-01 13:28:44.836121 Eastern Standard Time
| Itration            | 440      |
| Real Det Return     | 519      |
| Real Sto Return     | 451      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 220000   |
| Running Forward KL  | -4.08    |
| Running Reverse KL  | 3.98     |
| Running Update Time | 440      |
----------------------------------
2025-02-01 13:29:00.548215 Eastern Standard Time
| Itration            | 441      |
| Real Det Return     | 497      |
| Real Sto Return     | 437      |
| Reward Loss         | -29.9    |
| Running Env Steps   | 220500   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 441      |
----------------------------------
2025-02-01 13:29:16.070393 Eastern Standard Time
| Itration            | 442      |
| Real Det Return     | 502      |
| Real Sto Return     | 438      |
| Reward Loss         | -29.7    |
| Running Env Steps   | 221000   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 3.81     |
| Running Update Time | 442      |
----------------------------------
2025-02-01 13:29:31.593922 Eastern Standard Time
| Itration            | 443      |
| Real Det Return     | 502      |
| Real Sto Return     | 441      |
| Reward Loss         | -41.1    |
| Running Env Steps   | 221500   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 443      |
----------------------------------
2025-02-01 13:29:47.112668 Eastern Standard Time
| Itration            | 444      |
| Real Det Return     | 483      |
| Real Sto Return     | 424      |
| Reward Loss         | -28.9    |
| Running Env Steps   | 222000   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 444      |
----------------------------------
2025-02-01 13:30:02.677389 Eastern Standard Time
| Itration            | 445      |
| Real Det Return     | 515      |
| Real Sto Return     | 441      |
| Reward Loss         | -29      |
| Running Env Steps   | 222500   |
| Running Forward KL  | -3.84    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 445      |
----------------------------------
2025-02-01 13:30:18.175443 Eastern Standard Time
| Itration            | 446      |
| Real Det Return     | 464      |
| Real Sto Return     | 417      |
| Reward Loss         | -40.3    |
| Running Env Steps   | 223000   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 3.79     |
| Running Update Time | 446      |
----------------------------------
2025-02-01 13:30:33.727229 Eastern Standard Time
| Itration            | 447      |
| Real Det Return     | 506      |
| Real Sto Return     | 443      |
| Reward Loss         | -46.3    |
| Running Env Steps   | 223500   |
| Running Forward KL  | -4.17    |
| Running Reverse KL  | 3.75     |
| Running Update Time | 447      |
----------------------------------
2025-02-01 13:30:49.305966 Eastern Standard Time
| Itration            | 448      |
| Real Det Return     | 499      |
| Real Sto Return     | 446      |
| Reward Loss         | -41.5    |
| Running Env Steps   | 224000   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 448      |
----------------------------------
2025-02-01 13:31:04.900465 Eastern Standard Time
| Itration            | 449      |
| Real Det Return     | 512      |
| Real Sto Return     | 445      |
| Reward Loss         | -38.1    |
| Running Env Steps   | 224500   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 3.82     |
| Running Update Time | 449      |
----------------------------------
2025-02-01 13:31:20.522724 Eastern Standard Time
| Itration            | 450      |
| Real Det Return     | 491      |
| Real Sto Return     | 429      |
| Reward Loss         | -42.4    |
| Running Env Steps   | 225000   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 450      |
----------------------------------
2025-02-01 13:31:36.195473 Eastern Standard Time
| Itration            | 451      |
| Real Det Return     | 492      |
| Real Sto Return     | 434      |
| Reward Loss         | -32      |
| Running Env Steps   | 225500   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 3.87     |
| Running Update Time | 451      |
----------------------------------
2025-02-01 13:31:51.678917 Eastern Standard Time
| Itration            | 452      |
| Real Det Return     | 500      |
| Real Sto Return     | 433      |
| Reward Loss         | -48.6    |
| Running Env Steps   | 226000   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 3.73     |
| Running Update Time | 452      |
----------------------------------
2025-02-01 13:32:07.255259 Eastern Standard Time
| Itration            | 453      |
| Real Det Return     | 499      |
| Real Sto Return     | 440      |
| Reward Loss         | -45.3    |
| Running Env Steps   | 226500   |
| Running Forward KL  | -4.24    |
| Running Reverse KL  | 4        |
| Running Update Time | 453      |
----------------------------------
2025-02-01 13:32:22.764524 Eastern Standard Time
| Itration            | 454      |
| Real Det Return     | 487      |
| Real Sto Return     | 428      |
| Reward Loss         | -42.6    |
| Running Env Steps   | 227000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 454      |
----------------------------------
2025-02-01 13:32:38.299034 Eastern Standard Time
| Itration            | 455      |
| Real Det Return     | 504      |
| Real Sto Return     | 436      |
| Reward Loss         | -33.8    |
| Running Env Steps   | 227500   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 3.56     |
| Running Update Time | 455      |
----------------------------------
2025-02-01 13:32:53.833930 Eastern Standard Time
| Itration            | 456      |
| Real Det Return     | 514      |
| Real Sto Return     | 451      |
| Reward Loss         | -32.6    |
| Running Env Steps   | 228000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 456      |
----------------------------------
2025-02-01 13:33:09.378094 Eastern Standard Time
| Itration            | 457      |
| Real Det Return     | 508      |
| Real Sto Return     | 445      |
| Reward Loss         | -33.5    |
| Running Env Steps   | 228500   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 457      |
----------------------------------
2025-02-01 13:33:24.906981 Eastern Standard Time
| Itration            | 458      |
| Real Det Return     | 526      |
| Real Sto Return     | 463      |
| Reward Loss         | -34.7    |
| Running Env Steps   | 229000   |
| Running Forward KL  | -3.59    |
| Running Reverse KL  | 3.76     |
| Running Update Time | 458      |
----------------------------------
2025-02-01 13:33:40.438913 Eastern Standard Time
| Itration            | 459      |
| Real Det Return     | 518      |
| Real Sto Return     | 453      |
| Reward Loss         | -43.5    |
| Running Env Steps   | 229500   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 459      |
----------------------------------
2025-02-01 13:33:55.960481 Eastern Standard Time
| Itration            | 460      |
| Real Det Return     | 504      |
| Real Sto Return     | 450      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 230000   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 460      |
----------------------------------
2025-02-01 13:34:11.521507 Eastern Standard Time
| Itration            | 461      |
| Real Det Return     | 506      |
| Real Sto Return     | 444      |
| Reward Loss         | -34.5    |
| Running Env Steps   | 230500   |
| Running Forward KL  | -3.96    |
| Running Reverse KL  | 4        |
| Running Update Time | 461      |
----------------------------------
2025-02-01 13:34:27.055271 Eastern Standard Time
| Itration            | 462      |
| Real Det Return     | 517      |
| Real Sto Return     | 443      |
| Reward Loss         | -36.9    |
| Running Env Steps   | 231000   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 462      |
----------------------------------
2025-02-01 13:34:42.680724 Eastern Standard Time
| Itration            | 463      |
| Real Det Return     | 508      |
| Real Sto Return     | 431      |
| Reward Loss         | -41.9    |
| Running Env Steps   | 231500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 463      |
----------------------------------
2025-02-01 13:34:58.365065 Eastern Standard Time
| Itration            | 464      |
| Real Det Return     | 519      |
| Real Sto Return     | 456      |
| Reward Loss         | -26.8    |
| Running Env Steps   | 232000   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 464      |
----------------------------------
2025-02-01 13:35:13.966411 Eastern Standard Time
| Itration            | 465      |
| Real Det Return     | 502      |
| Real Sto Return     | 446      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 232500   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 465      |
----------------------------------
2025-02-01 13:35:29.453262 Eastern Standard Time
| Itration            | 466      |
| Real Det Return     | 506      |
| Real Sto Return     | 440      |
| Reward Loss         | -34.6    |
| Running Env Steps   | 233000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 466      |
----------------------------------
2025-02-01 13:35:45.012627 Eastern Standard Time
| Itration            | 467      |
| Real Det Return     | 485      |
| Real Sto Return     | 439      |
| Reward Loss         | -41.2    |
| Running Env Steps   | 233500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 4.08     |
| Running Update Time | 467      |
----------------------------------
2025-02-01 13:36:00.536723 Eastern Standard Time
| Itration            | 468      |
| Real Det Return     | 503      |
| Real Sto Return     | 449      |
| Reward Loss         | -26.4    |
| Running Env Steps   | 234000   |
| Running Forward KL  | -4.06    |
| Running Reverse KL  | 3.52     |
| Running Update Time | 468      |
----------------------------------
2025-02-01 13:36:16.356126 Eastern Standard Time
| Itration            | 469      |
| Real Det Return     | 496      |
| Real Sto Return     | 439      |
| Reward Loss         | -35.1    |
| Running Env Steps   | 234500   |
| Running Forward KL  | -3.83    |
| Running Reverse KL  | 3.6      |
| Running Update Time | 469      |
----------------------------------
2025-02-01 13:36:31.982706 Eastern Standard Time
| Itration            | 470      |
| Real Det Return     | 512      |
| Real Sto Return     | 433      |
| Reward Loss         | -45.1    |
| Running Env Steps   | 235000   |
| Running Forward KL  | -3.93    |
| Running Reverse KL  | 3.72     |
| Running Update Time | 470      |
----------------------------------
2025-02-01 13:36:47.550600 Eastern Standard Time
| Itration            | 471      |
| Real Det Return     | 511      |
| Real Sto Return     | 459      |
| Reward Loss         | -33.2    |
| Running Env Steps   | 235500   |
| Running Forward KL  | -4.04    |
| Running Reverse KL  | 3.98     |
| Running Update Time | 471      |
----------------------------------
2025-02-01 13:37:03.182524 Eastern Standard Time
| Itration            | 472      |
| Real Det Return     | 495      |
| Real Sto Return     | 441      |
| Reward Loss         | -39.5    |
| Running Env Steps   | 236000   |
| Running Forward KL  | -3.87    |
| Running Reverse KL  | 3.89     |
| Running Update Time | 472      |
----------------------------------
2025-02-01 13:37:18.720751 Eastern Standard Time
| Itration            | 473      |
| Real Det Return     | 522      |
| Real Sto Return     | 458      |
| Reward Loss         | -48.8    |
| Running Env Steps   | 236500   |
| Running Forward KL  | -3.94    |
| Running Reverse KL  | 3.79     |
| Running Update Time | 473      |
----------------------------------
2025-02-01 13:37:34.310407 Eastern Standard Time
| Itration            | 474      |
| Real Det Return     | 507      |
| Real Sto Return     | 453      |
| Reward Loss         | -38.1    |
| Running Env Steps   | 237000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 3.66     |
| Running Update Time | 474      |
----------------------------------
2025-02-01 13:37:49.969081 Eastern Standard Time
| Itration            | 475      |
| Real Det Return     | 507      |
| Real Sto Return     | 455      |
| Reward Loss         | -23.5    |
| Running Env Steps   | 237500   |
| Running Forward KL  | -3.86    |
| Running Reverse KL  | 4.08     |
| Running Update Time | 475      |
----------------------------------
2025-02-01 13:38:05.633972 Eastern Standard Time
| Itration            | 476      |
| Real Det Return     | 524      |
| Real Sto Return     | 452      |
| Reward Loss         | -23.7    |
| Running Env Steps   | 238000   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 476      |
----------------------------------
2025-02-01 13:38:21.179973 Eastern Standard Time
| Itration            | 477      |
| Real Det Return     | 523      |
| Real Sto Return     | 458      |
| Reward Loss         | -32.6    |
| Running Env Steps   | 238500   |
| Running Forward KL  | -3.68    |
| Running Reverse KL  | 4.1      |
| Running Update Time | 477      |
----------------------------------
2025-02-01 13:38:36.750755 Eastern Standard Time
| Itration            | 478      |
| Real Det Return     | 497      |
| Real Sto Return     | 437      |
| Reward Loss         | -55.8    |
| Running Env Steps   | 239000   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 478      |
----------------------------------
2025-02-01 13:38:52.317114 Eastern Standard Time
| Itration            | 479      |
| Real Det Return     | 518      |
| Real Sto Return     | 455      |
| Reward Loss         | -30      |
| Running Env Steps   | 239500   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 3.98     |
| Running Update Time | 479      |
----------------------------------
2025-02-01 13:39:07.919424 Eastern Standard Time
| Itration            | 480      |
| Real Det Return     | 483      |
| Real Sto Return     | 442      |
| Reward Loss         | -37.5    |
| Running Env Steps   | 240000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 480      |
----------------------------------
2025-02-01 13:39:23.481132 Eastern Standard Time
| Itration            | 481      |
| Real Det Return     | 497      |
| Real Sto Return     | 444      |
| Reward Loss         | -31.2    |
| Running Env Steps   | 240500   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 481      |
----------------------------------
2025-02-01 13:39:39.013637 Eastern Standard Time
| Itration            | 482      |
| Real Det Return     | 521      |
| Real Sto Return     | 451      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 241000   |
| Running Forward KL  | -4.17    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 482      |
----------------------------------
2025-02-01 13:39:54.596871 Eastern Standard Time
| Itration            | 483      |
| Real Det Return     | 495      |
| Real Sto Return     | 449      |
| Reward Loss         | -30.7    |
| Running Env Steps   | 241500   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 483      |
----------------------------------
2025-02-01 13:40:10.211570 Eastern Standard Time
| Itration            | 484      |
| Real Det Return     | 511      |
| Real Sto Return     | 448      |
| Reward Loss         | -34.6    |
| Running Env Steps   | 242000   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 4.28     |
| Running Update Time | 484      |
----------------------------------
2025-02-01 13:40:25.798984 Eastern Standard Time
| Itration            | 485      |
| Real Det Return     | 499      |
| Real Sto Return     | 435      |
| Reward Loss         | -38.4    |
| Running Env Steps   | 242500   |
| Running Forward KL  | -4.12    |
| Running Reverse KL  | 3.92     |
| Running Update Time | 485      |
----------------------------------
2025-02-01 13:40:41.425936 Eastern Standard Time
| Itration            | 486      |
| Real Det Return     | 504      |
| Real Sto Return     | 436      |
| Reward Loss         | -37.4    |
| Running Env Steps   | 243000   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 3.54     |
| Running Update Time | 486      |
----------------------------------
2025-02-01 13:40:56.981656 Eastern Standard Time
| Itration            | 487      |
| Real Det Return     | 509      |
| Real Sto Return     | 444      |
| Reward Loss         | -33.8    |
| Running Env Steps   | 243500   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 3.99     |
| Running Update Time | 487      |
----------------------------------
2025-02-01 13:41:12.536515 Eastern Standard Time
| Itration            | 488      |
| Real Det Return     | 488      |
| Real Sto Return     | 450      |
| Reward Loss         | -37      |
| Running Env Steps   | 244000   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 488      |
----------------------------------
2025-02-01 13:41:28.076734 Eastern Standard Time
| Itration            | 489      |
| Real Det Return     | 530      |
| Real Sto Return     | 457      |
| Reward Loss         | -28.6    |
| Running Env Steps   | 244500   |
| Running Forward KL  | -3.94    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 489      |
----------------------------------
2025-02-01 13:41:43.702380 Eastern Standard Time
| Itration            | 490      |
| Real Det Return     | 510      |
| Real Sto Return     | 449      |
| Reward Loss         | -20.5    |
| Running Env Steps   | 245000   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 3.99     |
| Running Update Time | 490      |
----------------------------------
2025-02-01 13:41:59.350957 Eastern Standard Time
| Itration            | 491      |
| Real Det Return     | 512      |
| Real Sto Return     | 445      |
| Reward Loss         | -41.9    |
| Running Env Steps   | 245500   |
| Running Forward KL  | -4.15    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 491      |
----------------------------------
2025-02-01 13:42:14.966006 Eastern Standard Time
| Itration            | 492      |
| Real Det Return     | 502      |
| Real Sto Return     | 443      |
| Reward Loss         | -23.4    |
| Running Env Steps   | 246000   |
| Running Forward KL  | -3.91    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 492      |
----------------------------------
2025-02-01 13:42:30.568212 Eastern Standard Time
| Itration            | 493      |
| Real Det Return     | 486      |
| Real Sto Return     | 437      |
| Reward Loss         | -35.9    |
| Running Env Steps   | 246500   |
| Running Forward KL  | -4.1     |
| Running Reverse KL  | 3.99     |
| Running Update Time | 493      |
----------------------------------
2025-02-01 13:42:46.196309 Eastern Standard Time
| Itration            | 494      |
| Real Det Return     | 523      |
| Real Sto Return     | 455      |
| Reward Loss         | -48.6    |
| Running Env Steps   | 247000   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 4.03     |
| Running Update Time | 494      |
----------------------------------
2025-02-01 13:43:01.797995 Eastern Standard Time
| Itration            | 495      |
| Real Det Return     | 499      |
| Real Sto Return     | 437      |
| Reward Loss         | -21.9    |
| Running Env Steps   | 247500   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 3.83     |
| Running Update Time | 495      |
----------------------------------
2025-02-01 13:43:17.602540 Eastern Standard Time
| Itration            | 496      |
| Real Det Return     | 493      |
| Real Sto Return     | 437      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 248000   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 3.95     |
| Running Update Time | 496      |
----------------------------------
2025-02-01 13:43:33.362348 Eastern Standard Time
| Itration            | 497      |
| Real Det Return     | 503      |
| Real Sto Return     | 442      |
| Reward Loss         | -32.1    |
| Running Env Steps   | 248500   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 3.83     |
| Running Update Time | 497      |
----------------------------------
2025-02-01 13:43:49.251463 Eastern Standard Time
| Itration            | 498      |
| Real Det Return     | 513      |
| Real Sto Return     | 465      |
| Reward Loss         | -16.5    |
| Running Env Steps   | 249000   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 498      |
----------------------------------
2025-02-01 13:44:04.966661 Eastern Standard Time
| Itration            | 499      |
| Real Det Return     | 500      |
| Real Sto Return     | 444      |
| Reward Loss         | -49.2    |
| Running Env Steps   | 249500   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 4.27     |
| Running Update Time | 499      |
----------------------------------
2025-02-01 13:44:20.551964 Eastern Standard Time
| Itration            | 500      |
| Real Det Return     | 532      |
| Real Sto Return     | 472      |
| Reward Loss         | -33.3    |
| Running Env Steps   | 250000   |
| Running Forward KL  | -3.82    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 500      |
----------------------------------
2025-02-01 13:44:36.161529 Eastern Standard Time
| Itration            | 501      |
| Real Det Return     | 523      |
| Real Sto Return     | 453      |
| Reward Loss         | -13.7    |
| Running Env Steps   | 250500   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 501      |
----------------------------------
2025-02-01 13:44:51.756678 Eastern Standard Time
| Itration            | 502      |
| Real Det Return     | 520      |
| Real Sto Return     | 455      |
| Reward Loss         | -33.1    |
| Running Env Steps   | 251000   |
| Running Forward KL  | -4.17    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 502      |
----------------------------------
2025-02-01 13:45:07.359361 Eastern Standard Time
| Itration            | 503      |
| Real Det Return     | 520      |
| Real Sto Return     | 456      |
| Reward Loss         | -37.7    |
| Running Env Steps   | 251500   |
| Running Forward KL  | -3.94    |
| Running Reverse KL  | 3.87     |
| Running Update Time | 503      |
----------------------------------
2025-02-01 13:45:22.962560 Eastern Standard Time
| Itration            | 504      |
| Real Det Return     | 509      |
| Real Sto Return     | 454      |
| Reward Loss         | -22.1    |
| Running Env Steps   | 252000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 4.27     |
| Running Update Time | 504      |
----------------------------------
2025-02-01 13:45:38.553240 Eastern Standard Time
| Itration            | 505      |
| Real Det Return     | 529      |
| Real Sto Return     | 461      |
| Reward Loss         | -28.4    |
| Running Env Steps   | 252500   |
| Running Forward KL  | -3.99    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 505      |
----------------------------------
2025-02-01 13:45:54.179866 Eastern Standard Time
| Itration            | 506      |
| Real Det Return     | 511      |
| Real Sto Return     | 449      |
| Reward Loss         | -23      |
| Running Env Steps   | 253000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 3.92     |
| Running Update Time | 506      |
----------------------------------
2025-02-01 13:46:09.808486 Eastern Standard Time
| Itration            | 507      |
| Real Det Return     | 528      |
| Real Sto Return     | 457      |
| Reward Loss         | -21.9    |
| Running Env Steps   | 253500   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.45     |
| Running Update Time | 507      |
----------------------------------
2025-02-01 13:46:25.407278 Eastern Standard Time
| Itration            | 508      |
| Real Det Return     | 501      |
| Real Sto Return     | 448      |
| Reward Loss         | -24      |
| Running Env Steps   | 254000   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 508      |
----------------------------------
2025-02-01 13:46:41.049334 Eastern Standard Time
| Itration            | 509      |
| Real Det Return     | 520      |
| Real Sto Return     | 466      |
| Reward Loss         | -24.3    |
| Running Env Steps   | 254500   |
| Running Forward KL  | -4.5     |
| Running Reverse KL  | 4.12     |
| Running Update Time | 509      |
----------------------------------
2025-02-01 13:46:56.726134 Eastern Standard Time
| Itration            | 510      |
| Real Det Return     | 505      |
| Real Sto Return     | 441      |
| Reward Loss         | -40.2    |
| Running Env Steps   | 255000   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 510      |
----------------------------------
2025-02-01 13:47:12.345715 Eastern Standard Time
| Itration            | 511      |
| Real Det Return     | 518      |
| Real Sto Return     | 462      |
| Reward Loss         | -21.5    |
| Running Env Steps   | 255500   |
| Running Forward KL  | -4.5     |
| Running Reverse KL  | 4.08     |
| Running Update Time | 511      |
----------------------------------
2025-02-01 13:47:27.970852 Eastern Standard Time
| Itration            | 512      |
| Real Det Return     | 518      |
| Real Sto Return     | 459      |
| Reward Loss         | -33.6    |
| Running Env Steps   | 256000   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.28     |
| Running Update Time | 512      |
----------------------------------
2025-02-01 13:47:43.687580 Eastern Standard Time
| Itration            | 513      |
| Real Det Return     | 517      |
| Real Sto Return     | 457      |
| Reward Loss         | -23.7    |
| Running Env Steps   | 256500   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 3.69     |
| Running Update Time | 513      |
----------------------------------
2025-02-01 13:47:59.261595 Eastern Standard Time
| Itration            | 514      |
| Real Det Return     | 528      |
| Real Sto Return     | 466      |
| Reward Loss         | -21.4    |
| Running Env Steps   | 257000   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 514      |
----------------------------------
2025-02-01 13:48:14.876448 Eastern Standard Time
| Itration            | 515      |
| Real Det Return     | 507      |
| Real Sto Return     | 448      |
| Reward Loss         | -29      |
| Running Env Steps   | 257500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 515      |
----------------------------------
2025-02-01 13:48:30.461171 Eastern Standard Time
| Itration            | 516      |
| Real Det Return     | 536      |
| Real Sto Return     | 454      |
| Reward Loss         | -28.5    |
| Running Env Steps   | 258000   |
| Running Forward KL  | -3.81    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 516      |
----------------------------------
2025-02-01 13:48:46.155500 Eastern Standard Time
| Itration            | 517      |
| Real Det Return     | 504      |
| Real Sto Return     | 451      |
| Reward Loss         | -40.7    |
| Running Env Steps   | 258500   |
| Running Forward KL  | -4.08    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 517      |
----------------------------------
2025-02-01 13:49:01.766408 Eastern Standard Time
| Itration            | 518      |
| Real Det Return     | 519      |
| Real Sto Return     | 445      |
| Reward Loss         | -31.4    |
| Running Env Steps   | 259000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 4.31     |
| Running Update Time | 518      |
----------------------------------
2025-02-01 13:49:17.405837 Eastern Standard Time
| Itration            | 519      |
| Real Det Return     | 530      |
| Real Sto Return     | 459      |
| Reward Loss         | -13.6    |
| Running Env Steps   | 259500   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 3.91     |
| Running Update Time | 519      |
----------------------------------
2025-02-01 13:49:33.002006 Eastern Standard Time
| Itration            | 520      |
| Real Det Return     | 522      |
| Real Sto Return     | 473      |
| Reward Loss         | -37.2    |
| Running Env Steps   | 260000   |
| Running Forward KL  | -4.01    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 520      |
----------------------------------
2025-02-01 13:49:48.593463 Eastern Standard Time
| Itration            | 521      |
| Real Det Return     | 517      |
| Real Sto Return     | 455      |
| Reward Loss         | -23.4    |
| Running Env Steps   | 260500   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 521      |
----------------------------------
2025-02-01 13:50:04.248264 Eastern Standard Time
| Itration            | 522      |
| Real Det Return     | 524      |
| Real Sto Return     | 464      |
| Reward Loss         | -34.3    |
| Running Env Steps   | 261000   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 522      |
----------------------------------
2025-02-01 13:50:19.841940 Eastern Standard Time
| Itration            | 523      |
| Real Det Return     | 523      |
| Real Sto Return     | 469      |
| Reward Loss         | -22.8    |
| Running Env Steps   | 261500   |
| Running Forward KL  | -4.03    |
| Running Reverse KL  | 3.77     |
| Running Update Time | 523      |
----------------------------------
2025-02-01 13:50:35.397811 Eastern Standard Time
| Itration            | 524      |
| Real Det Return     | 526      |
| Real Sto Return     | 470      |
| Reward Loss         | -31.5    |
| Running Env Steps   | 262000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 3.89     |
| Running Update Time | 524      |
----------------------------------
2025-02-01 13:50:51.002900 Eastern Standard Time
| Itration            | 525      |
| Real Det Return     | 499      |
| Real Sto Return     | 440      |
| Reward Loss         | -36      |
| Running Env Steps   | 262500   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 3.81     |
| Running Update Time | 525      |
----------------------------------
2025-02-01 13:51:06.599990 Eastern Standard Time
| Itration            | 526      |
| Real Det Return     | 520      |
| Real Sto Return     | 467      |
| Reward Loss         | -12.7    |
| Running Env Steps   | 263000   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 526      |
----------------------------------
2025-02-01 13:51:22.209222 Eastern Standard Time
| Itration            | 527      |
| Real Det Return     | 509      |
| Real Sto Return     | 447      |
| Reward Loss         | -30.5    |
| Running Env Steps   | 263500   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 527      |
----------------------------------
2025-02-01 13:51:37.822869 Eastern Standard Time
| Itration            | 528      |
| Real Det Return     | 513      |
| Real Sto Return     | 455      |
| Reward Loss         | -22.4    |
| Running Env Steps   | 264000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 528      |
----------------------------------
2025-02-01 13:51:53.365534 Eastern Standard Time
| Itration            | 529      |
| Real Det Return     | 517      |
| Real Sto Return     | 465      |
| Reward Loss         | -38.9    |
| Running Env Steps   | 264500   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 529      |
----------------------------------
2025-02-01 13:52:08.966578 Eastern Standard Time
| Itration            | 530      |
| Real Det Return     | 525      |
| Real Sto Return     | 459      |
| Reward Loss         | -32.3    |
| Running Env Steps   | 265000   |
| Running Forward KL  | -3.88    |
| Running Reverse KL  | 4.27     |
| Running Update Time | 530      |
----------------------------------
2025-02-01 13:52:24.618269 Eastern Standard Time
| Itration            | 531      |
| Real Det Return     | 520      |
| Real Sto Return     | 462      |
| Reward Loss         | -33.2    |
| Running Env Steps   | 265500   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 3.68     |
| Running Update Time | 531      |
----------------------------------
2025-02-01 13:52:40.226858 Eastern Standard Time
| Itration            | 532      |
| Real Det Return     | 520      |
| Real Sto Return     | 451      |
| Reward Loss         | -17.7    |
| Running Env Steps   | 266000   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 532      |
----------------------------------
2025-02-01 13:52:55.904710 Eastern Standard Time
| Itration            | 533      |
| Real Det Return     | 497      |
| Real Sto Return     | 455      |
| Reward Loss         | -27.5    |
| Running Env Steps   | 266500   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 3.81     |
| Running Update Time | 533      |
----------------------------------
2025-02-01 13:53:11.477980 Eastern Standard Time
| Itration            | 534      |
| Real Det Return     | 509      |
| Real Sto Return     | 448      |
| Reward Loss         | -22.6    |
| Running Env Steps   | 267000   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 3.85     |
| Running Update Time | 534      |
----------------------------------
2025-02-01 13:53:27.092189 Eastern Standard Time
| Itration            | 535      |
| Real Det Return     | 521      |
| Real Sto Return     | 467      |
| Reward Loss         | -28.2    |
| Running Env Steps   | 267500   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 535      |
----------------------------------
2025-02-01 13:53:42.702796 Eastern Standard Time
| Itration            | 536      |
| Real Det Return     | 505      |
| Real Sto Return     | 446      |
| Reward Loss         | -26      |
| Running Env Steps   | 268000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 536      |
----------------------------------
2025-02-01 13:53:58.297083 Eastern Standard Time
| Itration            | 537      |
| Real Det Return     | 529      |
| Real Sto Return     | 472      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 268500   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 3.48     |
| Running Update Time | 537      |
----------------------------------
2025-02-01 13:54:13.888617 Eastern Standard Time
| Itration            | 538      |
| Real Det Return     | 510      |
| Real Sto Return     | 457      |
| Reward Loss         | -25      |
| Running Env Steps   | 269000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 538      |
----------------------------------
2025-02-01 13:54:29.445657 Eastern Standard Time
| Itration            | 539      |
| Real Det Return     | 509      |
| Real Sto Return     | 452      |
| Reward Loss         | -8.34    |
| Running Env Steps   | 269500   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 4.01     |
| Running Update Time | 539      |
----------------------------------
2025-02-01 13:54:44.993760 Eastern Standard Time
| Itration            | 540      |
| Real Det Return     | 513      |
| Real Sto Return     | 449      |
| Reward Loss         | -40.2    |
| Running Env Steps   | 270000   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 540      |
----------------------------------
2025-02-01 13:55:00.602087 Eastern Standard Time
| Itration            | 541      |
| Real Det Return     | 535      |
| Real Sto Return     | 478      |
| Reward Loss         | -21.7    |
| Running Env Steps   | 270500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 541      |
----------------------------------
2025-02-01 13:55:16.248895 Eastern Standard Time
| Itration            | 542      |
| Real Det Return     | 504      |
| Real Sto Return     | 454      |
| Reward Loss         | -40.5    |
| Running Env Steps   | 271000   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 4.2      |
| Running Update Time | 542      |
----------------------------------
2025-02-01 13:55:31.823944 Eastern Standard Time
| Itration            | 543      |
| Real Det Return     | 533      |
| Real Sto Return     | 454      |
| Reward Loss         | -35      |
| Running Env Steps   | 271500   |
| Running Forward KL  | -4.02    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 543      |
----------------------------------
2025-02-01 13:55:47.416602 Eastern Standard Time
| Itration            | 544      |
| Real Det Return     | 535      |
| Real Sto Return     | 470      |
| Reward Loss         | -34.4    |
| Running Env Steps   | 272000   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 544      |
----------------------------------
2025-02-01 13:56:02.955269 Eastern Standard Time
| Itration            | 545      |
| Real Det Return     | 502      |
| Real Sto Return     | 464      |
| Reward Loss         | -30.3    |
| Running Env Steps   | 272500   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 3.68     |
| Running Update Time | 545      |
----------------------------------
2025-02-01 13:56:18.529800 Eastern Standard Time
| Itration            | 546      |
| Real Det Return     | 502      |
| Real Sto Return     | 455      |
| Reward Loss         | -38.1    |
| Running Env Steps   | 273000   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 546      |
----------------------------------
2025-02-01 13:56:34.063829 Eastern Standard Time
| Itration            | 547      |
| Real Det Return     | 524      |
| Real Sto Return     | 461      |
| Reward Loss         | -31.7    |
| Running Env Steps   | 273500   |
| Running Forward KL  | -3.9     |
| Running Reverse KL  | 3.87     |
| Running Update Time | 547      |
----------------------------------
2025-02-01 13:56:49.655460 Eastern Standard Time
| Itration            | 548      |
| Real Det Return     | 519      |
| Real Sto Return     | 465      |
| Reward Loss         | -25.4    |
| Running Env Steps   | 274000   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 548      |
----------------------------------
2025-02-01 13:57:05.217479 Eastern Standard Time
| Itration            | 549      |
| Real Det Return     | 544      |
| Real Sto Return     | 486      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 274500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 3.93     |
| Running Update Time | 549      |
----------------------------------
2025-02-01 13:57:20.786954 Eastern Standard Time
| Itration            | 550      |
| Real Det Return     | 508      |
| Real Sto Return     | 447      |
| Reward Loss         | -21.9    |
| Running Env Steps   | 275000   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 550      |
----------------------------------
2025-02-01 13:57:36.404613 Eastern Standard Time
| Itration            | 551      |
| Real Det Return     | 532      |
| Real Sto Return     | 467      |
| Reward Loss         | -18.1    |
| Running Env Steps   | 275500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 3.77     |
| Running Update Time | 551      |
----------------------------------
2025-02-01 13:57:51.997997 Eastern Standard Time
| Itration            | 552      |
| Real Det Return     | 503      |
| Real Sto Return     | 455      |
| Reward Loss         | -56.2    |
| Running Env Steps   | 276000   |
| Running Forward KL  | -4.24    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 552      |
----------------------------------
2025-02-01 13:58:07.657246 Eastern Standard Time
| Itration            | 553      |
| Real Det Return     | 530      |
| Real Sto Return     | 463      |
| Reward Loss         | -31      |
| Running Env Steps   | 276500   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 553      |
----------------------------------
2025-02-01 13:58:23.207269 Eastern Standard Time
| Itration            | 554      |
| Real Det Return     | 532      |
| Real Sto Return     | 464      |
| Reward Loss         | -36.4    |
| Running Env Steps   | 277000   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 3.55     |
| Running Update Time | 554      |
----------------------------------
2025-02-01 13:58:38.859485 Eastern Standard Time
| Itration            | 555      |
| Real Det Return     | 514      |
| Real Sto Return     | 460      |
| Reward Loss         | -29.2    |
| Running Env Steps   | 277500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 555      |
----------------------------------
2025-02-01 13:58:54.552214 Eastern Standard Time
| Itration            | 556      |
| Real Det Return     | 519      |
| Real Sto Return     | 459      |
| Reward Loss         | -33.3    |
| Running Env Steps   | 278000   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 556      |
----------------------------------
2025-02-01 13:59:10.226677 Eastern Standard Time
| Itration            | 557      |
| Real Det Return     | 527      |
| Real Sto Return     | 464      |
| Reward Loss         | -31.9    |
| Running Env Steps   | 278500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 557      |
----------------------------------
2025-02-01 13:59:25.927479 Eastern Standard Time
| Itration            | 558      |
| Real Det Return     | 520      |
| Real Sto Return     | 456      |
| Reward Loss         | -29.4    |
| Running Env Steps   | 279000   |
| Running Forward KL  | -4.15    |
| Running Reverse KL  | 3.95     |
| Running Update Time | 558      |
----------------------------------
2025-02-01 13:59:41.523285 Eastern Standard Time
| Itration            | 559      |
| Real Det Return     | 515      |
| Real Sto Return     | 457      |
| Reward Loss         | -21.4    |
| Running Env Steps   | 279500   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 3.78     |
| Running Update Time | 559      |
----------------------------------
2025-02-01 13:59:57.066288 Eastern Standard Time
| Itration            | 560      |
| Real Det Return     | 503      |
| Real Sto Return     | 455      |
| Reward Loss         | -25.1    |
| Running Env Steps   | 280000   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 560      |
----------------------------------
2025-02-01 14:00:12.689879 Eastern Standard Time
| Itration            | 561      |
| Real Det Return     | 540      |
| Real Sto Return     | 470      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 280500   |
| Running Forward KL  | -4.01    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 561      |
----------------------------------
2025-02-01 14:00:28.268877 Eastern Standard Time
| Itration            | 562      |
| Real Det Return     | 525      |
| Real Sto Return     | 460      |
| Reward Loss         | -20.4    |
| Running Env Steps   | 281000   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 562      |
----------------------------------
2025-02-01 14:00:43.901658 Eastern Standard Time
| Itration            | 563      |
| Real Det Return     | 536      |
| Real Sto Return     | 470      |
| Reward Loss         | -5.69    |
| Running Env Steps   | 281500   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 3.94     |
| Running Update Time | 563      |
----------------------------------
2025-02-01 14:00:59.536913 Eastern Standard Time
| Itration            | 564      |
| Real Det Return     | 514      |
| Real Sto Return     | 456      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 282000   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 564      |
----------------------------------
2025-02-01 14:01:15.173480 Eastern Standard Time
| Itration            | 565      |
| Real Det Return     | 524      |
| Real Sto Return     | 457      |
| Reward Loss         | -22.5    |
| Running Env Steps   | 282500   |
| Running Forward KL  | -4.3     |
| Running Reverse KL  | 3.6      |
| Running Update Time | 565      |
----------------------------------
2025-02-01 14:01:30.727001 Eastern Standard Time
| Itration            | 566      |
| Real Det Return     | 520      |
| Real Sto Return     | 463      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 283000   |
| Running Forward KL  | -3.36    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 566      |
----------------------------------
2025-02-01 14:01:46.278611 Eastern Standard Time
| Itration            | 567      |
| Real Det Return     | 525      |
| Real Sto Return     | 468      |
| Reward Loss         | -28.6    |
| Running Env Steps   | 283500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 3.75     |
| Running Update Time | 567      |
----------------------------------
2025-02-01 14:02:01.936473 Eastern Standard Time
| Itration            | 568      |
| Real Det Return     | 493      |
| Real Sto Return     | 447      |
| Reward Loss         | -22.6    |
| Running Env Steps   | 284000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 568      |
----------------------------------
2025-02-01 14:02:17.631973 Eastern Standard Time
| Itration            | 569      |
| Real Det Return     | 535      |
| Real Sto Return     | 460      |
| Reward Loss         | -24.1    |
| Running Env Steps   | 284500   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 3.63     |
| Running Update Time | 569      |
----------------------------------
2025-02-01 14:02:33.247362 Eastern Standard Time
| Itration            | 570      |
| Real Det Return     | 510      |
| Real Sto Return     | 459      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 285000   |
| Running Forward KL  | -4.04    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 570      |
----------------------------------
2025-02-01 14:02:48.816858 Eastern Standard Time
| Itration            | 571      |
| Real Det Return     | 525      |
| Real Sto Return     | 476      |
| Reward Loss         | -22.7    |
| Running Env Steps   | 285500   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 3.99     |
| Running Update Time | 571      |
----------------------------------
2025-02-01 14:03:04.441108 Eastern Standard Time
| Itration            | 572      |
| Real Det Return     | 506      |
| Real Sto Return     | 452      |
| Reward Loss         | -27.1    |
| Running Env Steps   | 286000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 572      |
----------------------------------
2025-02-01 14:03:20.111139 Eastern Standard Time
| Itration            | 573      |
| Real Det Return     | 503      |
| Real Sto Return     | 457      |
| Reward Loss         | -21.2    |
| Running Env Steps   | 286500   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 3.92     |
| Running Update Time | 573      |
----------------------------------
2025-02-01 14:03:35.755175 Eastern Standard Time
| Itration            | 574      |
| Real Det Return     | 520      |
| Real Sto Return     | 458      |
| Reward Loss         | -14.5    |
| Running Env Steps   | 287000   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 574      |
----------------------------------
2025-02-01 14:03:51.270808 Eastern Standard Time
| Itration            | 575      |
| Real Det Return     | 511      |
| Real Sto Return     | 455      |
| Reward Loss         | -43.8    |
| Running Env Steps   | 287500   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 575      |
----------------------------------
2025-02-01 14:04:06.928922 Eastern Standard Time
| Itration            | 576      |
| Real Det Return     | 537      |
| Real Sto Return     | 459      |
| Reward Loss         | -26.9    |
| Running Env Steps   | 288000   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 576      |
----------------------------------
2025-02-01 14:04:22.598935 Eastern Standard Time
| Itration            | 577      |
| Real Det Return     | 514      |
| Real Sto Return     | 467      |
| Reward Loss         | -28.4    |
| Running Env Steps   | 288500   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 577      |
----------------------------------
2025-02-01 14:04:38.184090 Eastern Standard Time
| Itration            | 578      |
| Real Det Return     | 509      |
| Real Sto Return     | 457      |
| Reward Loss         | -30.4    |
| Running Env Steps   | 289000   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 578      |
----------------------------------
2025-02-01 14:04:53.734739 Eastern Standard Time
| Itration            | 579      |
| Real Det Return     | 528      |
| Real Sto Return     | 452      |
| Reward Loss         | -22.3    |
| Running Env Steps   | 289500   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 579      |
----------------------------------
2025-02-01 14:05:09.431202 Eastern Standard Time
| Itration            | 580      |
| Real Det Return     | 509      |
| Real Sto Return     | 452      |
| Reward Loss         | -18.9    |
| Running Env Steps   | 290000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 3.92     |
| Running Update Time | 580      |
----------------------------------
2025-02-01 14:05:25.024987 Eastern Standard Time
| Itration            | 581      |
| Real Det Return     | 524      |
| Real Sto Return     | 467      |
| Reward Loss         | -33.1    |
| Running Env Steps   | 290500   |
| Running Forward KL  | -4.3     |
| Running Reverse KL  | 3.97     |
| Running Update Time | 581      |
----------------------------------
2025-02-01 14:05:40.566355 Eastern Standard Time
| Itration            | 582      |
| Real Det Return     | 516      |
| Real Sto Return     | 465      |
| Reward Loss         | -16.8    |
| Running Env Steps   | 291000   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 582      |
----------------------------------
2025-02-01 14:05:56.137564 Eastern Standard Time
| Itration            | 583      |
| Real Det Return     | 531      |
| Real Sto Return     | 477      |
| Reward Loss         | -18      |
| Running Env Steps   | 291500   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 3.79     |
| Running Update Time | 583      |
----------------------------------
2025-02-01 14:06:11.649322 Eastern Standard Time
| Itration            | 584      |
| Real Det Return     | 521      |
| Real Sto Return     | 462      |
| Reward Loss         | -30.9    |
| Running Env Steps   | 292000   |
| Running Forward KL  | -3.95    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 584      |
----------------------------------
2025-02-01 14:06:27.252709 Eastern Standard Time
| Itration            | 585      |
| Real Det Return     | 502      |
| Real Sto Return     | 448      |
| Reward Loss         | -16.6    |
| Running Env Steps   | 292500   |
| Running Forward KL  | -4.12    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 585      |
----------------------------------
2025-02-01 14:06:42.874072 Eastern Standard Time
| Itration            | 586      |
| Real Det Return     | 530      |
| Real Sto Return     | 479      |
| Reward Loss         | -21.7    |
| Running Env Steps   | 293000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 4.28     |
| Running Update Time | 586      |
----------------------------------
2025-02-01 14:06:58.537082 Eastern Standard Time
| Itration            | 587      |
| Real Det Return     | 536      |
| Real Sto Return     | 485      |
| Reward Loss         | -38.3    |
| Running Env Steps   | 293500   |
| Running Forward KL  | -4.13    |
| Running Reverse KL  | 3.76     |
| Running Update Time | 587      |
----------------------------------
2025-02-01 14:07:14.136337 Eastern Standard Time
| Itration            | 588      |
| Real Det Return     | 517      |
| Real Sto Return     | 462      |
| Reward Loss         | -15.5    |
| Running Env Steps   | 294000   |
| Running Forward KL  | -4.01    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 588      |
----------------------------------
2025-02-01 14:07:29.756891 Eastern Standard Time
| Itration            | 589      |
| Real Det Return     | 519      |
| Real Sto Return     | 457      |
| Reward Loss         | -33.7    |
| Running Env Steps   | 294500   |
| Running Forward KL  | -3.96    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 589      |
----------------------------------
2025-02-01 14:07:45.348213 Eastern Standard Time
| Itration            | 590      |
| Real Det Return     | 521      |
| Real Sto Return     | 475      |
| Reward Loss         | -23      |
| Running Env Steps   | 295000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 590      |
----------------------------------
2025-02-01 14:08:00.922073 Eastern Standard Time
| Itration            | 591      |
| Real Det Return     | 523      |
| Real Sto Return     | 465      |
| Reward Loss         | -23      |
| Running Env Steps   | 295500   |
| Running Forward KL  | -4.32    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 591      |
----------------------------------
2025-02-01 14:08:16.455104 Eastern Standard Time
| Itration            | 592      |
| Real Det Return     | 539      |
| Real Sto Return     | 475      |
| Reward Loss         | -22.5    |
| Running Env Steps   | 296000   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 592      |
----------------------------------
2025-02-01 14:08:32.036887 Eastern Standard Time
| Itration            | 593      |
| Real Det Return     | 521      |
| Real Sto Return     | 467      |
| Reward Loss         | -33      |
| Running Env Steps   | 296500   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 593      |
----------------------------------
2025-02-01 14:08:47.598450 Eastern Standard Time
| Itration            | 594      |
| Real Det Return     | 529      |
| Real Sto Return     | 460      |
| Reward Loss         | -19      |
| Running Env Steps   | 297000   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 3.89     |
| Running Update Time | 594      |
----------------------------------
2025-02-01 14:09:03.195782 Eastern Standard Time
| Itration            | 595      |
| Real Det Return     | 528      |
| Real Sto Return     | 463      |
| Reward Loss         | -13.3    |
| Running Env Steps   | 297500   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 595      |
----------------------------------
2025-02-01 14:09:18.750510 Eastern Standard Time
| Itration            | 596      |
| Real Det Return     | 501      |
| Real Sto Return     | 454      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 298000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 596      |
----------------------------------
2025-02-01 14:09:34.404951 Eastern Standard Time
| Itration            | 597      |
| Real Det Return     | 528      |
| Real Sto Return     | 464      |
| Reward Loss         | -16.9    |
| Running Env Steps   | 298500   |
| Running Forward KL  | -4.06    |
| Running Reverse KL  | 3.84     |
| Running Update Time | 597      |
----------------------------------
2025-02-01 14:09:50.021950 Eastern Standard Time
| Itration            | 598      |
| Real Det Return     | 525      |
| Real Sto Return     | 469      |
| Reward Loss         | -27.9    |
| Running Env Steps   | 299000   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 598      |
----------------------------------
2025-02-01 14:10:05.509765 Eastern Standard Time
| Itration            | 599      |
| Real Det Return     | 507      |
| Real Sto Return     | 465      |
| Reward Loss         | -29.3    |
| Running Env Steps   | 299500   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 3.87     |
| Running Update Time | 599      |
----------------------------------
2025-02-01 14:10:21.103536 Eastern Standard Time
| Itration            | 600      |
| Real Det Return     | 516      |
| Real Sto Return     | 463      |
| Reward Loss         | -11.7    |
| Running Env Steps   | 300000   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 4.28     |
| Running Update Time | 600      |
----------------------------------
2025-02-01 14:10:36.656061 Eastern Standard Time
| Itration            | 601      |
| Real Det Return     | 509      |
| Real Sto Return     | 466      |
| Reward Loss         | -32.3    |
| Running Env Steps   | 300500   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 3.72     |
| Running Update Time | 601      |
----------------------------------
2025-02-01 14:10:52.275201 Eastern Standard Time
| Itration            | 602      |
| Real Det Return     | 502      |
| Real Sto Return     | 457      |
| Reward Loss         | -39.4    |
| Running Env Steps   | 301000   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 602      |
----------------------------------
2025-02-01 14:11:07.874461 Eastern Standard Time
| Itration            | 603      |
| Real Det Return     | 521      |
| Real Sto Return     | 478      |
| Reward Loss         | -18      |
| Running Env Steps   | 301500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 603      |
----------------------------------
2025-02-01 14:11:23.571354 Eastern Standard Time
| Itration            | 604      |
| Real Det Return     | 528      |
| Real Sto Return     | 470      |
| Reward Loss         | -21.1    |
| Running Env Steps   | 302000   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 3.97     |
| Running Update Time | 604      |
----------------------------------
2025-02-01 14:11:39.086277 Eastern Standard Time
| Itration            | 605      |
| Real Det Return     | 523      |
| Real Sto Return     | 463      |
| Reward Loss         | -25.2    |
| Running Env Steps   | 302500   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 605      |
----------------------------------
2025-02-01 14:11:54.700510 Eastern Standard Time
| Itration            | 606      |
| Real Det Return     | 521      |
| Real Sto Return     | 466      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 303000   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 4.34     |
| Running Update Time | 606      |
----------------------------------
2025-02-01 14:12:10.372415 Eastern Standard Time
| Itration            | 607      |
| Real Det Return     | 516      |
| Real Sto Return     | 459      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 303500   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.18     |
| Running Update Time | 607      |
----------------------------------
2025-02-01 14:12:25.958842 Eastern Standard Time
| Itration            | 608      |
| Real Det Return     | 522      |
| Real Sto Return     | 468      |
| Reward Loss         | -25.7    |
| Running Env Steps   | 304000   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 3.95     |
| Running Update Time | 608      |
----------------------------------
2025-02-01 14:12:41.509769 Eastern Standard Time
| Itration            | 609      |
| Real Det Return     | 516      |
| Real Sto Return     | 466      |
| Reward Loss         | -38.5    |
| Running Env Steps   | 304500   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 609      |
----------------------------------
2025-02-01 14:12:57.027864 Eastern Standard Time
| Itration            | 610      |
| Real Det Return     | 538      |
| Real Sto Return     | 463      |
| Reward Loss         | -30.1    |
| Running Env Steps   | 305000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 610      |
----------------------------------
2025-02-01 14:13:12.712340 Eastern Standard Time
| Itration            | 611      |
| Real Det Return     | 516      |
| Real Sto Return     | 474      |
| Reward Loss         | -30.4    |
| Running Env Steps   | 305500   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 3.94     |
| Running Update Time | 611      |
----------------------------------
2025-02-01 14:13:28.269359 Eastern Standard Time
| Itration            | 612      |
| Real Det Return     | 527      |
| Real Sto Return     | 475      |
| Reward Loss         | -16.5    |
| Running Env Steps   | 306000   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 3.84     |
| Running Update Time | 612      |
----------------------------------
2025-02-01 14:13:43.925267 Eastern Standard Time
| Itration            | 613      |
| Real Det Return     | 527      |
| Real Sto Return     | 462      |
| Reward Loss         | -23.2    |
| Running Env Steps   | 306500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 613      |
----------------------------------
2025-02-01 14:13:59.560273 Eastern Standard Time
| Itration            | 614      |
| Real Det Return     | 524      |
| Real Sto Return     | 463      |
| Reward Loss         | -16      |
| Running Env Steps   | 307000   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 614      |
----------------------------------
2025-02-01 14:14:15.215945 Eastern Standard Time
| Itration            | 615      |
| Real Det Return     | 528      |
| Real Sto Return     | 464      |
| Reward Loss         | -32      |
| Running Env Steps   | 307500   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 4.01     |
| Running Update Time | 615      |
----------------------------------
2025-02-01 14:14:30.879853 Eastern Standard Time
| Itration            | 616      |
| Real Det Return     | 504      |
| Real Sto Return     | 459      |
| Reward Loss         | -15.8    |
| Running Env Steps   | 308000   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 4.27     |
| Running Update Time | 616      |
----------------------------------
2025-02-01 14:14:46.451181 Eastern Standard Time
| Itration            | 617      |
| Real Det Return     | 516      |
| Real Sto Return     | 465      |
| Reward Loss         | -42.4    |
| Running Env Steps   | 308500   |
| Running Forward KL  | -3.73    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 617      |
----------------------------------
2025-02-01 14:15:01.980758 Eastern Standard Time
| Itration            | 618      |
| Real Det Return     | 528      |
| Real Sto Return     | 478      |
| Reward Loss         | -16.7    |
| Running Env Steps   | 309000   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 3.92     |
| Running Update Time | 618      |
----------------------------------
2025-02-01 14:15:17.477095 Eastern Standard Time
| Itration            | 619      |
| Real Det Return     | 508      |
| Real Sto Return     | 453      |
| Reward Loss         | -27.9    |
| Running Env Steps   | 309500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 619      |
----------------------------------
2025-02-01 14:15:33.074587 Eastern Standard Time
| Itration            | 620      |
| Real Det Return     | 505      |
| Real Sto Return     | 457      |
| Reward Loss         | -23.9    |
| Running Env Steps   | 310000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 620      |
----------------------------------
2025-02-01 14:15:48.661803 Eastern Standard Time
| Itration            | 621      |
| Real Det Return     | 517      |
| Real Sto Return     | 460      |
| Reward Loss         | -22.3    |
| Running Env Steps   | 310500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 4        |
| Running Update Time | 621      |
----------------------------------
2025-02-01 14:16:04.203939 Eastern Standard Time
| Itration            | 622      |
| Real Det Return     | 551      |
| Real Sto Return     | 484      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 311000   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 3.91     |
| Running Update Time | 622      |
----------------------------------
2025-02-01 14:16:19.783794 Eastern Standard Time
| Itration            | 623      |
| Real Det Return     | 524      |
| Real Sto Return     | 469      |
| Reward Loss         | -19.1    |
| Running Env Steps   | 311500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 623      |
----------------------------------
2025-02-01 14:16:35.303884 Eastern Standard Time
| Itration            | 624      |
| Real Det Return     | 519      |
| Real Sto Return     | 475      |
| Reward Loss         | -12.7    |
| Running Env Steps   | 312000   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 3.64     |
| Running Update Time | 624      |
----------------------------------
2025-02-01 14:16:50.923110 Eastern Standard Time
| Itration            | 625      |
| Real Det Return     | 523      |
| Real Sto Return     | 456      |
| Reward Loss         | -42      |
| Running Env Steps   | 312500   |
| Running Forward KL  | -3.71    |
| Running Reverse KL  | 4.12     |
| Running Update Time | 625      |
----------------------------------
2025-02-01 14:17:06.548898 Eastern Standard Time
| Itration            | 626      |
| Real Det Return     | 524      |
| Real Sto Return     | 471      |
| Reward Loss         | -11.9    |
| Running Env Steps   | 313000   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 626      |
----------------------------------
2025-02-01 14:17:22.332418 Eastern Standard Time
| Itration            | 627      |
| Real Det Return     | 534      |
| Real Sto Return     | 484      |
| Reward Loss         | -14.9    |
| Running Env Steps   | 313500   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 627      |
----------------------------------
2025-02-01 14:17:37.931556 Eastern Standard Time
| Itration            | 628      |
| Real Det Return     | 515      |
| Real Sto Return     | 453      |
| Reward Loss         | -42.2    |
| Running Env Steps   | 314000   |
| Running Forward KL  | -4.05    |
| Running Reverse KL  | 3.91     |
| Running Update Time | 628      |
----------------------------------
2025-02-01 14:17:53.410272 Eastern Standard Time
| Itration            | 629      |
| Real Det Return     | 522      |
| Real Sto Return     | 471      |
| Reward Loss         | -18.2    |
| Running Env Steps   | 314500   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 629      |
----------------------------------
2025-02-01 14:18:09.052439 Eastern Standard Time
| Itration            | 630      |
| Real Det Return     | 512      |
| Real Sto Return     | 460      |
| Reward Loss         | -26.9    |
| Running Env Steps   | 315000   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 630      |
----------------------------------
2025-02-01 14:18:24.697528 Eastern Standard Time
| Itration            | 631      |
| Real Det Return     | 509      |
| Real Sto Return     | 461      |
| Reward Loss         | -23.1    |
| Running Env Steps   | 315500   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 631      |
----------------------------------
2025-02-01 14:18:40.171354 Eastern Standard Time
| Itration            | 632      |
| Real Det Return     | 523      |
| Real Sto Return     | 470      |
| Reward Loss         | -32.9    |
| Running Env Steps   | 316000   |
| Running Forward KL  | -4.24    |
| Running Reverse KL  | 3.98     |
| Running Update Time | 632      |
----------------------------------
2025-02-01 14:18:55.719946 Eastern Standard Time
| Itration            | 633      |
| Real Det Return     | 513      |
| Real Sto Return     | 451      |
| Reward Loss         | -22.8    |
| Running Env Steps   | 316500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 4.31     |
| Running Update Time | 633      |
----------------------------------
2025-02-01 14:19:11.396269 Eastern Standard Time
| Itration            | 634      |
| Real Det Return     | 507      |
| Real Sto Return     | 462      |
| Reward Loss         | -30.7    |
| Running Env Steps   | 317000   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 3.83     |
| Running Update Time | 634      |
----------------------------------
2025-02-01 14:19:27.006902 Eastern Standard Time
| Itration            | 635      |
| Real Det Return     | 519      |
| Real Sto Return     | 460      |
| Reward Loss         | -49.8    |
| Running Env Steps   | 317500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 3.8      |
| Running Update Time | 635      |
----------------------------------
2025-02-01 14:19:42.570840 Eastern Standard Time
| Itration            | 636      |
| Real Det Return     | 528      |
| Real Sto Return     | 470      |
| Reward Loss         | -16.2    |
| Running Env Steps   | 318000   |
| Running Forward KL  | -4.09    |
| Running Reverse KL  | 4.38     |
| Running Update Time | 636      |
----------------------------------
2025-02-01 14:19:58.098579 Eastern Standard Time
| Itration            | 637      |
| Real Det Return     | 538      |
| Real Sto Return     | 479      |
| Reward Loss         | -22.7    |
| Running Env Steps   | 318500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 3.92     |
| Running Update Time | 637      |
----------------------------------
2025-02-01 14:20:13.672480 Eastern Standard Time
| Itration            | 638      |
| Real Det Return     | 517      |
| Real Sto Return     | 468      |
| Reward Loss         | -39.8    |
| Running Env Steps   | 319000   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 4.01     |
| Running Update Time | 638      |
----------------------------------
2025-02-01 14:20:29.192103 Eastern Standard Time
| Itration            | 639      |
| Real Det Return     | 510      |
| Real Sto Return     | 462      |
| Reward Loss         | -24.1    |
| Running Env Steps   | 319500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 3.97     |
| Running Update Time | 639      |
----------------------------------
2025-02-01 14:20:44.805356 Eastern Standard Time
| Itration            | 640      |
| Real Det Return     | 525      |
| Real Sto Return     | 475      |
| Reward Loss         | -15.3    |
| Running Env Steps   | 320000   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 640      |
----------------------------------
2025-02-01 14:21:00.428202 Eastern Standard Time
| Itration            | 641      |
| Real Det Return     | 543      |
| Real Sto Return     | 475      |
| Reward Loss         | -14      |
| Running Env Steps   | 320500   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 4.2      |
| Running Update Time | 641      |
----------------------------------
2025-02-01 14:21:16.012964 Eastern Standard Time
| Itration            | 642      |
| Real Det Return     | 527      |
| Real Sto Return     | 477      |
| Reward Loss         | -21.8    |
| Running Env Steps   | 321000   |
| Running Forward KL  | -4.33    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 642      |
----------------------------------
2025-02-01 14:21:31.626421 Eastern Standard Time
| Itration            | 643      |
| Real Det Return     | 522      |
| Real Sto Return     | 471      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 321500   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 3.77     |
| Running Update Time | 643      |
----------------------------------
2025-02-01 14:21:47.221045 Eastern Standard Time
| Itration            | 644      |
| Real Det Return     | 527      |
| Real Sto Return     | 476      |
| Reward Loss         | -22      |
| Running Env Steps   | 322000   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 3.79     |
| Running Update Time | 644      |
----------------------------------
2025-02-01 14:22:02.859851 Eastern Standard Time
| Itration            | 645      |
| Real Det Return     | 522      |
| Real Sto Return     | 466      |
| Reward Loss         | -19.6    |
| Running Env Steps   | 322500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 645      |
----------------------------------
2025-02-01 14:22:18.481346 Eastern Standard Time
| Itration            | 646      |
| Real Det Return     | 536      |
| Real Sto Return     | 493      |
| Reward Loss         | -17.8    |
| Running Env Steps   | 323000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 646      |
----------------------------------
2025-02-01 14:22:34.061837 Eastern Standard Time
| Itration            | 647      |
| Real Det Return     | 530      |
| Real Sto Return     | 466      |
| Reward Loss         | -39.4    |
| Running Env Steps   | 323500   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 3.69     |
| Running Update Time | 647      |
----------------------------------
2025-02-01 14:22:49.657354 Eastern Standard Time
| Itration            | 648      |
| Real Det Return     | 533      |
| Real Sto Return     | 485      |
| Reward Loss         | -8.75    |
| Running Env Steps   | 324000   |
| Running Forward KL  | -4.04    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 648      |
----------------------------------
2025-02-01 14:23:05.238672 Eastern Standard Time
| Itration            | 649      |
| Real Det Return     | 521      |
| Real Sto Return     | 472      |
| Reward Loss         | -38.3    |
| Running Env Steps   | 324500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 3.8      |
| Running Update Time | 649      |
----------------------------------
2025-02-01 14:23:20.820697 Eastern Standard Time
| Itration            | 650      |
| Real Det Return     | 527      |
| Real Sto Return     | 469      |
| Reward Loss         | -25.3    |
| Running Env Steps   | 325000   |
| Running Forward KL  | -4.15    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 650      |
----------------------------------
2025-02-01 14:23:36.411150 Eastern Standard Time
| Itration            | 651      |
| Real Det Return     | 503      |
| Real Sto Return     | 463      |
| Reward Loss         | -20.3    |
| Running Env Steps   | 325500   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 651      |
----------------------------------
2025-02-01 14:23:52.004039 Eastern Standard Time
| Itration            | 652      |
| Real Det Return     | 535      |
| Real Sto Return     | 485      |
| Reward Loss         | -18.5    |
| Running Env Steps   | 326000   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 3.89     |
| Running Update Time | 652      |
----------------------------------
2025-02-01 14:24:07.567554 Eastern Standard Time
| Itration            | 653      |
| Real Det Return     | 517      |
| Real Sto Return     | 466      |
| Reward Loss         | -25.1    |
| Running Env Steps   | 326500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 653      |
----------------------------------
2025-02-01 14:24:23.450373 Eastern Standard Time
| Itration            | 654      |
| Real Det Return     | 521      |
| Real Sto Return     | 477      |
| Reward Loss         | -18.2    |
| Running Env Steps   | 327000   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 654      |
----------------------------------
2025-02-01 14:24:39.039131 Eastern Standard Time
| Itration            | 655      |
| Real Det Return     | 507      |
| Real Sto Return     | 472      |
| Reward Loss         | -25.9    |
| Running Env Steps   | 327500   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 655      |
----------------------------------
2025-02-01 14:24:54.568527 Eastern Standard Time
| Itration            | 656      |
| Real Det Return     | 522      |
| Real Sto Return     | 461      |
| Reward Loss         | -19.9    |
| Running Env Steps   | 328000   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 3.72     |
| Running Update Time | 656      |
----------------------------------
2025-02-01 14:25:10.142370 Eastern Standard Time
| Itration            | 657      |
| Real Det Return     | 528      |
| Real Sto Return     | 469      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 328500   |
| Running Forward KL  | -3.97    |
| Running Reverse KL  | 4        |
| Running Update Time | 657      |
----------------------------------
2025-02-01 14:25:28.466391 Eastern Standard Time
| Itration            | 658      |
| Real Det Return     | 537      |
| Real Sto Return     | 478      |
| Reward Loss         | -28      |
| Running Env Steps   | 329000   |
| Running Forward KL  | -4.3     |
| Running Reverse KL  | 3.83     |
| Running Update Time | 658      |
----------------------------------
2025-02-01 14:25:45.866484 Eastern Standard Time
| Itration            | 659      |
| Real Det Return     | 521      |
| Real Sto Return     | 463      |
| Reward Loss         | -18.8    |
| Running Env Steps   | 329500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 3.77     |
| Running Update Time | 659      |
----------------------------------
2025-02-01 14:26:03.861223 Eastern Standard Time
| Itration            | 660      |
| Real Det Return     | 516      |
| Real Sto Return     | 467      |
| Reward Loss         | -28.6    |
| Running Env Steps   | 330000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 660      |
----------------------------------
2025-02-01 14:26:24.254474 Eastern Standard Time
| Itration            | 661      |
| Real Det Return     | 504      |
| Real Sto Return     | 468      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 330500   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 4.08     |
| Running Update Time | 661      |
----------------------------------
2025-02-01 14:26:41.359755 Eastern Standard Time
| Itration            | 662      |
| Real Det Return     | 536      |
| Real Sto Return     | 476      |
| Reward Loss         | -27.7    |
| Running Env Steps   | 331000   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 662      |
----------------------------------
2025-02-01 14:26:58.325697 Eastern Standard Time
| Itration            | 663      |
| Real Det Return     | 522      |
| Real Sto Return     | 472      |
| Reward Loss         | -26.3    |
| Running Env Steps   | 331500   |
| Running Forward KL  | -4.87    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 663      |
----------------------------------
2025-02-01 14:27:15.366669 Eastern Standard Time
| Itration            | 664      |
| Real Det Return     | 531      |
| Real Sto Return     | 473      |
| Reward Loss         | -27.5    |
| Running Env Steps   | 332000   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 664      |
----------------------------------
2025-02-01 14:27:32.322089 Eastern Standard Time
| Itration            | 665      |
| Real Det Return     | 510      |
| Real Sto Return     | 463      |
| Reward Loss         | -26      |
| Running Env Steps   | 332500   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 665      |
----------------------------------
2025-02-01 14:27:49.286622 Eastern Standard Time
| Itration            | 666      |
| Real Det Return     | 534      |
| Real Sto Return     | 465      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 333000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 666      |
----------------------------------
2025-02-01 14:28:06.256281 Eastern Standard Time
| Itration            | 667      |
| Real Det Return     | 526      |
| Real Sto Return     | 467      |
| Reward Loss         | -21.7    |
| Running Env Steps   | 333500   |
| Running Forward KL  | -4.24    |
| Running Reverse KL  | 4.08     |
| Running Update Time | 667      |
----------------------------------
2025-02-01 14:28:23.332494 Eastern Standard Time
| Itration            | 668      |
| Real Det Return     | 525      |
| Real Sto Return     | 465      |
| Reward Loss         | -24.1    |
| Running Env Steps   | 334000   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 668      |
----------------------------------
2025-02-01 14:28:40.367926 Eastern Standard Time
| Itration            | 669      |
| Real Det Return     | 520      |
| Real Sto Return     | 471      |
| Reward Loss         | -18.8    |
| Running Env Steps   | 334500   |
| Running Forward KL  | -4.15    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 669      |
----------------------------------
2025-02-01 14:28:57.299074 Eastern Standard Time
| Itration            | 670      |
| Real Det Return     | 515      |
| Real Sto Return     | 463      |
| Reward Loss         | -40.2    |
| Running Env Steps   | 335000   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 670      |
----------------------------------
2025-02-01 14:29:14.235270 Eastern Standard Time
| Itration            | 671      |
| Real Det Return     | 509      |
| Real Sto Return     | 467      |
| Reward Loss         | -30.6    |
| Running Env Steps   | 335500   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 671      |
----------------------------------
2025-02-01 14:29:31.199285 Eastern Standard Time
| Itration            | 672      |
| Real Det Return     | 532      |
| Real Sto Return     | 473      |
| Reward Loss         | -19.2    |
| Running Env Steps   | 336000   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 4.14     |
| Running Update Time | 672      |
----------------------------------
2025-02-01 14:29:48.162424 Eastern Standard Time
| Itration            | 673      |
| Real Det Return     | 531      |
| Real Sto Return     | 469      |
| Reward Loss         | -12.8    |
| Running Env Steps   | 336500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 673      |
----------------------------------
2025-02-01 14:30:05.067033 Eastern Standard Time
| Itration            | 674      |
| Real Det Return     | 534      |
| Real Sto Return     | 477      |
| Reward Loss         | -36.4    |
| Running Env Steps   | 337000   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 674      |
----------------------------------
2025-02-01 14:30:22.048802 Eastern Standard Time
| Itration            | 675      |
| Real Det Return     | 527      |
| Real Sto Return     | 474      |
| Reward Loss         | -21.1    |
| Running Env Steps   | 337500   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 675      |
----------------------------------
2025-02-01 14:30:38.972260 Eastern Standard Time
| Itration            | 676      |
| Real Det Return     | 519      |
| Real Sto Return     | 482      |
| Reward Loss         | -42.9    |
| Running Env Steps   | 338000   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 676      |
----------------------------------
2025-02-01 14:30:56.014172 Eastern Standard Time
| Itration            | 677      |
| Real Det Return     | 523      |
| Real Sto Return     | 474      |
| Reward Loss         | -30.4    |
| Running Env Steps   | 338500   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 677      |
----------------------------------
2025-02-01 14:31:13.055919 Eastern Standard Time
| Itration            | 678      |
| Real Det Return     | 526      |
| Real Sto Return     | 463      |
| Reward Loss         | -23.5    |
| Running Env Steps   | 339000   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 678      |
----------------------------------
2025-02-01 14:31:29.973557 Eastern Standard Time
| Itration            | 679      |
| Real Det Return     | 520      |
| Real Sto Return     | 467      |
| Reward Loss         | -29.1    |
| Running Env Steps   | 339500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 679      |
----------------------------------
2025-02-01 14:31:46.919616 Eastern Standard Time
| Itration            | 680      |
| Real Det Return     | 533      |
| Real Sto Return     | 469      |
| Reward Loss         | -16.9    |
| Running Env Steps   | 340000   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 680      |
----------------------------------
2025-02-01 14:32:03.816369 Eastern Standard Time
| Itration            | 681      |
| Real Det Return     | 522      |
| Real Sto Return     | 478      |
| Reward Loss         | -14.9    |
| Running Env Steps   | 340500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 681      |
----------------------------------
2025-02-01 14:32:20.951776 Eastern Standard Time
| Itration            | 682      |
| Real Det Return     | 526      |
| Real Sto Return     | 465      |
| Reward Loss         | -17.9    |
| Running Env Steps   | 341000   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 682      |
----------------------------------
2025-02-01 14:32:37.890080 Eastern Standard Time
| Itration            | 683      |
| Real Det Return     | 516      |
| Real Sto Return     | 471      |
| Reward Loss         | -45.2    |
| Running Env Steps   | 341500   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 683      |
----------------------------------
2025-02-01 14:32:55.114077 Eastern Standard Time
| Itration            | 684      |
| Real Det Return     | 515      |
| Real Sto Return     | 463      |
| Reward Loss         | -41.9    |
| Running Env Steps   | 342000   |
| Running Forward KL  | -4.12    |
| Running Reverse KL  | 3.91     |
| Running Update Time | 684      |
----------------------------------
2025-02-01 14:33:12.262173 Eastern Standard Time
| Itration            | 685      |
| Real Det Return     | 536      |
| Real Sto Return     | 484      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 342500   |
| Running Forward KL  | -4       |
| Running Reverse KL  | 3.82     |
| Running Update Time | 685      |
----------------------------------
2025-02-01 14:33:29.200715 Eastern Standard Time
| Itration            | 686      |
| Real Det Return     | 538      |
| Real Sto Return     | 488      |
| Reward Loss         | -26.8    |
| Running Env Steps   | 343000   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 686      |
----------------------------------
2025-02-01 14:33:46.203406 Eastern Standard Time
| Itration            | 687      |
| Real Det Return     | 506      |
| Real Sto Return     | 465      |
| Reward Loss         | -17.3    |
| Running Env Steps   | 343500   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 4.04     |
| Running Update Time | 687      |
----------------------------------
2025-02-01 14:34:03.123105 Eastern Standard Time
| Itration            | 688      |
| Real Det Return     | 521      |
| Real Sto Return     | 462      |
| Reward Loss         | -39.2    |
| Running Env Steps   | 344000   |
| Running Forward KL  | -4.01    |
| Running Reverse KL  | 3.66     |
| Running Update Time | 688      |
----------------------------------
2025-02-01 14:34:20.131463 Eastern Standard Time
| Itration            | 689      |
| Real Det Return     | 533      |
| Real Sto Return     | 475      |
| Reward Loss         | -11.3    |
| Running Env Steps   | 344500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 3.68     |
| Running Update Time | 689      |
----------------------------------
2025-02-01 14:34:37.061396 Eastern Standard Time
| Itration            | 690      |
| Real Det Return     | 517      |
| Real Sto Return     | 474      |
| Reward Loss         | -26.4    |
| Running Env Steps   | 345000   |
| Running Forward KL  | -4.12    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 690      |
----------------------------------
2025-02-01 14:34:53.967934 Eastern Standard Time
| Itration            | 691      |
| Real Det Return     | 536      |
| Real Sto Return     | 472      |
| Reward Loss         | -10.7    |
| Running Env Steps   | 345500   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 691      |
----------------------------------
2025-02-01 14:35:11.048615 Eastern Standard Time
| Itration            | 692      |
| Real Det Return     | 529      |
| Real Sto Return     | 486      |
| Reward Loss         | -20.6    |
| Running Env Steps   | 346000   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 3.91     |
| Running Update Time | 692      |
----------------------------------
2025-02-01 14:35:28.004092 Eastern Standard Time
| Itration            | 693      |
| Real Det Return     | 520      |
| Real Sto Return     | 471      |
| Reward Loss         | -23.3    |
| Running Env Steps   | 346500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 693      |
----------------------------------
2025-02-01 14:35:44.966078 Eastern Standard Time
| Itration            | 694      |
| Real Det Return     | 514      |
| Real Sto Return     | 468      |
| Reward Loss         | -13.4    |
| Running Env Steps   | 347000   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 3.9      |
| Running Update Time | 694      |
----------------------------------
2025-02-01 14:36:01.869044 Eastern Standard Time
| Itration            | 695      |
| Real Det Return     | 502      |
| Real Sto Return     | 448      |
| Reward Loss         | -28.1    |
| Running Env Steps   | 347500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 695      |
----------------------------------
2025-02-01 14:36:18.716885 Eastern Standard Time
| Itration            | 696      |
| Real Det Return     | 535      |
| Real Sto Return     | 471      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 348000   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 3.81     |
| Running Update Time | 696      |
----------------------------------
2025-02-01 14:36:35.592111 Eastern Standard Time
| Itration            | 697      |
| Real Det Return     | 527      |
| Real Sto Return     | 457      |
| Reward Loss         | -41.8    |
| Running Env Steps   | 348500   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 3.98     |
| Running Update Time | 697      |
----------------------------------
2025-02-01 14:36:52.668226 Eastern Standard Time
| Itration            | 698      |
| Real Det Return     | 529      |
| Real Sto Return     | 477      |
| Reward Loss         | -31.5    |
| Running Env Steps   | 349000   |
| Running Forward KL  | -4.06    |
| Running Reverse KL  | 4.08     |
| Running Update Time | 698      |
----------------------------------
2025-02-01 14:37:09.778599 Eastern Standard Time
| Itration            | 699      |
| Real Det Return     | 518      |
| Real Sto Return     | 467      |
| Reward Loss         | -25.7    |
| Running Env Steps   | 349500   |
| Running Forward KL  | -3.85    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 699      |
----------------------------------
2025-02-01 14:37:26.790006 Eastern Standard Time
| Itration            | 700      |
| Real Det Return     | 505      |
| Real Sto Return     | 482      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 350000   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 700      |
----------------------------------
2025-02-01 14:37:43.690522 Eastern Standard Time
| Itration            | 701      |
| Real Det Return     | 514      |
| Real Sto Return     | 466      |
| Reward Loss         | -18.7    |
| Running Env Steps   | 350500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 3.87     |
| Running Update Time | 701      |
----------------------------------
2025-02-01 14:38:00.920924 Eastern Standard Time
| Itration            | 702      |
| Real Det Return     | 545      |
| Real Sto Return     | 484      |
| Reward Loss         | -17.9    |
| Running Env Steps   | 351000   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 3.82     |
| Running Update Time | 702      |
----------------------------------
2025-02-01 14:38:17.853666 Eastern Standard Time
| Itration            | 703      |
| Real Det Return     | 510      |
| Real Sto Return     | 472      |
| Reward Loss         | -22.6    |
| Running Env Steps   | 351500   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 703      |
----------------------------------
2025-02-01 14:38:34.773390 Eastern Standard Time
| Itration            | 704      |
| Real Det Return     | 545      |
| Real Sto Return     | 481      |
| Reward Loss         | -23.5    |
| Running Env Steps   | 352000   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 3.7      |
| Running Update Time | 704      |
----------------------------------
2025-02-01 14:38:51.634741 Eastern Standard Time
| Itration            | 705      |
| Real Det Return     | 530      |
| Real Sto Return     | 473      |
| Reward Loss         | -31.4    |
| Running Env Steps   | 352500   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 3.94     |
| Running Update Time | 705      |
----------------------------------
2025-02-01 14:39:08.705844 Eastern Standard Time
| Itration            | 706      |
| Real Det Return     | 532      |
| Real Sto Return     | 482      |
| Reward Loss         | -25.7    |
| Running Env Steps   | 353000   |
| Running Forward KL  | -4.3     |
| Running Reverse KL  | 3.98     |
| Running Update Time | 706      |
----------------------------------
2025-02-01 14:39:25.088628 Eastern Standard Time
| Itration            | 707      |
| Real Det Return     | 539      |
| Real Sto Return     | 484      |
| Reward Loss         | -24.9    |
| Running Env Steps   | 353500   |
| Running Forward KL  | -3.89    |
| Running Reverse KL  | 4.1      |
| Running Update Time | 707      |
----------------------------------
2025-02-01 14:39:41.530793 Eastern Standard Time
| Itration            | 708      |
| Real Det Return     | 529      |
| Real Sto Return     | 478      |
| Reward Loss         | -15.1    |
| Running Env Steps   | 354000   |
| Running Forward KL  | -4.65    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 708      |
----------------------------------
2025-02-01 14:39:57.848967 Eastern Standard Time
| Itration            | 709      |
| Real Det Return     | 534      |
| Real Sto Return     | 471      |
| Reward Loss         | -25.4    |
| Running Env Steps   | 354500   |
| Running Forward KL  | -4.3     |
| Running Reverse KL  | 3.86     |
| Running Update Time | 709      |
----------------------------------
2025-02-01 14:40:14.237218 Eastern Standard Time
| Itration            | 710      |
| Real Det Return     | 545      |
| Real Sto Return     | 487      |
| Reward Loss         | -10.1    |
| Running Env Steps   | 355000   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.28     |
| Running Update Time | 710      |
----------------------------------
2025-02-01 14:40:30.636249 Eastern Standard Time
| Itration            | 711      |
| Real Det Return     | 521      |
| Real Sto Return     | 473      |
| Reward Loss         | -37.9    |
| Running Env Steps   | 355500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 4.08     |
| Running Update Time | 711      |
----------------------------------
2025-02-01 14:40:50.225561 Eastern Standard Time
| Itration            | 712      |
| Real Det Return     | 523      |
| Real Sto Return     | 477      |
| Reward Loss         | -39.4    |
| Running Env Steps   | 356000   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 712      |
----------------------------------
2025-02-01 14:41:08.807060 Eastern Standard Time
| Itration            | 713      |
| Real Det Return     | 524      |
| Real Sto Return     | 473      |
| Reward Loss         | -12.9    |
| Running Env Steps   | 356500   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 713      |
----------------------------------
2025-02-01 14:41:25.820584 Eastern Standard Time
| Itration            | 714      |
| Real Det Return     | 538      |
| Real Sto Return     | 474      |
| Reward Loss         | -25.3    |
| Running Env Steps   | 357000   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 4        |
| Running Update Time | 714      |
----------------------------------
2025-02-01 14:41:43.201561 Eastern Standard Time
| Itration            | 715      |
| Real Det Return     | 526      |
| Real Sto Return     | 480      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 357500   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 715      |
----------------------------------
2025-02-01 14:42:00.206019 Eastern Standard Time
| Itration            | 716      |
| Real Det Return     | 518      |
| Real Sto Return     | 464      |
| Reward Loss         | -12.5    |
| Running Env Steps   | 358000   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 716      |
----------------------------------
2025-02-01 14:42:17.128980 Eastern Standard Time
| Itration            | 717      |
| Real Det Return     | 526      |
| Real Sto Return     | 474      |
| Reward Loss         | -22      |
| Running Env Steps   | 358500   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 717      |
----------------------------------
2025-02-01 14:42:34.078607 Eastern Standard Time
| Itration            | 718      |
| Real Det Return     | 532      |
| Real Sto Return     | 476      |
| Reward Loss         | -23.7    |
| Running Env Steps   | 359000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 718      |
----------------------------------
2025-02-01 14:42:51.042482 Eastern Standard Time
| Itration            | 719      |
| Real Det Return     | 529      |
| Real Sto Return     | 480      |
| Reward Loss         | -33.6    |
| Running Env Steps   | 359500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 719      |
----------------------------------
2025-02-01 14:43:09.552704 Eastern Standard Time
| Itration            | 720      |
| Real Det Return     | 519      |
| Real Sto Return     | 471      |
| Reward Loss         | -38.7    |
| Running Env Steps   | 360000   |
| Running Forward KL  | -4.32    |
| Running Reverse KL  | 3.82     |
| Running Update Time | 720      |
----------------------------------
2025-02-01 14:43:27.118280 Eastern Standard Time
| Itration            | 721      |
| Real Det Return     | 505      |
| Real Sto Return     | 468      |
| Reward Loss         | -21      |
| Running Env Steps   | 360500   |
| Running Forward KL  | -4.12    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 721      |
----------------------------------
2025-02-01 14:43:44.058652 Eastern Standard Time
| Itration            | 722      |
| Real Det Return     | 534      |
| Real Sto Return     | 494      |
| Reward Loss         | -19.2    |
| Running Env Steps   | 361000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.84     |
| Running Update Time | 722      |
----------------------------------
2025-02-01 14:44:01.015238 Eastern Standard Time
| Itration            | 723      |
| Real Det Return     | 519      |
| Real Sto Return     | 466      |
| Reward Loss         | -30.7    |
| Running Env Steps   | 361500   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 723      |
----------------------------------
2025-02-01 14:44:17.962719 Eastern Standard Time
| Itration            | 724      |
| Real Det Return     | 546      |
| Real Sto Return     | 476      |
| Reward Loss         | -29.5    |
| Running Env Steps   | 362000   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 4.31     |
| Running Update Time | 724      |
----------------------------------
2025-02-01 14:44:34.901363 Eastern Standard Time
| Itration            | 725      |
| Real Det Return     | 519      |
| Real Sto Return     | 471      |
| Reward Loss         | -21.4    |
| Running Env Steps   | 362500   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 725      |
----------------------------------
2025-02-01 14:44:51.968399 Eastern Standard Time
| Itration            | 726      |
| Real Det Return     | 514      |
| Real Sto Return     | 468      |
| Reward Loss         | -25.7    |
| Running Env Steps   | 363000   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 726      |
----------------------------------
2025-02-01 14:45:08.942485 Eastern Standard Time
| Itration            | 727      |
| Real Det Return     | 520      |
| Real Sto Return     | 473      |
| Reward Loss         | -17      |
| Running Env Steps   | 363500   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 3.94     |
| Running Update Time | 727      |
----------------------------------
2025-02-01 14:45:25.827087 Eastern Standard Time
| Itration            | 728      |
| Real Det Return     | 506      |
| Real Sto Return     | 454      |
| Reward Loss         | -29.5    |
| Running Env Steps   | 364000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 728      |
----------------------------------
2025-02-01 14:45:42.790059 Eastern Standard Time
| Itration            | 729      |
| Real Det Return     | 517      |
| Real Sto Return     | 478      |
| Reward Loss         | -15.3    |
| Running Env Steps   | 364500   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 729      |
----------------------------------
2025-02-01 14:46:00.023146 Eastern Standard Time
| Itration            | 730      |
| Real Det Return     | 513      |
| Real Sto Return     | 471      |
| Reward Loss         | -28      |
| Running Env Steps   | 365000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 4.05     |
| Running Update Time | 730      |
----------------------------------
2025-02-01 14:46:17.718750 Eastern Standard Time
| Itration            | 731      |
| Real Det Return     | 539      |
| Real Sto Return     | 492      |
| Reward Loss         | -27.1    |
| Running Env Steps   | 365500   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 4.31     |
| Running Update Time | 731      |
----------------------------------
2025-02-01 14:46:34.728467 Eastern Standard Time
| Itration            | 732      |
| Real Det Return     | 531      |
| Real Sto Return     | 466      |
| Reward Loss         | -33.5    |
| Running Env Steps   | 366000   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 732      |
----------------------------------
2025-02-01 14:46:51.931977 Eastern Standard Time
| Itration            | 733      |
| Real Det Return     | 537      |
| Real Sto Return     | 472      |
| Reward Loss         | -38.9    |
| Running Env Steps   | 366500   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 3.99     |
| Running Update Time | 733      |
----------------------------------
2025-02-01 14:47:09.513218 Eastern Standard Time
| Itration            | 734      |
| Real Det Return     | 541      |
| Real Sto Return     | 477      |
| Reward Loss         | -26      |
| Running Env Steps   | 367000   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 734      |
----------------------------------
2025-02-01 14:47:26.837214 Eastern Standard Time
| Itration            | 735      |
| Real Det Return     | 539      |
| Real Sto Return     | 471      |
| Reward Loss         | -37.8    |
| Running Env Steps   | 367500   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 735      |
----------------------------------
2025-02-01 14:47:44.720291 Eastern Standard Time
| Itration            | 736      |
| Real Det Return     | 536      |
| Real Sto Return     | 490      |
| Reward Loss         | -36.4    |
| Running Env Steps   | 368000   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.41     |
| Running Update Time | 736      |
----------------------------------
2025-02-01 14:48:01.812452 Eastern Standard Time
| Itration            | 737      |
| Real Det Return     | 541      |
| Real Sto Return     | 475      |
| Reward Loss         | -26.8    |
| Running Env Steps   | 368500   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.04     |
| Running Update Time | 737      |
----------------------------------
2025-02-01 14:48:18.588791 Eastern Standard Time
| Itration            | 738      |
| Real Det Return     | 533      |
| Real Sto Return     | 465      |
| Reward Loss         | -32.9    |
| Running Env Steps   | 369000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 738      |
----------------------------------
2025-02-01 14:48:35.076473 Eastern Standard Time
| Itration            | 739      |
| Real Det Return     | 539      |
| Real Sto Return     | 478      |
| Reward Loss         | -26.5    |
| Running Env Steps   | 369500   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 739      |
----------------------------------
2025-02-01 14:48:51.984022 Eastern Standard Time
| Itration            | 740      |
| Real Det Return     | 537      |
| Real Sto Return     | 474      |
| Reward Loss         | -12.8    |
| Running Env Steps   | 370000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 740      |
----------------------------------
2025-02-01 14:49:10.919077 Eastern Standard Time
| Itration            | 741      |
| Real Det Return     | 520      |
| Real Sto Return     | 463      |
| Reward Loss         | -22.1    |
| Running Env Steps   | 370500   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 741      |
----------------------------------
2025-02-01 14:49:27.375950 Eastern Standard Time
| Itration            | 742      |
| Real Det Return     | 533      |
| Real Sto Return     | 490      |
| Reward Loss         | -14.6    |
| Running Env Steps   | 371000   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 742      |
----------------------------------
2025-02-01 14:49:43.756388 Eastern Standard Time
| Itration            | 743      |
| Real Det Return     | 506      |
| Real Sto Return     | 455      |
| Reward Loss         | -39.8    |
| Running Env Steps   | 371500   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 743      |
----------------------------------
2025-02-01 14:50:00.059657 Eastern Standard Time
| Itration            | 744      |
| Real Det Return     | 534      |
| Real Sto Return     | 487      |
| Reward Loss         | -23.1    |
| Running Env Steps   | 372000   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 744      |
----------------------------------
2025-02-01 14:50:15.808466 Eastern Standard Time
| Itration            | 745      |
| Real Det Return     | 522      |
| Real Sto Return     | 475      |
| Reward Loss         | -31.6    |
| Running Env Steps   | 372500   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 745      |
----------------------------------
2025-02-01 14:50:31.348803 Eastern Standard Time
| Itration            | 746      |
| Real Det Return     | 515      |
| Real Sto Return     | 466      |
| Reward Loss         | -25.3    |
| Running Env Steps   | 373000   |
| Running Forward KL  | -4.19    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 746      |
----------------------------------
2025-02-01 14:50:46.927093 Eastern Standard Time
| Itration            | 747      |
| Real Det Return     | 525      |
| Real Sto Return     | 474      |
| Reward Loss         | -30      |
| Running Env Steps   | 373500   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 3.79     |
| Running Update Time | 747      |
----------------------------------
2025-02-01 14:51:02.404417 Eastern Standard Time
| Itration            | 748      |
| Real Det Return     | 525      |
| Real Sto Return     | 482      |
| Reward Loss         | -15.4    |
| Running Env Steps   | 374000   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 4.2      |
| Running Update Time | 748      |
----------------------------------
2025-02-01 14:51:17.914737 Eastern Standard Time
| Itration            | 749      |
| Real Det Return     | 520      |
| Real Sto Return     | 465      |
| Reward Loss         | -33.9    |
| Running Env Steps   | 374500   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 749      |
----------------------------------
2025-02-01 14:51:33.542150 Eastern Standard Time
| Itration            | 750      |
| Real Det Return     | 529      |
| Real Sto Return     | 472      |
| Reward Loss         | -46.4    |
| Running Env Steps   | 375000   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 750      |
----------------------------------
2025-02-01 14:51:49.146973 Eastern Standard Time
| Itration            | 751      |
| Real Det Return     | 533      |
| Real Sto Return     | 472      |
| Reward Loss         | -24.4    |
| Running Env Steps   | 375500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 751      |
----------------------------------
2025-02-01 14:52:04.720088 Eastern Standard Time
| Itration            | 752      |
| Real Det Return     | 530      |
| Real Sto Return     | 476      |
| Reward Loss         | -27.7    |
| Running Env Steps   | 376000   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 3.72     |
| Running Update Time | 752      |
----------------------------------
2025-02-01 14:52:20.302136 Eastern Standard Time
| Itration            | 753      |
| Real Det Return     | 514      |
| Real Sto Return     | 462      |
| Reward Loss         | -42.5    |
| Running Env Steps   | 376500   |
| Running Forward KL  | -4.18    |
| Running Reverse KL  | 3.86     |
| Running Update Time | 753      |
----------------------------------
2025-02-01 14:52:35.780391 Eastern Standard Time
| Itration            | 754      |
| Real Det Return     | 517      |
| Real Sto Return     | 472      |
| Reward Loss         | -36      |
| Running Env Steps   | 377000   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 3.52     |
| Running Update Time | 754      |
----------------------------------
2025-02-01 14:52:51.404401 Eastern Standard Time
| Itration            | 755      |
| Real Det Return     | 539      |
| Real Sto Return     | 481      |
| Reward Loss         | -20.3    |
| Running Env Steps   | 377500   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 755      |
----------------------------------
2025-02-01 14:53:06.918457 Eastern Standard Time
| Itration            | 756      |
| Real Det Return     | 543      |
| Real Sto Return     | 497      |
| Reward Loss         | -13      |
| Running Env Steps   | 378000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 756      |
----------------------------------
2025-02-01 14:53:22.415139 Eastern Standard Time
| Itration            | 757      |
| Real Det Return     | 531      |
| Real Sto Return     | 486      |
| Reward Loss         | -34.5    |
| Running Env Steps   | 378500   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 757      |
----------------------------------
2025-02-01 14:53:38.067351 Eastern Standard Time
| Itration            | 758      |
| Real Det Return     | 537      |
| Real Sto Return     | 492      |
| Reward Loss         | -19.6    |
| Running Env Steps   | 379000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 758      |
----------------------------------
2025-02-01 14:53:53.661711 Eastern Standard Time
| Itration            | 759      |
| Real Det Return     | 543      |
| Real Sto Return     | 487      |
| Reward Loss         | -16.5    |
| Running Env Steps   | 379500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 759      |
----------------------------------
2025-02-01 14:54:09.425280 Eastern Standard Time
| Itration            | 760      |
| Real Det Return     | 520      |
| Real Sto Return     | 485      |
| Reward Loss         | -23.6    |
| Running Env Steps   | 380000   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 760      |
----------------------------------
2025-02-01 14:54:25.049331 Eastern Standard Time
| Itration            | 761      |
| Real Det Return     | 534      |
| Real Sto Return     | 496      |
| Reward Loss         | -26.8    |
| Running Env Steps   | 380500   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 761      |
----------------------------------
2025-02-01 14:54:40.631995 Eastern Standard Time
| Itration            | 762      |
| Real Det Return     | 534      |
| Real Sto Return     | 478      |
| Reward Loss         | -23.9    |
| Running Env Steps   | 381000   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 762      |
----------------------------------
2025-02-01 14:54:56.170862 Eastern Standard Time
| Itration            | 763      |
| Real Det Return     | 540      |
| Real Sto Return     | 481      |
| Reward Loss         | -18.9    |
| Running Env Steps   | 381500   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 763      |
----------------------------------
2025-02-01 14:55:12.685709 Eastern Standard Time
| Itration            | 764      |
| Real Det Return     | 520      |
| Real Sto Return     | 473      |
| Reward Loss         | -29.8    |
| Running Env Steps   | 382000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 764      |
----------------------------------
2025-02-01 14:55:29.913734 Eastern Standard Time
| Itration            | 765      |
| Real Det Return     | 526      |
| Real Sto Return     | 481      |
| Reward Loss         | -25.3    |
| Running Env Steps   | 382500   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 3.75     |
| Running Update Time | 765      |
----------------------------------
2025-02-01 14:55:46.621965 Eastern Standard Time
| Itration            | 766      |
| Real Det Return     | 539      |
| Real Sto Return     | 483      |
| Reward Loss         | -31.7    |
| Running Env Steps   | 383000   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.1      |
| Running Update Time | 766      |
----------------------------------
2025-02-01 14:56:02.496823 Eastern Standard Time
| Itration            | 767      |
| Real Det Return     | 543      |
| Real Sto Return     | 494      |
| Reward Loss         | -13.3    |
| Running Env Steps   | 383500   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 767      |
----------------------------------
2025-02-01 14:56:18.567463 Eastern Standard Time
| Itration            | 768      |
| Real Det Return     | 545      |
| Real Sto Return     | 479      |
| Reward Loss         | -20.8    |
| Running Env Steps   | 384000   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 4.19     |
| Running Update Time | 768      |
----------------------------------
2025-02-01 14:56:35.588064 Eastern Standard Time
| Itration            | 769      |
| Real Det Return     | 532      |
| Real Sto Return     | 492      |
| Reward Loss         | -34.6    |
| Running Env Steps   | 384500   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 769      |
----------------------------------
2025-02-01 14:56:51.545731 Eastern Standard Time
| Itration            | 770      |
| Real Det Return     | 540      |
| Real Sto Return     | 472      |
| Reward Loss         | -29.8    |
| Running Env Steps   | 385000   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 770      |
----------------------------------
2025-02-01 14:57:07.310996 Eastern Standard Time
| Itration            | 771      |
| Real Det Return     | 508      |
| Real Sto Return     | 453      |
| Reward Loss         | -22.2    |
| Running Env Steps   | 385500   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 4.62     |
| Running Update Time | 771      |
----------------------------------
2025-02-01 14:57:23.184003 Eastern Standard Time
| Itration            | 772      |
| Real Det Return     | 529      |
| Real Sto Return     | 491      |
| Reward Loss         | -31.3    |
| Running Env Steps   | 386000   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 3.75     |
| Running Update Time | 772      |
----------------------------------
2025-02-01 14:57:38.913658 Eastern Standard Time
| Itration            | 773      |
| Real Det Return     | 540      |
| Real Sto Return     | 491      |
| Reward Loss         | -26.6    |
| Running Env Steps   | 386500   |
| Running Forward KL  | -4.14    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 773      |
----------------------------------
2025-02-01 14:57:55.426973 Eastern Standard Time
| Itration            | 774      |
| Real Det Return     | 538      |
| Real Sto Return     | 488      |
| Reward Loss         | -16.9    |
| Running Env Steps   | 387000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 774      |
----------------------------------
2025-02-01 14:58:12.300724 Eastern Standard Time
| Itration            | 775      |
| Real Det Return     | 531      |
| Real Sto Return     | 468      |
| Reward Loss         | -10.9    |
| Running Env Steps   | 387500   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 775      |
----------------------------------
2025-02-01 14:58:28.438747 Eastern Standard Time
| Itration            | 776      |
| Real Det Return     | 549      |
| Real Sto Return     | 487      |
| Reward Loss         | -14.2    |
| Running Env Steps   | 388000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 776      |
----------------------------------
2025-02-01 14:58:44.152777 Eastern Standard Time
| Itration            | 777      |
| Real Det Return     | 526      |
| Real Sto Return     | 486      |
| Reward Loss         | -13.9    |
| Running Env Steps   | 388500   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.27     |
| Running Update Time | 777      |
----------------------------------
2025-02-01 14:58:59.824622 Eastern Standard Time
| Itration            | 778      |
| Real Det Return     | 525      |
| Real Sto Return     | 467      |
| Reward Loss         | -27.8    |
| Running Env Steps   | 389000   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 4.12     |
| Running Update Time | 778      |
----------------------------------
2025-02-01 14:59:15.508142 Eastern Standard Time
| Itration            | 779      |
| Real Det Return     | 528      |
| Real Sto Return     | 480      |
| Reward Loss         | -12.6    |
| Running Env Steps   | 389500   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 3.99     |
| Running Update Time | 779      |
----------------------------------
2025-02-01 14:59:31.216889 Eastern Standard Time
| Itration            | 780      |
| Real Det Return     | 530      |
| Real Sto Return     | 478      |
| Reward Loss         | -24      |
| Running Env Steps   | 390000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 780      |
----------------------------------
2025-02-01 14:59:47.001549 Eastern Standard Time
| Itration            | 781      |
| Real Det Return     | 516      |
| Real Sto Return     | 468      |
| Reward Loss         | -31      |
| Running Env Steps   | 390500   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 781      |
----------------------------------
2025-02-01 15:00:02.863445 Eastern Standard Time
| Itration            | 782      |
| Real Det Return     | 535      |
| Real Sto Return     | 487      |
| Reward Loss         | -26.7    |
| Running Env Steps   | 391000   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 782      |
----------------------------------
2025-02-01 15:00:18.657580 Eastern Standard Time
| Itration            | 783      |
| Real Det Return     | 537      |
| Real Sto Return     | 477      |
| Reward Loss         | -26.1    |
| Running Env Steps   | 391500   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 783      |
----------------------------------
2025-02-01 15:00:34.361030 Eastern Standard Time
| Itration            | 784      |
| Real Det Return     | 532      |
| Real Sto Return     | 463      |
| Reward Loss         | -42.5    |
| Running Env Steps   | 392000   |
| Running Forward KL  | -4.16    |
| Running Reverse KL  | 4        |
| Running Update Time | 784      |
----------------------------------
2025-02-01 15:00:50.075417 Eastern Standard Time
| Itration            | 785      |
| Real Det Return     | 520      |
| Real Sto Return     | 461      |
| Reward Loss         | -23.1    |
| Running Env Steps   | 392500   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 785      |
----------------------------------
2025-02-01 15:01:05.784403 Eastern Standard Time
| Itration            | 786      |
| Real Det Return     | 530      |
| Real Sto Return     | 469      |
| Reward Loss         | -26.7    |
| Running Env Steps   | 393000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 786      |
----------------------------------
2025-02-01 15:01:21.425707 Eastern Standard Time
| Itration            | 787      |
| Real Det Return     | 548      |
| Real Sto Return     | 488      |
| Reward Loss         | -21.8    |
| Running Env Steps   | 393500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 787      |
----------------------------------
2025-02-01 15:01:37.208248 Eastern Standard Time
| Itration            | 788      |
| Real Det Return     | 514      |
| Real Sto Return     | 470      |
| Reward Loss         | -32.8    |
| Running Env Steps   | 394000   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 788      |
----------------------------------
2025-02-01 15:01:52.867458 Eastern Standard Time
| Itration            | 789      |
| Real Det Return     | 519      |
| Real Sto Return     | 479      |
| Reward Loss         | -20.9    |
| Running Env Steps   | 394500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 789      |
----------------------------------
2025-02-01 15:02:08.579789 Eastern Standard Time
| Itration            | 790      |
| Real Det Return     | 502      |
| Real Sto Return     | 457      |
| Reward Loss         | -25.4    |
| Running Env Steps   | 395000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 790      |
----------------------------------
2025-02-01 15:02:24.301644 Eastern Standard Time
| Itration            | 791      |
| Real Det Return     | 520      |
| Real Sto Return     | 482      |
| Reward Loss         | -7.4     |
| Running Env Steps   | 395500   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 4.55     |
| Running Update Time | 791      |
----------------------------------
2025-02-01 15:02:40.026035 Eastern Standard Time
| Itration            | 792      |
| Real Det Return     | 523      |
| Real Sto Return     | 475      |
| Reward Loss         | -22.6    |
| Running Env Steps   | 396000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.35     |
| Running Update Time | 792      |
----------------------------------
2025-02-01 15:02:55.691474 Eastern Standard Time
| Itration            | 793      |
| Real Det Return     | 527      |
| Real Sto Return     | 465      |
| Reward Loss         | -17.2    |
| Running Env Steps   | 396500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 793      |
----------------------------------
2025-02-01 15:03:11.503800 Eastern Standard Time
| Itration            | 794      |
| Real Det Return     | 530      |
| Real Sto Return     | 488      |
| Reward Loss         | -6.74    |
| Running Env Steps   | 397000   |
| Running Forward KL  | -4.21    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 794      |
----------------------------------
2025-02-01 15:03:27.138294 Eastern Standard Time
| Itration            | 795      |
| Real Det Return     | 524      |
| Real Sto Return     | 473      |
| Reward Loss         | -22.4    |
| Running Env Steps   | 397500   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 795      |
----------------------------------
2025-02-01 15:03:42.835618 Eastern Standard Time
| Itration            | 796      |
| Real Det Return     | 520      |
| Real Sto Return     | 468      |
| Reward Loss         | -13      |
| Running Env Steps   | 398000   |
| Running Forward KL  | -4.87    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 796      |
----------------------------------
2025-02-01 15:03:58.524418 Eastern Standard Time
| Itration            | 797      |
| Real Det Return     | 525      |
| Real Sto Return     | 476      |
| Reward Loss         | -37.6    |
| Running Env Steps   | 398500   |
| Running Forward KL  | -3.96    |
| Running Reverse KL  | 4.12     |
| Running Update Time | 797      |
----------------------------------
2025-02-01 15:04:14.177501 Eastern Standard Time
| Itration            | 798      |
| Real Det Return     | 524      |
| Real Sto Return     | 472      |
| Reward Loss         | -32.3    |
| Running Env Steps   | 399000   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 798      |
----------------------------------
2025-02-01 15:04:29.901011 Eastern Standard Time
| Itration            | 799      |
| Real Det Return     | 524      |
| Real Sto Return     | 482      |
| Reward Loss         | -29.4    |
| Running Env Steps   | 399500   |
| Running Forward KL  | -3.93    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 799      |
----------------------------------
2025-02-01 15:04:45.602053 Eastern Standard Time
| Itration            | 800      |
| Real Det Return     | 543      |
| Real Sto Return     | 494      |
| Reward Loss         | -19.9    |
| Running Env Steps   | 400000   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.31     |
| Running Update Time | 800      |
----------------------------------
2025-02-01 15:05:01.347237 Eastern Standard Time
| Itration            | 801      |
| Real Det Return     | 528      |
| Real Sto Return     | 481      |
| Reward Loss         | -30      |
| Running Env Steps   | 400500   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 801      |
----------------------------------
2025-02-01 15:05:17.194452 Eastern Standard Time
| Itration            | 802      |
| Real Det Return     | 530      |
| Real Sto Return     | 480      |
| Reward Loss         | -19.2    |
| Running Env Steps   | 401000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 802      |
----------------------------------
2025-02-01 15:05:32.875288 Eastern Standard Time
| Itration            | 803      |
| Real Det Return     | 527      |
| Real Sto Return     | 472      |
| Reward Loss         | -32.2    |
| Running Env Steps   | 401500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 803      |
----------------------------------
2025-02-01 15:05:48.533406 Eastern Standard Time
| Itration            | 804      |
| Real Det Return     | 537      |
| Real Sto Return     | 480      |
| Reward Loss         | -20.6    |
| Running Env Steps   | 402000   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 804      |
----------------------------------
2025-02-01 15:06:04.212929 Eastern Standard Time
| Itration            | 805      |
| Real Det Return     | 536      |
| Real Sto Return     | 484      |
| Reward Loss         | -36      |
| Running Env Steps   | 402500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 805      |
----------------------------------
2025-02-01 15:06:19.808080 Eastern Standard Time
| Itration            | 806      |
| Real Det Return     | 492      |
| Real Sto Return     | 460      |
| Reward Loss         | -35.6    |
| Running Env Steps   | 403000   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 806      |
----------------------------------
2025-02-01 15:06:35.562498 Eastern Standard Time
| Itration            | 807      |
| Real Det Return     | 530      |
| Real Sto Return     | 471      |
| Reward Loss         | -32.2    |
| Running Env Steps   | 403500   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 807      |
----------------------------------
2025-02-01 15:06:51.262592 Eastern Standard Time
| Itration            | 808      |
| Real Det Return     | 533      |
| Real Sto Return     | 473      |
| Reward Loss         | -14.4    |
| Running Env Steps   | 404000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 808      |
----------------------------------
2025-02-01 15:07:06.943786 Eastern Standard Time
| Itration            | 809      |
| Real Det Return     | 535      |
| Real Sto Return     | 477      |
| Reward Loss         | -18.3    |
| Running Env Steps   | 404500   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 3.85     |
| Running Update Time | 809      |
----------------------------------
2025-02-01 15:07:22.710746 Eastern Standard Time
| Itration            | 810      |
| Real Det Return     | 527      |
| Real Sto Return     | 479      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 405000   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 3.83     |
| Running Update Time | 810      |
----------------------------------
2025-02-01 15:07:38.406248 Eastern Standard Time
| Itration            | 811      |
| Real Det Return     | 531      |
| Real Sto Return     | 481      |
| Reward Loss         | -21.5    |
| Running Env Steps   | 405500   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 811      |
----------------------------------
2025-02-01 15:07:54.064602 Eastern Standard Time
| Itration            | 812      |
| Real Det Return     | 540      |
| Real Sto Return     | 483      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 406000   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 812      |
----------------------------------
2025-02-01 15:08:09.729449 Eastern Standard Time
| Itration            | 813      |
| Real Det Return     | 526      |
| Real Sto Return     | 477      |
| Reward Loss         | -32.1    |
| Running Env Steps   | 406500   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.05     |
| Running Update Time | 813      |
----------------------------------
2025-02-01 15:08:25.409230 Eastern Standard Time
| Itration            | 814      |
| Real Det Return     | 533      |
| Real Sto Return     | 480      |
| Reward Loss         | -26.3    |
| Running Env Steps   | 407000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 3.71     |
| Running Update Time | 814      |
----------------------------------
2025-02-01 15:08:41.089447 Eastern Standard Time
| Itration            | 815      |
| Real Det Return     | 537      |
| Real Sto Return     | 479      |
| Reward Loss         | -35      |
| Running Env Steps   | 407500   |
| Running Forward KL  | -4.07    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 815      |
----------------------------------
2025-02-01 15:08:56.763623 Eastern Standard Time
| Itration            | 816      |
| Real Det Return     | 531      |
| Real Sto Return     | 483      |
| Reward Loss         | -26.9    |
| Running Env Steps   | 408000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 816      |
----------------------------------
2025-02-01 15:09:12.871350 Eastern Standard Time
| Itration            | 817      |
| Real Det Return     | 534      |
| Real Sto Return     | 482      |
| Reward Loss         | -32.5    |
| Running Env Steps   | 408500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 3.84     |
| Running Update Time | 817      |
----------------------------------
2025-02-01 15:09:28.576010 Eastern Standard Time
| Itration            | 818      |
| Real Det Return     | 540      |
| Real Sto Return     | 490      |
| Reward Loss         | -15      |
| Running Env Steps   | 409000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 3.82     |
| Running Update Time | 818      |
----------------------------------
2025-02-01 15:09:44.152177 Eastern Standard Time
| Itration            | 819      |
| Real Det Return     | 539      |
| Real Sto Return     | 487      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 409500   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 819      |
----------------------------------
2025-02-01 15:09:59.850306 Eastern Standard Time
| Itration            | 820      |
| Real Det Return     | 528      |
| Real Sto Return     | 487      |
| Reward Loss         | -19.5    |
| Running Env Steps   | 410000   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 820      |
----------------------------------
2025-02-01 15:10:15.573731 Eastern Standard Time
| Itration            | 821      |
| Real Det Return     | 530      |
| Real Sto Return     | 478      |
| Reward Loss         | -28      |
| Running Env Steps   | 410500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 821      |
----------------------------------
2025-02-01 15:10:31.293591 Eastern Standard Time
| Itration            | 822      |
| Real Det Return     | 546      |
| Real Sto Return     | 484      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 411000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 822      |
----------------------------------
2025-02-01 15:10:46.861992 Eastern Standard Time
| Itration            | 823      |
| Real Det Return     | 545      |
| Real Sto Return     | 484      |
| Reward Loss         | -19.1    |
| Running Env Steps   | 411500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 823      |
----------------------------------
2025-02-01 15:11:02.502482 Eastern Standard Time
| Itration            | 824      |
| Real Det Return     | 530      |
| Real Sto Return     | 486      |
| Reward Loss         | -32.8    |
| Running Env Steps   | 412000   |
| Running Forward KL  | -4.32    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 824      |
----------------------------------
2025-02-01 15:11:18.144578 Eastern Standard Time
| Itration            | 825      |
| Real Det Return     | 522      |
| Real Sto Return     | 464      |
| Reward Loss         | -18.7    |
| Running Env Steps   | 412500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 825      |
----------------------------------
2025-02-01 15:11:33.870324 Eastern Standard Time
| Itration            | 826      |
| Real Det Return     | 526      |
| Real Sto Return     | 482      |
| Reward Loss         | -31.6    |
| Running Env Steps   | 413000   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 826      |
----------------------------------
2025-02-01 15:11:49.549857 Eastern Standard Time
| Itration            | 827      |
| Real Det Return     | 534      |
| Real Sto Return     | 478      |
| Reward Loss         | -8.45    |
| Running Env Steps   | 413500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 827      |
----------------------------------
2025-02-01 15:12:05.285267 Eastern Standard Time
| Itration            | 828      |
| Real Det Return     | 511      |
| Real Sto Return     | 478      |
| Reward Loss         | -22.2    |
| Running Env Steps   | 414000   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 828      |
----------------------------------
2025-02-01 15:12:20.918599 Eastern Standard Time
| Itration            | 829      |
| Real Det Return     | 527      |
| Real Sto Return     | 492      |
| Reward Loss         | -24.7    |
| Running Env Steps   | 414500   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 829      |
----------------------------------
2025-02-01 15:12:36.592289 Eastern Standard Time
| Itration            | 830      |
| Real Det Return     | 541      |
| Real Sto Return     | 493      |
| Reward Loss         | -13.8    |
| Running Env Steps   | 415000   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 830      |
----------------------------------
2025-02-01 15:12:52.330208 Eastern Standard Time
| Itration            | 831      |
| Real Det Return     | 534      |
| Real Sto Return     | 482      |
| Reward Loss         | -22      |
| Running Env Steps   | 415500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.31     |
| Running Update Time | 831      |
----------------------------------
2025-02-01 15:13:08.080282 Eastern Standard Time
| Itration            | 832      |
| Real Det Return     | 530      |
| Real Sto Return     | 495      |
| Reward Loss         | -10.9    |
| Running Env Steps   | 416000   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 832      |
----------------------------------
2025-02-01 15:13:23.721856 Eastern Standard Time
| Itration            | 833      |
| Real Det Return     | 533      |
| Real Sto Return     | 483      |
| Reward Loss         | -21.4    |
| Running Env Steps   | 416500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 833      |
----------------------------------
2025-02-01 15:13:39.440508 Eastern Standard Time
| Itration            | 834      |
| Real Det Return     | 530      |
| Real Sto Return     | 480      |
| Reward Loss         | -3.64    |
| Running Env Steps   | 417000   |
| Running Forward KL  | -4.29    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 834      |
----------------------------------
2025-02-01 15:13:55.105318 Eastern Standard Time
| Itration            | 835      |
| Real Det Return     | 543      |
| Real Sto Return     | 485      |
| Reward Loss         | -31.5    |
| Running Env Steps   | 417500   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 835      |
----------------------------------
2025-02-01 15:14:10.766851 Eastern Standard Time
| Itration            | 836      |
| Real Det Return     | 541      |
| Real Sto Return     | 487      |
| Reward Loss         | -28.4    |
| Running Env Steps   | 418000   |
| Running Forward KL  | -4.32    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 836      |
----------------------------------
2025-02-01 15:14:26.409099 Eastern Standard Time
| Itration            | 837      |
| Real Det Return     | 523      |
| Real Sto Return     | 479      |
| Reward Loss         | -25      |
| Running Env Steps   | 418500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 837      |
----------------------------------
2025-02-01 15:14:42.057887 Eastern Standard Time
| Itration            | 838      |
| Real Det Return     | 515      |
| Real Sto Return     | 466      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 419000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 838      |
----------------------------------
2025-02-01 15:14:57.687181 Eastern Standard Time
| Itration            | 839      |
| Real Det Return     | 522      |
| Real Sto Return     | 480      |
| Reward Loss         | -21.8    |
| Running Env Steps   | 419500   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 839      |
----------------------------------
2025-02-01 15:15:13.388912 Eastern Standard Time
| Itration            | 840      |
| Real Det Return     | 517      |
| Real Sto Return     | 474      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 420000   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 840      |
----------------------------------
2025-02-01 15:15:29.138720 Eastern Standard Time
| Itration            | 841      |
| Real Det Return     | 532      |
| Real Sto Return     | 478      |
| Reward Loss         | -18.7    |
| Running Env Steps   | 420500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 841      |
----------------------------------
2025-02-01 15:15:44.844634 Eastern Standard Time
| Itration            | 842      |
| Real Det Return     | 534      |
| Real Sto Return     | 480      |
| Reward Loss         | -13.5    |
| Running Env Steps   | 421000   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 842      |
----------------------------------
2025-02-01 15:16:00.536885 Eastern Standard Time
| Itration            | 843      |
| Real Det Return     | 535      |
| Real Sto Return     | 480      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 421500   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 843      |
----------------------------------
2025-02-01 15:16:16.234733 Eastern Standard Time
| Itration            | 844      |
| Real Det Return     | 527      |
| Real Sto Return     | 478      |
| Reward Loss         | -17.1    |
| Running Env Steps   | 422000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 844      |
----------------------------------
2025-02-01 15:16:31.924509 Eastern Standard Time
| Itration            | 845      |
| Real Det Return     | 510      |
| Real Sto Return     | 471      |
| Reward Loss         | -25      |
| Running Env Steps   | 422500   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.63     |
| Running Update Time | 845      |
----------------------------------
2025-02-01 15:16:47.652595 Eastern Standard Time
| Itration            | 846      |
| Real Det Return     | 533      |
| Real Sto Return     | 488      |
| Reward Loss         | -18.4    |
| Running Env Steps   | 423000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 846      |
----------------------------------
2025-02-01 15:17:03.357065 Eastern Standard Time
| Itration            | 847      |
| Real Det Return     | 540      |
| Real Sto Return     | 482      |
| Reward Loss         | -22      |
| Running Env Steps   | 423500   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 847      |
----------------------------------
2025-02-01 15:17:19.079377 Eastern Standard Time
| Itration            | 848      |
| Real Det Return     | 538      |
| Real Sto Return     | 482      |
| Reward Loss         | -15.6    |
| Running Env Steps   | 424000   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 848      |
----------------------------------
2025-02-01 15:17:34.746949 Eastern Standard Time
| Itration            | 849      |
| Real Det Return     | 547      |
| Real Sto Return     | 484      |
| Reward Loss         | -11.8    |
| Running Env Steps   | 424500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 849      |
----------------------------------
2025-02-01 15:17:50.425968 Eastern Standard Time
| Itration            | 850      |
| Real Det Return     | 528      |
| Real Sto Return     | 473      |
| Reward Loss         | -27.8    |
| Running Env Steps   | 425000   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 850      |
----------------------------------
2025-02-01 15:18:06.129209 Eastern Standard Time
| Itration            | 851      |
| Real Det Return     | 537      |
| Real Sto Return     | 479      |
| Reward Loss         | -18.2    |
| Running Env Steps   | 425500   |
| Running Forward KL  | -4.32    |
| Running Reverse KL  | 4.38     |
| Running Update Time | 851      |
----------------------------------
2025-02-01 15:18:21.753058 Eastern Standard Time
| Itration            | 852      |
| Real Det Return     | 534      |
| Real Sto Return     | 471      |
| Reward Loss         | -14.7    |
| Running Env Steps   | 426000   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 852      |
----------------------------------
2025-02-01 15:18:37.457502 Eastern Standard Time
| Itration            | 853      |
| Real Det Return     | 528      |
| Real Sto Return     | 489      |
| Reward Loss         | -23.4    |
| Running Env Steps   | 426500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 853      |
----------------------------------
2025-02-01 15:18:53.210070 Eastern Standard Time
| Itration            | 854      |
| Real Det Return     | 521      |
| Real Sto Return     | 468      |
| Reward Loss         | -24.1    |
| Running Env Steps   | 427000   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 4.19     |
| Running Update Time | 854      |
----------------------------------
2025-02-01 15:19:08.878711 Eastern Standard Time
| Itration            | 855      |
| Real Det Return     | 535      |
| Real Sto Return     | 477      |
| Reward Loss         | -28.5    |
| Running Env Steps   | 427500   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 855      |
----------------------------------
2025-02-01 15:19:24.585064 Eastern Standard Time
| Itration            | 856      |
| Real Det Return     | 533      |
| Real Sto Return     | 475      |
| Reward Loss         | -32      |
| Running Env Steps   | 428000   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 856      |
----------------------------------
2025-02-01 15:19:40.275900 Eastern Standard Time
| Itration            | 857      |
| Real Det Return     | 533      |
| Real Sto Return     | 482      |
| Reward Loss         | -5.47    |
| Running Env Steps   | 428500   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 857      |
----------------------------------
2025-02-01 15:19:55.987747 Eastern Standard Time
| Itration            | 858      |
| Real Det Return     | 514      |
| Real Sto Return     | 461      |
| Reward Loss         | -27.1    |
| Running Env Steps   | 429000   |
| Running Forward KL  | -4.23    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 858      |
----------------------------------
2025-02-01 15:20:11.763207 Eastern Standard Time
| Itration            | 859      |
| Real Det Return     | 526      |
| Real Sto Return     | 474      |
| Reward Loss         | -14.2    |
| Running Env Steps   | 429500   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.38     |
| Running Update Time | 859      |
----------------------------------
2025-02-01 15:20:27.453212 Eastern Standard Time
| Itration            | 860      |
| Real Det Return     | 533      |
| Real Sto Return     | 480      |
| Reward Loss         | -26.7    |
| Running Env Steps   | 430000   |
| Running Forward KL  | -4.32    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 860      |
----------------------------------
2025-02-01 15:20:43.130485 Eastern Standard Time
| Itration            | 861      |
| Real Det Return     | 532      |
| Real Sto Return     | 476      |
| Reward Loss         | -35      |
| Running Env Steps   | 430500   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.12     |
| Running Update Time | 861      |
----------------------------------
2025-02-01 15:20:58.813433 Eastern Standard Time
| Itration            | 862      |
| Real Det Return     | 531      |
| Real Sto Return     | 464      |
| Reward Loss         | -23      |
| Running Env Steps   | 431000   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 862      |
----------------------------------
2025-02-01 15:21:14.492834 Eastern Standard Time
| Itration            | 863      |
| Real Det Return     | 531      |
| Real Sto Return     | 480      |
| Reward Loss         | -28.4    |
| Running Env Steps   | 431500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 863      |
----------------------------------
2025-02-01 15:21:30.188327 Eastern Standard Time
| Itration            | 864      |
| Real Det Return     | 525      |
| Real Sto Return     | 483      |
| Reward Loss         | -22      |
| Running Env Steps   | 432000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 864      |
----------------------------------
2025-02-01 15:21:45.825243 Eastern Standard Time
| Itration            | 865      |
| Real Det Return     | 538      |
| Real Sto Return     | 486      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 432500   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 865      |
----------------------------------
2025-02-01 15:22:01.465268 Eastern Standard Time
| Itration            | 866      |
| Real Det Return     | 524      |
| Real Sto Return     | 467      |
| Reward Loss         | -38.6    |
| Running Env Steps   | 433000   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 4.11     |
| Running Update Time | 866      |
----------------------------------
2025-02-01 15:22:17.096953 Eastern Standard Time
| Itration            | 867      |
| Real Det Return     | 538      |
| Real Sto Return     | 481      |
| Reward Loss         | -18.3    |
| Running Env Steps   | 433500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 867      |
----------------------------------
2025-02-01 15:22:32.717936 Eastern Standard Time
| Itration            | 868      |
| Real Det Return     | 541      |
| Real Sto Return     | 490      |
| Reward Loss         | -20.1    |
| Running Env Steps   | 434000   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 868      |
----------------------------------
2025-02-01 15:22:48.348686 Eastern Standard Time
| Itration            | 869      |
| Real Det Return     | 515      |
| Real Sto Return     | 463      |
| Reward Loss         | -33.7    |
| Running Env Steps   | 434500   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 869      |
----------------------------------
2025-02-01 15:23:04.068360 Eastern Standard Time
| Itration            | 870      |
| Real Det Return     | 539      |
| Real Sto Return     | 475      |
| Reward Loss         | -25.8    |
| Running Env Steps   | 435000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.01     |
| Running Update Time | 870      |
----------------------------------
2025-02-01 15:23:19.758187 Eastern Standard Time
| Itration            | 871      |
| Real Det Return     | 536      |
| Real Sto Return     | 475      |
| Reward Loss         | -37      |
| Running Env Steps   | 435500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 871      |
----------------------------------
2025-02-01 15:23:35.447145 Eastern Standard Time
| Itration            | 872      |
| Real Det Return     | 535      |
| Real Sto Return     | 473      |
| Reward Loss         | -25.3    |
| Running Env Steps   | 436000   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 872      |
----------------------------------
2025-02-01 15:23:51.182972 Eastern Standard Time
| Itration            | 873      |
| Real Det Return     | 534      |
| Real Sto Return     | 490      |
| Reward Loss         | -31.6    |
| Running Env Steps   | 436500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 873      |
----------------------------------
2025-02-01 15:24:08.516237 Eastern Standard Time
| Itration            | 874      |
| Real Det Return     | 515      |
| Real Sto Return     | 460      |
| Reward Loss         | -26.6    |
| Running Env Steps   | 437000   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 874      |
----------------------------------
2025-02-01 15:24:24.078337 Eastern Standard Time
| Itration            | 875      |
| Real Det Return     | 497      |
| Real Sto Return     | 460      |
| Reward Loss         | -35.3    |
| Running Env Steps   | 437500   |
| Running Forward KL  | -3.74    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 875      |
----------------------------------
2025-02-01 15:24:39.685055 Eastern Standard Time
| Itration            | 876      |
| Real Det Return     | 520      |
| Real Sto Return     | 473      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 438000   |
| Running Forward KL  | -4.2     |
| Running Reverse KL  | 4.74     |
| Running Update Time | 876      |
----------------------------------
2025-02-01 15:24:55.300187 Eastern Standard Time
| Itration            | 877      |
| Real Det Return     | 514      |
| Real Sto Return     | 462      |
| Reward Loss         | -34      |
| Running Env Steps   | 438500   |
| Running Forward KL  | -4.26    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 877      |
----------------------------------
2025-02-01 15:25:11.376369 Eastern Standard Time
| Itration            | 878      |
| Real Det Return     | 543      |
| Real Sto Return     | 485      |
| Reward Loss         | -30      |
| Running Env Steps   | 439000   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 3.99     |
| Running Update Time | 878      |
----------------------------------
2025-02-01 15:25:27.713666 Eastern Standard Time
| Itration            | 879      |
| Real Det Return     | 509      |
| Real Sto Return     | 461      |
| Reward Loss         | -10.6    |
| Running Env Steps   | 439500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 4.45     |
| Running Update Time | 879      |
----------------------------------
2025-02-01 15:25:43.558778 Eastern Standard Time
| Itration            | 880      |
| Real Det Return     | 526      |
| Real Sto Return     | 473      |
| Reward Loss         | -34.5    |
| Running Env Steps   | 440000   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 3.99     |
| Running Update Time | 880      |
----------------------------------
2025-02-01 15:25:59.351757 Eastern Standard Time
| Itration            | 881      |
| Real Det Return     | 531      |
| Real Sto Return     | 475      |
| Reward Loss         | -15.1    |
| Running Env Steps   | 440500   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 5.27     |
| Running Update Time | 881      |
----------------------------------
2025-02-01 15:26:15.042246 Eastern Standard Time
| Itration            | 882      |
| Real Det Return     | 530      |
| Real Sto Return     | 490      |
| Reward Loss         | -25.9    |
| Running Env Steps   | 441000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 882      |
----------------------------------
2025-02-01 15:26:30.811131 Eastern Standard Time
| Itration            | 883      |
| Real Det Return     | 528      |
| Real Sto Return     | 482      |
| Reward Loss         | -32.6    |
| Running Env Steps   | 441500   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 3.73     |
| Running Update Time | 883      |
----------------------------------
2025-02-01 15:26:47.486317 Eastern Standard Time
| Itration            | 884      |
| Real Det Return     | 533      |
| Real Sto Return     | 483      |
| Reward Loss         | -29.1    |
| Running Env Steps   | 442000   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 884      |
----------------------------------
2025-02-01 15:27:03.448227 Eastern Standard Time
| Itration            | 885      |
| Real Det Return     | 519      |
| Real Sto Return     | 480      |
| Reward Loss         | -35.1    |
| Running Env Steps   | 442500   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 4.01     |
| Running Update Time | 885      |
----------------------------------
2025-02-01 15:27:22.136282 Eastern Standard Time
| Itration            | 886      |
| Real Det Return     | 534      |
| Real Sto Return     | 485      |
| Reward Loss         | -19.4    |
| Running Env Steps   | 443000   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 886      |
----------------------------------
2025-02-01 15:27:38.704998 Eastern Standard Time
| Itration            | 887      |
| Real Det Return     | 520      |
| Real Sto Return     | 485      |
| Reward Loss         | -34.8    |
| Running Env Steps   | 443500   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 887      |
----------------------------------
2025-02-01 15:27:57.482385 Eastern Standard Time
| Itration            | 888      |
| Real Det Return     | 543      |
| Real Sto Return     | 496      |
| Reward Loss         | -16      |
| Running Env Steps   | 444000   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 888      |
----------------------------------
2025-02-01 15:28:15.272862 Eastern Standard Time
| Itration            | 889      |
| Real Det Return     | 537      |
| Real Sto Return     | 482      |
| Reward Loss         | -27.2    |
| Running Env Steps   | 444500   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 4.27     |
| Running Update Time | 889      |
----------------------------------
2025-02-01 15:28:32.618556 Eastern Standard Time
| Itration            | 890      |
| Real Det Return     | 528      |
| Real Sto Return     | 479      |
| Reward Loss         | -23.5    |
| Running Env Steps   | 445000   |
| Running Forward KL  | -3.98    |
| Running Reverse KL  | 4.28     |
| Running Update Time | 890      |
----------------------------------
2025-02-01 15:28:48.643424 Eastern Standard Time
| Itration            | 891      |
| Real Det Return     | 527      |
| Real Sto Return     | 474      |
| Reward Loss         | -29.1    |
| Running Env Steps   | 445500   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 891      |
----------------------------------
2025-02-01 15:29:04.525078 Eastern Standard Time
| Itration            | 892      |
| Real Det Return     | 538      |
| Real Sto Return     | 481      |
| Reward Loss         | -19.5    |
| Running Env Steps   | 446000   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 892      |
----------------------------------
2025-02-01 15:29:20.405734 Eastern Standard Time
| Itration            | 893      |
| Real Det Return     | 521      |
| Real Sto Return     | 485      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 446500   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 893      |
----------------------------------
2025-02-01 15:29:36.066367 Eastern Standard Time
| Itration            | 894      |
| Real Det Return     | 544      |
| Real Sto Return     | 497      |
| Reward Loss         | -20.2    |
| Running Env Steps   | 447000   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 3.92     |
| Running Update Time | 894      |
----------------------------------
2025-02-01 15:29:52.037864 Eastern Standard Time
| Itration            | 895      |
| Real Det Return     | 540      |
| Real Sto Return     | 487      |
| Reward Loss         | -21.9    |
| Running Env Steps   | 447500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 895      |
----------------------------------
2025-02-01 15:30:07.707903 Eastern Standard Time
| Itration            | 896      |
| Real Det Return     | 526      |
| Real Sto Return     | 477      |
| Reward Loss         | -3.86    |
| Running Env Steps   | 448000   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 5.08     |
| Running Update Time | 896      |
----------------------------------
2025-02-01 15:30:23.282890 Eastern Standard Time
| Itration            | 897      |
| Real Det Return     | 533      |
| Real Sto Return     | 490      |
| Reward Loss         | -15.6    |
| Running Env Steps   | 448500   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 897      |
----------------------------------
2025-02-01 15:30:38.887166 Eastern Standard Time
| Itration            | 898      |
| Real Det Return     | 534      |
| Real Sto Return     | 482      |
| Reward Loss         | -31.4    |
| Running Env Steps   | 449000   |
| Running Forward KL  | -4.45    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 898      |
----------------------------------
2025-02-01 15:30:54.464483 Eastern Standard Time
| Itration            | 899      |
| Real Det Return     | 519      |
| Real Sto Return     | 483      |
| Reward Loss         | -33.4    |
| Running Env Steps   | 449500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 3.95     |
| Running Update Time | 899      |
----------------------------------
2025-02-01 15:31:09.996770 Eastern Standard Time
| Itration            | 900      |
| Real Det Return     | 524      |
| Real Sto Return     | 476      |
| Reward Loss         | -31.7    |
| Running Env Steps   | 450000   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 900      |
----------------------------------
2025-02-01 15:31:25.596003 Eastern Standard Time
| Itration            | 901      |
| Real Det Return     | 536      |
| Real Sto Return     | 482      |
| Reward Loss         | -25.8    |
| Running Env Steps   | 450500   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 4.02     |
| Running Update Time | 901      |
----------------------------------
2025-02-01 15:31:41.164384 Eastern Standard Time
| Itration            | 902      |
| Real Det Return     | 519      |
| Real Sto Return     | 469      |
| Reward Loss         | -26.6    |
| Running Env Steps   | 451000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 902      |
----------------------------------
2025-02-01 15:31:56.691932 Eastern Standard Time
| Itration            | 903      |
| Real Det Return     | 538      |
| Real Sto Return     | 481      |
| Reward Loss         | -28.4    |
| Running Env Steps   | 451500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 903      |
----------------------------------
2025-02-01 15:32:12.337855 Eastern Standard Time
| Itration            | 904      |
| Real Det Return     | 527      |
| Real Sto Return     | 477      |
| Reward Loss         | -21.2    |
| Running Env Steps   | 452000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 4.22     |
| Running Update Time | 904      |
----------------------------------
2025-02-01 15:32:27.844461 Eastern Standard Time
| Itration            | 905      |
| Real Det Return     | 515      |
| Real Sto Return     | 468      |
| Reward Loss         | -38.9    |
| Running Env Steps   | 452500   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 905      |
----------------------------------
2025-02-01 15:32:43.714857 Eastern Standard Time
| Itration            | 906      |
| Real Det Return     | 523      |
| Real Sto Return     | 487      |
| Reward Loss         | -17.5    |
| Running Env Steps   | 453000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 906      |
----------------------------------
2025-02-01 15:32:59.575329 Eastern Standard Time
| Itration            | 907      |
| Real Det Return     | 544      |
| Real Sto Return     | 503      |
| Reward Loss         | -18.2    |
| Running Env Steps   | 453500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.52     |
| Running Update Time | 907      |
----------------------------------
2025-02-01 15:33:15.255114 Eastern Standard Time
| Itration            | 908      |
| Real Det Return     | 536      |
| Real Sto Return     | 485      |
| Reward Loss         | -21.5    |
| Running Env Steps   | 454000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 3.93     |
| Running Update Time | 908      |
----------------------------------
2025-02-01 15:33:30.873990 Eastern Standard Time
| Itration            | 909      |
| Real Det Return     | 534      |
| Real Sto Return     | 471      |
| Reward Loss         | -15.8    |
| Running Env Steps   | 454500   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 4.94     |
| Running Update Time | 909      |
----------------------------------
2025-02-01 15:33:46.504747 Eastern Standard Time
| Itration            | 910      |
| Real Det Return     | 538      |
| Real Sto Return     | 478      |
| Reward Loss         | -19.3    |
| Running Env Steps   | 455000   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.55     |
| Running Update Time | 910      |
----------------------------------
2025-02-01 15:34:02.093871 Eastern Standard Time
| Itration            | 911      |
| Real Det Return     | 532      |
| Real Sto Return     | 487      |
| Reward Loss         | -11.2    |
| Running Env Steps   | 455500   |
| Running Forward KL  | -4.11    |
| Running Reverse KL  | 4.01     |
| Running Update Time | 911      |
----------------------------------
2025-02-01 15:34:17.748432 Eastern Standard Time
| Itration            | 912      |
| Real Det Return     | 523      |
| Real Sto Return     | 482      |
| Reward Loss         | -24      |
| Running Env Steps   | 456000   |
| Running Forward KL  | -4.22    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 912      |
----------------------------------
2025-02-01 15:34:33.337692 Eastern Standard Time
| Itration            | 913      |
| Real Det Return     | 543      |
| Real Sto Return     | 487      |
| Reward Loss         | -20.4    |
| Running Env Steps   | 456500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 913      |
----------------------------------
2025-02-01 15:34:48.949397 Eastern Standard Time
| Itration            | 914      |
| Real Det Return     | 542      |
| Real Sto Return     | 479      |
| Reward Loss         | -23.2    |
| Running Env Steps   | 457000   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 914      |
----------------------------------
2025-02-01 15:35:04.535856 Eastern Standard Time
| Itration            | 915      |
| Real Det Return     | 517      |
| Real Sto Return     | 484      |
| Reward Loss         | -22.2    |
| Running Env Steps   | 457500   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 915      |
----------------------------------
2025-02-01 15:35:20.202644 Eastern Standard Time
| Itration            | 916      |
| Real Det Return     | 523      |
| Real Sto Return     | 483      |
| Reward Loss         | -25.9    |
| Running Env Steps   | 458000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 3.56     |
| Running Update Time | 916      |
----------------------------------
2025-02-01 15:35:35.793779 Eastern Standard Time
| Itration            | 917      |
| Real Det Return     | 518      |
| Real Sto Return     | 481      |
| Reward Loss         | -28.8    |
| Running Env Steps   | 458500   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 4.55     |
| Running Update Time | 917      |
----------------------------------
2025-02-01 15:35:51.451671 Eastern Standard Time
| Itration            | 918      |
| Real Det Return     | 529      |
| Real Sto Return     | 479      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 459000   |
| Running Forward KL  | -4.35    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 918      |
----------------------------------
2025-02-01 15:36:07.108212 Eastern Standard Time
| Itration            | 919      |
| Real Det Return     | 542      |
| Real Sto Return     | 498      |
| Reward Loss         | -22.1    |
| Running Env Steps   | 459500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 919      |
----------------------------------
2025-02-01 15:36:22.671037 Eastern Standard Time
| Itration            | 920      |
| Real Det Return     | 529      |
| Real Sto Return     | 475      |
| Reward Loss         | -36.3    |
| Running Env Steps   | 460000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 920      |
----------------------------------
2025-02-01 15:36:38.367878 Eastern Standard Time
| Itration            | 921      |
| Real Det Return     | 533      |
| Real Sto Return     | 491      |
| Reward Loss         | -24.5    |
| Running Env Steps   | 460500   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 921      |
----------------------------------
2025-02-01 15:36:54.046390 Eastern Standard Time
| Itration            | 922      |
| Real Det Return     | 529      |
| Real Sto Return     | 485      |
| Reward Loss         | -31.8    |
| Running Env Steps   | 461000   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 922      |
----------------------------------
2025-02-01 15:37:09.712032 Eastern Standard Time
| Itration            | 923      |
| Real Det Return     | 535      |
| Real Sto Return     | 483      |
| Reward Loss         | -16.8    |
| Running Env Steps   | 461500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.31     |
| Running Update Time | 923      |
----------------------------------
2025-02-01 15:37:25.457721 Eastern Standard Time
| Itration            | 924      |
| Real Det Return     | 535      |
| Real Sto Return     | 484      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 462000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 924      |
----------------------------------
2025-02-01 15:37:41.109277 Eastern Standard Time
| Itration            | 925      |
| Real Det Return     | 540      |
| Real Sto Return     | 487      |
| Reward Loss         | -21.8    |
| Running Env Steps   | 462500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 925      |
----------------------------------
2025-02-01 15:37:56.808621 Eastern Standard Time
| Itration            | 926      |
| Real Det Return     | 543      |
| Real Sto Return     | 484      |
| Reward Loss         | -40.7    |
| Running Env Steps   | 463000   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 926      |
----------------------------------
2025-02-01 15:38:12.436022 Eastern Standard Time
| Itration            | 927      |
| Real Det Return     | 553      |
| Real Sto Return     | 493      |
| Reward Loss         | -37.1    |
| Running Env Steps   | 463500   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 927      |
----------------------------------
2025-02-01 15:38:28.164745 Eastern Standard Time
| Itration            | 928      |
| Real Det Return     | 527      |
| Real Sto Return     | 483      |
| Reward Loss         | -16.1    |
| Running Env Steps   | 464000   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 928      |
----------------------------------
2025-02-01 15:38:43.865411 Eastern Standard Time
| Itration            | 929      |
| Real Det Return     | 537      |
| Real Sto Return     | 500      |
| Reward Loss         | -19.4    |
| Running Env Steps   | 464500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.71     |
| Running Update Time | 929      |
----------------------------------
2025-02-01 15:38:59.527911 Eastern Standard Time
| Itration            | 930      |
| Real Det Return     | 522      |
| Real Sto Return     | 469      |
| Reward Loss         | -37.6    |
| Running Env Steps   | 465000   |
| Running Forward KL  | -4.28    |
| Running Reverse KL  | 3.56     |
| Running Update Time | 930      |
----------------------------------
2025-02-01 15:39:15.163459 Eastern Standard Time
| Itration            | 931      |
| Real Det Return     | 543      |
| Real Sto Return     | 491      |
| Reward Loss         | -27.4    |
| Running Env Steps   | 465500   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 931      |
----------------------------------
2025-02-01 15:39:30.833072 Eastern Standard Time
| Itration            | 932      |
| Real Det Return     | 513      |
| Real Sto Return     | 464      |
| Reward Loss         | -31.4    |
| Running Env Steps   | 466000   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.12     |
| Running Update Time | 932      |
----------------------------------
2025-02-01 15:39:46.426584 Eastern Standard Time
| Itration            | 933      |
| Real Det Return     | 516      |
| Real Sto Return     | 480      |
| Reward Loss         | -23.5    |
| Running Env Steps   | 466500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 933      |
----------------------------------
2025-02-01 15:40:01.999597 Eastern Standard Time
| Itration            | 934      |
| Real Det Return     | 556      |
| Real Sto Return     | 494      |
| Reward Loss         | -32.3    |
| Running Env Steps   | 467000   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 934      |
----------------------------------
2025-02-01 15:40:17.859945 Eastern Standard Time
| Itration            | 935      |
| Real Det Return     | 532      |
| Real Sto Return     | 494      |
| Reward Loss         | -20.6    |
| Running Env Steps   | 467500   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 935      |
----------------------------------
2025-02-01 15:40:33.770552 Eastern Standard Time
| Itration            | 936      |
| Real Det Return     | 514      |
| Real Sto Return     | 466      |
| Reward Loss         | -29      |
| Running Env Steps   | 468000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 936      |
----------------------------------
2025-02-01 15:40:49.670181 Eastern Standard Time
| Itration            | 937      |
| Real Det Return     | 537      |
| Real Sto Return     | 489      |
| Reward Loss         | -34.2    |
| Running Env Steps   | 468500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.2      |
| Running Update Time | 937      |
----------------------------------
2025-02-01 15:41:05.709048 Eastern Standard Time
| Itration            | 938      |
| Real Det Return     | 526      |
| Real Sto Return     | 480      |
| Reward Loss         | -33.8    |
| Running Env Steps   | 469000   |
| Running Forward KL  | -4.27    |
| Running Reverse KL  | 3.87     |
| Running Update Time | 938      |
----------------------------------
2025-02-01 15:41:21.708999 Eastern Standard Time
| Itration            | 939      |
| Real Det Return     | 545      |
| Real Sto Return     | 479      |
| Reward Loss         | -25.6    |
| Running Env Steps   | 469500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 939      |
----------------------------------
2025-02-01 15:41:37.706958 Eastern Standard Time
| Itration            | 940      |
| Real Det Return     | 532      |
| Real Sto Return     | 493      |
| Reward Loss         | -24.1    |
| Running Env Steps   | 470000   |
| Running Forward KL  | -4.65    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 940      |
----------------------------------
2025-02-01 15:41:53.717019 Eastern Standard Time
| Itration            | 941      |
| Real Det Return     | 532      |
| Real Sto Return     | 477      |
| Reward Loss         | -18.9    |
| Running Env Steps   | 470500   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 4.41     |
| Running Update Time | 941      |
----------------------------------
2025-02-01 15:42:09.449314 Eastern Standard Time
| Itration            | 942      |
| Real Det Return     | 533      |
| Real Sto Return     | 486      |
| Reward Loss         | -43.1    |
| Running Env Steps   | 471000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 3.98     |
| Running Update Time | 942      |
----------------------------------
2025-02-01 15:42:25.136650 Eastern Standard Time
| Itration            | 943      |
| Real Det Return     | 539      |
| Real Sto Return     | 498      |
| Reward Loss         | -26.1    |
| Running Env Steps   | 471500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 943      |
----------------------------------
2025-02-01 15:42:40.853830 Eastern Standard Time
| Itration            | 944      |
| Real Det Return     | 534      |
| Real Sto Return     | 490      |
| Reward Loss         | -9.52    |
| Running Env Steps   | 472000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 944      |
----------------------------------
2025-02-01 15:42:56.710559 Eastern Standard Time
| Itration            | 945      |
| Real Det Return     | 539      |
| Real Sto Return     | 487      |
| Reward Loss         | -28.6    |
| Running Env Steps   | 472500   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 945      |
----------------------------------
2025-02-01 15:43:12.351870 Eastern Standard Time
| Itration            | 946      |
| Real Det Return     | 525      |
| Real Sto Return     | 459      |
| Reward Loss         | -33.4    |
| Running Env Steps   | 473000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 946      |
----------------------------------
2025-02-01 15:43:28.047525 Eastern Standard Time
| Itration            | 947      |
| Real Det Return     | 536      |
| Real Sto Return     | 490      |
| Reward Loss         | -23.9    |
| Running Env Steps   | 473500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 947      |
----------------------------------
2025-02-01 15:43:43.573597 Eastern Standard Time
| Itration            | 948      |
| Real Det Return     | 532      |
| Real Sto Return     | 485      |
| Reward Loss         | -26.8    |
| Running Env Steps   | 474000   |
| Running Forward KL  | -4.31    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 948      |
----------------------------------
2025-02-01 15:43:59.193521 Eastern Standard Time
| Itration            | 949      |
| Real Det Return     | 541      |
| Real Sto Return     | 488      |
| Reward Loss         | -20      |
| Running Env Steps   | 474500   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 4.2      |
| Running Update Time | 949      |
----------------------------------
2025-02-01 15:44:14.830904 Eastern Standard Time
| Itration            | 950      |
| Real Det Return     | 547      |
| Real Sto Return     | 489      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 475000   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 950      |
----------------------------------
2025-02-01 15:44:30.479384 Eastern Standard Time
| Itration            | 951      |
| Real Det Return     | 528      |
| Real Sto Return     | 488      |
| Reward Loss         | -26.1    |
| Running Env Steps   | 475500   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 951      |
----------------------------------
2025-02-01 15:44:46.120721 Eastern Standard Time
| Itration            | 952      |
| Real Det Return     | 541      |
| Real Sto Return     | 488      |
| Reward Loss         | -24.9    |
| Running Env Steps   | 476000   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 952      |
----------------------------------
2025-02-01 15:45:01.819134 Eastern Standard Time
| Itration            | 953      |
| Real Det Return     | 535      |
| Real Sto Return     | 492      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 476500   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 4.68     |
| Running Update Time | 953      |
----------------------------------
2025-02-01 15:45:17.568325 Eastern Standard Time
| Itration            | 954      |
| Real Det Return     | 540      |
| Real Sto Return     | 477      |
| Reward Loss         | -21.4    |
| Running Env Steps   | 477000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.41     |
| Running Update Time | 954      |
----------------------------------
2025-02-01 15:45:33.298413 Eastern Standard Time
| Itration            | 955      |
| Real Det Return     | 531      |
| Real Sto Return     | 484      |
| Reward Loss         | -46.4    |
| Running Env Steps   | 477500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.57     |
| Running Update Time | 955      |
----------------------------------
2025-02-01 15:45:48.914843 Eastern Standard Time
| Itration            | 956      |
| Real Det Return     | 541      |
| Real Sto Return     | 485      |
| Reward Loss         | -25.8    |
| Running Env Steps   | 478000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 956      |
----------------------------------
2025-02-01 15:46:04.477147 Eastern Standard Time
| Itration            | 957      |
| Real Det Return     | 536      |
| Real Sto Return     | 498      |
| Reward Loss         | -28.1    |
| Running Env Steps   | 478500   |
| Running Forward KL  | -4.25    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 957      |
----------------------------------
2025-02-01 15:46:21.147464 Eastern Standard Time
| Itration            | 958      |
| Real Det Return     | 527      |
| Real Sto Return     | 475      |
| Reward Loss         | -15.1    |
| Running Env Steps   | 479000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 958      |
----------------------------------
2025-02-01 15:46:38.301772 Eastern Standard Time
| Itration            | 959      |
| Real Det Return     | 532      |
| Real Sto Return     | 479      |
| Reward Loss         | -30.9    |
| Running Env Steps   | 479500   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 959      |
----------------------------------
2025-02-01 15:46:54.281221 Eastern Standard Time
| Itration            | 960      |
| Real Det Return     | 535      |
| Real Sto Return     | 493      |
| Reward Loss         | -14.1    |
| Running Env Steps   | 480000   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 960      |
----------------------------------
2025-02-01 15:47:10.245703 Eastern Standard Time
| Itration            | 961      |
| Real Det Return     | 541      |
| Real Sto Return     | 491      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 480500   |
| Running Forward KL  | -4.5     |
| Running Reverse KL  | 4.17     |
| Running Update Time | 961      |
----------------------------------
2025-02-01 15:47:26.341733 Eastern Standard Time
| Itration            | 962      |
| Real Det Return     | 519      |
| Real Sto Return     | 480      |
| Reward Loss         | -31.2    |
| Running Env Steps   | 481000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 962      |
----------------------------------
2025-02-01 15:47:41.863813 Eastern Standard Time
| Itration            | 963      |
| Real Det Return     | 537      |
| Real Sto Return     | 486      |
| Reward Loss         | -13.4    |
| Running Env Steps   | 481500   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 963      |
----------------------------------
2025-02-01 15:47:57.393031 Eastern Standard Time
| Itration            | 964      |
| Real Det Return     | 532      |
| Real Sto Return     | 485      |
| Reward Loss         | -13      |
| Running Env Steps   | 482000   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.22     |
| Running Update Time | 964      |
----------------------------------
2025-02-01 15:48:13.183741 Eastern Standard Time
| Itration            | 965      |
| Real Det Return     | 549      |
| Real Sto Return     | 494      |
| Reward Loss         | -21      |
| Running Env Steps   | 482500   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 965      |
----------------------------------
2025-02-01 15:48:28.886264 Eastern Standard Time
| Itration            | 966      |
| Real Det Return     | 530      |
| Real Sto Return     | 484      |
| Reward Loss         | -19.9    |
| Running Env Steps   | 483000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 966      |
----------------------------------
2025-02-01 15:48:45.236328 Eastern Standard Time
| Itration            | 967      |
| Real Det Return     | 531      |
| Real Sto Return     | 494      |
| Reward Loss         | -33.4    |
| Running Env Steps   | 483500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 967      |
----------------------------------
2025-02-01 15:49:03.104645 Eastern Standard Time
| Itration            | 968      |
| Real Det Return     | 532      |
| Real Sto Return     | 490      |
| Reward Loss         | -30.5    |
| Running Env Steps   | 484000   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 968      |
----------------------------------
2025-02-01 15:49:19.239232 Eastern Standard Time
| Itration            | 969      |
| Real Det Return     | 520      |
| Real Sto Return     | 466      |
| Reward Loss         | -37.4    |
| Running Env Steps   | 484500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 969      |
----------------------------------
2025-02-01 15:49:35.066365 Eastern Standard Time
| Itration            | 970      |
| Real Det Return     | 539      |
| Real Sto Return     | 486      |
| Reward Loss         | -23.8    |
| Running Env Steps   | 485000   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 970      |
----------------------------------
2025-02-01 15:49:50.704494 Eastern Standard Time
| Itration            | 971      |
| Real Det Return     | 533      |
| Real Sto Return     | 485      |
| Reward Loss         | -30.4    |
| Running Env Steps   | 485500   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 971      |
----------------------------------
2025-02-01 15:50:06.464058 Eastern Standard Time
| Itration            | 972      |
| Real Det Return     | 535      |
| Real Sto Return     | 486      |
| Reward Loss         | -29.9    |
| Running Env Steps   | 486000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 3.94     |
| Running Update Time | 972      |
----------------------------------
2025-02-01 15:50:22.139922 Eastern Standard Time
| Itration            | 973      |
| Real Det Return     | 535      |
| Real Sto Return     | 478      |
| Reward Loss         | -25.3    |
| Running Env Steps   | 486500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.27     |
| Running Update Time | 973      |
----------------------------------
2025-02-01 15:50:37.852966 Eastern Standard Time
| Itration            | 974      |
| Real Det Return     | 535      |
| Real Sto Return     | 484      |
| Reward Loss         | -19.3    |
| Running Env Steps   | 487000   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 974      |
----------------------------------
2025-02-01 15:50:53.649513 Eastern Standard Time
| Itration            | 975      |
| Real Det Return     | 542      |
| Real Sto Return     | 492      |
| Reward Loss         | -29.1    |
| Running Env Steps   | 487500   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 4.23     |
| Running Update Time | 975      |
----------------------------------
2025-02-01 15:51:09.381245 Eastern Standard Time
| Itration            | 976      |
| Real Det Return     | 533      |
| Real Sto Return     | 492      |
| Reward Loss         | -20.3    |
| Running Env Steps   | 488000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 976      |
----------------------------------
2025-02-01 15:51:25.062346 Eastern Standard Time
| Itration            | 977      |
| Real Det Return     | 530      |
| Real Sto Return     | 474      |
| Reward Loss         | -39      |
| Running Env Steps   | 488500   |
| Running Forward KL  | -4.32    |
| Running Reverse KL  | 4.41     |
| Running Update Time | 977      |
----------------------------------
2025-02-01 15:51:40.836249 Eastern Standard Time
| Itration            | 978      |
| Real Det Return     | 537      |
| Real Sto Return     | 496      |
| Reward Loss         | -18      |
| Running Env Steps   | 489000   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 978      |
----------------------------------
2025-02-01 15:51:56.512652 Eastern Standard Time
| Itration            | 979      |
| Real Det Return     | 532      |
| Real Sto Return     | 491      |
| Reward Loss         | -32.4    |
| Running Env Steps   | 489500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 3.88     |
| Running Update Time | 979      |
----------------------------------
2025-02-01 15:52:12.239632 Eastern Standard Time
| Itration            | 980      |
| Real Det Return     | 543      |
| Real Sto Return     | 490      |
| Reward Loss         | -23.3    |
| Running Env Steps   | 490000   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 980      |
----------------------------------
2025-02-01 15:52:27.931422 Eastern Standard Time
| Itration            | 981      |
| Real Det Return     | 546      |
| Real Sto Return     | 498      |
| Reward Loss         | -20.4    |
| Running Env Steps   | 490500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 981      |
----------------------------------
2025-02-01 15:52:43.664626 Eastern Standard Time
| Itration            | 982      |
| Real Det Return     | 522      |
| Real Sto Return     | 472      |
| Reward Loss         | -23.5    |
| Running Env Steps   | 491000   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 982      |
----------------------------------
2025-02-01 15:52:59.307911 Eastern Standard Time
| Itration            | 983      |
| Real Det Return     | 539      |
| Real Sto Return     | 488      |
| Reward Loss         | -17      |
| Running Env Steps   | 491500   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 983      |
----------------------------------
2025-02-01 15:53:14.953734 Eastern Standard Time
| Itration            | 984      |
| Real Det Return     | 546      |
| Real Sto Return     | 494      |
| Reward Loss         | -18.2    |
| Running Env Steps   | 492000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 984      |
----------------------------------
2025-02-01 15:53:30.654070 Eastern Standard Time
| Itration            | 985      |
| Real Det Return     | 511      |
| Real Sto Return     | 476      |
| Reward Loss         | -24.2    |
| Running Env Steps   | 492500   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 985      |
----------------------------------
2025-02-01 15:53:46.452766 Eastern Standard Time
| Itration            | 986      |
| Real Det Return     | 535      |
| Real Sto Return     | 488      |
| Reward Loss         | -18.1    |
| Running Env Steps   | 493000   |
| Running Forward KL  | -4.52    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 986      |
----------------------------------
2025-02-01 15:54:02.127329 Eastern Standard Time
| Itration            | 987      |
| Real Det Return     | 537      |
| Real Sto Return     | 484      |
| Reward Loss         | -26.5    |
| Running Env Steps   | 493500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.58     |
| Running Update Time | 987      |
----------------------------------
2025-02-01 15:54:17.905396 Eastern Standard Time
| Itration            | 988      |
| Real Det Return     | 523      |
| Real Sto Return     | 475      |
| Reward Loss         | -35.8    |
| Running Env Steps   | 494000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 988      |
----------------------------------
2025-02-01 15:54:33.651574 Eastern Standard Time
| Itration            | 989      |
| Real Det Return     | 546      |
| Real Sto Return     | 490      |
| Reward Loss         | -31.7    |
| Running Env Steps   | 494500   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 989      |
----------------------------------
2025-02-01 15:54:49.339092 Eastern Standard Time
| Itration            | 990      |
| Real Det Return     | 539      |
| Real Sto Return     | 485      |
| Reward Loss         | -27.6    |
| Running Env Steps   | 495000   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 990      |
----------------------------------
2025-02-01 15:55:05.082256 Eastern Standard Time
| Itration            | 991      |
| Real Det Return     | 550      |
| Real Sto Return     | 488      |
| Reward Loss         | -16      |
| Running Env Steps   | 495500   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 991      |
----------------------------------
2025-02-01 15:55:20.809587 Eastern Standard Time
| Itration            | 992      |
| Real Det Return     | 527      |
| Real Sto Return     | 493      |
| Reward Loss         | -19.1    |
| Running Env Steps   | 496000   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 992      |
----------------------------------
2025-02-01 15:55:36.406469 Eastern Standard Time
| Itration            | 993      |
| Real Det Return     | 528      |
| Real Sto Return     | 480      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 496500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.68     |
| Running Update Time | 993      |
----------------------------------
2025-02-01 15:55:51.950068 Eastern Standard Time
| Itration            | 994      |
| Real Det Return     | 536      |
| Real Sto Return     | 482      |
| Reward Loss         | -15.9    |
| Running Env Steps   | 497000   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 994      |
----------------------------------
2025-02-01 15:56:07.462133 Eastern Standard Time
| Itration            | 995      |
| Real Det Return     | 526      |
| Real Sto Return     | 485      |
| Reward Loss         | -10.1    |
| Running Env Steps   | 497500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 995      |
----------------------------------
2025-02-01 15:56:22.970977 Eastern Standard Time
| Itration            | 996      |
| Real Det Return     | 539      |
| Real Sto Return     | 486      |
| Reward Loss         | -32.3    |
| Running Env Steps   | 498000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 996      |
----------------------------------
2025-02-01 15:56:38.431546 Eastern Standard Time
| Itration            | 997      |
| Real Det Return     | 520      |
| Real Sto Return     | 475      |
| Reward Loss         | -23.7    |
| Running Env Steps   | 498500   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 997      |
----------------------------------
2025-02-01 15:56:53.945379 Eastern Standard Time
| Itration            | 998      |
| Real Det Return     | 541      |
| Real Sto Return     | 481      |
| Reward Loss         | -19.7    |
| Running Env Steps   | 499000   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 998      |
----------------------------------
2025-02-01 15:57:09.465450 Eastern Standard Time
| Itration            | 999      |
| Real Det Return     | 539      |
| Real Sto Return     | 492      |
| Reward Loss         | -13.2    |
| Running Env Steps   | 499500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 999      |
----------------------------------
2025-02-01 15:57:24.917001 Eastern Standard Time
| Itration            | 1000     |
| Real Det Return     | 517      |
| Real Sto Return     | 473      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 500000   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 1000     |
----------------------------------
2025-02-01 15:57:40.350135 Eastern Standard Time
| Itration            | 1001     |
| Real Det Return     | 544      |
| Real Sto Return     | 488      |
| Reward Loss         | -23.8    |
| Running Env Steps   | 500500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 1001     |
----------------------------------
2025-02-01 15:57:55.794961 Eastern Standard Time
| Itration            | 1002     |
| Real Det Return     | 540      |
| Real Sto Return     | 485      |
| Reward Loss         | -15.5    |
| Running Env Steps   | 501000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 4.1      |
| Running Update Time | 1002     |
----------------------------------
2025-02-01 15:58:11.274610 Eastern Standard Time
| Itration            | 1003     |
| Real Det Return     | 522      |
| Real Sto Return     | 480      |
| Reward Loss         | -11.7    |
| Running Env Steps   | 501500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.57     |
| Running Update Time | 1003     |
----------------------------------
2025-02-01 15:58:26.846674 Eastern Standard Time
| Itration            | 1004     |
| Real Det Return     | 522      |
| Real Sto Return     | 477      |
| Reward Loss         | -16.3    |
| Running Env Steps   | 502000   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 1004     |
----------------------------------
2025-02-01 15:58:42.358960 Eastern Standard Time
| Itration            | 1005     |
| Real Det Return     | 531      |
| Real Sto Return     | 490      |
| Reward Loss         | -27.7    |
| Running Env Steps   | 502500   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1005     |
----------------------------------
2025-02-01 15:58:57.790946 Eastern Standard Time
| Itration            | 1006     |
| Real Det Return     | 553      |
| Real Sto Return     | 490      |
| Reward Loss         | -15.6    |
| Running Env Steps   | 503000   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 1006     |
----------------------------------
2025-02-01 15:59:13.297344 Eastern Standard Time
| Itration            | 1007     |
| Real Det Return     | 536      |
| Real Sto Return     | 493      |
| Reward Loss         | -31.6    |
| Running Env Steps   | 503500   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 1007     |
----------------------------------
2025-02-01 15:59:28.915886 Eastern Standard Time
| Itration            | 1008     |
| Real Det Return     | 532      |
| Real Sto Return     | 483      |
| Reward Loss         | -31.8    |
| Running Env Steps   | 504000   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1008     |
----------------------------------
2025-02-01 15:59:44.427981 Eastern Standard Time
| Itration            | 1009     |
| Real Det Return     | 536      |
| Real Sto Return     | 478      |
| Reward Loss         | -34.7    |
| Running Env Steps   | 504500   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 4.11     |
| Running Update Time | 1009     |
----------------------------------
2025-02-01 15:59:59.860493 Eastern Standard Time
| Itration            | 1010     |
| Real Det Return     | 494      |
| Real Sto Return     | 462      |
| Reward Loss         | -24.1    |
| Running Env Steps   | 505000   |
| Running Forward KL  | -4.34    |
| Running Reverse KL  | 5.48     |
| Running Update Time | 1010     |
----------------------------------
2025-02-01 16:00:15.419215 Eastern Standard Time
| Itration            | 1011     |
| Real Det Return     | 536      |
| Real Sto Return     | 481      |
| Reward Loss         | -29.2    |
| Running Env Steps   | 505500   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 4.2      |
| Running Update Time | 1011     |
----------------------------------
2025-02-01 16:00:30.975649 Eastern Standard Time
| Itration            | 1012     |
| Real Det Return     | 516      |
| Real Sto Return     | 472      |
| Reward Loss         | -9.11    |
| Running Env Steps   | 506000   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1012     |
----------------------------------
2025-02-01 16:00:46.549888 Eastern Standard Time
| Itration            | 1013     |
| Real Det Return     | 533      |
| Real Sto Return     | 499      |
| Reward Loss         | -11      |
| Running Env Steps   | 506500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1013     |
----------------------------------
2025-02-01 16:01:02.013942 Eastern Standard Time
| Itration            | 1014     |
| Real Det Return     | 519      |
| Real Sto Return     | 483      |
| Reward Loss         | -15.2    |
| Running Env Steps   | 507000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 1014     |
----------------------------------
2025-02-01 16:01:17.506690 Eastern Standard Time
| Itration            | 1015     |
| Real Det Return     | 532      |
| Real Sto Return     | 481      |
| Reward Loss         | -31.9    |
| Running Env Steps   | 507500   |
| Running Forward KL  | -4.36    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 1015     |
----------------------------------
2025-02-01 16:01:32.978826 Eastern Standard Time
| Itration            | 1016     |
| Real Det Return     | 521      |
| Real Sto Return     | 472      |
| Reward Loss         | -26      |
| Running Env Steps   | 508000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 1016     |
----------------------------------
2025-02-01 16:01:48.503472 Eastern Standard Time
| Itration            | 1017     |
| Real Det Return     | 527      |
| Real Sto Return     | 479      |
| Reward Loss         | -17      |
| Running Env Steps   | 508500   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 1017     |
----------------------------------
2025-02-01 16:02:03.995223 Eastern Standard Time
| Itration            | 1018     |
| Real Det Return     | 519      |
| Real Sto Return     | 467      |
| Reward Loss         | -25.7    |
| Running Env Steps   | 509000   |
| Running Forward KL  | -4.24    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 1018     |
----------------------------------
2025-02-01 16:02:19.476907 Eastern Standard Time
| Itration            | 1019     |
| Real Det Return     | 522      |
| Real Sto Return     | 475      |
| Reward Loss         | -34.5    |
| Running Env Steps   | 509500   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 1019     |
----------------------------------
2025-02-01 16:02:35.074843 Eastern Standard Time
| Itration            | 1020     |
| Real Det Return     | 528      |
| Real Sto Return     | 473      |
| Reward Loss         | -33      |
| Running Env Steps   | 510000   |
| Running Forward KL  | -4.39    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1020     |
----------------------------------
2025-02-01 16:02:50.559147 Eastern Standard Time
| Itration            | 1021     |
| Real Det Return     | 524      |
| Real Sto Return     | 492      |
| Reward Loss         | -36.1    |
| Running Env Steps   | 510500   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.31     |
| Running Update Time | 1021     |
----------------------------------
2025-02-01 16:03:06.054763 Eastern Standard Time
| Itration            | 1022     |
| Real Det Return     | 526      |
| Real Sto Return     | 469      |
| Reward Loss         | -23      |
| Running Env Steps   | 511000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 1022     |
----------------------------------
2025-02-01 16:03:21.560088 Eastern Standard Time
| Itration            | 1023     |
| Real Det Return     | 517      |
| Real Sto Return     | 483      |
| Reward Loss         | -36.9    |
| Running Env Steps   | 511500   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 1023     |
----------------------------------
2025-02-01 16:03:37.014211 Eastern Standard Time
| Itration            | 1024     |
| Real Det Return     | 536      |
| Real Sto Return     | 499      |
| Reward Loss         | -17.2    |
| Running Env Steps   | 512000   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1024     |
----------------------------------
2025-02-01 16:03:52.490964 Eastern Standard Time
| Itration            | 1025     |
| Real Det Return     | 524      |
| Real Sto Return     | 478      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 512500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1025     |
----------------------------------
2025-02-01 16:04:08.012002 Eastern Standard Time
| Itration            | 1026     |
| Real Det Return     | 535      |
| Real Sto Return     | 478      |
| Reward Loss         | -28.9    |
| Running Env Steps   | 513000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 1026     |
----------------------------------
2025-02-01 16:04:23.573742 Eastern Standard Time
| Itration            | 1027     |
| Real Det Return     | 536      |
| Real Sto Return     | 480      |
| Reward Loss         | -50.6    |
| Running Env Steps   | 513500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.24     |
| Running Update Time | 1027     |
----------------------------------
2025-02-01 16:04:39.131267 Eastern Standard Time
| Itration            | 1028     |
| Real Det Return     | 511      |
| Real Sto Return     | 476      |
| Reward Loss         | -16.7    |
| Running Env Steps   | 514000   |
| Running Forward KL  | -4.42    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 1028     |
----------------------------------
2025-02-01 16:04:54.531830 Eastern Standard Time
| Itration            | 1029     |
| Real Det Return     | 538      |
| Real Sto Return     | 488      |
| Reward Loss         | -13.3    |
| Running Env Steps   | 514500   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1029     |
----------------------------------
2025-02-01 16:05:10.101396 Eastern Standard Time
| Itration            | 1030     |
| Real Det Return     | 536      |
| Real Sto Return     | 489      |
| Reward Loss         | -19      |
| Running Env Steps   | 515000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 1030     |
----------------------------------
2025-02-01 16:05:25.542207 Eastern Standard Time
| Itration            | 1031     |
| Real Det Return     | 540      |
| Real Sto Return     | 484      |
| Reward Loss         | -18.5    |
| Running Env Steps   | 515500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 5.08     |
| Running Update Time | 1031     |
----------------------------------
2025-02-01 16:05:41.013667 Eastern Standard Time
| Itration            | 1032     |
| Real Det Return     | 541      |
| Real Sto Return     | 491      |
| Reward Loss         | -34.6    |
| Running Env Steps   | 516000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.52     |
| Running Update Time | 1032     |
----------------------------------
2025-02-01 16:05:56.458996 Eastern Standard Time
| Itration            | 1033     |
| Real Det Return     | 540      |
| Real Sto Return     | 492      |
| Reward Loss         | -14.4    |
| Running Env Steps   | 516500   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 1033     |
----------------------------------
2025-02-01 16:06:11.936903 Eastern Standard Time
| Itration            | 1034     |
| Real Det Return     | 540      |
| Real Sto Return     | 484      |
| Reward Loss         | -34.8    |
| Running Env Steps   | 517000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 1034     |
----------------------------------
2025-02-01 16:06:27.369323 Eastern Standard Time
| Itration            | 1035     |
| Real Det Return     | 550      |
| Real Sto Return     | 498      |
| Reward Loss         | -28.2    |
| Running Env Steps   | 517500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 1035     |
----------------------------------
2025-02-01 16:06:42.867764 Eastern Standard Time
| Itration            | 1036     |
| Real Det Return     | 501      |
| Real Sto Return     | 469      |
| Reward Loss         | -20.7    |
| Running Env Steps   | 518000   |
| Running Forward KL  | -3.92    |
| Running Reverse KL  | 5.07     |
| Running Update Time | 1036     |
----------------------------------
2025-02-01 16:06:58.313738 Eastern Standard Time
| Itration            | 1037     |
| Real Det Return     | 511      |
| Real Sto Return     | 460      |
| Reward Loss         | -32      |
| Running Env Steps   | 518500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1037     |
----------------------------------
2025-02-01 16:07:13.870289 Eastern Standard Time
| Itration            | 1038     |
| Real Det Return     | 531      |
| Real Sto Return     | 485      |
| Reward Loss         | -21.3    |
| Running Env Steps   | 519000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1038     |
----------------------------------
2025-02-01 16:07:29.276061 Eastern Standard Time
| Itration            | 1039     |
| Real Det Return     | 535      |
| Real Sto Return     | 485      |
| Reward Loss         | -19.3    |
| Running Env Steps   | 519500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1039     |
----------------------------------
2025-02-01 16:07:44.770744 Eastern Standard Time
| Itration            | 1040     |
| Real Det Return     | 544      |
| Real Sto Return     | 483      |
| Reward Loss         | -33.5    |
| Running Env Steps   | 520000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 4.52     |
| Running Update Time | 1040     |
----------------------------------
2025-02-01 16:08:00.227196 Eastern Standard Time
| Itration            | 1041     |
| Real Det Return     | 538      |
| Real Sto Return     | 479      |
| Reward Loss         | -19.2    |
| Running Env Steps   | 520500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 1041     |
----------------------------------
2025-02-01 16:08:15.742027 Eastern Standard Time
| Itration            | 1042     |
| Real Det Return     | 538      |
| Real Sto Return     | 492      |
| Reward Loss         | -32.2    |
| Running Env Steps   | 521000   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1042     |
----------------------------------
2025-02-01 16:08:31.423307 Eastern Standard Time
| Itration            | 1043     |
| Real Det Return     | 518      |
| Real Sto Return     | 477      |
| Reward Loss         | -26.5    |
| Running Env Steps   | 521500   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 1043     |
----------------------------------
2025-02-01 16:08:47.062780 Eastern Standard Time
| Itration            | 1044     |
| Real Det Return     | 536      |
| Real Sto Return     | 488      |
| Reward Loss         | -11.2    |
| Running Env Steps   | 522000   |
| Running Forward KL  | -5.38    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 1044     |
----------------------------------
2025-02-01 16:09:02.521656 Eastern Standard Time
| Itration            | 1045     |
| Real Det Return     | 521      |
| Real Sto Return     | 470      |
| Reward Loss         | -32.9    |
| Running Env Steps   | 522500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1045     |
----------------------------------
2025-02-01 16:09:18.009815 Eastern Standard Time
| Itration            | 1046     |
| Real Det Return     | 530      |
| Real Sto Return     | 476      |
| Reward Loss         | -19.8    |
| Running Env Steps   | 523000   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 1046     |
----------------------------------
2025-02-01 16:09:33.755564 Eastern Standard Time
| Itration            | 1047     |
| Real Det Return     | 517      |
| Real Sto Return     | 472      |
| Reward Loss         | -20.9    |
| Running Env Steps   | 523500   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1047     |
----------------------------------
2025-02-01 16:09:49.213155 Eastern Standard Time
| Itration            | 1048     |
| Real Det Return     | 534      |
| Real Sto Return     | 484      |
| Reward Loss         | -22.3    |
| Running Env Steps   | 524000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1048     |
----------------------------------
2025-02-01 16:10:04.707170 Eastern Standard Time
| Itration            | 1049     |
| Real Det Return     | 529      |
| Real Sto Return     | 483      |
| Reward Loss         | -23.4    |
| Running Env Steps   | 524500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 5.17     |
| Running Update Time | 1049     |
----------------------------------
2025-02-01 16:10:20.260880 Eastern Standard Time
| Itration            | 1050     |
| Real Det Return     | 529      |
| Real Sto Return     | 483      |
| Reward Loss         | -30.1    |
| Running Env Steps   | 525000   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1050     |
----------------------------------
2025-02-01 16:10:35.720118 Eastern Standard Time
| Itration            | 1051     |
| Real Det Return     | 546      |
| Real Sto Return     | 490      |
| Reward Loss         | -28      |
| Running Env Steps   | 525500   |
| Running Forward KL  | -4.65    |
| Running Reverse KL  | 4.41     |
| Running Update Time | 1051     |
----------------------------------
2025-02-01 16:10:51.213299 Eastern Standard Time
| Itration            | 1052     |
| Real Det Return     | 516      |
| Real Sto Return     | 477      |
| Reward Loss         | -32.6    |
| Running Env Steps   | 526000   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1052     |
----------------------------------
2025-02-01 16:11:06.622552 Eastern Standard Time
| Itration            | 1053     |
| Real Det Return     | 535      |
| Real Sto Return     | 491      |
| Reward Loss         | -23.5    |
| Running Env Steps   | 526500   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 1053     |
----------------------------------
2025-02-01 16:11:22.121278 Eastern Standard Time
| Itration            | 1054     |
| Real Det Return     | 540      |
| Real Sto Return     | 499      |
| Reward Loss         | -32.6    |
| Running Env Steps   | 527000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1054     |
----------------------------------
2025-02-01 16:11:37.638141 Eastern Standard Time
| Itration            | 1055     |
| Real Det Return     | 516      |
| Real Sto Return     | 485      |
| Reward Loss         | -31      |
| Running Env Steps   | 527500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 1055     |
----------------------------------
2025-02-01 16:11:53.156421 Eastern Standard Time
| Itration            | 1056     |
| Real Det Return     | 524      |
| Real Sto Return     | 482      |
| Reward Loss         | -29.5    |
| Running Env Steps   | 528000   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1056     |
----------------------------------
2025-02-01 16:12:08.695374 Eastern Standard Time
| Itration            | 1057     |
| Real Det Return     | 526      |
| Real Sto Return     | 480      |
| Reward Loss         | -25.4    |
| Running Env Steps   | 528500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.41     |
| Running Update Time | 1057     |
----------------------------------
2025-02-01 16:12:24.208284 Eastern Standard Time
| Itration            | 1058     |
| Real Det Return     | 530      |
| Real Sto Return     | 487      |
| Reward Loss         | -32.7    |
| Running Env Steps   | 529000   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 1058     |
----------------------------------
2025-02-01 16:12:39.669868 Eastern Standard Time
| Itration            | 1059     |
| Real Det Return     | 535      |
| Real Sto Return     | 491      |
| Reward Loss         | -27.6    |
| Running Env Steps   | 529500   |
| Running Forward KL  | -4.65    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1059     |
----------------------------------
2025-02-01 16:12:55.225615 Eastern Standard Time
| Itration            | 1060     |
| Real Det Return     | 541      |
| Real Sto Return     | 498      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 530000   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 1060     |
----------------------------------
2025-02-01 16:13:10.843061 Eastern Standard Time
| Itration            | 1061     |
| Real Det Return     | 545      |
| Real Sto Return     | 495      |
| Reward Loss         | -28.9    |
| Running Env Steps   | 530500   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.15     |
| Running Update Time | 1061     |
----------------------------------
2025-02-01 16:13:26.293739 Eastern Standard Time
| Itration            | 1062     |
| Real Det Return     | 533      |
| Real Sto Return     | 479      |
| Reward Loss         | -27.8    |
| Running Env Steps   | 531000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.94     |
| Running Update Time | 1062     |
----------------------------------
2025-02-01 16:13:41.757408 Eastern Standard Time
| Itration            | 1063     |
| Real Det Return     | 531      |
| Real Sto Return     | 467      |
| Reward Loss         | -32.4    |
| Running Env Steps   | 531500   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 3.92     |
| Running Update Time | 1063     |
----------------------------------
2025-02-01 16:13:57.189800 Eastern Standard Time
| Itration            | 1064     |
| Real Det Return     | 531      |
| Real Sto Return     | 484      |
| Reward Loss         | -21.4    |
| Running Env Steps   | 532000   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 1064     |
----------------------------------
2025-02-01 16:14:12.698308 Eastern Standard Time
| Itration            | 1065     |
| Real Det Return     | 540      |
| Real Sto Return     | 489      |
| Reward Loss         | -18.9    |
| Running Env Steps   | 532500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 1065     |
----------------------------------
2025-02-01 16:14:28.213850 Eastern Standard Time
| Itration            | 1066     |
| Real Det Return     | 559      |
| Real Sto Return     | 499      |
| Reward Loss         | -41      |
| Running Env Steps   | 533000   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.05     |
| Running Update Time | 1066     |
----------------------------------
2025-02-01 16:14:43.756691 Eastern Standard Time
| Itration            | 1067     |
| Real Det Return     | 519      |
| Real Sto Return     | 463      |
| Reward Loss         | -30.4    |
| Running Env Steps   | 533500   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1067     |
----------------------------------
2025-02-01 16:14:59.211425 Eastern Standard Time
| Itration            | 1068     |
| Real Det Return     | 527      |
| Real Sto Return     | 487      |
| Reward Loss         | -8       |
| Running Env Steps   | 534000   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 1068     |
----------------------------------
2025-02-01 16:15:14.667567 Eastern Standard Time
| Itration            | 1069     |
| Real Det Return     | 528      |
| Real Sto Return     | 483      |
| Reward Loss         | -15.3    |
| Running Env Steps   | 534500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 1069     |
----------------------------------
2025-02-01 16:15:30.163018 Eastern Standard Time
| Itration            | 1070     |
| Real Det Return     | 542      |
| Real Sto Return     | 494      |
| Reward Loss         | -22.8    |
| Running Env Steps   | 535000   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1070     |
----------------------------------
2025-02-01 16:15:45.620215 Eastern Standard Time
| Itration            | 1071     |
| Real Det Return     | 543      |
| Real Sto Return     | 498      |
| Reward Loss         | -8.51    |
| Running Env Steps   | 535500   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1071     |
----------------------------------
2025-02-01 16:16:01.127458 Eastern Standard Time
| Itration            | 1072     |
| Real Det Return     | 530      |
| Real Sto Return     | 475      |
| Reward Loss         | -19.3    |
| Running Env Steps   | 536000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 1072     |
----------------------------------
2025-02-01 16:16:16.603759 Eastern Standard Time
| Itration            | 1073     |
| Real Det Return     | 541      |
| Real Sto Return     | 486      |
| Reward Loss         | -24      |
| Running Env Steps   | 536500   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.57     |
| Running Update Time | 1073     |
----------------------------------
2025-02-01 16:16:32.028423 Eastern Standard Time
| Itration            | 1074     |
| Real Det Return     | 528      |
| Real Sto Return     | 473      |
| Reward Loss         | -32.1    |
| Running Env Steps   | 537000   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 1074     |
----------------------------------
2025-02-01 16:16:47.590259 Eastern Standard Time
| Itration            | 1075     |
| Real Det Return     | 514      |
| Real Sto Return     | 468      |
| Reward Loss         | -23.7    |
| Running Env Steps   | 537500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 1075     |
----------------------------------
2025-02-01 16:17:03.080070 Eastern Standard Time
| Itration            | 1076     |
| Real Det Return     | 546      |
| Real Sto Return     | 479      |
| Reward Loss         | -20.8    |
| Running Env Steps   | 538000   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1076     |
----------------------------------
2025-02-01 16:17:18.736115 Eastern Standard Time
| Itration            | 1077     |
| Real Det Return     | 523      |
| Real Sto Return     | 474      |
| Reward Loss         | -32      |
| Running Env Steps   | 538500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 1077     |
----------------------------------
2025-02-01 16:17:34.367682 Eastern Standard Time
| Itration            | 1078     |
| Real Det Return     | 528      |
| Real Sto Return     | 476      |
| Reward Loss         | -21.1    |
| Running Env Steps   | 539000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1078     |
----------------------------------
2025-02-01 16:17:49.833987 Eastern Standard Time
| Itration            | 1079     |
| Real Det Return     | 511      |
| Real Sto Return     | 465      |
| Reward Loss         | -29.8    |
| Running Env Steps   | 539500   |
| Running Forward KL  | -4.43    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1079     |
----------------------------------
2025-02-01 16:18:05.514184 Eastern Standard Time
| Itration            | 1080     |
| Real Det Return     | 527      |
| Real Sto Return     | 481      |
| Reward Loss         | -25.4    |
| Running Env Steps   | 540000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1080     |
----------------------------------
2025-02-01 16:18:21.577563 Eastern Standard Time
| Itration            | 1081     |
| Real Det Return     | 507      |
| Real Sto Return     | 468      |
| Reward Loss         | -15      |
| Running Env Steps   | 540500   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1081     |
----------------------------------
2025-02-01 16:18:37.222355 Eastern Standard Time
| Itration            | 1082     |
| Real Det Return     | 516      |
| Real Sto Return     | 460      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 541000   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 1082     |
----------------------------------
2025-02-01 16:18:53.331182 Eastern Standard Time
| Itration            | 1083     |
| Real Det Return     | 534      |
| Real Sto Return     | 485      |
| Reward Loss         | -24.7    |
| Running Env Steps   | 541500   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.3      |
| Running Update Time | 1083     |
----------------------------------
2025-02-01 16:19:09.242091 Eastern Standard Time
| Itration            | 1084     |
| Real Det Return     | 523      |
| Real Sto Return     | 485      |
| Reward Loss         | -27.2    |
| Running Env Steps   | 542000   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 1084     |
----------------------------------
2025-02-01 16:19:24.732057 Eastern Standard Time
| Itration            | 1085     |
| Real Det Return     | 528      |
| Real Sto Return     | 483      |
| Reward Loss         | -26.7    |
| Running Env Steps   | 542500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.08     |
| Running Update Time | 1085     |
----------------------------------
2025-02-01 16:19:40.260431 Eastern Standard Time
| Itration            | 1086     |
| Real Det Return     | 534      |
| Real Sto Return     | 473      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 543000   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.28     |
| Running Update Time | 1086     |
----------------------------------
2025-02-01 16:19:55.781688 Eastern Standard Time
| Itration            | 1087     |
| Real Det Return     | 533      |
| Real Sto Return     | 493      |
| Reward Loss         | -35.6    |
| Running Env Steps   | 543500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 1087     |
----------------------------------
2025-02-01 16:20:11.807998 Eastern Standard Time
| Itration            | 1088     |
| Real Det Return     | 536      |
| Real Sto Return     | 476      |
| Reward Loss         | -24.5    |
| Running Env Steps   | 544000   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 1088     |
----------------------------------
2025-02-01 16:20:27.790566 Eastern Standard Time
| Itration            | 1089     |
| Real Det Return     | 530      |
| Real Sto Return     | 483      |
| Reward Loss         | -24.6    |
| Running Env Steps   | 544500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 1089     |
----------------------------------
2025-02-01 16:20:43.452192 Eastern Standard Time
| Itration            | 1090     |
| Real Det Return     | 531      |
| Real Sto Return     | 470      |
| Reward Loss         | -29.3    |
| Running Env Steps   | 545000   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1090     |
----------------------------------
2025-02-01 16:20:59.432223 Eastern Standard Time
| Itration            | 1091     |
| Real Det Return     | 520      |
| Real Sto Return     | 476      |
| Reward Loss         | -16.1    |
| Running Env Steps   | 545500   |
| Running Forward KL  | -4.44    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 1091     |
----------------------------------
2025-02-01 16:21:15.130486 Eastern Standard Time
| Itration            | 1092     |
| Real Det Return     | 524      |
| Real Sto Return     | 467      |
| Reward Loss         | -24.7    |
| Running Env Steps   | 546000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 1092     |
----------------------------------
2025-02-01 16:21:31.196307 Eastern Standard Time
| Itration            | 1093     |
| Real Det Return     | 517      |
| Real Sto Return     | 487      |
| Reward Loss         | -23.6    |
| Running Env Steps   | 546500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 1093     |
----------------------------------
2025-02-01 16:21:47.255471 Eastern Standard Time
| Itration            | 1094     |
| Real Det Return     | 505      |
| Real Sto Return     | 474      |
| Reward Loss         | -12.1    |
| Running Env Steps   | 547000   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1094     |
----------------------------------
2025-02-01 16:22:03.395899 Eastern Standard Time
| Itration            | 1095     |
| Real Det Return     | 523      |
| Real Sto Return     | 481      |
| Reward Loss         | -34.7    |
| Running Env Steps   | 547500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 1095     |
----------------------------------
2025-02-01 16:22:19.252591 Eastern Standard Time
| Itration            | 1096     |
| Real Det Return     | 521      |
| Real Sto Return     | 480      |
| Reward Loss         | -32.2    |
| Running Env Steps   | 548000   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.38     |
| Running Update Time | 1096     |
----------------------------------
2025-02-01 16:22:34.995947 Eastern Standard Time
| Itration            | 1097     |
| Real Det Return     | 545      |
| Real Sto Return     | 483      |
| Reward Loss         | -20.1    |
| Running Env Steps   | 548500   |
| Running Forward KL  | -4.51    |
| Running Reverse KL  | 4.38     |
| Running Update Time | 1097     |
----------------------------------
2025-02-01 16:22:50.585680 Eastern Standard Time
| Itration            | 1098     |
| Real Det Return     | 526      |
| Real Sto Return     | 472      |
| Reward Loss         | -28.2    |
| Running Env Steps   | 549000   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1098     |
----------------------------------
2025-02-01 16:23:06.136931 Eastern Standard Time
| Itration            | 1099     |
| Real Det Return     | 526      |
| Real Sto Return     | 482      |
| Reward Loss         | -27.4    |
| Running Env Steps   | 549500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 1099     |
----------------------------------
2025-02-01 16:23:21.980838 Eastern Standard Time
| Itration            | 1100     |
| Real Det Return     | 526      |
| Real Sto Return     | 473      |
| Reward Loss         | -25.2    |
| Running Env Steps   | 550000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1100     |
----------------------------------
2025-02-01 16:23:37.991459 Eastern Standard Time
| Itration            | 1101     |
| Real Det Return     | 536      |
| Real Sto Return     | 473      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 550500   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1101     |
----------------------------------
2025-02-01 16:23:53.561601 Eastern Standard Time
| Itration            | 1102     |
| Real Det Return     | 522      |
| Real Sto Return     | 480      |
| Reward Loss         | -23.1    |
| Running Env Steps   | 551000   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1102     |
----------------------------------
2025-02-01 16:24:09.178186 Eastern Standard Time
| Itration            | 1103     |
| Real Det Return     | 522      |
| Real Sto Return     | 476      |
| Reward Loss         | -15.7    |
| Running Env Steps   | 551500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1103     |
----------------------------------
2025-02-01 16:24:24.812743 Eastern Standard Time
| Itration            | 1104     |
| Real Det Return     | 523      |
| Real Sto Return     | 482      |
| Reward Loss         | -21.9    |
| Running Env Steps   | 552000   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1104     |
----------------------------------
2025-02-01 16:24:40.383069 Eastern Standard Time
| Itration            | 1105     |
| Real Det Return     | 510      |
| Real Sto Return     | 478      |
| Reward Loss         | -27.7    |
| Running Env Steps   | 552500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 5.33     |
| Running Update Time | 1105     |
----------------------------------
2025-02-01 16:24:55.965621 Eastern Standard Time
| Itration            | 1106     |
| Real Det Return     | 540      |
| Real Sto Return     | 495      |
| Reward Loss         | -22.8    |
| Running Env Steps   | 553000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.62     |
| Running Update Time | 1106     |
----------------------------------
2025-02-01 16:25:11.603992 Eastern Standard Time
| Itration            | 1107     |
| Real Det Return     | 539      |
| Real Sto Return     | 488      |
| Reward Loss         | -31.8    |
| Running Env Steps   | 553500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1107     |
----------------------------------
2025-02-01 16:25:27.129152 Eastern Standard Time
| Itration            | 1108     |
| Real Det Return     | 522      |
| Real Sto Return     | 487      |
| Reward Loss         | -18.4    |
| Running Env Steps   | 554000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 1108     |
----------------------------------
2025-02-01 16:25:42.727970 Eastern Standard Time
| Itration            | 1109     |
| Real Det Return     | 511      |
| Real Sto Return     | 479      |
| Reward Loss         | -35.7    |
| Running Env Steps   | 554500   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1109     |
----------------------------------
2025-02-01 16:25:58.587991 Eastern Standard Time
| Itration            | 1110     |
| Real Det Return     | 525      |
| Real Sto Return     | 479      |
| Reward Loss         | -30.1    |
| Running Env Steps   | 555000   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 1110     |
----------------------------------
2025-02-01 16:26:14.160964 Eastern Standard Time
| Itration            | 1111     |
| Real Det Return     | 538      |
| Real Sto Return     | 489      |
| Reward Loss         | -23.2    |
| Running Env Steps   | 555500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1111     |
----------------------------------
2025-02-01 16:26:29.713898 Eastern Standard Time
| Itration            | 1112     |
| Real Det Return     | 530      |
| Real Sto Return     | 483      |
| Reward Loss         | -34.3    |
| Running Env Steps   | 556000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1112     |
----------------------------------
2025-02-01 16:26:45.371544 Eastern Standard Time
| Itration            | 1113     |
| Real Det Return     | 541      |
| Real Sto Return     | 495      |
| Reward Loss         | -23.3    |
| Running Env Steps   | 556500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1113     |
----------------------------------
2025-02-01 16:27:00.994949 Eastern Standard Time
| Itration            | 1114     |
| Real Det Return     | 535      |
| Real Sto Return     | 493      |
| Reward Loss         | -20.9    |
| Running Env Steps   | 557000   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 1114     |
----------------------------------
2025-02-01 16:27:17.453992 Eastern Standard Time
| Itration            | 1115     |
| Real Det Return     | 531      |
| Real Sto Return     | 502      |
| Reward Loss         | -25.7    |
| Running Env Steps   | 557500   |
| Running Forward KL  | -4.65    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1115     |
----------------------------------
2025-02-01 16:27:33.138474 Eastern Standard Time
| Itration            | 1116     |
| Real Det Return     | 544      |
| Real Sto Return     | 486      |
| Reward Loss         | -23.4    |
| Running Env Steps   | 558000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 1116     |
----------------------------------
2025-02-01 16:27:49.070923 Eastern Standard Time
| Itration            | 1117     |
| Real Det Return     | 543      |
| Real Sto Return     | 497      |
| Reward Loss         | -34.3    |
| Running Env Steps   | 558500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 1117     |
----------------------------------
2025-02-01 16:28:05.121526 Eastern Standard Time
| Itration            | 1118     |
| Real Det Return     | 532      |
| Real Sto Return     | 495      |
| Reward Loss         | -29.1    |
| Running Env Steps   | 559000   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.68     |
| Running Update Time | 1118     |
----------------------------------
2025-02-01 16:28:21.136904 Eastern Standard Time
| Itration            | 1119     |
| Real Det Return     | 524      |
| Real Sto Return     | 492      |
| Reward Loss         | -18.5    |
| Running Env Steps   | 559500   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 1119     |
----------------------------------
2025-02-01 16:28:36.887241 Eastern Standard Time
| Itration            | 1120     |
| Real Det Return     | 544      |
| Real Sto Return     | 495      |
| Reward Loss         | -28.6    |
| Running Env Steps   | 560000   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 1120     |
----------------------------------
2025-02-01 16:28:52.420311 Eastern Standard Time
| Itration            | 1121     |
| Real Det Return     | 536      |
| Real Sto Return     | 479      |
| Reward Loss         | -30.3    |
| Running Env Steps   | 560500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 1121     |
----------------------------------
2025-02-01 16:29:08.053348 Eastern Standard Time
| Itration            | 1122     |
| Real Det Return     | 525      |
| Real Sto Return     | 480      |
| Reward Loss         | -22.4    |
| Running Env Steps   | 561000   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1122     |
----------------------------------
2025-02-01 16:29:23.673229 Eastern Standard Time
| Itration            | 1123     |
| Real Det Return     | 541      |
| Real Sto Return     | 489      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 561500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1123     |
----------------------------------
2025-02-01 16:29:39.165906 Eastern Standard Time
| Itration            | 1124     |
| Real Det Return     | 548      |
| Real Sto Return     | 495      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 562000   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1124     |
----------------------------------
2025-02-01 16:29:54.692734 Eastern Standard Time
| Itration            | 1125     |
| Real Det Return     | 533      |
| Real Sto Return     | 480      |
| Reward Loss         | -44.9    |
| Running Env Steps   | 562500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.2      |
| Running Update Time | 1125     |
----------------------------------
2025-02-01 16:30:10.217890 Eastern Standard Time
| Itration            | 1126     |
| Real Det Return     | 530      |
| Real Sto Return     | 489      |
| Reward Loss         | -28.6    |
| Running Env Steps   | 563000   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1126     |
----------------------------------
2025-02-01 16:30:25.823454 Eastern Standard Time
| Itration            | 1127     |
| Real Det Return     | 531      |
| Real Sto Return     | 481      |
| Reward Loss         | -43.9    |
| Running Env Steps   | 563500   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 1127     |
----------------------------------
2025-02-01 16:30:41.387546 Eastern Standard Time
| Itration            | 1128     |
| Real Det Return     | 537      |
| Real Sto Return     | 493      |
| Reward Loss         | -30      |
| Running Env Steps   | 564000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1128     |
----------------------------------
2025-02-01 16:30:57.052594 Eastern Standard Time
| Itration            | 1129     |
| Real Det Return     | 519      |
| Real Sto Return     | 471      |
| Reward Loss         | -34.8    |
| Running Env Steps   | 564500   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 1129     |
----------------------------------
2025-02-01 16:31:12.547377 Eastern Standard Time
| Itration            | 1130     |
| Real Det Return     | 525      |
| Real Sto Return     | 487      |
| Reward Loss         | -38.7    |
| Running Env Steps   | 565000   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.41     |
| Running Update Time | 1130     |
----------------------------------
2025-02-01 16:31:31.381456 Eastern Standard Time
| Itration            | 1131     |
| Real Det Return     | 520      |
| Real Sto Return     | 478      |
| Reward Loss         | -33.8    |
| Running Env Steps   | 565500   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1131     |
----------------------------------
2025-02-01 16:31:47.770867 Eastern Standard Time
| Itration            | 1132     |
| Real Det Return     | 538      |
| Real Sto Return     | 482      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 566000   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 1132     |
----------------------------------
2025-02-01 16:32:03.999482 Eastern Standard Time
| Itration            | 1133     |
| Real Det Return     | 546      |
| Real Sto Return     | 489      |
| Reward Loss         | -34.7    |
| Running Env Steps   | 566500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.45     |
| Running Update Time | 1133     |
----------------------------------
2025-02-01 16:32:19.972919 Eastern Standard Time
| Itration            | 1134     |
| Real Det Return     | 547      |
| Real Sto Return     | 491      |
| Reward Loss         | -24.6    |
| Running Env Steps   | 567000   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1134     |
----------------------------------
2025-02-01 16:32:35.521300 Eastern Standard Time
| Itration            | 1135     |
| Real Det Return     | 525      |
| Real Sto Return     | 483      |
| Reward Loss         | -28.1    |
| Running Env Steps   | 567500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 1135     |
----------------------------------
2025-02-01 16:32:51.392005 Eastern Standard Time
| Itration            | 1136     |
| Real Det Return     | 536      |
| Real Sto Return     | 482      |
| Reward Loss         | -31      |
| Running Env Steps   | 568000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1136     |
----------------------------------
2025-02-01 16:33:07.185596 Eastern Standard Time
| Itration            | 1137     |
| Real Det Return     | 538      |
| Real Sto Return     | 495      |
| Reward Loss         | -24      |
| Running Env Steps   | 568500   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.57     |
| Running Update Time | 1137     |
----------------------------------
2025-02-01 16:33:22.933896 Eastern Standard Time
| Itration            | 1138     |
| Real Det Return     | 527      |
| Real Sto Return     | 467      |
| Reward Loss         | -25.1    |
| Running Env Steps   | 569000   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1138     |
----------------------------------
2025-02-01 16:33:38.666280 Eastern Standard Time
| Itration            | 1139     |
| Real Det Return     | 520      |
| Real Sto Return     | 480      |
| Reward Loss         | -35.3    |
| Running Env Steps   | 569500   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1139     |
----------------------------------
2025-02-01 16:33:54.776388 Eastern Standard Time
| Itration            | 1140     |
| Real Det Return     | 539      |
| Real Sto Return     | 494      |
| Reward Loss         | -20.2    |
| Running Env Steps   | 570000   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 1140     |
----------------------------------
2025-02-01 16:34:10.463105 Eastern Standard Time
| Itration            | 1141     |
| Real Det Return     | 530      |
| Real Sto Return     | 486      |
| Reward Loss         | -27.8    |
| Running Env Steps   | 570500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 1141     |
----------------------------------
2025-02-01 16:34:26.572144 Eastern Standard Time
| Itration            | 1142     |
| Real Det Return     | 542      |
| Real Sto Return     | 492      |
| Reward Loss         | -19      |
| Running Env Steps   | 571000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 1142     |
----------------------------------
2025-02-01 16:34:42.221773 Eastern Standard Time
| Itration            | 1143     |
| Real Det Return     | 536      |
| Real Sto Return     | 495      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 571500   |
| Running Forward KL  | -5.72    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1143     |
----------------------------------
2025-02-01 16:34:58.276232 Eastern Standard Time
| Itration            | 1144     |
| Real Det Return     | 532      |
| Real Sto Return     | 484      |
| Reward Loss         | -24      |
| Running Env Steps   | 572000   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1144     |
----------------------------------
2025-02-01 16:35:14.668055 Eastern Standard Time
| Itration            | 1145     |
| Real Det Return     | 520      |
| Real Sto Return     | 485      |
| Reward Loss         | -34      |
| Running Env Steps   | 572500   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1145     |
----------------------------------
2025-02-01 16:35:30.696950 Eastern Standard Time
| Itration            | 1146     |
| Real Det Return     | 541      |
| Real Sto Return     | 494      |
| Reward Loss         | -19.4    |
| Running Env Steps   | 573000   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1146     |
----------------------------------
2025-02-01 16:35:46.759408 Eastern Standard Time
| Itration            | 1147     |
| Real Det Return     | 520      |
| Real Sto Return     | 484      |
| Reward Loss         | -29.8    |
| Running Env Steps   | 573500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.57     |
| Running Update Time | 1147     |
----------------------------------
2025-02-01 16:36:02.451269 Eastern Standard Time
| Itration            | 1148     |
| Real Det Return     | 511      |
| Real Sto Return     | 487      |
| Reward Loss         | -30.9    |
| Running Env Steps   | 574000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.52     |
| Running Update Time | 1148     |
----------------------------------
2025-02-01 16:36:18.135818 Eastern Standard Time
| Itration            | 1149     |
| Real Det Return     | 528      |
| Real Sto Return     | 463      |
| Reward Loss         | -26.8    |
| Running Env Steps   | 574500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1149     |
----------------------------------
2025-02-01 16:36:33.821823 Eastern Standard Time
| Itration            | 1150     |
| Real Det Return     | 523      |
| Real Sto Return     | 493      |
| Reward Loss         | -36.8    |
| Running Env Steps   | 575000   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 1150     |
----------------------------------
2025-02-01 16:36:49.719066 Eastern Standard Time
| Itration            | 1151     |
| Real Det Return     | 519      |
| Real Sto Return     | 473      |
| Reward Loss         | -19.6    |
| Running Env Steps   | 575500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1151     |
----------------------------------
2025-02-01 16:37:05.428517 Eastern Standard Time
| Itration            | 1152     |
| Real Det Return     | 534      |
| Real Sto Return     | 490      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 576000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 1152     |
----------------------------------
2025-02-01 16:37:21.078339 Eastern Standard Time
| Itration            | 1153     |
| Real Det Return     | 523      |
| Real Sto Return     | 491      |
| Reward Loss         | -24.8    |
| Running Env Steps   | 576500   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1153     |
----------------------------------
2025-02-01 16:37:36.641793 Eastern Standard Time
| Itration            | 1154     |
| Real Det Return     | 535      |
| Real Sto Return     | 495      |
| Reward Loss         | -37.1    |
| Running Env Steps   | 577000   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 1154     |
----------------------------------
2025-02-01 16:37:52.264957 Eastern Standard Time
| Itration            | 1155     |
| Real Det Return     | 536      |
| Real Sto Return     | 474      |
| Reward Loss         | -19.7    |
| Running Env Steps   | 577500   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1155     |
----------------------------------
2025-02-01 16:38:07.902225 Eastern Standard Time
| Itration            | 1156     |
| Real Det Return     | 530      |
| Real Sto Return     | 487      |
| Reward Loss         | -22.7    |
| Running Env Steps   | 578000   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 1156     |
----------------------------------
2025-02-01 16:38:23.586090 Eastern Standard Time
| Itration            | 1157     |
| Real Det Return     | 543      |
| Real Sto Return     | 497      |
| Reward Loss         | -33      |
| Running Env Steps   | 578500   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1157     |
----------------------------------
2025-02-01 16:38:39.105063 Eastern Standard Time
| Itration            | 1158     |
| Real Det Return     | 527      |
| Real Sto Return     | 487      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 579000   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 1158     |
----------------------------------
2025-02-01 16:38:54.710860 Eastern Standard Time
| Itration            | 1159     |
| Real Det Return     | 533      |
| Real Sto Return     | 481      |
| Reward Loss         | -30.6    |
| Running Env Steps   | 579500   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1159     |
----------------------------------
2025-02-01 16:39:10.325085 Eastern Standard Time
| Itration            | 1160     |
| Real Det Return     | 536      |
| Real Sto Return     | 476      |
| Reward Loss         | -32.2    |
| Running Env Steps   | 580000   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1160     |
----------------------------------
2025-02-01 16:39:25.884619 Eastern Standard Time
| Itration            | 1161     |
| Real Det Return     | 548      |
| Real Sto Return     | 491      |
| Reward Loss         | -35      |
| Running Env Steps   | 580500   |
| Running Forward KL  | -4.54    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1161     |
----------------------------------
2025-02-01 16:39:41.442233 Eastern Standard Time
| Itration            | 1162     |
| Real Det Return     | 550      |
| Real Sto Return     | 503      |
| Reward Loss         | -20.8    |
| Running Env Steps   | 581000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1162     |
----------------------------------
2025-02-01 16:39:57.028731 Eastern Standard Time
| Itration            | 1163     |
| Real Det Return     | 533      |
| Real Sto Return     | 491      |
| Reward Loss         | -23.2    |
| Running Env Steps   | 581500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1163     |
----------------------------------
2025-02-01 16:40:12.754211 Eastern Standard Time
| Itration            | 1164     |
| Real Det Return     | 545      |
| Real Sto Return     | 486      |
| Reward Loss         | -31.3    |
| Running Env Steps   | 582000   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1164     |
----------------------------------
2025-02-01 16:40:28.412601 Eastern Standard Time
| Itration            | 1165     |
| Real Det Return     | 530      |
| Real Sto Return     | 482      |
| Reward Loss         | -20.7    |
| Running Env Steps   | 582500   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 4.09     |
| Running Update Time | 1165     |
----------------------------------
2025-02-01 16:40:44.119593 Eastern Standard Time
| Itration            | 1166     |
| Real Det Return     | 523      |
| Real Sto Return     | 470      |
| Reward Loss         | -34.6    |
| Running Env Steps   | 583000   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1166     |
----------------------------------
2025-02-01 16:41:00.300866 Eastern Standard Time
| Itration            | 1167     |
| Real Det Return     | 535      |
| Real Sto Return     | 482      |
| Reward Loss         | -44.1    |
| Running Env Steps   | 583500   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.4      |
| Running Update Time | 1167     |
----------------------------------
2025-02-01 16:41:15.966606 Eastern Standard Time
| Itration            | 1168     |
| Real Det Return     | 548      |
| Real Sto Return     | 485      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 584000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 5.21     |
| Running Update Time | 1168     |
----------------------------------
2025-02-01 16:41:32.119992 Eastern Standard Time
| Itration            | 1169     |
| Real Det Return     | 542      |
| Real Sto Return     | 479      |
| Reward Loss         | -29.5    |
| Running Env Steps   | 584500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1169     |
----------------------------------
2025-02-01 16:41:48.321793 Eastern Standard Time
| Itration            | 1170     |
| Real Det Return     | 515      |
| Real Sto Return     | 482      |
| Reward Loss         | -34.5    |
| Running Env Steps   | 585000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.45     |
| Running Update Time | 1170     |
----------------------------------
2025-02-01 16:42:04.063515 Eastern Standard Time
| Itration            | 1171     |
| Real Det Return     | 521      |
| Real Sto Return     | 486      |
| Reward Loss         | -32.3    |
| Running Env Steps   | 585500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.01     |
| Running Update Time | 1171     |
----------------------------------
2025-02-01 16:42:19.806513 Eastern Standard Time
| Itration            | 1172     |
| Real Det Return     | 534      |
| Real Sto Return     | 481      |
| Reward Loss         | -38.9    |
| Running Env Steps   | 586000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 1172     |
----------------------------------
2025-02-01 16:42:35.542352 Eastern Standard Time
| Itration            | 1173     |
| Real Det Return     | 516      |
| Real Sto Return     | 476      |
| Reward Loss         | -39.4    |
| Running Env Steps   | 586500   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 1173     |
----------------------------------
2025-02-01 16:42:51.892971 Eastern Standard Time
| Itration            | 1174     |
| Real Det Return     | 548      |
| Real Sto Return     | 501      |
| Reward Loss         | -23.1    |
| Running Env Steps   | 587000   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1174     |
----------------------------------
2025-02-01 16:43:07.743204 Eastern Standard Time
| Itration            | 1175     |
| Real Det Return     | 538      |
| Real Sto Return     | 479      |
| Reward Loss         | -14.4    |
| Running Env Steps   | 587500   |
| Running Forward KL  | -5.47    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 1175     |
----------------------------------
2025-02-01 16:43:23.669786 Eastern Standard Time
| Itration            | 1176     |
| Real Det Return     | 519      |
| Real Sto Return     | 466      |
| Reward Loss         | -35.2    |
| Running Env Steps   | 588000   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 1176     |
----------------------------------
2025-02-01 16:43:39.619200 Eastern Standard Time
| Itration            | 1177     |
| Real Det Return     | 534      |
| Real Sto Return     | 481      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 588500   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1177     |
----------------------------------
2025-02-01 16:43:55.491233 Eastern Standard Time
| Itration            | 1178     |
| Real Det Return     | 536      |
| Real Sto Return     | 489      |
| Reward Loss         | -32.3    |
| Running Env Steps   | 589000   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1178     |
----------------------------------
2025-02-01 16:44:11.286858 Eastern Standard Time
| Itration            | 1179     |
| Real Det Return     | 520      |
| Real Sto Return     | 465      |
| Reward Loss         | -29.9    |
| Running Env Steps   | 589500   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 1179     |
----------------------------------
2025-02-01 16:44:26.965634 Eastern Standard Time
| Itration            | 1180     |
| Real Det Return     | 547      |
| Real Sto Return     | 485      |
| Reward Loss         | -34      |
| Running Env Steps   | 590000   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.13     |
| Running Update Time | 1180     |
----------------------------------
2025-02-01 16:44:43.531908 Eastern Standard Time
| Itration            | 1181     |
| Real Det Return     | 536      |
| Real Sto Return     | 486      |
| Reward Loss         | -27.9    |
| Running Env Steps   | 590500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.4      |
| Running Update Time | 1181     |
----------------------------------
2025-02-01 16:44:59.292657 Eastern Standard Time
| Itration            | 1182     |
| Real Det Return     | 541      |
| Real Sto Return     | 485      |
| Reward Loss         | -37.3    |
| Running Env Steps   | 591000   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 4.22     |
| Running Update Time | 1182     |
----------------------------------
2025-02-01 16:45:14.970285 Eastern Standard Time
| Itration            | 1183     |
| Real Det Return     | 541      |
| Real Sto Return     | 485      |
| Reward Loss         | -26.2    |
| Running Env Steps   | 591500   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1183     |
----------------------------------
2025-02-01 16:45:30.771272 Eastern Standard Time
| Itration            | 1184     |
| Real Det Return     | 510      |
| Real Sto Return     | 471      |
| Reward Loss         | -28.9    |
| Running Env Steps   | 592000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1184     |
----------------------------------
2025-02-01 16:45:46.532228 Eastern Standard Time
| Itration            | 1185     |
| Real Det Return     | 527      |
| Real Sto Return     | 470      |
| Reward Loss         | -22.3    |
| Running Env Steps   | 592500   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 1185     |
----------------------------------
2025-02-01 16:46:02.389644 Eastern Standard Time
| Itration            | 1186     |
| Real Det Return     | 525      |
| Real Sto Return     | 489      |
| Reward Loss         | -24.4    |
| Running Env Steps   | 593000   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1186     |
----------------------------------
2025-02-01 16:46:18.066550 Eastern Standard Time
| Itration            | 1187     |
| Real Det Return     | 527      |
| Real Sto Return     | 480      |
| Reward Loss         | -26.6    |
| Running Env Steps   | 593500   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1187     |
----------------------------------
2025-02-01 16:46:33.636032 Eastern Standard Time
| Itration            | 1188     |
| Real Det Return     | 535      |
| Real Sto Return     | 482      |
| Reward Loss         | -19.5    |
| Running Env Steps   | 594000   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1188     |
----------------------------------
2025-02-01 16:46:49.378926 Eastern Standard Time
| Itration            | 1189     |
| Real Det Return     | 539      |
| Real Sto Return     | 471      |
| Reward Loss         | -14.6    |
| Running Env Steps   | 594500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 1189     |
----------------------------------
2025-02-01 16:47:05.223509 Eastern Standard Time
| Itration            | 1190     |
| Real Det Return     | 527      |
| Real Sto Return     | 487      |
| Reward Loss         | -25.8    |
| Running Env Steps   | 595000   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1190     |
----------------------------------
2025-02-01 16:47:20.900426 Eastern Standard Time
| Itration            | 1191     |
| Real Det Return     | 524      |
| Real Sto Return     | 484      |
| Reward Loss         | -26.1    |
| Running Env Steps   | 595500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1191     |
----------------------------------
2025-02-01 16:47:37.247667 Eastern Standard Time
| Itration            | 1192     |
| Real Det Return     | 519      |
| Real Sto Return     | 475      |
| Reward Loss         | -35.6    |
| Running Env Steps   | 596000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 1192     |
----------------------------------
2025-02-01 16:47:53.034376 Eastern Standard Time
| Itration            | 1193     |
| Real Det Return     | 529      |
| Real Sto Return     | 489      |
| Reward Loss         | -31.6    |
| Running Env Steps   | 596500   |
| Running Forward KL  | -4.3     |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1193     |
----------------------------------
2025-02-01 16:48:08.856836 Eastern Standard Time
| Itration            | 1194     |
| Real Det Return     | 517      |
| Real Sto Return     | 478      |
| Reward Loss         | -42.4    |
| Running Env Steps   | 597000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.52     |
| Running Update Time | 1194     |
----------------------------------
2025-02-01 16:48:24.593259 Eastern Standard Time
| Itration            | 1195     |
| Real Det Return     | 507      |
| Real Sto Return     | 471      |
| Reward Loss         | -47.6    |
| Running Env Steps   | 597500   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 1195     |
----------------------------------
2025-02-01 16:48:40.401317 Eastern Standard Time
| Itration            | 1196     |
| Real Det Return     | 541      |
| Real Sto Return     | 488      |
| Reward Loss         | -33.2    |
| Running Env Steps   | 598000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 1196     |
----------------------------------
2025-02-01 16:48:56.186627 Eastern Standard Time
| Itration            | 1197     |
| Real Det Return     | 507      |
| Real Sto Return     | 476      |
| Reward Loss         | -39.4    |
| Running Env Steps   | 598500   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1197     |
----------------------------------
2025-02-01 16:49:12.110428 Eastern Standard Time
| Itration            | 1198     |
| Real Det Return     | 529      |
| Real Sto Return     | 487      |
| Reward Loss         | -33.2    |
| Running Env Steps   | 599000   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.06     |
| Running Update Time | 1198     |
----------------------------------
2025-02-01 16:49:27.949291 Eastern Standard Time
| Itration            | 1199     |
| Real Det Return     | 547      |
| Real Sto Return     | 490      |
| Reward Loss         | -37.5    |
| Running Env Steps   | 599500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1199     |
----------------------------------
2025-02-01 16:49:43.617032 Eastern Standard Time
| Itration            | 1200     |
| Real Det Return     | 521      |
| Real Sto Return     | 475      |
| Reward Loss         | -29.9    |
| Running Env Steps   | 600000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1200     |
----------------------------------
2025-02-01 16:49:59.437495 Eastern Standard Time
| Itration            | 1201     |
| Real Det Return     | 511      |
| Real Sto Return     | 466      |
| Reward Loss         | -36.2    |
| Running Env Steps   | 600500   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 5        |
| Running Update Time | 1201     |
----------------------------------
2025-02-01 16:50:15.617664 Eastern Standard Time
| Itration            | 1202     |
| Real Det Return     | 549      |
| Real Sto Return     | 490      |
| Reward Loss         | -33.1    |
| Running Env Steps   | 601000   |
| Running Forward KL  | -4.56    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1202     |
----------------------------------
2025-02-01 16:50:31.201744 Eastern Standard Time
| Itration            | 1203     |
| Real Det Return     | 535      |
| Real Sto Return     | 485      |
| Reward Loss         | -32.8    |
| Running Env Steps   | 601500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1203     |
----------------------------------
2025-02-01 16:50:47.127373 Eastern Standard Time
| Itration            | 1204     |
| Real Det Return     | 518      |
| Real Sto Return     | 476      |
| Reward Loss         | -24.1    |
| Running Env Steps   | 602000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1204     |
----------------------------------
2025-02-01 16:51:03.636468 Eastern Standard Time
| Itration            | 1205     |
| Real Det Return     | 528      |
| Real Sto Return     | 494      |
| Reward Loss         | -18.7    |
| Running Env Steps   | 602500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 1205     |
----------------------------------
2025-02-01 16:51:19.229481 Eastern Standard Time
| Itration            | 1206     |
| Real Det Return     | 545      |
| Real Sto Return     | 501      |
| Reward Loss         | -26.1    |
| Running Env Steps   | 603000   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 1206     |
----------------------------------
2025-02-01 16:51:34.930355 Eastern Standard Time
| Itration            | 1207     |
| Real Det Return     | 526      |
| Real Sto Return     | 495      |
| Reward Loss         | -29.3    |
| Running Env Steps   | 603500   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.05     |
| Running Update Time | 1207     |
----------------------------------
2025-02-01 16:51:50.562188 Eastern Standard Time
| Itration            | 1208     |
| Real Det Return     | 549      |
| Real Sto Return     | 504      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 604000   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 1208     |
----------------------------------
2025-02-01 16:52:06.149569 Eastern Standard Time
| Itration            | 1209     |
| Real Det Return     | 534      |
| Real Sto Return     | 481      |
| Reward Loss         | -30.6    |
| Running Env Steps   | 604500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.21     |
| Running Update Time | 1209     |
----------------------------------
2025-02-01 16:52:21.902589 Eastern Standard Time
| Itration            | 1210     |
| Real Det Return     | 550      |
| Real Sto Return     | 480      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 605000   |
| Running Forward KL  | -4.49    |
| Running Reverse KL  | 5        |
| Running Update Time | 1210     |
----------------------------------
2025-02-01 16:52:38.259161 Eastern Standard Time
| Itration            | 1211     |
| Real Det Return     | 534      |
| Real Sto Return     | 482      |
| Reward Loss         | -24.5    |
| Running Env Steps   | 605500   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1211     |
----------------------------------
2025-02-01 16:52:54.629512 Eastern Standard Time
| Itration            | 1212     |
| Real Det Return     | 528      |
| Real Sto Return     | 477      |
| Reward Loss         | -38.4    |
| Running Env Steps   | 606000   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1212     |
----------------------------------
2025-02-01 16:53:10.593130 Eastern Standard Time
| Itration            | 1213     |
| Real Det Return     | 536      |
| Real Sto Return     | 495      |
| Reward Loss         | -21.4    |
| Running Env Steps   | 606500   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 1213     |
----------------------------------
2025-02-01 16:53:26.898599 Eastern Standard Time
| Itration            | 1214     |
| Real Det Return     | 534      |
| Real Sto Return     | 493      |
| Reward Loss         | -25.1    |
| Running Env Steps   | 607000   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1214     |
----------------------------------
2025-02-01 16:53:42.573314 Eastern Standard Time
| Itration            | 1215     |
| Real Det Return     | 546      |
| Real Sto Return     | 492      |
| Reward Loss         | -31.5    |
| Running Env Steps   | 607500   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 4.19     |
| Running Update Time | 1215     |
----------------------------------
2025-02-01 16:53:58.583070 Eastern Standard Time
| Itration            | 1216     |
| Real Det Return     | 538      |
| Real Sto Return     | 485      |
| Reward Loss         | -35      |
| Running Env Steps   | 608000   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 1216     |
----------------------------------
2025-02-01 16:54:14.888068 Eastern Standard Time
| Itration            | 1217     |
| Real Det Return     | 522      |
| Real Sto Return     | 489      |
| Reward Loss         | -34.3    |
| Running Env Steps   | 608500   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1217     |
----------------------------------
2025-02-01 16:54:30.829991 Eastern Standard Time
| Itration            | 1218     |
| Real Det Return     | 545      |
| Real Sto Return     | 493      |
| Reward Loss         | -24.5    |
| Running Env Steps   | 609000   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 1218     |
----------------------------------
2025-02-01 16:54:46.494679 Eastern Standard Time
| Itration            | 1219     |
| Real Det Return     | 504      |
| Real Sto Return     | 470      |
| Reward Loss         | -32.3    |
| Running Env Steps   | 609500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1219     |
----------------------------------
2025-02-01 16:55:02.127306 Eastern Standard Time
| Itration            | 1220     |
| Real Det Return     | 525      |
| Real Sto Return     | 478      |
| Reward Loss         | -26.4    |
| Running Env Steps   | 610000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1220     |
----------------------------------
2025-02-01 16:55:18.416552 Eastern Standard Time
| Itration            | 1221     |
| Real Det Return     | 546      |
| Real Sto Return     | 489      |
| Reward Loss         | -13.5    |
| Running Env Steps   | 610500   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 1221     |
----------------------------------
2025-02-01 16:55:34.049772 Eastern Standard Time
| Itration            | 1222     |
| Real Det Return     | 529      |
| Real Sto Return     | 486      |
| Reward Loss         | -22      |
| Running Env Steps   | 611000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 5.22     |
| Running Update Time | 1222     |
----------------------------------
2025-02-01 16:55:49.827053 Eastern Standard Time
| Itration            | 1223     |
| Real Det Return     | 508      |
| Real Sto Return     | 474      |
| Reward Loss         | -27      |
| Running Env Steps   | 611500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1223     |
----------------------------------
2025-02-01 16:56:06.273866 Eastern Standard Time
| Itration            | 1224     |
| Real Det Return     | 543      |
| Real Sto Return     | 491      |
| Reward Loss         | -24.1    |
| Running Env Steps   | 612000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1224     |
----------------------------------
2025-02-01 16:56:22.060775 Eastern Standard Time
| Itration            | 1225     |
| Real Det Return     | 538      |
| Real Sto Return     | 482      |
| Reward Loss         | -21.3    |
| Running Env Steps   | 612500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 1225     |
----------------------------------
2025-02-01 16:56:37.601850 Eastern Standard Time
| Itration            | 1226     |
| Real Det Return     | 521      |
| Real Sto Return     | 490      |
| Reward Loss         | -34      |
| Running Env Steps   | 613000   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1226     |
----------------------------------
2025-02-01 16:56:54.078874 Eastern Standard Time
| Itration            | 1227     |
| Real Det Return     | 535      |
| Real Sto Return     | 478      |
| Reward Loss         | -37.3    |
| Running Env Steps   | 613500   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 1227     |
----------------------------------
2025-02-01 16:57:10.052239 Eastern Standard Time
| Itration            | 1228     |
| Real Det Return     | 524      |
| Real Sto Return     | 486      |
| Reward Loss         | -27.5    |
| Running Env Steps   | 614000   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1228     |
----------------------------------
2025-02-01 16:57:26.245232 Eastern Standard Time
| Itration            | 1229     |
| Real Det Return     | 534      |
| Real Sto Return     | 476      |
| Reward Loss         | -21.4    |
| Running Env Steps   | 614500   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.79     |
| Running Update Time | 1229     |
----------------------------------
2025-02-01 16:57:42.325101 Eastern Standard Time
| Itration            | 1230     |
| Real Det Return     | 537      |
| Real Sto Return     | 473      |
| Reward Loss         | -23.7    |
| Running Env Steps   | 615000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1230     |
----------------------------------
2025-02-01 16:57:58.123731 Eastern Standard Time
| Itration            | 1231     |
| Real Det Return     | 530      |
| Real Sto Return     | 473      |
| Reward Loss         | -34.1    |
| Running Env Steps   | 615500   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1231     |
----------------------------------
2025-02-01 16:58:13.850156 Eastern Standard Time
| Itration            | 1232     |
| Real Det Return     | 538      |
| Real Sto Return     | 496      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 616000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.6      |
| Running Update Time | 1232     |
----------------------------------
2025-02-01 16:58:29.743317 Eastern Standard Time
| Itration            | 1233     |
| Real Det Return     | 529      |
| Real Sto Return     | 485      |
| Reward Loss         | -30.2    |
| Running Env Steps   | 616500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.03     |
| Running Update Time | 1233     |
----------------------------------
2025-02-01 16:58:46.226699 Eastern Standard Time
| Itration            | 1234     |
| Real Det Return     | 538      |
| Real Sto Return     | 479      |
| Reward Loss         | -25.9    |
| Running Env Steps   | 617000   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 1234     |
----------------------------------
2025-02-01 16:59:01.978871 Eastern Standard Time
| Itration            | 1235     |
| Real Det Return     | 522      |
| Real Sto Return     | 482      |
| Reward Loss         | -30.1    |
| Running Env Steps   | 617500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.29     |
| Running Update Time | 1235     |
----------------------------------
2025-02-01 16:59:17.728412 Eastern Standard Time
| Itration            | 1236     |
| Real Det Return     | 518      |
| Real Sto Return     | 478      |
| Reward Loss         | -34.5    |
| Running Env Steps   | 618000   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1236     |
----------------------------------
2025-02-01 16:59:34.092239 Eastern Standard Time
| Itration            | 1237     |
| Real Det Return     | 528      |
| Real Sto Return     | 479      |
| Reward Loss         | -35.7    |
| Running Env Steps   | 618500   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 1237     |
----------------------------------
2025-02-01 16:59:49.666336 Eastern Standard Time
| Itration            | 1238     |
| Real Det Return     | 544      |
| Real Sto Return     | 493      |
| Reward Loss         | -37.7    |
| Running Env Steps   | 619000   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 1238     |
----------------------------------
2025-02-01 17:00:05.227737 Eastern Standard Time
| Itration            | 1239     |
| Real Det Return     | 532      |
| Real Sto Return     | 481      |
| Reward Loss         | -37.9    |
| Running Env Steps   | 619500   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1239     |
----------------------------------
2025-02-01 17:00:20.750498 Eastern Standard Time
| Itration            | 1240     |
| Real Det Return     | 534      |
| Real Sto Return     | 480      |
| Reward Loss         | -32.8    |
| Running Env Steps   | 620000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 5.08     |
| Running Update Time | 1240     |
----------------------------------
2025-02-01 17:00:36.417348 Eastern Standard Time
| Itration            | 1241     |
| Real Det Return     | 522      |
| Real Sto Return     | 478      |
| Reward Loss         | -36.7    |
| Running Env Steps   | 620500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1241     |
----------------------------------
2025-02-01 17:00:52.077211 Eastern Standard Time
| Itration            | 1242     |
| Real Det Return     | 542      |
| Real Sto Return     | 498      |
| Reward Loss         | -20.8    |
| Running Env Steps   | 621000   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 1242     |
----------------------------------
2025-02-01 17:01:07.699503 Eastern Standard Time
| Itration            | 1243     |
| Real Det Return     | 519      |
| Real Sto Return     | 481      |
| Reward Loss         | -24.7    |
| Running Env Steps   | 621500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1243     |
----------------------------------
2025-02-01 17:01:23.305314 Eastern Standard Time
| Itration            | 1244     |
| Real Det Return     | 531      |
| Real Sto Return     | 493      |
| Reward Loss         | -31.3    |
| Running Env Steps   | 622000   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1244     |
----------------------------------
2025-02-01 17:01:38.867266 Eastern Standard Time
| Itration            | 1245     |
| Real Det Return     | 535      |
| Real Sto Return     | 482      |
| Reward Loss         | -21.6    |
| Running Env Steps   | 622500   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 5.79     |
| Running Update Time | 1245     |
----------------------------------
2025-02-01 17:01:54.380473 Eastern Standard Time
| Itration            | 1246     |
| Real Det Return     | 525      |
| Real Sto Return     | 472      |
| Reward Loss         | -25.9    |
| Running Env Steps   | 623000   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 3.96     |
| Running Update Time | 1246     |
----------------------------------
2025-02-01 17:02:10.186105 Eastern Standard Time
| Itration            | 1247     |
| Real Det Return     | 534      |
| Real Sto Return     | 470      |
| Reward Loss         | -33.5    |
| Running Env Steps   | 623500   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 1247     |
----------------------------------
2025-02-01 17:02:25.740559 Eastern Standard Time
| Itration            | 1248     |
| Real Det Return     | 516      |
| Real Sto Return     | 477      |
| Reward Loss         | -33.3    |
| Running Env Steps   | 624000   |
| Running Forward KL  | -5.36    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1248     |
----------------------------------
2025-02-01 17:02:41.400338 Eastern Standard Time
| Itration            | 1249     |
| Real Det Return     | 537      |
| Real Sto Return     | 475      |
| Reward Loss         | -34.7    |
| Running Env Steps   | 624500   |
| Running Forward KL  | -5.32    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 1249     |
----------------------------------
2025-02-01 17:02:56.918057 Eastern Standard Time
| Itration            | 1250     |
| Real Det Return     | 518      |
| Real Sto Return     | 474      |
| Reward Loss         | -27.4    |
| Running Env Steps   | 625000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1250     |
----------------------------------
2025-02-01 17:03:12.512385 Eastern Standard Time
| Itration            | 1251     |
| Real Det Return     | 546      |
| Real Sto Return     | 495      |
| Reward Loss         | -24.2    |
| Running Env Steps   | 625500   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 1251     |
----------------------------------
2025-02-01 17:03:28.554509 Eastern Standard Time
| Itration            | 1252     |
| Real Det Return     | 547      |
| Real Sto Return     | 488      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 626000   |
| Running Forward KL  | -4.3     |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1252     |
----------------------------------
2025-02-01 17:03:44.201717 Eastern Standard Time
| Itration            | 1253     |
| Real Det Return     | 516      |
| Real Sto Return     | 465      |
| Reward Loss         | -29.9    |
| Running Env Steps   | 626500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 1253     |
----------------------------------
2025-02-01 17:03:59.851219 Eastern Standard Time
| Itration            | 1254     |
| Real Det Return     | 522      |
| Real Sto Return     | 483      |
| Reward Loss         | -24.9    |
| Running Env Steps   | 627000   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1254     |
----------------------------------
2025-02-01 17:04:15.527603 Eastern Standard Time
| Itration            | 1255     |
| Real Det Return     | 529      |
| Real Sto Return     | 491      |
| Reward Loss         | -24.3    |
| Running Env Steps   | 627500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1255     |
----------------------------------
2025-02-01 17:04:31.164398 Eastern Standard Time
| Itration            | 1256     |
| Real Det Return     | 537      |
| Real Sto Return     | 486      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 628000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.79     |
| Running Update Time | 1256     |
----------------------------------
2025-02-01 17:04:46.891824 Eastern Standard Time
| Itration            | 1257     |
| Real Det Return     | 529      |
| Real Sto Return     | 471      |
| Reward Loss         | -33.6    |
| Running Env Steps   | 628500   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1257     |
----------------------------------
2025-02-01 17:05:03.345971 Eastern Standard Time
| Itration            | 1258     |
| Real Det Return     | 539      |
| Real Sto Return     | 483      |
| Reward Loss         | -50.4    |
| Running Env Steps   | 629000   |
| Running Forward KL  | -4.69    |
| Running Reverse KL  | 4.33     |
| Running Update Time | 1258     |
----------------------------------
2025-02-01 17:05:19.213429 Eastern Standard Time
| Itration            | 1259     |
| Real Det Return     | 534      |
| Real Sto Return     | 485      |
| Reward Loss         | -42.7    |
| Running Env Steps   | 629500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 1259     |
----------------------------------
2025-02-01 17:05:34.917200 Eastern Standard Time
| Itration            | 1260     |
| Real Det Return     | 523      |
| Real Sto Return     | 480      |
| Reward Loss         | -32.5    |
| Running Env Steps   | 630000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1260     |
----------------------------------
2025-02-01 17:05:50.356038 Eastern Standard Time
| Itration            | 1261     |
| Real Det Return     | 514      |
| Real Sto Return     | 484      |
| Reward Loss         | -26.8    |
| Running Env Steps   | 630500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 1261     |
----------------------------------
2025-02-01 17:06:05.867767 Eastern Standard Time
| Itration            | 1262     |
| Real Det Return     | 522      |
| Real Sto Return     | 462      |
| Reward Loss         | -32.9    |
| Running Env Steps   | 631000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 1262     |
----------------------------------
2025-02-01 17:06:21.349000 Eastern Standard Time
| Itration            | 1263     |
| Real Det Return     | 519      |
| Real Sto Return     | 472      |
| Reward Loss         | -32.9    |
| Running Env Steps   | 631500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.38     |
| Running Update Time | 1263     |
----------------------------------
2025-02-01 17:06:36.796750 Eastern Standard Time
| Itration            | 1264     |
| Real Det Return     | 526      |
| Real Sto Return     | 475      |
| Reward Loss         | -25      |
| Running Env Steps   | 632000   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 1264     |
----------------------------------
2025-02-01 17:06:52.258022 Eastern Standard Time
| Itration            | 1265     |
| Real Det Return     | 532      |
| Real Sto Return     | 478      |
| Reward Loss         | -44.5    |
| Running Env Steps   | 632500   |
| Running Forward KL  | -4.76    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1265     |
----------------------------------
2025-02-01 17:07:07.759824 Eastern Standard Time
| Itration            | 1266     |
| Real Det Return     | 520      |
| Real Sto Return     | 473      |
| Reward Loss         | -34      |
| Running Env Steps   | 633000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1266     |
----------------------------------
2025-02-01 17:07:23.575686 Eastern Standard Time
| Itration            | 1267     |
| Real Det Return     | 522      |
| Real Sto Return     | 474      |
| Reward Loss         | -41      |
| Running Env Steps   | 633500   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1267     |
----------------------------------
2025-02-01 17:07:39.102011 Eastern Standard Time
| Itration            | 1268     |
| Real Det Return     | 525      |
| Real Sto Return     | 478      |
| Reward Loss         | -31.3    |
| Running Env Steps   | 634000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.65     |
| Running Update Time | 1268     |
----------------------------------
2025-02-01 17:07:54.555851 Eastern Standard Time
| Itration            | 1269     |
| Real Det Return     | 536      |
| Real Sto Return     | 497      |
| Reward Loss         | -28      |
| Running Env Steps   | 634500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1269     |
----------------------------------
2025-02-01 17:08:10.056057 Eastern Standard Time
| Itration            | 1270     |
| Real Det Return     | 530      |
| Real Sto Return     | 485      |
| Reward Loss         | -32.6    |
| Running Env Steps   | 635000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1270     |
----------------------------------
2025-02-01 17:08:25.582273 Eastern Standard Time
| Itration            | 1271     |
| Real Det Return     | 530      |
| Real Sto Return     | 477      |
| Reward Loss         | -47.2    |
| Running Env Steps   | 635500   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 1271     |
----------------------------------
2025-02-01 17:08:41.096519 Eastern Standard Time
| Itration            | 1272     |
| Real Det Return     | 546      |
| Real Sto Return     | 486      |
| Reward Loss         | -43.4    |
| Running Env Steps   | 636000   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1272     |
----------------------------------
2025-02-01 17:08:56.556683 Eastern Standard Time
| Itration            | 1273     |
| Real Det Return     | 544      |
| Real Sto Return     | 496      |
| Reward Loss         | -16      |
| Running Env Steps   | 636500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1273     |
----------------------------------
2025-02-01 17:09:12.104480 Eastern Standard Time
| Itration            | 1274     |
| Real Det Return     | 539      |
| Real Sto Return     | 496      |
| Reward Loss         | -15      |
| Running Env Steps   | 637000   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 4.68     |
| Running Update Time | 1274     |
----------------------------------
2025-02-01 17:09:27.520772 Eastern Standard Time
| Itration            | 1275     |
| Real Det Return     | 523      |
| Real Sto Return     | 483      |
| Reward Loss         | -31.6    |
| Running Env Steps   | 637500   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1275     |
----------------------------------
2025-02-01 17:09:43.060095 Eastern Standard Time
| Itration            | 1276     |
| Real Det Return     | 545      |
| Real Sto Return     | 488      |
| Reward Loss         | -27.3    |
| Running Env Steps   | 638000   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1276     |
----------------------------------
2025-02-01 17:09:58.773670 Eastern Standard Time
| Itration            | 1277     |
| Real Det Return     | 546      |
| Real Sto Return     | 495      |
| Reward Loss         | -42.8    |
| Running Env Steps   | 638500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1277     |
----------------------------------
2025-02-01 17:10:14.283283 Eastern Standard Time
| Itration            | 1278     |
| Real Det Return     | 548      |
| Real Sto Return     | 504      |
| Reward Loss         | -26.6    |
| Running Env Steps   | 639000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 5        |
| Running Update Time | 1278     |
----------------------------------
2025-02-01 17:10:29.802199 Eastern Standard Time
| Itration            | 1279     |
| Real Det Return     | 534      |
| Real Sto Return     | 473      |
| Reward Loss         | -32.3    |
| Running Env Steps   | 639500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1279     |
----------------------------------
2025-02-01 17:10:45.272877 Eastern Standard Time
| Itration            | 1280     |
| Real Det Return     | 531      |
| Real Sto Return     | 479      |
| Reward Loss         | -29.9    |
| Running Env Steps   | 640000   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 4.19     |
| Running Update Time | 1280     |
----------------------------------
2025-02-01 17:11:00.933410 Eastern Standard Time
| Itration            | 1281     |
| Real Det Return     | 555      |
| Real Sto Return     | 493      |
| Reward Loss         | -27.9    |
| Running Env Steps   | 640500   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.28     |
| Running Update Time | 1281     |
----------------------------------
2025-02-01 17:11:16.429362 Eastern Standard Time
| Itration            | 1282     |
| Real Det Return     | 546      |
| Real Sto Return     | 506      |
| Reward Loss         | -18.6    |
| Running Env Steps   | 641000   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 1282     |
----------------------------------
2025-02-01 17:11:32.001474 Eastern Standard Time
| Itration            | 1283     |
| Real Det Return     | 536      |
| Real Sto Return     | 488      |
| Reward Loss         | -36.2    |
| Running Env Steps   | 641500   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1283     |
----------------------------------
2025-02-01 17:11:47.511002 Eastern Standard Time
| Itration            | 1284     |
| Real Det Return     | 519      |
| Real Sto Return     | 479      |
| Reward Loss         | -32.5    |
| Running Env Steps   | 642000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1284     |
----------------------------------
2025-02-01 17:12:03.026573 Eastern Standard Time
| Itration            | 1285     |
| Real Det Return     | 521      |
| Real Sto Return     | 473      |
| Reward Loss         | -26      |
| Running Env Steps   | 642500   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 1285     |
----------------------------------
2025-02-01 17:12:18.402624 Eastern Standard Time
| Itration            | 1286     |
| Real Det Return     | 540      |
| Real Sto Return     | 487      |
| Reward Loss         | -34.3    |
| Running Env Steps   | 643000   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1286     |
----------------------------------
2025-02-01 17:12:33.870359 Eastern Standard Time
| Itration            | 1287     |
| Real Det Return     | 530      |
| Real Sto Return     | 475      |
| Reward Loss         | -41.1    |
| Running Env Steps   | 643500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 1287     |
----------------------------------
2025-02-01 17:12:49.354188 Eastern Standard Time
| Itration            | 1288     |
| Real Det Return     | 526      |
| Real Sto Return     | 483      |
| Reward Loss         | -37.5    |
| Running Env Steps   | 644000   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1288     |
----------------------------------
2025-02-01 17:13:04.864205 Eastern Standard Time
| Itration            | 1289     |
| Real Det Return     | 536      |
| Real Sto Return     | 489      |
| Reward Loss         | -36.1    |
| Running Env Steps   | 644500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1289     |
----------------------------------
2025-02-01 17:13:20.374167 Eastern Standard Time
| Itration            | 1290     |
| Real Det Return     | 546      |
| Real Sto Return     | 485      |
| Reward Loss         | -29.2    |
| Running Env Steps   | 645000   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1290     |
----------------------------------
2025-02-01 17:13:35.978222 Eastern Standard Time
| Itration            | 1291     |
| Real Det Return     | 539      |
| Real Sto Return     | 484      |
| Reward Loss         | -32.8    |
| Running Env Steps   | 645500   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1291     |
----------------------------------
2025-02-01 17:13:51.450036 Eastern Standard Time
| Itration            | 1292     |
| Real Det Return     | 528      |
| Real Sto Return     | 488      |
| Reward Loss         | -48.4    |
| Running Env Steps   | 646000   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 5.34     |
| Running Update Time | 1292     |
----------------------------------
2025-02-01 17:14:06.938936 Eastern Standard Time
| Itration            | 1293     |
| Real Det Return     | 515      |
| Real Sto Return     | 480      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 646500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1293     |
----------------------------------
2025-02-01 17:14:22.386813 Eastern Standard Time
| Itration            | 1294     |
| Real Det Return     | 526      |
| Real Sto Return     | 487      |
| Reward Loss         | -26.2    |
| Running Env Steps   | 647000   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 5.55     |
| Running Update Time | 1294     |
----------------------------------
2025-02-01 17:14:37.897499 Eastern Standard Time
| Itration            | 1295     |
| Real Det Return     | 530      |
| Real Sto Return     | 476      |
| Reward Loss         | -35.2    |
| Running Env Steps   | 647500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.29     |
| Running Update Time | 1295     |
----------------------------------
2025-02-01 17:14:53.346353 Eastern Standard Time
| Itration            | 1296     |
| Real Det Return     | 530      |
| Real Sto Return     | 484      |
| Reward Loss         | -17.5    |
| Running Env Steps   | 648000   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1296     |
----------------------------------
2025-02-01 17:15:08.840671 Eastern Standard Time
| Itration            | 1297     |
| Real Det Return     | 542      |
| Real Sto Return     | 487      |
| Reward Loss         | -20.4    |
| Running Env Steps   | 648500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.79     |
| Running Update Time | 1297     |
----------------------------------
2025-02-01 17:15:24.345232 Eastern Standard Time
| Itration            | 1298     |
| Real Det Return     | 526      |
| Real Sto Return     | 497      |
| Reward Loss         | -38.3    |
| Running Env Steps   | 649000   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1298     |
----------------------------------
2025-02-01 17:15:39.779358 Eastern Standard Time
| Itration            | 1299     |
| Real Det Return     | 522      |
| Real Sto Return     | 472      |
| Reward Loss         | -27.5    |
| Running Env Steps   | 649500   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1299     |
----------------------------------
2025-02-01 17:15:55.217690 Eastern Standard Time
| Itration            | 1300     |
| Real Det Return     | 529      |
| Real Sto Return     | 485      |
| Reward Loss         | -37.9    |
| Running Env Steps   | 650000   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1300     |
----------------------------------
2025-02-01 17:16:10.759229 Eastern Standard Time
| Itration            | 1301     |
| Real Det Return     | 546      |
| Real Sto Return     | 488      |
| Reward Loss         | -23.6    |
| Running Env Steps   | 650500   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1301     |
----------------------------------
2025-02-01 17:16:26.197250 Eastern Standard Time
| Itration            | 1302     |
| Real Det Return     | 547      |
| Real Sto Return     | 499      |
| Reward Loss         | -32.9    |
| Running Env Steps   | 651000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1302     |
----------------------------------
2025-02-01 17:16:41.708849 Eastern Standard Time
| Itration            | 1303     |
| Real Det Return     | 548      |
| Real Sto Return     | 496      |
| Reward Loss         | -25.1    |
| Running Env Steps   | 651500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1303     |
----------------------------------
2025-02-01 17:16:57.252165 Eastern Standard Time
| Itration            | 1304     |
| Real Det Return     | 541      |
| Real Sto Return     | 487      |
| Reward Loss         | -25      |
| Running Env Steps   | 652000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 1304     |
----------------------------------
2025-02-01 17:17:12.824901 Eastern Standard Time
| Itration            | 1305     |
| Real Det Return     | 509      |
| Real Sto Return     | 456      |
| Reward Loss         | -44.1    |
| Running Env Steps   | 652500   |
| Running Forward KL  | -4.41    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1305     |
----------------------------------
2025-02-01 17:17:28.301819 Eastern Standard Time
| Itration            | 1306     |
| Real Det Return     | 520      |
| Real Sto Return     | 479      |
| Reward Loss         | -31.6    |
| Running Env Steps   | 653000   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 1306     |
----------------------------------
2025-02-01 17:17:43.756658 Eastern Standard Time
| Itration            | 1307     |
| Real Det Return     | 532      |
| Real Sto Return     | 471      |
| Reward Loss         | -37.1    |
| Running Env Steps   | 653500   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.55     |
| Running Update Time | 1307     |
----------------------------------
2025-02-01 17:17:59.199199 Eastern Standard Time
| Itration            | 1308     |
| Real Det Return     | 541      |
| Real Sto Return     | 486      |
| Reward Loss         | -27.6    |
| Running Env Steps   | 654000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 1308     |
----------------------------------
2025-02-01 17:18:14.724480 Eastern Standard Time
| Itration            | 1309     |
| Real Det Return     | 541      |
| Real Sto Return     | 497      |
| Reward Loss         | -18.5    |
| Running Env Steps   | 654500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1309     |
----------------------------------
2025-02-01 17:18:30.321984 Eastern Standard Time
| Itration            | 1310     |
| Real Det Return     | 539      |
| Real Sto Return     | 486      |
| Reward Loss         | -40.3    |
| Running Env Steps   | 655000   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1310     |
----------------------------------
2025-02-01 17:18:45.800980 Eastern Standard Time
| Itration            | 1311     |
| Real Det Return     | 549      |
| Real Sto Return     | 501      |
| Reward Loss         | -39.2    |
| Running Env Steps   | 655500   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.08     |
| Running Update Time | 1311     |
----------------------------------
2025-02-01 17:19:01.317126 Eastern Standard Time
| Itration            | 1312     |
| Real Det Return     | 527      |
| Real Sto Return     | 480      |
| Reward Loss         | -34.9    |
| Running Env Steps   | 656000   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1312     |
----------------------------------
2025-02-01 17:19:16.872212 Eastern Standard Time
| Itration            | 1313     |
| Real Det Return     | 526      |
| Real Sto Return     | 484      |
| Reward Loss         | -32.2    |
| Running Env Steps   | 656500   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1313     |
----------------------------------
2025-02-01 17:19:32.380661 Eastern Standard Time
| Itration            | 1314     |
| Real Det Return     | 539      |
| Real Sto Return     | 481      |
| Reward Loss         | -29.3    |
| Running Env Steps   | 657000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1314     |
----------------------------------
2025-02-01 17:19:47.887097 Eastern Standard Time
| Itration            | 1315     |
| Real Det Return     | 533      |
| Real Sto Return     | 479      |
| Reward Loss         | -41      |
| Running Env Steps   | 657500   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1315     |
----------------------------------
2025-02-01 17:20:03.361378 Eastern Standard Time
| Itration            | 1316     |
| Real Det Return     | 536      |
| Real Sto Return     | 470      |
| Reward Loss         | -44      |
| Running Env Steps   | 658000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1316     |
----------------------------------
2025-02-01 17:20:18.855048 Eastern Standard Time
| Itration            | 1317     |
| Real Det Return     | 506      |
| Real Sto Return     | 478      |
| Reward Loss         | -44.8    |
| Running Env Steps   | 658500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1317     |
----------------------------------
2025-02-01 17:20:34.382299 Eastern Standard Time
| Itration            | 1318     |
| Real Det Return     | 534      |
| Real Sto Return     | 482      |
| Reward Loss         | -28.4    |
| Running Env Steps   | 659000   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1318     |
----------------------------------
2025-02-01 17:20:49.872671 Eastern Standard Time
| Itration            | 1319     |
| Real Det Return     | 530      |
| Real Sto Return     | 488      |
| Reward Loss         | -29.8    |
| Running Env Steps   | 659500   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 4.17     |
| Running Update Time | 1319     |
----------------------------------
2025-02-01 17:21:05.365588 Eastern Standard Time
| Itration            | 1320     |
| Real Det Return     | 538      |
| Real Sto Return     | 488      |
| Reward Loss         | -35.5    |
| Running Env Steps   | 660000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 1320     |
----------------------------------
2025-02-01 17:21:20.856697 Eastern Standard Time
| Itration            | 1321     |
| Real Det Return     | 529      |
| Real Sto Return     | 486      |
| Reward Loss         | -37.1    |
| Running Env Steps   | 660500   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 5.2      |
| Running Update Time | 1321     |
----------------------------------
2025-02-01 17:21:36.277401 Eastern Standard Time
| Itration            | 1322     |
| Real Det Return     | 533      |
| Real Sto Return     | 481      |
| Reward Loss         | -27.5    |
| Running Env Steps   | 661000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1322     |
----------------------------------
2025-02-01 17:21:51.895522 Eastern Standard Time
| Itration            | 1323     |
| Real Det Return     | 538      |
| Real Sto Return     | 484      |
| Reward Loss         | -40.9    |
| Running Env Steps   | 661500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 1323     |
----------------------------------
2025-02-01 17:22:07.570295 Eastern Standard Time
| Itration            | 1324     |
| Real Det Return     | 550      |
| Real Sto Return     | 491      |
| Reward Loss         | -27.5    |
| Running Env Steps   | 662000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 1324     |
----------------------------------
2025-02-01 17:22:23.430635 Eastern Standard Time
| Itration            | 1325     |
| Real Det Return     | 538      |
| Real Sto Return     | 493      |
| Reward Loss         | -33.7    |
| Running Env Steps   | 662500   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 5.17     |
| Running Update Time | 1325     |
----------------------------------
2025-02-01 17:22:39.173618 Eastern Standard Time
| Itration            | 1326     |
| Real Det Return     | 532      |
| Real Sto Return     | 474      |
| Reward Loss         | -21.9    |
| Running Env Steps   | 663000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 5.44     |
| Running Update Time | 1326     |
----------------------------------
2025-02-01 17:22:54.901441 Eastern Standard Time
| Itration            | 1327     |
| Real Det Return     | 517      |
| Real Sto Return     | 477      |
| Reward Loss         | -32.6    |
| Running Env Steps   | 663500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1327     |
----------------------------------
2025-02-01 17:23:11.051540 Eastern Standard Time
| Itration            | 1328     |
| Real Det Return     | 519      |
| Real Sto Return     | 476      |
| Reward Loss         | -20.3    |
| Running Env Steps   | 664000   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1328     |
----------------------------------
2025-02-01 17:23:26.687803 Eastern Standard Time
| Itration            | 1329     |
| Real Det Return     | 548      |
| Real Sto Return     | 495      |
| Reward Loss         | -26.9    |
| Running Env Steps   | 664500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1329     |
----------------------------------
2025-02-01 17:23:42.272519 Eastern Standard Time
| Itration            | 1330     |
| Real Det Return     | 515      |
| Real Sto Return     | 480      |
| Reward Loss         | -31.1    |
| Running Env Steps   | 665000   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1330     |
----------------------------------
2025-02-01 17:23:58.336201 Eastern Standard Time
| Itration            | 1331     |
| Real Det Return     | 533      |
| Real Sto Return     | 491      |
| Reward Loss         | -38.7    |
| Running Env Steps   | 665500   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1331     |
----------------------------------
2025-02-01 17:24:13.877598 Eastern Standard Time
| Itration            | 1332     |
| Real Det Return     | 542      |
| Real Sto Return     | 499      |
| Reward Loss         | -28.4    |
| Running Env Steps   | 666000   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1332     |
----------------------------------
2025-02-01 17:24:29.419647 Eastern Standard Time
| Itration            | 1333     |
| Real Det Return     | 519      |
| Real Sto Return     | 482      |
| Reward Loss         | -41.6    |
| Running Env Steps   | 666500   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1333     |
----------------------------------
2025-02-01 17:24:45.133365 Eastern Standard Time
| Itration            | 1334     |
| Real Det Return     | 531      |
| Real Sto Return     | 482      |
| Reward Loss         | -29      |
| Running Env Steps   | 667000   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 5.03     |
| Running Update Time | 1334     |
----------------------------------
2025-02-01 17:25:00.927031 Eastern Standard Time
| Itration            | 1335     |
| Real Det Return     | 531      |
| Real Sto Return     | 488      |
| Reward Loss         | -39.2    |
| Running Env Steps   | 667500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 5.44     |
| Running Update Time | 1335     |
----------------------------------
2025-02-01 17:25:16.627895 Eastern Standard Time
| Itration            | 1336     |
| Real Det Return     | 536      |
| Real Sto Return     | 481      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 668000   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1336     |
----------------------------------
2025-02-01 17:25:32.213077 Eastern Standard Time
| Itration            | 1337     |
| Real Det Return     | 528      |
| Real Sto Return     | 480      |
| Reward Loss         | -42      |
| Running Env Steps   | 668500   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 4.42     |
| Running Update Time | 1337     |
----------------------------------
2025-02-01 17:25:47.857940 Eastern Standard Time
| Itration            | 1338     |
| Real Det Return     | 543      |
| Real Sto Return     | 493      |
| Reward Loss         | -28.9    |
| Running Env Steps   | 669000   |
| Running Forward KL  | -5.39    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1338     |
----------------------------------
2025-02-01 17:26:03.942785 Eastern Standard Time
| Itration            | 1339     |
| Real Det Return     | 520      |
| Real Sto Return     | 477      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 669500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1339     |
----------------------------------
2025-02-01 17:26:19.575152 Eastern Standard Time
| Itration            | 1340     |
| Real Det Return     | 543      |
| Real Sto Return     | 488      |
| Reward Loss         | -35.8    |
| Running Env Steps   | 670000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.87     |
| Running Update Time | 1340     |
----------------------------------
2025-02-01 17:26:35.337492 Eastern Standard Time
| Itration            | 1341     |
| Real Det Return     | 534      |
| Real Sto Return     | 477      |
| Reward Loss         | -30.6    |
| Running Env Steps   | 670500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1341     |
----------------------------------
2025-02-01 17:26:51.207341 Eastern Standard Time
| Itration            | 1342     |
| Real Det Return     | 546      |
| Real Sto Return     | 485      |
| Reward Loss         | -40.3    |
| Running Env Steps   | 671000   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1342     |
----------------------------------
2025-02-01 17:27:07.055456 Eastern Standard Time
| Itration            | 1343     |
| Real Det Return     | 541      |
| Real Sto Return     | 488      |
| Reward Loss         | -35.1    |
| Running Env Steps   | 671500   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 5.09     |
| Running Update Time | 1343     |
----------------------------------
2025-02-01 17:27:23.333711 Eastern Standard Time
| Itration            | 1344     |
| Real Det Return     | 521      |
| Real Sto Return     | 470      |
| Reward Loss         | -37.1    |
| Running Env Steps   | 672000   |
| Running Forward KL  | -4.48    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 1344     |
----------------------------------
2025-02-01 17:27:39.092403 Eastern Standard Time
| Itration            | 1345     |
| Real Det Return     | 539      |
| Real Sto Return     | 486      |
| Reward Loss         | -31.8    |
| Running Env Steps   | 672500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1345     |
----------------------------------
2025-02-01 17:27:54.846822 Eastern Standard Time
| Itration            | 1346     |
| Real Det Return     | 519      |
| Real Sto Return     | 474      |
| Reward Loss         | -37.1    |
| Running Env Steps   | 673000   |
| Running Forward KL  | -4.55    |
| Running Reverse KL  | 4.41     |
| Running Update Time | 1346     |
----------------------------------
2025-02-01 17:28:10.697596 Eastern Standard Time
| Itration            | 1347     |
| Real Det Return     | 509      |
| Real Sto Return     | 468      |
| Reward Loss         | -28.8    |
| Running Env Steps   | 673500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1347     |
----------------------------------
2025-02-01 17:28:26.940018 Eastern Standard Time
| Itration            | 1348     |
| Real Det Return     | 542      |
| Real Sto Return     | 482      |
| Reward Loss         | -45.6    |
| Running Env Steps   | 674000   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 4.55     |
| Running Update Time | 1348     |
----------------------------------
2025-02-01 17:28:43.087637 Eastern Standard Time
| Itration            | 1349     |
| Real Det Return     | 546      |
| Real Sto Return     | 482      |
| Reward Loss         | -34.6    |
| Running Env Steps   | 674500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.87     |
| Running Update Time | 1349     |
----------------------------------
2025-02-01 17:28:58.809518 Eastern Standard Time
| Itration            | 1350     |
| Real Det Return     | 520      |
| Real Sto Return     | 487      |
| Reward Loss         | -31.5    |
| Running Env Steps   | 675000   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1350     |
----------------------------------
2025-02-01 17:29:14.634736 Eastern Standard Time
| Itration            | 1351     |
| Real Det Return     | 540      |
| Real Sto Return     | 489      |
| Reward Loss         | -23.9    |
| Running Env Steps   | 675500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 5.32     |
| Running Update Time | 1351     |
----------------------------------
2025-02-01 17:29:30.775045 Eastern Standard Time
| Itration            | 1352     |
| Real Det Return     | 538      |
| Real Sto Return     | 497      |
| Reward Loss         | -28.1    |
| Running Env Steps   | 676000   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1352     |
----------------------------------
2025-02-01 17:29:46.462749 Eastern Standard Time
| Itration            | 1353     |
| Real Det Return     | 548      |
| Real Sto Return     | 496      |
| Reward Loss         | -21.8    |
| Running Env Steps   | 676500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 5.41     |
| Running Update Time | 1353     |
----------------------------------
2025-02-01 17:30:02.708676 Eastern Standard Time
| Itration            | 1354     |
| Real Det Return     | 534      |
| Real Sto Return     | 480      |
| Reward Loss         | -36      |
| Running Env Steps   | 677000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.18     |
| Running Update Time | 1354     |
----------------------------------
2025-02-01 17:30:18.824519 Eastern Standard Time
| Itration            | 1355     |
| Real Det Return     | 532      |
| Real Sto Return     | 476      |
| Reward Loss         | -28.9    |
| Running Env Steps   | 677500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.7      |
| Running Update Time | 1355     |
----------------------------------
2025-02-01 17:30:34.505142 Eastern Standard Time
| Itration            | 1356     |
| Real Det Return     | 534      |
| Real Sto Return     | 470      |
| Reward Loss         | -17.7    |
| Running Env Steps   | 678000   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 4.94     |
| Running Update Time | 1356     |
----------------------------------
2025-02-01 17:30:50.699643 Eastern Standard Time
| Itration            | 1357     |
| Real Det Return     | 540      |
| Real Sto Return     | 488      |
| Reward Loss         | -31.8    |
| Running Env Steps   | 678500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 1357     |
----------------------------------
2025-02-01 17:31:06.836116 Eastern Standard Time
| Itration            | 1358     |
| Real Det Return     | 542      |
| Real Sto Return     | 485      |
| Reward Loss         | -37.8    |
| Running Env Steps   | 679000   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 1358     |
----------------------------------
2025-02-01 17:31:22.530886 Eastern Standard Time
| Itration            | 1359     |
| Real Det Return     | 542      |
| Real Sto Return     | 483      |
| Reward Loss         | -44.6    |
| Running Env Steps   | 679500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1359     |
----------------------------------
2025-02-01 17:31:38.631811 Eastern Standard Time
| Itration            | 1360     |
| Real Det Return     | 533      |
| Real Sto Return     | 476      |
| Reward Loss         | -38.8    |
| Running Env Steps   | 680000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1360     |
----------------------------------
2025-02-01 17:31:54.410356 Eastern Standard Time
| Itration            | 1361     |
| Real Det Return     | 528      |
| Real Sto Return     | 485      |
| Reward Loss         | -33.4    |
| Running Env Steps   | 680500   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.63     |
| Running Update Time | 1361     |
----------------------------------
2025-02-01 17:32:10.357877 Eastern Standard Time
| Itration            | 1362     |
| Real Det Return     | 521      |
| Real Sto Return     | 480      |
| Reward Loss         | -42.9    |
| Running Env Steps   | 681000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.2      |
| Running Update Time | 1362     |
----------------------------------
2025-02-01 17:32:25.937770 Eastern Standard Time
| Itration            | 1363     |
| Real Det Return     | 528      |
| Real Sto Return     | 466      |
| Reward Loss         | -52.5    |
| Running Env Steps   | 681500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 1363     |
----------------------------------
2025-02-01 17:32:41.509461 Eastern Standard Time
| Itration            | 1364     |
| Real Det Return     | 525      |
| Real Sto Return     | 478      |
| Reward Loss         | -38      |
| Running Env Steps   | 682000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1364     |
----------------------------------
2025-02-01 17:32:57.055056 Eastern Standard Time
| Itration            | 1365     |
| Real Det Return     | 537      |
| Real Sto Return     | 484      |
| Reward Loss         | -42.3    |
| Running Env Steps   | 682500   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1365     |
----------------------------------
2025-02-01 17:33:12.588744 Eastern Standard Time
| Itration            | 1366     |
| Real Det Return     | 538      |
| Real Sto Return     | 491      |
| Reward Loss         | -30.1    |
| Running Env Steps   | 683000   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 1366     |
----------------------------------
2025-02-01 17:33:28.806124 Eastern Standard Time
| Itration            | 1367     |
| Real Det Return     | 556      |
| Real Sto Return     | 502      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 683500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1367     |
----------------------------------
2025-02-01 17:33:44.606065 Eastern Standard Time
| Itration            | 1368     |
| Real Det Return     | 532      |
| Real Sto Return     | 489      |
| Reward Loss         | -18.6    |
| Running Env Steps   | 684000   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 5.64     |
| Running Update Time | 1368     |
----------------------------------
2025-02-01 17:34:00.398654 Eastern Standard Time
| Itration            | 1369     |
| Real Det Return     | 531      |
| Real Sto Return     | 479      |
| Reward Loss         | -41.1    |
| Running Env Steps   | 684500   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1369     |
----------------------------------
2025-02-01 17:34:16.664295 Eastern Standard Time
| Itration            | 1370     |
| Real Det Return     | 528      |
| Real Sto Return     | 485      |
| Reward Loss         | -33.3    |
| Running Env Steps   | 685000   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 1370     |
----------------------------------
2025-02-01 17:34:32.420712 Eastern Standard Time
| Itration            | 1371     |
| Real Det Return     | 538      |
| Real Sto Return     | 485      |
| Reward Loss         | -30.9    |
| Running Env Steps   | 685500   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 1371     |
----------------------------------
2025-02-01 17:34:48.123150 Eastern Standard Time
| Itration            | 1372     |
| Real Det Return     | 527      |
| Real Sto Return     | 485      |
| Reward Loss         | -33.9    |
| Running Env Steps   | 686000   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 5.19     |
| Running Update Time | 1372     |
----------------------------------
2025-02-01 17:35:04.201718 Eastern Standard Time
| Itration            | 1373     |
| Real Det Return     | 540      |
| Real Sto Return     | 490      |
| Reward Loss         | -30.2    |
| Running Env Steps   | 686500   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 1373     |
----------------------------------
2025-02-01 17:35:19.905147 Eastern Standard Time
| Itration            | 1374     |
| Real Det Return     | 527      |
| Real Sto Return     | 478      |
| Reward Loss         | -23.9    |
| Running Env Steps   | 687000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1374     |
----------------------------------
2025-02-01 17:35:35.498674 Eastern Standard Time
| Itration            | 1375     |
| Real Det Return     | 531      |
| Real Sto Return     | 475      |
| Reward Loss         | -30.7    |
| Running Env Steps   | 687500   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1375     |
----------------------------------
2025-02-01 17:35:51.024146 Eastern Standard Time
| Itration            | 1376     |
| Real Det Return     | 508      |
| Real Sto Return     | 469      |
| Reward Loss         | -47.6    |
| Running Env Steps   | 688000   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 1376     |
----------------------------------
2025-02-01 17:36:06.597197 Eastern Standard Time
| Itration            | 1377     |
| Real Det Return     | 531      |
| Real Sto Return     | 488      |
| Reward Loss         | -46.6    |
| Running Env Steps   | 688500   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1377     |
----------------------------------
2025-02-01 17:36:22.213227 Eastern Standard Time
| Itration            | 1378     |
| Real Det Return     | 517      |
| Real Sto Return     | 471      |
| Reward Loss         | -25.4    |
| Running Env Steps   | 689000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1378     |
----------------------------------
2025-02-01 17:36:37.867336 Eastern Standard Time
| Itration            | 1379     |
| Real Det Return     | 531      |
| Real Sto Return     | 484      |
| Reward Loss         | -19.5    |
| Running Env Steps   | 689500   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1379     |
----------------------------------
2025-02-01 17:36:53.432609 Eastern Standard Time
| Itration            | 1380     |
| Real Det Return     | 510      |
| Real Sto Return     | 483      |
| Reward Loss         | -46      |
| Running Env Steps   | 690000   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 1380     |
----------------------------------
2025-02-01 17:37:08.966561 Eastern Standard Time
| Itration            | 1381     |
| Real Det Return     | 540      |
| Real Sto Return     | 484      |
| Reward Loss         | -37      |
| Running Env Steps   | 690500   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1381     |
----------------------------------
2025-02-01 17:37:24.497712 Eastern Standard Time
| Itration            | 1382     |
| Real Det Return     | 530      |
| Real Sto Return     | 479      |
| Reward Loss         | -32.5    |
| Running Env Steps   | 691000   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1382     |
----------------------------------
2025-02-01 17:37:40.057322 Eastern Standard Time
| Itration            | 1383     |
| Real Det Return     | 531      |
| Real Sto Return     | 489      |
| Reward Loss         | -35.1    |
| Running Env Steps   | 691500   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1383     |
----------------------------------
2025-02-01 17:37:55.669318 Eastern Standard Time
| Itration            | 1384     |
| Real Det Return     | 532      |
| Real Sto Return     | 493      |
| Reward Loss         | -38.8    |
| Running Env Steps   | 692000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.34     |
| Running Update Time | 1384     |
----------------------------------
2025-02-01 17:38:11.291387 Eastern Standard Time
| Itration            | 1385     |
| Real Det Return     | 463      |
| Real Sto Return     | 432      |
| Reward Loss         | -62.9    |
| Running Env Steps   | 692500   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1385     |
----------------------------------
2025-02-01 17:38:26.845875 Eastern Standard Time
| Itration            | 1386     |
| Real Det Return     | 539      |
| Real Sto Return     | 491      |
| Reward Loss         | -28.1    |
| Running Env Steps   | 693000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 5.05     |
| Running Update Time | 1386     |
----------------------------------
2025-02-01 17:38:42.556642 Eastern Standard Time
| Itration            | 1387     |
| Real Det Return     | 542      |
| Real Sto Return     | 487      |
| Reward Loss         | -35.7    |
| Running Env Steps   | 693500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 5.19     |
| Running Update Time | 1387     |
----------------------------------
2025-02-01 17:38:58.224512 Eastern Standard Time
| Itration            | 1388     |
| Real Det Return     | 537      |
| Real Sto Return     | 486      |
| Reward Loss         | -41.2    |
| Running Env Steps   | 694000   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1388     |
----------------------------------
2025-02-01 17:39:14.191704 Eastern Standard Time
| Itration            | 1389     |
| Real Det Return     | 518      |
| Real Sto Return     | 478      |
| Reward Loss         | -39.6    |
| Running Env Steps   | 694500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 1389     |
----------------------------------
2025-02-01 17:39:29.701543 Eastern Standard Time
| Itration            | 1390     |
| Real Det Return     | 548      |
| Real Sto Return     | 493      |
| Reward Loss         | -28      |
| Running Env Steps   | 695000   |
| Running Forward KL  | -4.87    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1390     |
----------------------------------
2025-02-01 17:39:45.264535 Eastern Standard Time
| Itration            | 1391     |
| Real Det Return     | 538      |
| Real Sto Return     | 488      |
| Reward Loss         | -27      |
| Running Env Steps   | 695500   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1391     |
----------------------------------
2025-02-01 17:40:00.922649 Eastern Standard Time
| Itration            | 1392     |
| Real Det Return     | 530      |
| Real Sto Return     | 480      |
| Reward Loss         | -35.5    |
| Running Env Steps   | 696000   |
| Running Forward KL  | -4.37    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1392     |
----------------------------------
2025-02-01 17:40:16.739895 Eastern Standard Time
| Itration            | 1393     |
| Real Det Return     | 543      |
| Real Sto Return     | 490      |
| Reward Loss         | -35.5    |
| Running Env Steps   | 696500   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1393     |
----------------------------------
2025-02-01 17:40:32.477141 Eastern Standard Time
| Itration            | 1394     |
| Real Det Return     | 516      |
| Real Sto Return     | 467      |
| Reward Loss         | -46.8    |
| Running Env Steps   | 697000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1394     |
----------------------------------
2025-02-01 17:40:48.061674 Eastern Standard Time
| Itration            | 1395     |
| Real Det Return     | 545      |
| Real Sto Return     | 495      |
| Reward Loss         | -31      |
| Running Env Steps   | 697500   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 5.1      |
| Running Update Time | 1395     |
----------------------------------
2025-02-01 17:41:03.712776 Eastern Standard Time
| Itration            | 1396     |
| Real Det Return     | 528      |
| Real Sto Return     | 478      |
| Reward Loss         | -47.9    |
| Running Env Steps   | 698000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1396     |
----------------------------------
2025-02-01 17:41:19.693621 Eastern Standard Time
| Itration            | 1397     |
| Real Det Return     | 528      |
| Real Sto Return     | 482      |
| Reward Loss         | -45      |
| Running Env Steps   | 698500   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1397     |
----------------------------------
2025-02-01 17:41:35.335684 Eastern Standard Time
| Itration            | 1398     |
| Real Det Return     | 538      |
| Real Sto Return     | 490      |
| Reward Loss         | -25.4    |
| Running Env Steps   | 699000   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1398     |
----------------------------------
2025-02-01 17:41:51.258754 Eastern Standard Time
| Itration            | 1399     |
| Real Det Return     | 537      |
| Real Sto Return     | 492      |
| Reward Loss         | -35.8    |
| Running Env Steps   | 699500   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 1399     |
----------------------------------
2025-02-01 17:42:07.169983 Eastern Standard Time
| Itration            | 1400     |
| Real Det Return     | 536      |
| Real Sto Return     | 496      |
| Reward Loss         | -35.3    |
| Running Env Steps   | 700000   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1400     |
----------------------------------
2025-02-01 17:42:23.127103 Eastern Standard Time
| Itration            | 1401     |
| Real Det Return     | 538      |
| Real Sto Return     | 482      |
| Reward Loss         | -41.8    |
| Running Env Steps   | 700500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1401     |
----------------------------------
2025-02-01 17:42:38.951195 Eastern Standard Time
| Itration            | 1402     |
| Real Det Return     | 535      |
| Real Sto Return     | 481      |
| Reward Loss         | -36      |
| Running Env Steps   | 701000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1402     |
----------------------------------
2025-02-01 17:42:54.697960 Eastern Standard Time
| Itration            | 1403     |
| Real Det Return     | 529      |
| Real Sto Return     | 472      |
| Reward Loss         | -34.9    |
| Running Env Steps   | 701500   |
| Running Forward KL  | -5.32    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1403     |
----------------------------------
2025-02-01 17:43:10.449849 Eastern Standard Time
| Itration            | 1404     |
| Real Det Return     | 545      |
| Real Sto Return     | 494      |
| Reward Loss         | -16      |
| Running Env Steps   | 702000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 1404     |
----------------------------------
2025-02-01 17:43:26.174214 Eastern Standard Time
| Itration            | 1405     |
| Real Det Return     | 539      |
| Real Sto Return     | 490      |
| Reward Loss         | -42.4    |
| Running Env Steps   | 702500   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1405     |
----------------------------------
2025-02-01 17:43:41.854488 Eastern Standard Time
| Itration            | 1406     |
| Real Det Return     | 528      |
| Real Sto Return     | 487      |
| Reward Loss         | -39.1    |
| Running Env Steps   | 703000   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1406     |
----------------------------------
2025-02-01 17:43:57.595881 Eastern Standard Time
| Itration            | 1407     |
| Real Det Return     | 522      |
| Real Sto Return     | 487      |
| Reward Loss         | -33.1    |
| Running Env Steps   | 703500   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1407     |
----------------------------------
2025-02-01 17:44:13.354311 Eastern Standard Time
| Itration            | 1408     |
| Real Det Return     | 544      |
| Real Sto Return     | 496      |
| Reward Loss         | -31      |
| Running Env Steps   | 704000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1408     |
----------------------------------
2025-02-01 17:44:29.526972 Eastern Standard Time
| Itration            | 1409     |
| Real Det Return     | 537      |
| Real Sto Return     | 482      |
| Reward Loss         | -40.1    |
| Running Env Steps   | 704500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 1409     |
----------------------------------
2025-02-01 17:44:45.127598 Eastern Standard Time
| Itration            | 1410     |
| Real Det Return     | 533      |
| Real Sto Return     | 484      |
| Reward Loss         | -24.8    |
| Running Env Steps   | 705000   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1410     |
----------------------------------
2025-02-01 17:45:00.797100 Eastern Standard Time
| Itration            | 1411     |
| Real Det Return     | 541      |
| Real Sto Return     | 477      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 705500   |
| Running Forward KL  | -5.49    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1411     |
----------------------------------
2025-02-01 17:45:16.335792 Eastern Standard Time
| Itration            | 1412     |
| Real Det Return     | 534      |
| Real Sto Return     | 491      |
| Reward Loss         | -19.9    |
| Running Env Steps   | 706000   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 1412     |
----------------------------------
2025-02-01 17:45:31.902231 Eastern Standard Time
| Itration            | 1413     |
| Real Det Return     | 536      |
| Real Sto Return     | 488      |
| Reward Loss         | -30.8    |
| Running Env Steps   | 706500   |
| Running Forward KL  | -5.37    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 1413     |
----------------------------------
2025-02-01 17:45:47.497558 Eastern Standard Time
| Itration            | 1414     |
| Real Det Return     | 527      |
| Real Sto Return     | 478      |
| Reward Loss         | -46.5    |
| Running Env Steps   | 707000   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1414     |
----------------------------------
2025-02-01 17:46:03.166249 Eastern Standard Time
| Itration            | 1415     |
| Real Det Return     | 546      |
| Real Sto Return     | 504      |
| Reward Loss         | -28.9    |
| Running Env Steps   | 707500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 5.1      |
| Running Update Time | 1415     |
----------------------------------
2025-02-01 17:46:18.980740 Eastern Standard Time
| Itration            | 1416     |
| Real Det Return     | 535      |
| Real Sto Return     | 489      |
| Reward Loss         | -29.9    |
| Running Env Steps   | 708000   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1416     |
----------------------------------
2025-02-01 17:46:35.247419 Eastern Standard Time
| Itration            | 1417     |
| Real Det Return     | 535      |
| Real Sto Return     | 479      |
| Reward Loss         | -32.9    |
| Running Env Steps   | 708500   |
| Running Forward KL  | -5.45    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1417     |
----------------------------------
2025-02-01 17:46:50.897771 Eastern Standard Time
| Itration            | 1418     |
| Real Det Return     | 533      |
| Real Sto Return     | 490      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 709000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 1418     |
----------------------------------
2025-02-01 17:47:06.560811 Eastern Standard Time
| Itration            | 1419     |
| Real Det Return     | 523      |
| Real Sto Return     | 486      |
| Reward Loss         | -43.4    |
| Running Env Steps   | 709500   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 4.13     |
| Running Update Time | 1419     |
----------------------------------
2025-02-01 17:47:22.139129 Eastern Standard Time
| Itration            | 1420     |
| Real Det Return     | 542      |
| Real Sto Return     | 491      |
| Reward Loss         | -33.3    |
| Running Env Steps   | 710000   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 5.27     |
| Running Update Time | 1420     |
----------------------------------
2025-02-01 17:47:37.761197 Eastern Standard Time
| Itration            | 1421     |
| Real Det Return     | 541      |
| Real Sto Return     | 485      |
| Reward Loss         | -42.3    |
| Running Env Steps   | 710500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 1421     |
----------------------------------
2025-02-01 17:47:53.647946 Eastern Standard Time
| Itration            | 1422     |
| Real Det Return     | 518      |
| Real Sto Return     | 486      |
| Reward Loss         | -33.2    |
| Running Env Steps   | 711000   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1422     |
----------------------------------
2025-02-01 17:48:09.344994 Eastern Standard Time
| Itration            | 1423     |
| Real Det Return     | 539      |
| Real Sto Return     | 481      |
| Reward Loss         | -32.6    |
| Running Env Steps   | 711500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 5.56     |
| Running Update Time | 1423     |
----------------------------------
2025-02-01 17:48:25.138038 Eastern Standard Time
| Itration            | 1424     |
| Real Det Return     | 505      |
| Real Sto Return     | 446      |
| Reward Loss         | -44      |
| Running Env Steps   | 712000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1424     |
----------------------------------
2025-02-01 17:48:41.164378 Eastern Standard Time
| Itration            | 1425     |
| Real Det Return     | 526      |
| Real Sto Return     | 486      |
| Reward Loss         | -32.8    |
| Running Env Steps   | 712500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1425     |
----------------------------------
2025-02-01 17:48:56.751671 Eastern Standard Time
| Itration            | 1426     |
| Real Det Return     | 527      |
| Real Sto Return     | 477      |
| Reward Loss         | -25.2    |
| Running Env Steps   | 713000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1426     |
----------------------------------
2025-02-01 17:49:12.322773 Eastern Standard Time
| Itration            | 1427     |
| Real Det Return     | 531      |
| Real Sto Return     | 471      |
| Reward Loss         | -39.6    |
| Running Env Steps   | 713500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1427     |
----------------------------------
2025-02-01 17:49:27.958013 Eastern Standard Time
| Itration            | 1428     |
| Real Det Return     | 536      |
| Real Sto Return     | 484      |
| Reward Loss         | -39      |
| Running Env Steps   | 714000   |
| Running Forward KL  | -4.47    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 1428     |
----------------------------------
2025-02-01 17:49:43.575431 Eastern Standard Time
| Itration            | 1429     |
| Real Det Return     | 506      |
| Real Sto Return     | 461      |
| Reward Loss         | -41      |
| Running Env Steps   | 714500   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1429     |
----------------------------------
2025-02-01 17:49:59.355307 Eastern Standard Time
| Itration            | 1430     |
| Real Det Return     | 529      |
| Real Sto Return     | 486      |
| Reward Loss         | -39.2    |
| Running Env Steps   | 715000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1430     |
----------------------------------
2025-02-01 17:50:15.216447 Eastern Standard Time
| Itration            | 1431     |
| Real Det Return     | 527      |
| Real Sto Return     | 493      |
| Reward Loss         | -45.2    |
| Running Env Steps   | 715500   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 1431     |
----------------------------------
2025-02-01 17:50:31.084950 Eastern Standard Time
| Itration            | 1432     |
| Real Det Return     | 527      |
| Real Sto Return     | 474      |
| Reward Loss         | -41.7    |
| Running Env Steps   | 716000   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 5        |
| Running Update Time | 1432     |
----------------------------------
2025-02-01 17:50:47.104454 Eastern Standard Time
| Itration            | 1433     |
| Real Det Return     | 522      |
| Real Sto Return     | 476      |
| Reward Loss         | -24.4    |
| Running Env Steps   | 716500   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 5.47     |
| Running Update Time | 1433     |
----------------------------------
2025-02-01 17:51:02.724890 Eastern Standard Time
| Itration            | 1434     |
| Real Det Return     | 523      |
| Real Sto Return     | 471      |
| Reward Loss         | -45.8    |
| Running Env Steps   | 717000   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 4.14     |
| Running Update Time | 1434     |
----------------------------------
2025-02-01 17:51:18.381036 Eastern Standard Time
| Itration            | 1435     |
| Real Det Return     | 534      |
| Real Sto Return     | 486      |
| Reward Loss         | -40.9    |
| Running Env Steps   | 717500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1435     |
----------------------------------
2025-02-01 17:51:34.301158 Eastern Standard Time
| Itration            | 1436     |
| Real Det Return     | 532      |
| Real Sto Return     | 480      |
| Reward Loss         | -36.5    |
| Running Env Steps   | 718000   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1436     |
----------------------------------
2025-02-01 17:51:49.856926 Eastern Standard Time
| Itration            | 1437     |
| Real Det Return     | 523      |
| Real Sto Return     | 483      |
| Reward Loss         | -33.8    |
| Running Env Steps   | 718500   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1437     |
----------------------------------
2025-02-01 17:52:05.544047 Eastern Standard Time
| Itration            | 1438     |
| Real Det Return     | 539      |
| Real Sto Return     | 493      |
| Reward Loss         | -37.8    |
| Running Env Steps   | 719000   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 5.17     |
| Running Update Time | 1438     |
----------------------------------
2025-02-01 17:52:21.243786 Eastern Standard Time
| Itration            | 1439     |
| Real Det Return     | 539      |
| Real Sto Return     | 496      |
| Reward Loss         | -44.7    |
| Running Env Steps   | 719500   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 1439     |
----------------------------------
2025-02-01 17:52:37.041984 Eastern Standard Time
| Itration            | 1440     |
| Real Det Return     | 525      |
| Real Sto Return     | 471      |
| Reward Loss         | -56.2    |
| Running Env Steps   | 720000   |
| Running Forward KL  | -4.4     |
| Running Reverse KL  | 4.36     |
| Running Update Time | 1440     |
----------------------------------
2025-02-01 17:52:53.279128 Eastern Standard Time
| Itration            | 1441     |
| Real Det Return     | 522      |
| Real Sto Return     | 484      |
| Reward Loss         | -41      |
| Running Env Steps   | 720500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 5.2      |
| Running Update Time | 1441     |
----------------------------------
2025-02-01 17:53:09.538791 Eastern Standard Time
| Itration            | 1442     |
| Real Det Return     | 547      |
| Real Sto Return     | 497      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 721000   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 1442     |
----------------------------------
2025-02-01 17:53:25.373145 Eastern Standard Time
| Itration            | 1443     |
| Real Det Return     | 533      |
| Real Sto Return     | 484      |
| Reward Loss         | -28.8    |
| Running Env Steps   | 721500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 5.08     |
| Running Update Time | 1443     |
----------------------------------
2025-02-01 17:53:41.107006 Eastern Standard Time
| Itration            | 1444     |
| Real Det Return     | 539      |
| Real Sto Return     | 492      |
| Reward Loss         | -46.3    |
| Running Env Steps   | 722000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1444     |
----------------------------------
2025-02-01 17:53:56.877888 Eastern Standard Time
| Itration            | 1445     |
| Real Det Return     | 527      |
| Real Sto Return     | 476      |
| Reward Loss         | -36.9    |
| Running Env Steps   | 722500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.79     |
| Running Update Time | 1445     |
----------------------------------
2025-02-01 17:54:12.720247 Eastern Standard Time
| Itration            | 1446     |
| Real Det Return     | 535      |
| Real Sto Return     | 481      |
| Reward Loss         | -35.3    |
| Running Env Steps   | 723000   |
| Running Forward KL  | -5.38    |
| Running Reverse KL  | 4.94     |
| Running Update Time | 1446     |
----------------------------------
2025-02-01 17:54:28.441186 Eastern Standard Time
| Itration            | 1447     |
| Real Det Return     | 547      |
| Real Sto Return     | 487      |
| Reward Loss         | -41.3    |
| Running Env Steps   | 723500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 5.22     |
| Running Update Time | 1447     |
----------------------------------
2025-02-01 17:54:44.242067 Eastern Standard Time
| Itration            | 1448     |
| Real Det Return     | 543      |
| Real Sto Return     | 495      |
| Reward Loss         | -29.6    |
| Running Env Steps   | 724000   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 1448     |
----------------------------------
2025-02-01 17:54:59.923325 Eastern Standard Time
| Itration            | 1449     |
| Real Det Return     | 539      |
| Real Sto Return     | 484      |
| Reward Loss         | -42.2    |
| Running Env Steps   | 724500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1449     |
----------------------------------
2025-02-01 17:55:15.785539 Eastern Standard Time
| Itration            | 1450     |
| Real Det Return     | 544      |
| Real Sto Return     | 495      |
| Reward Loss         | -36.9    |
| Running Env Steps   | 725000   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 5.07     |
| Running Update Time | 1450     |
----------------------------------
2025-02-01 17:55:31.539906 Eastern Standard Time
| Itration            | 1451     |
| Real Det Return     | 537      |
| Real Sto Return     | 484      |
| Reward Loss         | -26.9    |
| Running Env Steps   | 725500   |
| Running Forward KL  | -5.32    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1451     |
----------------------------------
2025-02-01 17:55:47.189686 Eastern Standard Time
| Itration            | 1452     |
| Real Det Return     | 523      |
| Real Sto Return     | 487      |
| Reward Loss         | -29.1    |
| Running Env Steps   | 726000   |
| Running Forward KL  | -5.5     |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1452     |
----------------------------------
2025-02-01 17:56:02.908121 Eastern Standard Time
| Itration            | 1453     |
| Real Det Return     | 518      |
| Real Sto Return     | 484      |
| Reward Loss         | -42.2    |
| Running Env Steps   | 726500   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 1453     |
----------------------------------
2025-02-01 17:56:19.001443 Eastern Standard Time
| Itration            | 1454     |
| Real Det Return     | 550      |
| Real Sto Return     | 506      |
| Reward Loss         | -38.8    |
| Running Env Steps   | 727000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.34     |
| Running Update Time | 1454     |
----------------------------------
2025-02-01 17:56:34.747482 Eastern Standard Time
| Itration            | 1455     |
| Real Det Return     | 508      |
| Real Sto Return     | 484      |
| Reward Loss         | -26.2    |
| Running Env Steps   | 727500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1455     |
----------------------------------
2025-02-01 17:56:50.334529 Eastern Standard Time
| Itration            | 1456     |
| Real Det Return     | 541      |
| Real Sto Return     | 486      |
| Reward Loss         | -40.3    |
| Running Env Steps   | 728000   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1456     |
----------------------------------
2025-02-01 17:57:05.902042 Eastern Standard Time
| Itration            | 1457     |
| Real Det Return     | 523      |
| Real Sto Return     | 466      |
| Reward Loss         | -21.2    |
| Running Env Steps   | 728500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1457     |
----------------------------------
2025-02-01 17:57:21.525269 Eastern Standard Time
| Itration            | 1458     |
| Real Det Return     | 541      |
| Real Sto Return     | 490      |
| Reward Loss         | -45.2    |
| Running Env Steps   | 729000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1458     |
----------------------------------
2025-02-01 17:57:37.142152 Eastern Standard Time
| Itration            | 1459     |
| Real Det Return     | 546      |
| Real Sto Return     | 487      |
| Reward Loss         | -42.8    |
| Running Env Steps   | 729500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 1459     |
----------------------------------
2025-02-01 17:57:52.765690 Eastern Standard Time
| Itration            | 1460     |
| Real Det Return     | 533      |
| Real Sto Return     | 481      |
| Reward Loss         | -49.2    |
| Running Env Steps   | 730000   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1460     |
----------------------------------
2025-02-01 17:58:08.303111 Eastern Standard Time
| Itration            | 1461     |
| Real Det Return     | 547      |
| Real Sto Return     | 482      |
| Reward Loss         | -28.4    |
| Running Env Steps   | 730500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 1461     |
----------------------------------
2025-02-01 17:58:23.831216 Eastern Standard Time
| Itration            | 1462     |
| Real Det Return     | 533      |
| Real Sto Return     | 486      |
| Reward Loss         | -30.1    |
| Running Env Steps   | 731000   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 4.65     |
| Running Update Time | 1462     |
----------------------------------
2025-02-01 17:58:39.703329 Eastern Standard Time
| Itration            | 1463     |
| Real Det Return     | 529      |
| Real Sto Return     | 465      |
| Reward Loss         | -36.4    |
| Running Env Steps   | 731500   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.79     |
| Running Update Time | 1463     |
----------------------------------
2025-02-01 17:58:55.539814 Eastern Standard Time
| Itration            | 1464     |
| Real Det Return     | 527      |
| Real Sto Return     | 483      |
| Reward Loss         | -30.6    |
| Running Env Steps   | 732000   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1464     |
----------------------------------
2025-02-01 17:59:11.348440 Eastern Standard Time
| Itration            | 1465     |
| Real Det Return     | 532      |
| Real Sto Return     | 490      |
| Reward Loss         | -39.6    |
| Running Env Steps   | 732500   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1465     |
----------------------------------
2025-02-01 17:59:27.836120 Eastern Standard Time
| Itration            | 1466     |
| Real Det Return     | 534      |
| Real Sto Return     | 486      |
| Reward Loss         | -44.8    |
| Running Env Steps   | 733000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1466     |
----------------------------------
2025-02-01 17:59:43.669672 Eastern Standard Time
| Itration            | 1467     |
| Real Det Return     | 525      |
| Real Sto Return     | 482      |
| Reward Loss         | -39.7    |
| Running Env Steps   | 733500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1467     |
----------------------------------
2025-02-01 17:59:59.355213 Eastern Standard Time
| Itration            | 1468     |
| Real Det Return     | 538      |
| Real Sto Return     | 475      |
| Reward Loss         | -29.3    |
| Running Env Steps   | 734000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1468     |
----------------------------------
2025-02-01 18:00:15.105211 Eastern Standard Time
| Itration            | 1469     |
| Real Det Return     | 534      |
| Real Sto Return     | 485      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 734500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1469     |
----------------------------------
2025-02-01 18:00:30.884808 Eastern Standard Time
| Itration            | 1470     |
| Real Det Return     | 525      |
| Real Sto Return     | 485      |
| Reward Loss         | -34.2    |
| Running Env Steps   | 735000   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 1470     |
----------------------------------
2025-02-01 18:00:46.616595 Eastern Standard Time
| Itration            | 1471     |
| Real Det Return     | 520      |
| Real Sto Return     | 477      |
| Reward Loss         | -47      |
| Running Env Steps   | 735500   |
| Running Forward KL  | -4.38    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1471     |
----------------------------------
2025-02-01 18:01:02.359829 Eastern Standard Time
| Itration            | 1472     |
| Real Det Return     | 538      |
| Real Sto Return     | 493      |
| Reward Loss         | -22.9    |
| Running Env Steps   | 736000   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 1472     |
----------------------------------
2025-02-01 18:01:18.107097 Eastern Standard Time
| Itration            | 1473     |
| Real Det Return     | 519      |
| Real Sto Return     | 477      |
| Reward Loss         | -36.6    |
| Running Env Steps   | 736500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 1473     |
----------------------------------
2025-02-01 18:01:33.857593 Eastern Standard Time
| Itration            | 1474     |
| Real Det Return     | 531      |
| Real Sto Return     | 488      |
| Reward Loss         | -34.2    |
| Running Env Steps   | 737000   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1474     |
----------------------------------
2025-02-01 18:01:49.631049 Eastern Standard Time
| Itration            | 1475     |
| Real Det Return     | 527      |
| Real Sto Return     | 464      |
| Reward Loss         | -36.1    |
| Running Env Steps   | 737500   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 5.03     |
| Running Update Time | 1475     |
----------------------------------
2025-02-01 18:02:05.481368 Eastern Standard Time
| Itration            | 1476     |
| Real Det Return     | 527      |
| Real Sto Return     | 476      |
| Reward Loss         | -44.9    |
| Running Env Steps   | 738000   |
| Running Forward KL  | -4.61    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1476     |
----------------------------------
2025-02-01 18:02:21.294216 Eastern Standard Time
| Itration            | 1477     |
| Real Det Return     | 543      |
| Real Sto Return     | 485      |
| Reward Loss         | -32.8    |
| Running Env Steps   | 738500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1477     |
----------------------------------
2025-02-01 18:02:37.032688 Eastern Standard Time
| Itration            | 1478     |
| Real Det Return     | 537      |
| Real Sto Return     | 479      |
| Reward Loss         | -53.4    |
| Running Env Steps   | 739000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 1478     |
----------------------------------
2025-02-01 18:02:52.870025 Eastern Standard Time
| Itration            | 1479     |
| Real Det Return     | 532      |
| Real Sto Return     | 490      |
| Reward Loss         | -33.9    |
| Running Env Steps   | 739500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1479     |
----------------------------------
2025-02-01 18:03:08.967545 Eastern Standard Time
| Itration            | 1480     |
| Real Det Return     | 542      |
| Real Sto Return     | 495      |
| Reward Loss         | -37.1    |
| Running Env Steps   | 740000   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 5.34     |
| Running Update Time | 1480     |
----------------------------------
2025-02-01 18:03:24.727178 Eastern Standard Time
| Itration            | 1481     |
| Real Det Return     | 535      |
| Real Sto Return     | 494      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 740500   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 4.38     |
| Running Update Time | 1481     |
----------------------------------
2025-02-01 18:03:40.561986 Eastern Standard Time
| Itration            | 1482     |
| Real Det Return     | 530      |
| Real Sto Return     | 498      |
| Reward Loss         | -28.6    |
| Running Env Steps   | 741000   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 5.2      |
| Running Update Time | 1482     |
----------------------------------
2025-02-01 18:03:56.249584 Eastern Standard Time
| Itration            | 1483     |
| Real Det Return     | 520      |
| Real Sto Return     | 473      |
| Reward Loss         | -61.4    |
| Running Env Steps   | 741500   |
| Running Forward KL  | -4.75    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1483     |
----------------------------------
2025-02-01 18:04:11.994189 Eastern Standard Time
| Itration            | 1484     |
| Real Det Return     | 518      |
| Real Sto Return     | 489      |
| Reward Loss         | -37.2    |
| Running Env Steps   | 742000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.16     |
| Running Update Time | 1484     |
----------------------------------
2025-02-01 18:04:27.792630 Eastern Standard Time
| Itration            | 1485     |
| Real Det Return     | 542      |
| Real Sto Return     | 480      |
| Reward Loss         | -31.9    |
| Running Env Steps   | 742500   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 1485     |
----------------------------------
2025-02-01 18:04:43.770571 Eastern Standard Time
| Itration            | 1486     |
| Real Det Return     | 531      |
| Real Sto Return     | 480      |
| Reward Loss         | -36.8    |
| Running Env Steps   | 743000   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1486     |
----------------------------------
2025-02-01 18:04:59.558350 Eastern Standard Time
| Itration            | 1487     |
| Real Det Return     | 538      |
| Real Sto Return     | 492      |
| Reward Loss         | -37.9    |
| Running Env Steps   | 743500   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1487     |
----------------------------------
2025-02-01 18:05:15.308634 Eastern Standard Time
| Itration            | 1488     |
| Real Det Return     | 529      |
| Real Sto Return     | 491      |
| Reward Loss         | -24.8    |
| Running Env Steps   | 744000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1488     |
----------------------------------
2025-02-01 18:05:31.059070 Eastern Standard Time
| Itration            | 1489     |
| Real Det Return     | 536      |
| Real Sto Return     | 477      |
| Reward Loss         | -37.2    |
| Running Env Steps   | 744500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1489     |
----------------------------------
2025-02-01 18:05:46.820934 Eastern Standard Time
| Itration            | 1490     |
| Real Det Return     | 541      |
| Real Sto Return     | 488      |
| Reward Loss         | -45.1    |
| Running Env Steps   | 745000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1490     |
----------------------------------
2025-02-01 18:06:02.571020 Eastern Standard Time
| Itration            | 1491     |
| Real Det Return     | 547      |
| Real Sto Return     | 497      |
| Reward Loss         | -38.9    |
| Running Env Steps   | 745500   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 1491     |
----------------------------------
2025-02-01 18:06:18.331766 Eastern Standard Time
| Itration            | 1492     |
| Real Det Return     | 542      |
| Real Sto Return     | 477      |
| Reward Loss         | -45.1    |
| Running Env Steps   | 746000   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1492     |
----------------------------------
2025-02-01 18:06:34.119928 Eastern Standard Time
| Itration            | 1493     |
| Real Det Return     | 522      |
| Real Sto Return     | 479      |
| Reward Loss         | -47.1    |
| Running Env Steps   | 746500   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1493     |
----------------------------------
2025-02-01 18:06:49.840674 Eastern Standard Time
| Itration            | 1494     |
| Real Det Return     | 525      |
| Real Sto Return     | 477      |
| Reward Loss         | -45.2    |
| Running Env Steps   | 747000   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1494     |
----------------------------------
2025-02-01 18:07:05.587119 Eastern Standard Time
| Itration            | 1495     |
| Real Det Return     | 545      |
| Real Sto Return     | 473      |
| Reward Loss         | -42.2    |
| Running Env Steps   | 747500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 1495     |
----------------------------------
2025-02-01 18:07:21.326576 Eastern Standard Time
| Itration            | 1496     |
| Real Det Return     | 518      |
| Real Sto Return     | 467      |
| Reward Loss         | -47      |
| Running Env Steps   | 748000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 1496     |
----------------------------------
2025-02-01 18:07:37.073148 Eastern Standard Time
| Itration            | 1497     |
| Real Det Return     | 543      |
| Real Sto Return     | 473      |
| Reward Loss         | -59.3    |
| Running Env Steps   | 748500   |
| Running Forward KL  | -4.57    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 1497     |
----------------------------------
2025-02-01 18:07:52.921861 Eastern Standard Time
| Itration            | 1498     |
| Real Det Return     | 527      |
| Real Sto Return     | 489      |
| Reward Loss         | -24.2    |
| Running Env Steps   | 749000   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1498     |
----------------------------------
2025-02-01 18:08:08.745279 Eastern Standard Time
| Itration            | 1499     |
| Real Det Return     | 527      |
| Real Sto Return     | 477      |
| Reward Loss         | -32.7    |
| Running Env Steps   | 749500   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1499     |
----------------------------------
2025-02-01 18:08:24.475318 Eastern Standard Time
| Itration            | 1500     |
| Real Det Return     | 528      |
| Real Sto Return     | 478      |
| Reward Loss         | -37.1    |
| Running Env Steps   | 750000   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 1500     |
----------------------------------
2025-02-01 18:08:42.140783 Eastern Standard Time
| Itration            | 1501     |
| Real Det Return     | 532      |
| Real Sto Return     | 482      |
| Reward Loss         | -42.4    |
| Running Env Steps   | 750500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 4.47     |
| Running Update Time | 1501     |
----------------------------------
2025-02-01 18:08:58.251175 Eastern Standard Time
| Itration            | 1502     |
| Real Det Return     | 526      |
| Real Sto Return     | 476      |
| Reward Loss         | -49.9    |
| Running Env Steps   | 751000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 1502     |
----------------------------------
2025-02-01 18:09:14.239539 Eastern Standard Time
| Itration            | 1503     |
| Real Det Return     | 537      |
| Real Sto Return     | 487      |
| Reward Loss         | -26.4    |
| Running Env Steps   | 751500   |
| Running Forward KL  | -4.58    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 1503     |
----------------------------------
2025-02-01 18:09:30.465532 Eastern Standard Time
| Itration            | 1504     |
| Real Det Return     | 525      |
| Real Sto Return     | 476      |
| Reward Loss         | -36.7    |
| Running Env Steps   | 752000   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.36     |
| Running Update Time | 1504     |
----------------------------------
2025-02-01 18:09:46.262785 Eastern Standard Time
| Itration            | 1505     |
| Real Det Return     | 530      |
| Real Sto Return     | 483      |
| Reward Loss         | -47.4    |
| Running Env Steps   | 752500   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 1505     |
----------------------------------
2025-02-01 18:10:02.151899 Eastern Standard Time
| Itration            | 1506     |
| Real Det Return     | 527      |
| Real Sto Return     | 476      |
| Reward Loss         | -32.5    |
| Running Env Steps   | 753000   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 5.3      |
| Running Update Time | 1506     |
----------------------------------
2025-02-01 18:10:18.027547 Eastern Standard Time
| Itration            | 1507     |
| Real Det Return     | 539      |
| Real Sto Return     | 476      |
| Reward Loss         | -35.9    |
| Running Env Steps   | 753500   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1507     |
----------------------------------
2025-02-01 18:10:33.526282 Eastern Standard Time
| Itration            | 1508     |
| Real Det Return     | 530      |
| Real Sto Return     | 481      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 754000   |
| Running Forward KL  | -4.53    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1508     |
----------------------------------
2025-02-01 18:10:49.110333 Eastern Standard Time
| Itration            | 1509     |
| Real Det Return     | 523      |
| Real Sto Return     | 478      |
| Reward Loss         | -45.2    |
| Running Env Steps   | 754500   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.6      |
| Running Update Time | 1509     |
----------------------------------
2025-02-01 18:11:04.728317 Eastern Standard Time
| Itration            | 1510     |
| Real Det Return     | 526      |
| Real Sto Return     | 472      |
| Reward Loss         | -41      |
| Running Env Steps   | 755000   |
| Running Forward KL  | -4.5     |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1510     |
----------------------------------
2025-02-01 18:11:20.294146 Eastern Standard Time
| Itration            | 1511     |
| Real Det Return     | 529      |
| Real Sto Return     | 480      |
| Reward Loss         | -39.7    |
| Running Env Steps   | 755500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 5.08     |
| Running Update Time | 1511     |
----------------------------------
2025-02-01 18:11:35.901784 Eastern Standard Time
| Itration            | 1512     |
| Real Det Return     | 540      |
| Real Sto Return     | 477      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 756000   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 1512     |
----------------------------------
2025-02-01 18:11:51.466309 Eastern Standard Time
| Itration            | 1513     |
| Real Det Return     | 548      |
| Real Sto Return     | 478      |
| Reward Loss         | -44.9    |
| Running Env Steps   | 756500   |
| Running Forward KL  | -4.74    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1513     |
----------------------------------
2025-02-01 18:12:07.066901 Eastern Standard Time
| Itration            | 1514     |
| Real Det Return     | 530      |
| Real Sto Return     | 483      |
| Reward Loss         | -29      |
| Running Env Steps   | 757000   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 1514     |
----------------------------------
2025-02-01 18:12:22.632059 Eastern Standard Time
| Itration            | 1515     |
| Real Det Return     | 536      |
| Real Sto Return     | 497      |
| Reward Loss         | -30.5    |
| Running Env Steps   | 757500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1515     |
----------------------------------
2025-02-01 18:12:38.181872 Eastern Standard Time
| Itration            | 1516     |
| Real Det Return     | 537      |
| Real Sto Return     | 487      |
| Reward Loss         | -28.7    |
| Running Env Steps   | 758000   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 5.22     |
| Running Update Time | 1516     |
----------------------------------
2025-02-01 18:12:53.739670 Eastern Standard Time
| Itration            | 1517     |
| Real Det Return     | 533      |
| Real Sto Return     | 481      |
| Reward Loss         | -46.2    |
| Running Env Steps   | 758500   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1517     |
----------------------------------
2025-02-01 18:13:09.393201 Eastern Standard Time
| Itration            | 1518     |
| Real Det Return     | 535      |
| Real Sto Return     | 480      |
| Reward Loss         | -33.8    |
| Running Env Steps   | 759000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1518     |
----------------------------------
2025-02-01 18:13:24.967026 Eastern Standard Time
| Itration            | 1519     |
| Real Det Return     | 538      |
| Real Sto Return     | 475      |
| Reward Loss         | -37.3    |
| Running Env Steps   | 759500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 5.44     |
| Running Update Time | 1519     |
----------------------------------
2025-02-01 18:13:40.588021 Eastern Standard Time
| Itration            | 1520     |
| Real Det Return     | 538      |
| Real Sto Return     | 481      |
| Reward Loss         | -51.1    |
| Running Env Steps   | 760000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1520     |
----------------------------------
2025-02-01 18:13:56.185620 Eastern Standard Time
| Itration            | 1521     |
| Real Det Return     | 531      |
| Real Sto Return     | 486      |
| Reward Loss         | -35.3    |
| Running Env Steps   | 760500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1521     |
----------------------------------
2025-02-01 18:14:12.445961 Eastern Standard Time
| Itration            | 1522     |
| Real Det Return     | 526      |
| Real Sto Return     | 486      |
| Reward Loss         | -39.8    |
| Running Env Steps   | 761000   |
| Running Forward KL  | -4.6     |
| Running Reverse KL  | 5.05     |
| Running Update Time | 1522     |
----------------------------------
2025-02-01 18:14:28.108997 Eastern Standard Time
| Itration            | 1523     |
| Real Det Return     | 536      |
| Real Sto Return     | 487      |
| Reward Loss         | -52.9    |
| Running Env Steps   | 761500   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1523     |
----------------------------------
2025-02-01 18:14:43.682956 Eastern Standard Time
| Itration            | 1524     |
| Real Det Return     | 524      |
| Real Sto Return     | 472      |
| Reward Loss         | -41.8    |
| Running Env Steps   | 762000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 5.2      |
| Running Update Time | 1524     |
----------------------------------
2025-02-01 18:14:59.219556 Eastern Standard Time
| Itration            | 1525     |
| Real Det Return     | 515      |
| Real Sto Return     | 470      |
| Reward Loss         | -51.6    |
| Running Env Steps   | 762500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1525     |
----------------------------------
2025-02-01 18:15:14.777641 Eastern Standard Time
| Itration            | 1526     |
| Real Det Return     | 516      |
| Real Sto Return     | 475      |
| Reward Loss         | -42      |
| Running Env Steps   | 763000   |
| Running Forward KL  | -5.37    |
| Running Reverse KL  | 5.37     |
| Running Update Time | 1526     |
----------------------------------
2025-02-01 18:15:30.353168 Eastern Standard Time
| Itration            | 1527     |
| Real Det Return     | 504      |
| Real Sto Return     | 460      |
| Reward Loss         | -45.7    |
| Running Env Steps   | 763500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1527     |
----------------------------------
2025-02-01 18:15:45.939352 Eastern Standard Time
| Itration            | 1528     |
| Real Det Return     | 531      |
| Real Sto Return     | 480      |
| Reward Loss         | -46.4    |
| Running Env Steps   | 764000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1528     |
----------------------------------
2025-02-01 18:16:01.505233 Eastern Standard Time
| Itration            | 1529     |
| Real Det Return     | 537      |
| Real Sto Return     | 485      |
| Reward Loss         | -42.3    |
| Running Env Steps   | 764500   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1529     |
----------------------------------
2025-02-01 18:16:17.096180 Eastern Standard Time
| Itration            | 1530     |
| Real Det Return     | 537      |
| Real Sto Return     | 479      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 765000   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1530     |
----------------------------------
2025-02-01 18:16:32.626337 Eastern Standard Time
| Itration            | 1531     |
| Real Det Return     | 526      |
| Real Sto Return     | 482      |
| Reward Loss         | -52.8    |
| Running Env Steps   | 765500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 4.03     |
| Running Update Time | 1531     |
----------------------------------
2025-02-01 18:16:48.229157 Eastern Standard Time
| Itration            | 1532     |
| Real Det Return     | 529      |
| Real Sto Return     | 470      |
| Reward Loss         | -36      |
| Running Env Steps   | 766000   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 5.36     |
| Running Update Time | 1532     |
----------------------------------
2025-02-01 18:17:03.817091 Eastern Standard Time
| Itration            | 1533     |
| Real Det Return     | 524      |
| Real Sto Return     | 481      |
| Reward Loss         | -44.6    |
| Running Env Steps   | 766500   |
| Running Forward KL  | -5.32    |
| Running Reverse KL  | 4.3      |
| Running Update Time | 1533     |
----------------------------------
2025-02-01 18:17:19.405347 Eastern Standard Time
| Itration            | 1534     |
| Real Det Return     | 528      |
| Real Sto Return     | 489      |
| Reward Loss         | -41.6    |
| Running Env Steps   | 767000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1534     |
----------------------------------
2025-02-01 18:17:34.962202 Eastern Standard Time
| Itration            | 1535     |
| Real Det Return     | 530      |
| Real Sto Return     | 487      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 767500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 5.18     |
| Running Update Time | 1535     |
----------------------------------
2025-02-01 18:17:50.528793 Eastern Standard Time
| Itration            | 1536     |
| Real Det Return     | 530      |
| Real Sto Return     | 480      |
| Reward Loss         | -38.5    |
| Running Env Steps   | 768000   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1536     |
----------------------------------
2025-02-01 18:18:06.090985 Eastern Standard Time
| Itration            | 1537     |
| Real Det Return     | 497      |
| Real Sto Return     | 466      |
| Reward Loss         | -44.4    |
| Running Env Steps   | 768500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1537     |
----------------------------------
2025-02-01 18:18:21.650813 Eastern Standard Time
| Itration            | 1538     |
| Real Det Return     | 523      |
| Real Sto Return     | 473      |
| Reward Loss         | -34.5    |
| Running Env Steps   | 769000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1538     |
----------------------------------
2025-02-01 18:18:37.255610 Eastern Standard Time
| Itration            | 1539     |
| Real Det Return     | 537      |
| Real Sto Return     | 477      |
| Reward Loss         | -44.7    |
| Running Env Steps   | 769500   |
| Running Forward KL  | -4.46    |
| Running Reverse KL  | 4.52     |
| Running Update Time | 1539     |
----------------------------------
2025-02-01 18:18:52.805655 Eastern Standard Time
| Itration            | 1540     |
| Real Det Return     | 539      |
| Real Sto Return     | 490      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 770000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1540     |
----------------------------------
2025-02-01 18:19:08.375716 Eastern Standard Time
| Itration            | 1541     |
| Real Det Return     | 526      |
| Real Sto Return     | 479      |
| Reward Loss         | -38.7    |
| Running Env Steps   | 770500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.43     |
| Running Update Time | 1541     |
----------------------------------
2025-02-01 18:19:23.980012 Eastern Standard Time
| Itration            | 1542     |
| Real Det Return     | 534      |
| Real Sto Return     | 470      |
| Reward Loss         | -28.3    |
| Running Env Steps   | 771000   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 5.3      |
| Running Update Time | 1542     |
----------------------------------
2025-02-01 18:19:39.491583 Eastern Standard Time
| Itration            | 1543     |
| Real Det Return     | 538      |
| Real Sto Return     | 487      |
| Reward Loss         | -43      |
| Running Env Steps   | 771500   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1543     |
----------------------------------
2025-02-01 18:19:55.060859 Eastern Standard Time
| Itration            | 1544     |
| Real Det Return     | 545      |
| Real Sto Return     | 492      |
| Reward Loss         | -38.2    |
| Running Env Steps   | 772000   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1544     |
----------------------------------
2025-02-01 18:20:10.653484 Eastern Standard Time
| Itration            | 1545     |
| Real Det Return     | 528      |
| Real Sto Return     | 481      |
| Reward Loss         | -45.5    |
| Running Env Steps   | 772500   |
| Running Forward KL  | -4.87    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1545     |
----------------------------------
2025-02-01 18:20:26.308029 Eastern Standard Time
| Itration            | 1546     |
| Real Det Return     | 526      |
| Real Sto Return     | 471      |
| Reward Loss         | -37.4    |
| Running Env Steps   | 773000   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.94     |
| Running Update Time | 1546     |
----------------------------------
2025-02-01 18:20:41.873918 Eastern Standard Time
| Itration            | 1547     |
| Real Det Return     | 537      |
| Real Sto Return     | 505      |
| Reward Loss         | -34.1    |
| Running Env Steps   | 773500   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1547     |
----------------------------------
2025-02-01 18:20:57.442139 Eastern Standard Time
| Itration            | 1548     |
| Real Det Return     | 512      |
| Real Sto Return     | 471      |
| Reward Loss         | -22      |
| Running Env Steps   | 774000   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 5.82     |
| Running Update Time | 1548     |
----------------------------------
2025-02-01 18:21:12.929237 Eastern Standard Time
| Itration            | 1549     |
| Real Det Return     | 528      |
| Real Sto Return     | 475      |
| Reward Loss         | -47      |
| Running Env Steps   | 774500   |
| Running Forward KL  | -4.71    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1549     |
----------------------------------
2025-02-01 18:21:28.464339 Eastern Standard Time
| Itration            | 1550     |
| Real Det Return     | 521      |
| Real Sto Return     | 490      |
| Reward Loss         | -42.4    |
| Running Env Steps   | 775000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 5.34     |
| Running Update Time | 1550     |
----------------------------------
2025-02-01 18:21:44.009636 Eastern Standard Time
| Itration            | 1551     |
| Real Det Return     | 552      |
| Real Sto Return     | 492      |
| Reward Loss         | -26.1    |
| Running Env Steps   | 775500   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1551     |
----------------------------------
2025-02-01 18:21:59.627812 Eastern Standard Time
| Itration            | 1552     |
| Real Det Return     | 544      |
| Real Sto Return     | 489      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 776000   |
| Running Forward KL  | -4.8     |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1552     |
----------------------------------
2025-02-01 18:22:15.181624 Eastern Standard Time
| Itration            | 1553     |
| Real Det Return     | 538      |
| Real Sto Return     | 484      |
| Reward Loss         | -29.3    |
| Running Env Steps   | 776500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1553     |
----------------------------------
2025-02-01 18:22:30.782992 Eastern Standard Time
| Itration            | 1554     |
| Real Det Return     | 532      |
| Real Sto Return     | 486      |
| Reward Loss         | -35      |
| Running Env Steps   | 777000   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1554     |
----------------------------------
2025-02-01 18:22:46.400504 Eastern Standard Time
| Itration            | 1555     |
| Real Det Return     | 538      |
| Real Sto Return     | 492      |
| Reward Loss         | -42.2    |
| Running Env Steps   | 777500   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1555     |
----------------------------------
2025-02-01 18:23:02.020583 Eastern Standard Time
| Itration            | 1556     |
| Real Det Return     | 542      |
| Real Sto Return     | 488      |
| Reward Loss         | -35.1    |
| Running Env Steps   | 778000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 5.24     |
| Running Update Time | 1556     |
----------------------------------
2025-02-01 18:23:17.497603 Eastern Standard Time
| Itration            | 1557     |
| Real Det Return     | 542      |
| Real Sto Return     | 485      |
| Reward Loss         | -28.4    |
| Running Env Steps   | 778500   |
| Running Forward KL  | -4.66    |
| Running Reverse KL  | 5.57     |
| Running Update Time | 1557     |
----------------------------------
2025-02-01 18:23:33.054105 Eastern Standard Time
| Itration            | 1558     |
| Real Det Return     | 545      |
| Real Sto Return     | 481      |
| Reward Loss         | -48.5    |
| Running Env Steps   | 779000   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 1558     |
----------------------------------
2025-02-01 18:23:48.823150 Eastern Standard Time
| Itration            | 1559     |
| Real Det Return     | 541      |
| Real Sto Return     | 492      |
| Reward Loss         | -27.2    |
| Running Env Steps   | 779500   |
| Running Forward KL  | -5.49    |
| Running Reverse KL  | 5.44     |
| Running Update Time | 1559     |
----------------------------------
2025-02-01 18:24:04.420144 Eastern Standard Time
| Itration            | 1560     |
| Real Det Return     | 531      |
| Real Sto Return     | 485      |
| Reward Loss         | -47.9    |
| Running Env Steps   | 780000   |
| Running Forward KL  | -4.68    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1560     |
----------------------------------
2025-02-01 18:24:19.964688 Eastern Standard Time
| Itration            | 1561     |
| Real Det Return     | 522      |
| Real Sto Return     | 475      |
| Reward Loss         | -43.9    |
| Running Env Steps   | 780500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1561     |
----------------------------------
2025-02-01 18:24:35.506571 Eastern Standard Time
| Itration            | 1562     |
| Real Det Return     | 515      |
| Real Sto Return     | 477      |
| Reward Loss         | -49.4    |
| Running Env Steps   | 781000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 1562     |
----------------------------------
2025-02-01 18:24:51.070835 Eastern Standard Time
| Itration            | 1563     |
| Real Det Return     | 516      |
| Real Sto Return     | 480      |
| Reward Loss         | -49.3    |
| Running Env Steps   | 781500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1563     |
----------------------------------
2025-02-01 18:25:06.699719 Eastern Standard Time
| Itration            | 1564     |
| Real Det Return     | 519      |
| Real Sto Return     | 465      |
| Reward Loss         | -49.7    |
| Running Env Steps   | 782000   |
| Running Forward KL  | -5.43    |
| Running Reverse KL  | 4.84     |
| Running Update Time | 1564     |
----------------------------------
2025-02-01 18:25:22.255398 Eastern Standard Time
| Itration            | 1565     |
| Real Det Return     | 534      |
| Real Sto Return     | 494      |
| Reward Loss         | -39.2    |
| Running Env Steps   | 782500   |
| Running Forward KL  | -5.32    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1565     |
----------------------------------
2025-02-01 18:25:37.787676 Eastern Standard Time
| Itration            | 1566     |
| Real Det Return     | 530      |
| Real Sto Return     | 478      |
| Reward Loss         | -36      |
| Running Env Steps   | 783000   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 5.07     |
| Running Update Time | 1566     |
----------------------------------
2025-02-01 18:25:53.315277 Eastern Standard Time
| Itration            | 1567     |
| Real Det Return     | 525      |
| Real Sto Return     | 493      |
| Reward Loss         | -38      |
| Running Env Steps   | 783500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1567     |
----------------------------------
2025-02-01 18:26:08.856011 Eastern Standard Time
| Itration            | 1568     |
| Real Det Return     | 528      |
| Real Sto Return     | 480      |
| Reward Loss         | -44.3    |
| Running Env Steps   | 784000   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1568     |
----------------------------------
2025-02-01 18:26:24.338841 Eastern Standard Time
| Itration            | 1569     |
| Real Det Return     | 528      |
| Real Sto Return     | 488      |
| Reward Loss         | -39      |
| Running Env Steps   | 784500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.94     |
| Running Update Time | 1569     |
----------------------------------
2025-02-01 18:26:39.877009 Eastern Standard Time
| Itration            | 1570     |
| Real Det Return     | 533      |
| Real Sto Return     | 480      |
| Reward Loss         | -43.5    |
| Running Env Steps   | 785000   |
| Running Forward KL  | -5.41    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1570     |
----------------------------------
2025-02-01 18:26:55.448043 Eastern Standard Time
| Itration            | 1571     |
| Real Det Return     | 546      |
| Real Sto Return     | 488      |
| Reward Loss         | -42.4    |
| Running Env Steps   | 785500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1571     |
----------------------------------
2025-02-01 18:27:10.987715 Eastern Standard Time
| Itration            | 1572     |
| Real Det Return     | 553      |
| Real Sto Return     | 492      |
| Reward Loss         | -39.8    |
| Running Env Steps   | 786000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1572     |
----------------------------------
2025-02-01 18:27:26.512369 Eastern Standard Time
| Itration            | 1573     |
| Real Det Return     | 543      |
| Real Sto Return     | 490      |
| Reward Loss         | -33.1    |
| Running Env Steps   | 786500   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1573     |
----------------------------------
2025-02-01 18:27:42.044811 Eastern Standard Time
| Itration            | 1574     |
| Real Det Return     | 533      |
| Real Sto Return     | 480      |
| Reward Loss         | -48.8    |
| Running Env Steps   | 787000   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.45     |
| Running Update Time | 1574     |
----------------------------------
2025-02-01 18:27:57.523885 Eastern Standard Time
| Itration            | 1575     |
| Real Det Return     | 539      |
| Real Sto Return     | 489      |
| Reward Loss         | -53.8    |
| Running Env Steps   | 787500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1575     |
----------------------------------
2025-02-01 18:28:13.078683 Eastern Standard Time
| Itration            | 1576     |
| Real Det Return     | 529      |
| Real Sto Return     | 481      |
| Reward Loss         | -40.4    |
| Running Env Steps   | 788000   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1576     |
----------------------------------
2025-02-01 18:28:28.674366 Eastern Standard Time
| Itration            | 1577     |
| Real Det Return     | 539      |
| Real Sto Return     | 487      |
| Reward Loss         | -41.5    |
| Running Env Steps   | 788500   |
| Running Forward KL  | -5.45    |
| Running Reverse KL  | 4.68     |
| Running Update Time | 1577     |
----------------------------------
2025-02-01 18:28:44.242740 Eastern Standard Time
| Itration            | 1578     |
| Real Det Return     | 517      |
| Real Sto Return     | 467      |
| Reward Loss         | -48.7    |
| Running Env Steps   | 789000   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 1578     |
----------------------------------
2025-02-01 18:28:59.851583 Eastern Standard Time
| Itration            | 1579     |
| Real Det Return     | 531      |
| Real Sto Return     | 485      |
| Reward Loss         | -40.2    |
| Running Env Steps   | 789500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 5        |
| Running Update Time | 1579     |
----------------------------------
2025-02-01 18:29:15.683234 Eastern Standard Time
| Itration            | 1580     |
| Real Det Return     | 548      |
| Real Sto Return     | 490      |
| Reward Loss         | -46.4    |
| Running Env Steps   | 790000   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1580     |
----------------------------------
2025-02-01 18:29:31.318917 Eastern Standard Time
| Itration            | 1581     |
| Real Det Return     | 531      |
| Real Sto Return     | 485      |
| Reward Loss         | -52.4    |
| Running Env Steps   | 790500   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1581     |
----------------------------------
2025-02-01 18:29:46.969809 Eastern Standard Time
| Itration            | 1582     |
| Real Det Return     | 528      |
| Real Sto Return     | 490      |
| Reward Loss         | -47      |
| Running Env Steps   | 791000   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1582     |
----------------------------------
2025-02-01 18:30:02.484098 Eastern Standard Time
| Itration            | 1583     |
| Real Det Return     | 545      |
| Real Sto Return     | 495      |
| Reward Loss         | -38.4    |
| Running Env Steps   | 791500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1583     |
----------------------------------
2025-02-01 18:30:18.070475 Eastern Standard Time
| Itration            | 1584     |
| Real Det Return     | 540      |
| Real Sto Return     | 472      |
| Reward Loss         | -47.3    |
| Running Env Steps   | 792000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1584     |
----------------------------------
2025-02-01 18:30:33.561987 Eastern Standard Time
| Itration            | 1585     |
| Real Det Return     | 537      |
| Real Sto Return     | 471      |
| Reward Loss         | -44.8    |
| Running Env Steps   | 792500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.37     |
| Running Update Time | 1585     |
----------------------------------
2025-02-01 18:30:49.119153 Eastern Standard Time
| Itration            | 1586     |
| Real Det Return     | 515      |
| Real Sto Return     | 485      |
| Reward Loss         | -31.2    |
| Running Env Steps   | 793000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 1586     |
----------------------------------
2025-02-01 18:31:04.668368 Eastern Standard Time
| Itration            | 1587     |
| Real Det Return     | 541      |
| Real Sto Return     | 480      |
| Reward Loss         | -36.4    |
| Running Env Steps   | 793500   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 5.17     |
| Running Update Time | 1587     |
----------------------------------
2025-02-01 18:31:20.134881 Eastern Standard Time
| Itration            | 1588     |
| Real Det Return     | 540      |
| Real Sto Return     | 490      |
| Reward Loss         | -41.3    |
| Running Env Steps   | 794000   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1588     |
----------------------------------
2025-02-01 18:31:35.732069 Eastern Standard Time
| Itration            | 1589     |
| Real Det Return     | 533      |
| Real Sto Return     | 482      |
| Reward Loss         | -40.7    |
| Running Env Steps   | 794500   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1589     |
----------------------------------
2025-02-01 18:31:51.357256 Eastern Standard Time
| Itration            | 1590     |
| Real Det Return     | 536      |
| Real Sto Return     | 479      |
| Reward Loss         | -41.2    |
| Running Env Steps   | 795000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 5.07     |
| Running Update Time | 1590     |
----------------------------------
2025-02-01 18:32:06.914630 Eastern Standard Time
| Itration            | 1591     |
| Real Det Return     | 542      |
| Real Sto Return     | 492      |
| Reward Loss         | -43.3    |
| Running Env Steps   | 795500   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 1591     |
----------------------------------
2025-02-01 18:32:22.457783 Eastern Standard Time
| Itration            | 1592     |
| Real Det Return     | 535      |
| Real Sto Return     | 487      |
| Reward Loss         | -55.8    |
| Running Env Steps   | 796000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1592     |
----------------------------------
2025-02-01 18:32:37.992510 Eastern Standard Time
| Itration            | 1593     |
| Real Det Return     | 524      |
| Real Sto Return     | 486      |
| Reward Loss         | -39.5    |
| Running Env Steps   | 796500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 5.1      |
| Running Update Time | 1593     |
----------------------------------
2025-02-01 18:32:53.453542 Eastern Standard Time
| Itration            | 1594     |
| Real Det Return     | 546      |
| Real Sto Return     | 485      |
| Reward Loss         | -38.8    |
| Running Env Steps   | 797000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 1594     |
----------------------------------
2025-02-01 18:33:09.087225 Eastern Standard Time
| Itration            | 1595     |
| Real Det Return     | 531      |
| Real Sto Return     | 495      |
| Reward Loss         | -31.7    |
| Running Env Steps   | 797500   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 1595     |
----------------------------------
2025-02-01 18:33:24.640912 Eastern Standard Time
| Itration            | 1596     |
| Real Det Return     | 531      |
| Real Sto Return     | 471      |
| Reward Loss         | -45.5    |
| Running Env Steps   | 798000   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 4.45     |
| Running Update Time | 1596     |
----------------------------------
2025-02-01 18:33:40.260854 Eastern Standard Time
| Itration            | 1597     |
| Real Det Return     | 533      |
| Real Sto Return     | 497      |
| Reward Loss         | -52      |
| Running Env Steps   | 798500   |
| Running Forward KL  | -5.38    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1597     |
----------------------------------
2025-02-01 18:33:55.847634 Eastern Standard Time
| Itration            | 1598     |
| Real Det Return     | 535      |
| Real Sto Return     | 484      |
| Reward Loss         | -48.3    |
| Running Env Steps   | 799000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 5.4      |
| Running Update Time | 1598     |
----------------------------------
2025-02-01 18:34:11.432519 Eastern Standard Time
| Itration            | 1599     |
| Real Det Return     | 532      |
| Real Sto Return     | 478      |
| Reward Loss         | -46.1    |
| Running Env Steps   | 799500   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 4.39     |
| Running Update Time | 1599     |
----------------------------------
2025-02-01 18:34:26.998745 Eastern Standard Time
| Itration            | 1600     |
| Real Det Return     | 526      |
| Real Sto Return     | 487      |
| Reward Loss         | -41.4    |
| Running Env Steps   | 800000   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 1600     |
----------------------------------
2025-02-01 18:34:42.543522 Eastern Standard Time
| Itration            | 1601     |
| Real Det Return     | 526      |
| Real Sto Return     | 473      |
| Reward Loss         | -51      |
| Running Env Steps   | 800500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 1601     |
----------------------------------
2025-02-01 18:34:58.117865 Eastern Standard Time
| Itration            | 1602     |
| Real Det Return     | 525      |
| Real Sto Return     | 473      |
| Reward Loss         | -48.6    |
| Running Env Steps   | 801000   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1602     |
----------------------------------
2025-02-01 18:35:13.704961 Eastern Standard Time
| Itration            | 1603     |
| Real Det Return     | 536      |
| Real Sto Return     | 484      |
| Reward Loss         | -25.5    |
| Running Env Steps   | 801500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 5.85     |
| Running Update Time | 1603     |
----------------------------------
2025-02-01 18:35:29.447810 Eastern Standard Time
| Itration            | 1604     |
| Real Det Return     | 529      |
| Real Sto Return     | 476      |
| Reward Loss         | -49.6    |
| Running Env Steps   | 802000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 5.38     |
| Running Update Time | 1604     |
----------------------------------
2025-02-01 18:35:45.042511 Eastern Standard Time
| Itration            | 1605     |
| Real Det Return     | 537      |
| Real Sto Return     | 479      |
| Reward Loss         | -38.2    |
| Running Env Steps   | 802500   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 4.46     |
| Running Update Time | 1605     |
----------------------------------
2025-02-01 18:36:00.656380 Eastern Standard Time
| Itration            | 1606     |
| Real Det Return     | 528      |
| Real Sto Return     | 481      |
| Reward Loss         | -53.1    |
| Running Env Steps   | 803000   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 4.49     |
| Running Update Time | 1606     |
----------------------------------
2025-02-01 18:36:16.225122 Eastern Standard Time
| Itration            | 1607     |
| Real Det Return     | 534      |
| Real Sto Return     | 489      |
| Reward Loss         | -50.7    |
| Running Env Steps   | 803500   |
| Running Forward KL  | -5.41    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1607     |
----------------------------------
2025-02-01 18:36:31.827966 Eastern Standard Time
| Itration            | 1608     |
| Real Det Return     | 532      |
| Real Sto Return     | 485      |
| Reward Loss         | -45.8    |
| Running Env Steps   | 804000   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 1608     |
----------------------------------
2025-02-01 18:36:47.326552 Eastern Standard Time
| Itration            | 1609     |
| Real Det Return     | 538      |
| Real Sto Return     | 473      |
| Reward Loss         | -37.8    |
| Running Env Steps   | 804500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1609     |
----------------------------------
2025-02-01 18:37:02.880214 Eastern Standard Time
| Itration            | 1610     |
| Real Det Return     | 523      |
| Real Sto Return     | 485      |
| Reward Loss         | -48      |
| Running Env Steps   | 805000   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1610     |
----------------------------------
2025-02-01 18:37:18.425041 Eastern Standard Time
| Itration            | 1611     |
| Real Det Return     | 518      |
| Real Sto Return     | 474      |
| Reward Loss         | -42.2    |
| Running Env Steps   | 805500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 5.25     |
| Running Update Time | 1611     |
----------------------------------
2025-02-01 18:37:33.970379 Eastern Standard Time
| Itration            | 1612     |
| Real Det Return     | 534      |
| Real Sto Return     | 486      |
| Reward Loss         | -43.1    |
| Running Env Steps   | 806000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 1612     |
----------------------------------
2025-02-01 18:37:49.556118 Eastern Standard Time
| Itration            | 1613     |
| Real Det Return     | 529      |
| Real Sto Return     | 474      |
| Reward Loss         | -57.7    |
| Running Env Steps   | 806500   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.41     |
| Running Update Time | 1613     |
----------------------------------
2025-02-01 18:38:05.105156 Eastern Standard Time
| Itration            | 1614     |
| Real Det Return     | 542      |
| Real Sto Return     | 497      |
| Reward Loss         | -41      |
| Running Env Steps   | 807000   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1614     |
----------------------------------
2025-02-01 18:38:20.666741 Eastern Standard Time
| Itration            | 1615     |
| Real Det Return     | 541      |
| Real Sto Return     | 489      |
| Reward Loss         | -42      |
| Running Env Steps   | 807500   |
| Running Forward KL  | -5.5     |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1615     |
----------------------------------
2025-02-01 18:38:36.354188 Eastern Standard Time
| Itration            | 1616     |
| Real Det Return     | 547      |
| Real Sto Return     | 493      |
| Reward Loss         | -40.3    |
| Running Env Steps   | 808000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 1616     |
----------------------------------
2025-02-01 18:38:51.967888 Eastern Standard Time
| Itration            | 1617     |
| Real Det Return     | 527      |
| Real Sto Return     | 468      |
| Reward Loss         | -44.8    |
| Running Env Steps   | 808500   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1617     |
----------------------------------
2025-02-01 18:39:07.538697 Eastern Standard Time
| Itration            | 1618     |
| Real Det Return     | 534      |
| Real Sto Return     | 487      |
| Reward Loss         | -34.4    |
| Running Env Steps   | 809000   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 4.32     |
| Running Update Time | 1618     |
----------------------------------
2025-02-01 18:39:23.114788 Eastern Standard Time
| Itration            | 1619     |
| Real Det Return     | 534      |
| Real Sto Return     | 480      |
| Reward Loss         | -46.1    |
| Running Env Steps   | 809500   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1619     |
----------------------------------
2025-02-01 18:39:38.696410 Eastern Standard Time
| Itration            | 1620     |
| Real Det Return     | 542      |
| Real Sto Return     | 497      |
| Reward Loss         | -35.2    |
| Running Env Steps   | 810000   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1620     |
----------------------------------
2025-02-01 18:39:54.282793 Eastern Standard Time
| Itration            | 1621     |
| Real Det Return     | 527      |
| Real Sto Return     | 478      |
| Reward Loss         | -37.7    |
| Running Env Steps   | 810500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1621     |
----------------------------------
2025-02-01 18:40:09.787478 Eastern Standard Time
| Itration            | 1622     |
| Real Det Return     | 535      |
| Real Sto Return     | 492      |
| Reward Loss         | -40.3    |
| Running Env Steps   | 811000   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1622     |
----------------------------------
2025-02-01 18:40:25.356010 Eastern Standard Time
| Itration            | 1623     |
| Real Det Return     | 512      |
| Real Sto Return     | 469      |
| Reward Loss         | -48.9    |
| Running Env Steps   | 811500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 5.1      |
| Running Update Time | 1623     |
----------------------------------
2025-02-01 18:40:40.883645 Eastern Standard Time
| Itration            | 1624     |
| Real Det Return     | 532      |
| Real Sto Return     | 483      |
| Reward Loss         | -36.9    |
| Running Env Steps   | 812000   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 1624     |
----------------------------------
2025-02-01 18:40:56.563376 Eastern Standard Time
| Itration            | 1625     |
| Real Det Return     | 521      |
| Real Sto Return     | 469      |
| Reward Loss         | -47.1    |
| Running Env Steps   | 812500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.07     |
| Running Update Time | 1625     |
----------------------------------
2025-02-01 18:41:12.123571 Eastern Standard Time
| Itration            | 1626     |
| Real Det Return     | 519      |
| Real Sto Return     | 480      |
| Reward Loss         | -47.1    |
| Running Env Steps   | 813000   |
| Running Forward KL  | -4.62    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 1626     |
----------------------------------
2025-02-01 18:41:27.747687 Eastern Standard Time
| Itration            | 1627     |
| Real Det Return     | 538      |
| Real Sto Return     | 492      |
| Reward Loss         | -44.9    |
| Running Env Steps   | 813500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1627     |
----------------------------------
2025-02-01 18:41:43.304717 Eastern Standard Time
| Itration            | 1628     |
| Real Det Return     | 539      |
| Real Sto Return     | 501      |
| Reward Loss         | -36.9    |
| Running Env Steps   | 814000   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.55     |
| Running Update Time | 1628     |
----------------------------------
2025-02-01 18:41:58.860572 Eastern Standard Time
| Itration            | 1629     |
| Real Det Return     | 522      |
| Real Sto Return     | 465      |
| Reward Loss         | -40.1    |
| Running Env Steps   | 814500   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 5.1      |
| Running Update Time | 1629     |
----------------------------------
2025-02-01 18:42:14.461647 Eastern Standard Time
| Itration            | 1630     |
| Real Det Return     | 550      |
| Real Sto Return     | 495      |
| Reward Loss         | -43.5    |
| Running Env Steps   | 815000   |
| Running Forward KL  | -5.49    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1630     |
----------------------------------
2025-02-01 18:42:30.075409 Eastern Standard Time
| Itration            | 1631     |
| Real Det Return     | 540      |
| Real Sto Return     | 486      |
| Reward Loss         | -55.6    |
| Running Env Steps   | 815500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1631     |
----------------------------------
2025-02-01 18:42:45.732527 Eastern Standard Time
| Itration            | 1632     |
| Real Det Return     | 538      |
| Real Sto Return     | 482      |
| Reward Loss         | -48.9    |
| Running Env Steps   | 816000   |
| Running Forward KL  | -5.44    |
| Running Reverse KL  | 4.35     |
| Running Update Time | 1632     |
----------------------------------
2025-02-01 18:43:01.306944 Eastern Standard Time
| Itration            | 1633     |
| Real Det Return     | 539      |
| Real Sto Return     | 490      |
| Reward Loss         | -51.1    |
| Running Env Steps   | 816500   |
| Running Forward KL  | -5.38    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1633     |
----------------------------------
2025-02-01 18:43:16.869469 Eastern Standard Time
| Itration            | 1634     |
| Real Det Return     | 551      |
| Real Sto Return     | 496      |
| Reward Loss         | -39.4    |
| Running Env Steps   | 817000   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1634     |
----------------------------------
2025-02-01 18:43:32.434691 Eastern Standard Time
| Itration            | 1635     |
| Real Det Return     | 525      |
| Real Sto Return     | 474      |
| Reward Loss         | -47.8    |
| Running Env Steps   | 817500   |
| Running Forward KL  | -5.44    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 1635     |
----------------------------------
2025-02-01 18:43:48.044598 Eastern Standard Time
| Itration            | 1636     |
| Real Det Return     | 542      |
| Real Sto Return     | 484      |
| Reward Loss         | -39.6    |
| Running Env Steps   | 818000   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 4.68     |
| Running Update Time | 1636     |
----------------------------------
2025-02-01 18:44:03.603205 Eastern Standard Time
| Itration            | 1637     |
| Real Det Return     | 548      |
| Real Sto Return     | 470      |
| Reward Loss         | -45.3    |
| Running Env Steps   | 818500   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1637     |
----------------------------------
2025-02-01 18:44:19.428871 Eastern Standard Time
| Itration            | 1638     |
| Real Det Return     | 534      |
| Real Sto Return     | 485      |
| Reward Loss         | -36.3    |
| Running Env Steps   | 819000   |
| Running Forward KL  | -5.55    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 1638     |
----------------------------------
2025-02-01 18:44:35.049200 Eastern Standard Time
| Itration            | 1639     |
| Real Det Return     | 536      |
| Real Sto Return     | 486      |
| Reward Loss         | -51      |
| Running Env Steps   | 819500   |
| Running Forward KL  | -5.36    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1639     |
----------------------------------
2025-02-01 18:44:50.615501 Eastern Standard Time
| Itration            | 1640     |
| Real Det Return     | 544      |
| Real Sto Return     | 483      |
| Reward Loss         | -37.4    |
| Running Env Steps   | 820000   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 5.55     |
| Running Update Time | 1640     |
----------------------------------
2025-02-01 18:45:06.185747 Eastern Standard Time
| Itration            | 1641     |
| Real Det Return     | 544      |
| Real Sto Return     | 494      |
| Reward Loss         | -39.6    |
| Running Env Steps   | 820500   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 5.32     |
| Running Update Time | 1641     |
----------------------------------
2025-02-01 18:45:21.793739 Eastern Standard Time
| Itration            | 1642     |
| Real Det Return     | 550      |
| Real Sto Return     | 485      |
| Reward Loss         | -50.1    |
| Running Env Steps   | 821000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1642     |
----------------------------------
2025-02-01 18:45:37.377142 Eastern Standard Time
| Itration            | 1643     |
| Real Det Return     | 522      |
| Real Sto Return     | 476      |
| Reward Loss         | -44.6    |
| Running Env Steps   | 821500   |
| Running Forward KL  | -4.83    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 1643     |
----------------------------------
2025-02-01 18:45:53.013292 Eastern Standard Time
| Itration            | 1644     |
| Real Det Return     | 519      |
| Real Sto Return     | 481      |
| Reward Loss         | -53.6    |
| Running Env Steps   | 822000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.48     |
| Running Update Time | 1644     |
----------------------------------
2025-02-01 18:46:08.597667 Eastern Standard Time
| Itration            | 1645     |
| Real Det Return     | 533      |
| Real Sto Return     | 484      |
| Reward Loss         | -42.5    |
| Running Env Steps   | 822500   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 1645     |
----------------------------------
2025-02-01 18:46:24.118106 Eastern Standard Time
| Itration            | 1646     |
| Real Det Return     | 530      |
| Real Sto Return     | 473      |
| Reward Loss         | -49.2    |
| Running Env Steps   | 823000   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 5.2      |
| Running Update Time | 1646     |
----------------------------------
2025-02-01 18:46:39.674146 Eastern Standard Time
| Itration            | 1647     |
| Real Det Return     | 544      |
| Real Sto Return     | 474      |
| Reward Loss         | -44.5    |
| Running Env Steps   | 823500   |
| Running Forward KL  | -4.89    |
| Running Reverse KL  | 5.24     |
| Running Update Time | 1647     |
----------------------------------
2025-02-01 18:46:55.202406 Eastern Standard Time
| Itration            | 1648     |
| Real Det Return     | 536      |
| Real Sto Return     | 469      |
| Reward Loss         | -57      |
| Running Env Steps   | 824000   |
| Running Forward KL  | -4.7     |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1648     |
----------------------------------
2025-02-01 18:47:10.848202 Eastern Standard Time
| Itration            | 1649     |
| Real Det Return     | 511      |
| Real Sto Return     | 475      |
| Reward Loss         | -43.1    |
| Running Env Steps   | 824500   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1649     |
----------------------------------
2025-02-01 18:47:26.493791 Eastern Standard Time
| Itration            | 1650     |
| Real Det Return     | 529      |
| Real Sto Return     | 491      |
| Reward Loss         | -42.2    |
| Running Env Steps   | 825000   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1650     |
----------------------------------
2025-02-01 18:47:42.091318 Eastern Standard Time
| Itration            | 1651     |
| Real Det Return     | 515      |
| Real Sto Return     | 471      |
| Reward Loss         | -46.2    |
| Running Env Steps   | 825500   |
| Running Forward KL  | -5.32    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 1651     |
----------------------------------
2025-02-01 18:47:57.651314 Eastern Standard Time
| Itration            | 1652     |
| Real Det Return     | 524      |
| Real Sto Return     | 478      |
| Reward Loss         | -51.1    |
| Running Env Steps   | 826000   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1652     |
----------------------------------
2025-02-01 18:48:13.274583 Eastern Standard Time
| Itration            | 1653     |
| Real Det Return     | 541      |
| Real Sto Return     | 490      |
| Reward Loss         | -54.2    |
| Running Env Steps   | 826500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1653     |
----------------------------------
2025-02-01 18:48:28.923612 Eastern Standard Time
| Itration            | 1654     |
| Real Det Return     | 546      |
| Real Sto Return     | 485      |
| Reward Loss         | -45.6    |
| Running Env Steps   | 827000   |
| Running Forward KL  | -5.41    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 1654     |
----------------------------------
2025-02-01 18:48:44.529068 Eastern Standard Time
| Itration            | 1655     |
| Real Det Return     | 537      |
| Real Sto Return     | 482      |
| Reward Loss         | -51.6    |
| Running Env Steps   | 827500   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1655     |
----------------------------------
2025-02-01 18:49:00.200720 Eastern Standard Time
| Itration            | 1656     |
| Real Det Return     | 530      |
| Real Sto Return     | 485      |
| Reward Loss         | -37.9    |
| Running Env Steps   | 828000   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 5.09     |
| Running Update Time | 1656     |
----------------------------------
2025-02-01 18:49:15.791649 Eastern Standard Time
| Itration            | 1657     |
| Real Det Return     | 534      |
| Real Sto Return     | 484      |
| Reward Loss         | -34.3    |
| Running Env Steps   | 828500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 5.42     |
| Running Update Time | 1657     |
----------------------------------
2025-02-01 18:49:31.464762 Eastern Standard Time
| Itration            | 1658     |
| Real Det Return     | 526      |
| Real Sto Return     | 479      |
| Reward Loss         | -46      |
| Running Env Steps   | 829000   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 5.52     |
| Running Update Time | 1658     |
----------------------------------
2025-02-01 18:49:46.995721 Eastern Standard Time
| Itration            | 1659     |
| Real Det Return     | 512      |
| Real Sto Return     | 468      |
| Reward Loss         | -56      |
| Running Env Steps   | 829500   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 4.4      |
| Running Update Time | 1659     |
----------------------------------
2025-02-01 18:50:02.538435 Eastern Standard Time
| Itration            | 1660     |
| Real Det Return     | 523      |
| Real Sto Return     | 473      |
| Reward Loss         | -51      |
| Running Env Steps   | 830000   |
| Running Forward KL  | -4.67    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1660     |
----------------------------------
2025-02-01 18:50:18.147132 Eastern Standard Time
| Itration            | 1661     |
| Real Det Return     | 511      |
| Real Sto Return     | 475      |
| Reward Loss         | -47.8    |
| Running Env Steps   | 830500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1661     |
----------------------------------
2025-02-01 18:50:33.767200 Eastern Standard Time
| Itration            | 1662     |
| Real Det Return     | 532      |
| Real Sto Return     | 482      |
| Reward Loss         | -45.7    |
| Running Env Steps   | 831000   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1662     |
----------------------------------
2025-02-01 18:50:49.385138 Eastern Standard Time
| Itration            | 1663     |
| Real Det Return     | 529      |
| Real Sto Return     | 480      |
| Reward Loss         | -37      |
| Running Env Steps   | 831500   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1663     |
----------------------------------
2025-02-01 18:51:04.963749 Eastern Standard Time
| Itration            | 1664     |
| Real Det Return     | 532      |
| Real Sto Return     | 484      |
| Reward Loss         | -42.3    |
| Running Env Steps   | 832000   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1664     |
----------------------------------
2025-02-01 18:51:20.521781 Eastern Standard Time
| Itration            | 1665     |
| Real Det Return     | 521      |
| Real Sto Return     | 471      |
| Reward Loss         | -36.5    |
| Running Env Steps   | 832500   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 1665     |
----------------------------------
2025-02-01 18:51:36.114211 Eastern Standard Time
| Itration            | 1666     |
| Real Det Return     | 520      |
| Real Sto Return     | 483      |
| Reward Loss         | -58.3    |
| Running Env Steps   | 833000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1666     |
----------------------------------
2025-02-01 18:51:51.692151 Eastern Standard Time
| Itration            | 1667     |
| Real Det Return     | 510      |
| Real Sto Return     | 467      |
| Reward Loss         | -50.8    |
| Running Env Steps   | 833500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1667     |
----------------------------------
2025-02-01 18:52:07.340047 Eastern Standard Time
| Itration            | 1668     |
| Real Det Return     | 540      |
| Real Sto Return     | 482      |
| Reward Loss         | -50.1    |
| Running Env Steps   | 834000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 1668     |
----------------------------------
2025-02-01 18:52:22.949437 Eastern Standard Time
| Itration            | 1669     |
| Real Det Return     | 520      |
| Real Sto Return     | 471      |
| Reward Loss         | -34.7    |
| Running Env Steps   | 834500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1669     |
----------------------------------
2025-02-01 18:52:38.540874 Eastern Standard Time
| Itration            | 1670     |
| Real Det Return     | 542      |
| Real Sto Return     | 483      |
| Reward Loss         | -46.3    |
| Running Env Steps   | 835000   |
| Running Forward KL  | -5.75    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1670     |
----------------------------------
2025-02-01 18:52:54.242068 Eastern Standard Time
| Itration            | 1671     |
| Real Det Return     | 513      |
| Real Sto Return     | 465      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 835500   |
| Running Forward KL  | -5.47    |
| Running Reverse KL  | 5.24     |
| Running Update Time | 1671     |
----------------------------------
2025-02-01 18:53:09.869677 Eastern Standard Time
| Itration            | 1672     |
| Real Det Return     | 541      |
| Real Sto Return     | 493      |
| Reward Loss         | -50.3    |
| Running Env Steps   | 836000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 1672     |
----------------------------------
2025-02-01 18:53:25.502130 Eastern Standard Time
| Itration            | 1673     |
| Real Det Return     | 531      |
| Real Sto Return     | 488      |
| Reward Loss         | -41.7    |
| Running Env Steps   | 836500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1673     |
----------------------------------
2025-02-01 18:53:41.136040 Eastern Standard Time
| Itration            | 1674     |
| Real Det Return     | 527      |
| Real Sto Return     | 490      |
| Reward Loss         | -48.3    |
| Running Env Steps   | 837000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.79     |
| Running Update Time | 1674     |
----------------------------------
2025-02-01 18:53:56.701604 Eastern Standard Time
| Itration            | 1675     |
| Real Det Return     | 518      |
| Real Sto Return     | 472      |
| Reward Loss         | -45.7    |
| Running Env Steps   | 837500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 1675     |
----------------------------------
2025-02-01 18:54:12.263764 Eastern Standard Time
| Itration            | 1676     |
| Real Det Return     | 513      |
| Real Sto Return     | 473      |
| Reward Loss         | -57.7    |
| Running Env Steps   | 838000   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 5        |
| Running Update Time | 1676     |
----------------------------------
2025-02-01 18:54:27.836947 Eastern Standard Time
| Itration            | 1677     |
| Real Det Return     | 527      |
| Real Sto Return     | 485      |
| Reward Loss         | -53.9    |
| Running Env Steps   | 838500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1677     |
----------------------------------
2025-02-01 18:54:43.466322 Eastern Standard Time
| Itration            | 1678     |
| Real Det Return     | 536      |
| Real Sto Return     | 481      |
| Reward Loss         | -40.7    |
| Running Env Steps   | 839000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1678     |
----------------------------------
2025-02-01 18:54:59.011002 Eastern Standard Time
| Itration            | 1679     |
| Real Det Return     | 545      |
| Real Sto Return     | 488      |
| Reward Loss         | -45.7    |
| Running Env Steps   | 839500   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1679     |
----------------------------------
2025-02-01 18:55:14.590447 Eastern Standard Time
| Itration            | 1680     |
| Real Det Return     | 525      |
| Real Sto Return     | 480      |
| Reward Loss         | -43.8    |
| Running Env Steps   | 840000   |
| Running Forward KL  | -5.36    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1680     |
----------------------------------
2025-02-01 18:55:30.155681 Eastern Standard Time
| Itration            | 1681     |
| Real Det Return     | 545      |
| Real Sto Return     | 477      |
| Reward Loss         | -48.2    |
| Running Env Steps   | 840500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 5.21     |
| Running Update Time | 1681     |
----------------------------------
2025-02-01 18:55:45.778548 Eastern Standard Time
| Itration            | 1682     |
| Real Det Return     | 532      |
| Real Sto Return     | 477      |
| Reward Loss         | -40.8    |
| Running Env Steps   | 841000   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.79     |
| Running Update Time | 1682     |
----------------------------------
2025-02-01 18:56:01.339401 Eastern Standard Time
| Itration            | 1683     |
| Real Det Return     | 531      |
| Real Sto Return     | 478      |
| Reward Loss         | -50.9    |
| Running Env Steps   | 841500   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 5.18     |
| Running Update Time | 1683     |
----------------------------------
2025-02-01 18:56:16.981715 Eastern Standard Time
| Itration            | 1684     |
| Real Det Return     | 524      |
| Real Sto Return     | 483      |
| Reward Loss         | -44.9    |
| Running Env Steps   | 842000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1684     |
----------------------------------
2025-02-01 18:56:32.522842 Eastern Standard Time
| Itration            | 1685     |
| Real Det Return     | 536      |
| Real Sto Return     | 477      |
| Reward Loss         | -47.5    |
| Running Env Steps   | 842500   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1685     |
----------------------------------
2025-02-01 18:56:48.137477 Eastern Standard Time
| Itration            | 1686     |
| Real Det Return     | 532      |
| Real Sto Return     | 480      |
| Reward Loss         | -47.3    |
| Running Env Steps   | 843000   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 5.51     |
| Running Update Time | 1686     |
----------------------------------
2025-02-01 18:57:03.680692 Eastern Standard Time
| Itration            | 1687     |
| Real Det Return     | 532      |
| Real Sto Return     | 478      |
| Reward Loss         | -45.4    |
| Running Env Steps   | 843500   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1687     |
----------------------------------
2025-02-01 18:57:19.211303 Eastern Standard Time
| Itration            | 1688     |
| Real Det Return     | 533      |
| Real Sto Return     | 488      |
| Reward Loss         | -44      |
| Running Env Steps   | 844000   |
| Running Forward KL  | -5.43    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1688     |
----------------------------------
2025-02-01 18:57:34.819504 Eastern Standard Time
| Itration            | 1689     |
| Real Det Return     | 538      |
| Real Sto Return     | 490      |
| Reward Loss         | -45.8    |
| Running Env Steps   | 844500   |
| Running Forward KL  | -5.61    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1689     |
----------------------------------
2025-02-01 18:57:50.434475 Eastern Standard Time
| Itration            | 1690     |
| Real Det Return     | 539      |
| Real Sto Return     | 490      |
| Reward Loss         | -50.7    |
| Running Env Steps   | 845000   |
| Running Forward KL  | -4.72    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1690     |
----------------------------------
2025-02-01 18:58:05.995389 Eastern Standard Time
| Itration            | 1691     |
| Real Det Return     | 518      |
| Real Sto Return     | 474      |
| Reward Loss         | -54.8    |
| Running Env Steps   | 845500   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 1691     |
----------------------------------
2025-02-01 18:58:21.545456 Eastern Standard Time
| Itration            | 1692     |
| Real Det Return     | 543      |
| Real Sto Return     | 485      |
| Reward Loss         | -54.4    |
| Running Env Steps   | 846000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.25     |
| Running Update Time | 1692     |
----------------------------------
2025-02-01 18:58:37.260795 Eastern Standard Time
| Itration            | 1693     |
| Real Det Return     | 541      |
| Real Sto Return     | 498      |
| Reward Loss         | -36.6    |
| Running Env Steps   | 846500   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 1693     |
----------------------------------
2025-02-01 18:58:52.843938 Eastern Standard Time
| Itration            | 1694     |
| Real Det Return     | 517      |
| Real Sto Return     | 471      |
| Reward Loss         | -51.6    |
| Running Env Steps   | 847000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1694     |
----------------------------------
2025-02-01 18:59:08.392513 Eastern Standard Time
| Itration            | 1695     |
| Real Det Return     | 530      |
| Real Sto Return     | 473      |
| Reward Loss         | -48.8    |
| Running Env Steps   | 847500   |
| Running Forward KL  | -4.9     |
| Running Reverse KL  | 4.61     |
| Running Update Time | 1695     |
----------------------------------
2025-02-01 18:59:24.256378 Eastern Standard Time
| Itration            | 1696     |
| Real Det Return     | 530      |
| Real Sto Return     | 473      |
| Reward Loss         | -54.1    |
| Running Env Steps   | 848000   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 1696     |
----------------------------------
2025-02-01 18:59:39.965012 Eastern Standard Time
| Itration            | 1697     |
| Real Det Return     | 515      |
| Real Sto Return     | 480      |
| Reward Loss         | -48      |
| Running Env Steps   | 848500   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 5.08     |
| Running Update Time | 1697     |
----------------------------------
2025-02-01 18:59:55.556910 Eastern Standard Time
| Itration            | 1698     |
| Real Det Return     | 534      |
| Real Sto Return     | 486      |
| Reward Loss         | -36.5    |
| Running Env Steps   | 849000   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 5.98     |
| Running Update Time | 1698     |
----------------------------------
2025-02-01 19:00:11.083769 Eastern Standard Time
| Itration            | 1699     |
| Real Det Return     | 529      |
| Real Sto Return     | 485      |
| Reward Loss         | -42      |
| Running Env Steps   | 849500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 5.46     |
| Running Update Time | 1699     |
----------------------------------
2025-02-01 19:00:26.668565 Eastern Standard Time
| Itration            | 1700     |
| Real Det Return     | 528      |
| Real Sto Return     | 489      |
| Reward Loss         | -49.7    |
| Running Env Steps   | 850000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1700     |
----------------------------------
2025-02-01 19:00:42.368569 Eastern Standard Time
| Itration            | 1701     |
| Real Det Return     | 537      |
| Real Sto Return     | 479      |
| Reward Loss         | -41      |
| Running Env Steps   | 850500   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1701     |
----------------------------------
2025-02-01 19:00:57.930768 Eastern Standard Time
| Itration            | 1702     |
| Real Det Return     | 536      |
| Real Sto Return     | 475      |
| Reward Loss         | -58.9    |
| Running Env Steps   | 851000   |
| Running Forward KL  | -4.82    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 1702     |
----------------------------------
2025-02-01 19:01:13.554398 Eastern Standard Time
| Itration            | 1703     |
| Real Det Return     | 534      |
| Real Sto Return     | 491      |
| Reward Loss         | -46.6    |
| Running Env Steps   | 851500   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1703     |
----------------------------------
2025-02-01 19:01:29.191310 Eastern Standard Time
| Itration            | 1704     |
| Real Det Return     | 535      |
| Real Sto Return     | 487      |
| Reward Loss         | -55.4    |
| Running Env Steps   | 852000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 5.15     |
| Running Update Time | 1704     |
----------------------------------
2025-02-01 19:01:44.807011 Eastern Standard Time
| Itration            | 1705     |
| Real Det Return     | 537      |
| Real Sto Return     | 480      |
| Reward Loss         | -39.1    |
| Running Env Steps   | 852500   |
| Running Forward KL  | -5.58    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1705     |
----------------------------------
2025-02-01 19:02:00.370823 Eastern Standard Time
| Itration            | 1706     |
| Real Det Return     | 537      |
| Real Sto Return     | 491      |
| Reward Loss         | -42.7    |
| Running Env Steps   | 853000   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 4.44     |
| Running Update Time | 1706     |
----------------------------------
2025-02-01 19:02:16.011416 Eastern Standard Time
| Itration            | 1707     |
| Real Det Return     | 532      |
| Real Sto Return     | 484      |
| Reward Loss         | -41.9    |
| Running Env Steps   | 853500   |
| Running Forward KL  | -5.51    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1707     |
----------------------------------
2025-02-01 19:02:31.623159 Eastern Standard Time
| Itration            | 1708     |
| Real Det Return     | 541      |
| Real Sto Return     | 483      |
| Reward Loss         | -47.1    |
| Running Env Steps   | 854000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 5.19     |
| Running Update Time | 1708     |
----------------------------------
2025-02-01 19:02:47.200455 Eastern Standard Time
| Itration            | 1709     |
| Real Det Return     | 533      |
| Real Sto Return     | 484      |
| Reward Loss         | -41.8    |
| Running Env Steps   | 854500   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1709     |
----------------------------------
2025-02-01 19:03:02.819554 Eastern Standard Time
| Itration            | 1710     |
| Real Det Return     | 526      |
| Real Sto Return     | 481      |
| Reward Loss         | -40      |
| Running Env Steps   | 855000   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1710     |
----------------------------------
2025-02-01 19:03:18.419450 Eastern Standard Time
| Itration            | 1711     |
| Real Det Return     | 525      |
| Real Sto Return     | 465      |
| Reward Loss         | -61.7    |
| Running Env Steps   | 855500   |
| Running Forward KL  | -5.43    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1711     |
----------------------------------
2025-02-01 19:03:34.014378 Eastern Standard Time
| Itration            | 1712     |
| Real Det Return     | 555      |
| Real Sto Return     | 496      |
| Reward Loss         | -46.7    |
| Running Env Steps   | 856000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1712     |
----------------------------------
2025-02-01 19:03:49.556040 Eastern Standard Time
| Itration            | 1713     |
| Real Det Return     | 521      |
| Real Sto Return     | 472      |
| Reward Loss         | -41.4    |
| Running Env Steps   | 856500   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1713     |
----------------------------------
2025-02-01 19:04:05.147816 Eastern Standard Time
| Itration            | 1714     |
| Real Det Return     | 522      |
| Real Sto Return     | 480      |
| Reward Loss         | -42.4    |
| Running Env Steps   | 857000   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1714     |
----------------------------------
2025-02-01 19:04:20.714223 Eastern Standard Time
| Itration            | 1715     |
| Real Det Return     | 536      |
| Real Sto Return     | 482      |
| Reward Loss         | -44.1    |
| Running Env Steps   | 857500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1715     |
----------------------------------
2025-02-01 19:04:36.298103 Eastern Standard Time
| Itration            | 1716     |
| Real Det Return     | 539      |
| Real Sto Return     | 481      |
| Reward Loss         | -49.5    |
| Running Env Steps   | 858000   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1716     |
----------------------------------
2025-02-01 19:04:51.917412 Eastern Standard Time
| Itration            | 1717     |
| Real Det Return     | 535      |
| Real Sto Return     | 492      |
| Reward Loss         | -42.5    |
| Running Env Steps   | 858500   |
| Running Forward KL  | -5.58    |
| Running Reverse KL  | 4.5      |
| Running Update Time | 1717     |
----------------------------------
2025-02-01 19:05:07.559959 Eastern Standard Time
| Itration            | 1718     |
| Real Det Return     | 529      |
| Real Sto Return     | 481      |
| Reward Loss         | -35.5    |
| Running Env Steps   | 859000   |
| Running Forward KL  | -5.44    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1718     |
----------------------------------
2025-02-01 19:05:23.133029 Eastern Standard Time
| Itration            | 1719     |
| Real Det Return     | 523      |
| Real Sto Return     | 484      |
| Reward Loss         | -46.8    |
| Running Env Steps   | 859500   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1719     |
----------------------------------
2025-02-01 19:05:38.785256 Eastern Standard Time
| Itration            | 1720     |
| Real Det Return     | 530      |
| Real Sto Return     | 477      |
| Reward Loss         | -43.9    |
| Running Env Steps   | 860000   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 4.68     |
| Running Update Time | 1720     |
----------------------------------
2025-02-01 19:05:54.350957 Eastern Standard Time
| Itration            | 1721     |
| Real Det Return     | 531      |
| Real Sto Return     | 490      |
| Reward Loss         | -52.7    |
| Running Env Steps   | 860500   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1721     |
----------------------------------
2025-02-01 19:06:09.963272 Eastern Standard Time
| Itration            | 1722     |
| Real Det Return     | 540      |
| Real Sto Return     | 506      |
| Reward Loss         | -41.5    |
| Running Env Steps   | 861000   |
| Running Forward KL  | -5.52    |
| Running Reverse KL  | 5.63     |
| Running Update Time | 1722     |
----------------------------------
2025-02-01 19:06:25.541594 Eastern Standard Time
| Itration            | 1723     |
| Real Det Return     | 536      |
| Real Sto Return     | 489      |
| Reward Loss         | -51      |
| Running Env Steps   | 861500   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1723     |
----------------------------------
2025-02-01 19:06:41.088781 Eastern Standard Time
| Itration            | 1724     |
| Real Det Return     | 531      |
| Real Sto Return     | 491      |
| Reward Loss         | -48.2    |
| Running Env Steps   | 862000   |
| Running Forward KL  | -5.63    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 1724     |
----------------------------------
2025-02-01 19:06:56.622878 Eastern Standard Time
| Itration            | 1725     |
| Real Det Return     | 540      |
| Real Sto Return     | 490      |
| Reward Loss         | -54.7    |
| Running Env Steps   | 862500   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 4.53     |
| Running Update Time | 1725     |
----------------------------------
2025-02-01 19:07:12.228780 Eastern Standard Time
| Itration            | 1726     |
| Real Det Return     | 541      |
| Real Sto Return     | 495      |
| Reward Loss         | -35.6    |
| Running Env Steps   | 863000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 5.09     |
| Running Update Time | 1726     |
----------------------------------
2025-02-01 19:07:27.913574 Eastern Standard Time
| Itration            | 1727     |
| Real Det Return     | 553      |
| Real Sto Return     | 500      |
| Reward Loss         | -36.3    |
| Running Env Steps   | 863500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 5.18     |
| Running Update Time | 1727     |
----------------------------------
2025-02-01 19:07:43.405051 Eastern Standard Time
| Itration            | 1728     |
| Real Det Return     | 542      |
| Real Sto Return     | 484      |
| Reward Loss         | -47      |
| Running Env Steps   | 864000   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1728     |
----------------------------------
2025-02-01 19:07:59.005851 Eastern Standard Time
| Itration            | 1729     |
| Real Det Return     | 542      |
| Real Sto Return     | 493      |
| Reward Loss         | -56.6    |
| Running Env Steps   | 864500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1729     |
----------------------------------
2025-02-01 19:08:14.603997 Eastern Standard Time
| Itration            | 1730     |
| Real Det Return     | 525      |
| Real Sto Return     | 480      |
| Reward Loss         | -50.8    |
| Running Env Steps   | 865000   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1730     |
----------------------------------
2025-02-01 19:08:30.174139 Eastern Standard Time
| Itration            | 1731     |
| Real Det Return     | 540      |
| Real Sto Return     | 489      |
| Reward Loss         | -58.7    |
| Running Env Steps   | 865500   |
| Running Forward KL  | -4.59    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1731     |
----------------------------------
2025-02-01 19:08:45.796957 Eastern Standard Time
| Itration            | 1732     |
| Real Det Return     | 538      |
| Real Sto Return     | 487      |
| Reward Loss         | -62.1    |
| Running Env Steps   | 866000   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 4.26     |
| Running Update Time | 1732     |
----------------------------------
2025-02-01 19:09:01.382921 Eastern Standard Time
| Itration            | 1733     |
| Real Det Return     | 543      |
| Real Sto Return     | 493      |
| Reward Loss         | -34.7    |
| Running Env Steps   | 866500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 5.56     |
| Running Update Time | 1733     |
----------------------------------
2025-02-01 19:09:16.947940 Eastern Standard Time
| Itration            | 1734     |
| Real Det Return     | 535      |
| Real Sto Return     | 470      |
| Reward Loss         | -47.8    |
| Running Env Steps   | 867000   |
| Running Forward KL  | -5.66    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1734     |
----------------------------------
2025-02-01 19:09:32.588376 Eastern Standard Time
| Itration            | 1735     |
| Real Det Return     | 544      |
| Real Sto Return     | 502      |
| Reward Loss         | -51.2    |
| Running Env Steps   | 867500   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1735     |
----------------------------------
2025-02-01 19:09:48.152833 Eastern Standard Time
| Itration            | 1736     |
| Real Det Return     | 543      |
| Real Sto Return     | 497      |
| Reward Loss         | -42.2    |
| Running Env Steps   | 868000   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 5.33     |
| Running Update Time | 1736     |
----------------------------------
2025-02-01 19:10:03.849225 Eastern Standard Time
| Itration            | 1737     |
| Real Det Return     | 539      |
| Real Sto Return     | 488      |
| Reward Loss         | -47.4    |
| Running Env Steps   | 868500   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 1737     |
----------------------------------
2025-02-01 19:10:19.385253 Eastern Standard Time
| Itration            | 1738     |
| Real Det Return     | 527      |
| Real Sto Return     | 476      |
| Reward Loss         | -45.5    |
| Running Env Steps   | 869000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 5.19     |
| Running Update Time | 1738     |
----------------------------------
2025-02-01 19:10:35.039807 Eastern Standard Time
| Itration            | 1739     |
| Real Det Return     | 537      |
| Real Sto Return     | 495      |
| Reward Loss         | -41.1    |
| Running Env Steps   | 869500   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 1739     |
----------------------------------
2025-02-01 19:10:50.664037 Eastern Standard Time
| Itration            | 1740     |
| Real Det Return     | 521      |
| Real Sto Return     | 476      |
| Reward Loss         | -54.5    |
| Running Env Steps   | 870000   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1740     |
----------------------------------
2025-02-01 19:11:06.293451 Eastern Standard Time
| Itration            | 1741     |
| Real Det Return     | 540      |
| Real Sto Return     | 479      |
| Reward Loss         | -36.6    |
| Running Env Steps   | 870500   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 5.18     |
| Running Update Time | 1741     |
----------------------------------
2025-02-01 19:11:21.905787 Eastern Standard Time
| Itration            | 1742     |
| Real Det Return     | 530      |
| Real Sto Return     | 483      |
| Reward Loss         | -40.9    |
| Running Env Steps   | 871000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 5.49     |
| Running Update Time | 1742     |
----------------------------------
2025-02-01 19:11:37.579335 Eastern Standard Time
| Itration            | 1743     |
| Real Det Return     | 541      |
| Real Sto Return     | 489      |
| Reward Loss         | -46.2    |
| Running Env Steps   | 871500   |
| Running Forward KL  | -5.4     |
| Running Reverse KL  | 5.19     |
| Running Update Time | 1743     |
----------------------------------
2025-02-01 19:11:53.197508 Eastern Standard Time
| Itration            | 1744     |
| Real Det Return     | 542      |
| Real Sto Return     | 482      |
| Reward Loss         | -49.1    |
| Running Env Steps   | 872000   |
| Running Forward KL  | -4.79    |
| Running Reverse KL  | 5.7      |
| Running Update Time | 1744     |
----------------------------------
2025-02-01 19:12:08.891307 Eastern Standard Time
| Itration            | 1745     |
| Real Det Return     | 540      |
| Real Sto Return     | 497      |
| Reward Loss         | -51.8    |
| Running Env Steps   | 872500   |
| Running Forward KL  | -4.84    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1745     |
----------------------------------
2025-02-01 19:12:24.445347 Eastern Standard Time
| Itration            | 1746     |
| Real Det Return     | 511      |
| Real Sto Return     | 471      |
| Reward Loss         | -52      |
| Running Env Steps   | 873000   |
| Running Forward KL  | -5.64    |
| Running Reverse KL  | 5.03     |
| Running Update Time | 1746     |
----------------------------------
2025-02-01 19:12:40.027965 Eastern Standard Time
| Itration            | 1747     |
| Real Det Return     | 545      |
| Real Sto Return     | 486      |
| Reward Loss         | -43      |
| Running Env Steps   | 873500   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1747     |
----------------------------------
2025-02-01 19:12:55.574263 Eastern Standard Time
| Itration            | 1748     |
| Real Det Return     | 523      |
| Real Sto Return     | 481      |
| Reward Loss         | -61      |
| Running Env Steps   | 874000   |
| Running Forward KL  | -5.06    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1748     |
----------------------------------
2025-02-01 19:13:11.183953 Eastern Standard Time
| Itration            | 1749     |
| Real Det Return     | 540      |
| Real Sto Return     | 490      |
| Reward Loss         | -47.7    |
| Running Env Steps   | 874500   |
| Running Forward KL  | -5.67    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 1749     |
----------------------------------
2025-02-01 19:13:26.761361 Eastern Standard Time
| Itration            | 1750     |
| Real Det Return     | 521      |
| Real Sto Return     | 468      |
| Reward Loss         | -50.6    |
| Running Env Steps   | 875000   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 1750     |
----------------------------------
2025-02-01 19:13:42.356882 Eastern Standard Time
| Itration            | 1751     |
| Real Det Return     | 534      |
| Real Sto Return     | 477      |
| Reward Loss         | -62.2    |
| Running Env Steps   | 875500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1751     |
----------------------------------
2025-02-01 19:13:57.977072 Eastern Standard Time
| Itration            | 1752     |
| Real Det Return     | 543      |
| Real Sto Return     | 491      |
| Reward Loss         | -41.1    |
| Running Env Steps   | 876000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1752     |
----------------------------------
2025-02-01 19:14:14.034052 Eastern Standard Time
| Itration            | 1753     |
| Real Det Return     | 525      |
| Real Sto Return     | 472      |
| Reward Loss         | -51.3    |
| Running Env Steps   | 876500   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 5.3      |
| Running Update Time | 1753     |
----------------------------------
2025-02-01 19:14:29.835452 Eastern Standard Time
| Itration            | 1754     |
| Real Det Return     | 523      |
| Real Sto Return     | 477      |
| Reward Loss         | -46.6    |
| Running Env Steps   | 877000   |
| Running Forward KL  | -5.02    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 1754     |
----------------------------------
2025-02-01 19:14:45.400796 Eastern Standard Time
| Itration            | 1755     |
| Real Det Return     | 508      |
| Real Sto Return     | 485      |
| Reward Loss         | -49.6    |
| Running Env Steps   | 877500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 5        |
| Running Update Time | 1755     |
----------------------------------
2025-02-01 19:15:01.040905 Eastern Standard Time
| Itration            | 1756     |
| Real Det Return     | 527      |
| Real Sto Return     | 478      |
| Reward Loss         | -52.8    |
| Running Env Steps   | 878000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 5.1      |
| Running Update Time | 1756     |
----------------------------------
2025-02-01 19:15:16.557498 Eastern Standard Time
| Itration            | 1757     |
| Real Det Return     | 542      |
| Real Sto Return     | 486      |
| Reward Loss         | -57      |
| Running Env Steps   | 878500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.84     |
| Running Update Time | 1757     |
----------------------------------
2025-02-01 19:15:32.104415 Eastern Standard Time
| Itration            | 1758     |
| Real Det Return     | 531      |
| Real Sto Return     | 474      |
| Reward Loss         | -52.9    |
| Running Env Steps   | 879000   |
| Running Forward KL  | -5.4     |
| Running Reverse KL  | 4.75     |
| Running Update Time | 1758     |
----------------------------------
2025-02-01 19:15:47.685357 Eastern Standard Time
| Itration            | 1759     |
| Real Det Return     | 539      |
| Real Sto Return     | 485      |
| Reward Loss         | -56.4    |
| Running Env Steps   | 879500   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1759     |
----------------------------------
2025-02-01 19:16:03.224962 Eastern Standard Time
| Itration            | 1760     |
| Real Det Return     | 521      |
| Real Sto Return     | 474      |
| Reward Loss         | -43.8    |
| Running Env Steps   | 880000   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 5.82     |
| Running Update Time | 1760     |
----------------------------------
2025-02-01 19:16:18.748010 Eastern Standard Time
| Itration            | 1761     |
| Real Det Return     | 518      |
| Real Sto Return     | 482      |
| Reward Loss         | -59.5    |
| Running Env Steps   | 880500   |
| Running Forward KL  | -5.61    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1761     |
----------------------------------
2025-02-01 19:16:34.414562 Eastern Standard Time
| Itration            | 1762     |
| Real Det Return     | 533      |
| Real Sto Return     | 483      |
| Reward Loss         | -52.7    |
| Running Env Steps   | 881000   |
| Running Forward KL  | -4.65    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1762     |
----------------------------------
2025-02-01 19:16:49.999684 Eastern Standard Time
| Itration            | 1763     |
| Real Det Return     | 546      |
| Real Sto Return     | 481      |
| Reward Loss         | -68.1    |
| Running Env Steps   | 881500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1763     |
----------------------------------
2025-02-01 19:17:05.728975 Eastern Standard Time
| Itration            | 1764     |
| Real Det Return     | 534      |
| Real Sto Return     | 484      |
| Reward Loss         | -56.7    |
| Running Env Steps   | 882000   |
| Running Forward KL  | -4.94    |
| Running Reverse KL  | 4.74     |
| Running Update Time | 1764     |
----------------------------------
2025-02-01 19:17:21.427904 Eastern Standard Time
| Itration            | 1765     |
| Real Det Return     | 530      |
| Real Sto Return     | 477      |
| Reward Loss         | -46.4    |
| Running Env Steps   | 882500   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1765     |
----------------------------------
2025-02-01 19:17:37.075728 Eastern Standard Time
| Itration            | 1766     |
| Real Det Return     | 538      |
| Real Sto Return     | 479      |
| Reward Loss         | -54.8    |
| Running Env Steps   | 883000   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 5.37     |
| Running Update Time | 1766     |
----------------------------------
2025-02-01 19:17:52.621752 Eastern Standard Time
| Itration            | 1767     |
| Real Det Return     | 531      |
| Real Sto Return     | 474      |
| Reward Loss         | -49.7    |
| Running Env Steps   | 883500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 1767     |
----------------------------------
2025-02-01 19:18:08.274039 Eastern Standard Time
| Itration            | 1768     |
| Real Det Return     | 520      |
| Real Sto Return     | 466      |
| Reward Loss         | -49.7    |
| Running Env Steps   | 884000   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1768     |
----------------------------------
2025-02-01 19:18:23.831023 Eastern Standard Time
| Itration            | 1769     |
| Real Det Return     | 538      |
| Real Sto Return     | 492      |
| Reward Loss         | -43.6    |
| Running Env Steps   | 884500   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 4.57     |
| Running Update Time | 1769     |
----------------------------------
2025-02-01 19:18:39.438041 Eastern Standard Time
| Itration            | 1770     |
| Real Det Return     | 538      |
| Real Sto Return     | 486      |
| Reward Loss         | -50.4    |
| Running Env Steps   | 885000   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1770     |
----------------------------------
2025-02-01 19:18:55.101063 Eastern Standard Time
| Itration            | 1771     |
| Real Det Return     | 527      |
| Real Sto Return     | 489      |
| Reward Loss         | -41.9    |
| Running Env Steps   | 885500   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 5.07     |
| Running Update Time | 1771     |
----------------------------------
2025-02-01 19:19:10.781359 Eastern Standard Time
| Itration            | 1772     |
| Real Det Return     | 534      |
| Real Sto Return     | 475      |
| Reward Loss         | -44.5    |
| Running Env Steps   | 886000   |
| Running Forward KL  | -4.91    |
| Running Reverse KL  | 5.21     |
| Running Update Time | 1772     |
----------------------------------
2025-02-01 19:19:26.356242 Eastern Standard Time
| Itration            | 1773     |
| Real Det Return     | 532      |
| Real Sto Return     | 488      |
| Reward Loss         | -53.9    |
| Running Env Steps   | 886500   |
| Running Forward KL  | -4.86    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1773     |
----------------------------------
2025-02-01 19:19:41.949307 Eastern Standard Time
| Itration            | 1774     |
| Real Det Return     | 530      |
| Real Sto Return     | 485      |
| Reward Loss         | -45.1    |
| Running Env Steps   | 887000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 1774     |
----------------------------------
2025-02-01 19:19:57.540280 Eastern Standard Time
| Itration            | 1775     |
| Real Det Return     | 524      |
| Real Sto Return     | 482      |
| Reward Loss         | -56.8    |
| Running Env Steps   | 887500   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 4.57     |
| Running Update Time | 1775     |
----------------------------------
2025-02-01 19:20:13.109903 Eastern Standard Time
| Itration            | 1776     |
| Real Det Return     | 524      |
| Real Sto Return     | 480      |
| Reward Loss         | -63.2    |
| Running Env Steps   | 888000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1776     |
----------------------------------
2025-02-01 19:20:28.790717 Eastern Standard Time
| Itration            | 1777     |
| Real Det Return     | 522      |
| Real Sto Return     | 479      |
| Reward Loss         | -50.6    |
| Running Env Steps   | 888500   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 4.72     |
| Running Update Time | 1777     |
----------------------------------
2025-02-01 19:20:44.412227 Eastern Standard Time
| Itration            | 1778     |
| Real Det Return     | 522      |
| Real Sto Return     | 489      |
| Reward Loss         | -58      |
| Running Env Steps   | 889000   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1778     |
----------------------------------
2025-02-01 19:21:00.017032 Eastern Standard Time
| Itration            | 1779     |
| Real Det Return     | 533      |
| Real Sto Return     | 494      |
| Reward Loss         | -47.2    |
| Running Env Steps   | 889500   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 1779     |
----------------------------------
2025-02-01 19:21:15.648073 Eastern Standard Time
| Itration            | 1780     |
| Real Det Return     | 536      |
| Real Sto Return     | 487      |
| Reward Loss         | -46.6    |
| Running Env Steps   | 890000   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 5.6      |
| Running Update Time | 1780     |
----------------------------------
2025-02-01 19:21:31.292320 Eastern Standard Time
| Itration            | 1781     |
| Real Det Return     | 521      |
| Real Sto Return     | 476      |
| Reward Loss         | -48.3    |
| Running Env Steps   | 890500   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1781     |
----------------------------------
2025-02-01 19:21:46.934288 Eastern Standard Time
| Itration            | 1782     |
| Real Det Return     | 536      |
| Real Sto Return     | 476      |
| Reward Loss         | -48.1    |
| Running Env Steps   | 891000   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1782     |
----------------------------------
2025-02-01 19:22:02.508210 Eastern Standard Time
| Itration            | 1783     |
| Real Det Return     | 533      |
| Real Sto Return     | 483      |
| Reward Loss         | -40.7    |
| Running Env Steps   | 891500   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 5.22     |
| Running Update Time | 1783     |
----------------------------------
2025-02-01 19:22:18.126178 Eastern Standard Time
| Itration            | 1784     |
| Real Det Return     | 530      |
| Real Sto Return     | 470      |
| Reward Loss         | -61.1    |
| Running Env Steps   | 892000   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 4.45     |
| Running Update Time | 1784     |
----------------------------------
2025-02-01 19:22:33.730128 Eastern Standard Time
| Itration            | 1785     |
| Real Det Return     | 531      |
| Real Sto Return     | 491      |
| Reward Loss         | -48.8    |
| Running Env Steps   | 892500   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 5.31     |
| Running Update Time | 1785     |
----------------------------------
2025-02-01 19:22:49.306838 Eastern Standard Time
| Itration            | 1786     |
| Real Det Return     | 539      |
| Real Sto Return     | 491      |
| Reward Loss         | -46.7    |
| Running Env Steps   | 893000   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 5.52     |
| Running Update Time | 1786     |
----------------------------------
2025-02-01 19:23:04.899565 Eastern Standard Time
| Itration            | 1787     |
| Real Det Return     | 518      |
| Real Sto Return     | 475      |
| Reward Loss         | -45      |
| Running Env Steps   | 893500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1787     |
----------------------------------
2025-02-01 19:23:20.512078 Eastern Standard Time
| Itration            | 1788     |
| Real Det Return     | 524      |
| Real Sto Return     | 473      |
| Reward Loss         | -54.6    |
| Running Env Steps   | 894000   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1788     |
----------------------------------
2025-02-01 19:23:36.072607 Eastern Standard Time
| Itration            | 1789     |
| Real Det Return     | 536      |
| Real Sto Return     | 486      |
| Reward Loss         | -46.8    |
| Running Env Steps   | 894500   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1789     |
----------------------------------
2025-02-01 19:23:51.675469 Eastern Standard Time
| Itration            | 1790     |
| Real Det Return     | 538      |
| Real Sto Return     | 484      |
| Reward Loss         | -63.4    |
| Running Env Steps   | 895000   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 4.45     |
| Running Update Time | 1790     |
----------------------------------
2025-02-01 19:24:07.275485 Eastern Standard Time
| Itration            | 1791     |
| Real Det Return     | 534      |
| Real Sto Return     | 479      |
| Reward Loss         | -50.5    |
| Running Env Steps   | 895500   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 4.94     |
| Running Update Time | 1791     |
----------------------------------
2025-02-01 19:24:22.872046 Eastern Standard Time
| Itration            | 1792     |
| Real Det Return     | 537      |
| Real Sto Return     | 481      |
| Reward Loss         | -50.7    |
| Running Env Steps   | 896000   |
| Running Forward KL  | -5.51    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 1792     |
----------------------------------
2025-02-01 19:24:38.466091 Eastern Standard Time
| Itration            | 1793     |
| Real Det Return     | 544      |
| Real Sto Return     | 487      |
| Reward Loss         | -48.2    |
| Running Env Steps   | 896500   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 4.59     |
| Running Update Time | 1793     |
----------------------------------
2025-02-01 19:24:54.030057 Eastern Standard Time
| Itration            | 1794     |
| Real Det Return     | 514      |
| Real Sto Return     | 466      |
| Reward Loss         | -62.9    |
| Running Env Steps   | 897000   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1794     |
----------------------------------
2025-02-01 19:25:09.573882 Eastern Standard Time
| Itration            | 1795     |
| Real Det Return     | 540      |
| Real Sto Return     | 481      |
| Reward Loss         | -50.4    |
| Running Env Steps   | 897500   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 1795     |
----------------------------------
2025-02-01 19:25:25.306138 Eastern Standard Time
| Itration            | 1796     |
| Real Det Return     | 539      |
| Real Sto Return     | 486      |
| Reward Loss         | -41.6    |
| Running Env Steps   | 898000   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 4.8      |
| Running Update Time | 1796     |
----------------------------------
2025-02-01 19:25:40.879484 Eastern Standard Time
| Itration            | 1797     |
| Real Det Return     | 511      |
| Real Sto Return     | 472      |
| Reward Loss         | -65.7    |
| Running Env Steps   | 898500   |
| Running Forward KL  | -4.73    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1797     |
----------------------------------
2025-02-01 19:25:56.526600 Eastern Standard Time
| Itration            | 1798     |
| Real Det Return     | 525      |
| Real Sto Return     | 471      |
| Reward Loss         | -51.2    |
| Running Env Steps   | 899000   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 1798     |
----------------------------------
2025-02-01 19:26:12.168298 Eastern Standard Time
| Itration            | 1799     |
| Real Det Return     | 542      |
| Real Sto Return     | 486      |
| Reward Loss         | -40.5    |
| Running Env Steps   | 899500   |
| Running Forward KL  | -5.36    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 1799     |
----------------------------------
2025-02-01 19:26:27.672652 Eastern Standard Time
| Itration            | 1800     |
| Real Det Return     | 530      |
| Real Sto Return     | 481      |
| Reward Loss         | -41.9    |
| Running Env Steps   | 900000   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1800     |
----------------------------------
2025-02-01 19:26:43.257476 Eastern Standard Time
| Itration            | 1801     |
| Real Det Return     | 516      |
| Real Sto Return     | 468      |
| Reward Loss         | -54.4    |
| Running Env Steps   | 900500   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 4.69     |
| Running Update Time | 1801     |
----------------------------------
2025-02-01 19:26:58.881294 Eastern Standard Time
| Itration            | 1802     |
| Real Det Return     | 520      |
| Real Sto Return     | 473      |
| Reward Loss         | -52.8    |
| Running Env Steps   | 901000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 4.84     |
| Running Update Time | 1802     |
----------------------------------
2025-02-01 19:27:14.467356 Eastern Standard Time
| Itration            | 1803     |
| Real Det Return     | 536      |
| Real Sto Return     | 490      |
| Reward Loss         | -45.8    |
| Running Env Steps   | 901500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1803     |
----------------------------------
2025-02-01 19:27:30.102393 Eastern Standard Time
| Itration            | 1804     |
| Real Det Return     | 534      |
| Real Sto Return     | 483      |
| Reward Loss         | -55.2    |
| Running Env Steps   | 902000   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1804     |
----------------------------------
2025-02-01 19:27:45.674858 Eastern Standard Time
| Itration            | 1805     |
| Real Det Return     | 522      |
| Real Sto Return     | 479      |
| Reward Loss         | -43.4    |
| Running Env Steps   | 902500   |
| Running Forward KL  | -5.39    |
| Running Reverse KL  | 5.91     |
| Running Update Time | 1805     |
----------------------------------
2025-02-01 19:28:01.259110 Eastern Standard Time
| Itration            | 1806     |
| Real Det Return     | 500      |
| Real Sto Return     | 466      |
| Reward Loss         | -55.1    |
| Running Env Steps   | 903000   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1806     |
----------------------------------
2025-02-01 19:28:16.851230 Eastern Standard Time
| Itration            | 1807     |
| Real Det Return     | 514      |
| Real Sto Return     | 478      |
| Reward Loss         | -54.5    |
| Running Env Steps   | 903500   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1807     |
----------------------------------
2025-02-01 19:28:32.483973 Eastern Standard Time
| Itration            | 1808     |
| Real Det Return     | 532      |
| Real Sto Return     | 486      |
| Reward Loss         | -47.6    |
| Running Env Steps   | 904000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 5.03     |
| Running Update Time | 1808     |
----------------------------------
2025-02-01 19:28:48.078428 Eastern Standard Time
| Itration            | 1809     |
| Real Det Return     | 540      |
| Real Sto Return     | 484      |
| Reward Loss         | -36.4    |
| Running Env Steps   | 904500   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 5.51     |
| Running Update Time | 1809     |
----------------------------------
2025-02-01 19:29:03.704817 Eastern Standard Time
| Itration            | 1810     |
| Real Det Return     | 536      |
| Real Sto Return     | 479      |
| Reward Loss         | -54.1    |
| Running Env Steps   | 905000   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1810     |
----------------------------------
2025-02-01 19:29:19.271785 Eastern Standard Time
| Itration            | 1811     |
| Real Det Return     | 531      |
| Real Sto Return     | 477      |
| Reward Loss         | -46.7    |
| Running Env Steps   | 905500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1811     |
----------------------------------
2025-02-01 19:29:34.967952 Eastern Standard Time
| Itration            | 1812     |
| Real Det Return     | 541      |
| Real Sto Return     | 491      |
| Reward Loss         | -49      |
| Running Env Steps   | 906000   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 5.68     |
| Running Update Time | 1812     |
----------------------------------
2025-02-01 19:29:50.544275 Eastern Standard Time
| Itration            | 1813     |
| Real Det Return     | 526      |
| Real Sto Return     | 485      |
| Reward Loss         | -54.1    |
| Running Env Steps   | 906500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1813     |
----------------------------------
2025-02-01 19:30:06.102964 Eastern Standard Time
| Itration            | 1814     |
| Real Det Return     | 541      |
| Real Sto Return     | 489      |
| Reward Loss         | -54.5    |
| Running Env Steps   | 907000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 1814     |
----------------------------------
2025-02-01 19:30:21.704544 Eastern Standard Time
| Itration            | 1815     |
| Real Det Return     | 544      |
| Real Sto Return     | 481      |
| Reward Loss         | -52.4    |
| Running Env Steps   | 907500   |
| Running Forward KL  | -5.45    |
| Running Reverse KL  | 5.07     |
| Running Update Time | 1815     |
----------------------------------
2025-02-01 19:30:37.231891 Eastern Standard Time
| Itration            | 1816     |
| Real Det Return     | 532      |
| Real Sto Return     | 482      |
| Reward Loss         | -59      |
| Running Env Steps   | 908000   |
| Running Forward KL  | -5.44    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1816     |
----------------------------------
2025-02-01 19:30:52.814005 Eastern Standard Time
| Itration            | 1817     |
| Real Det Return     | 544      |
| Real Sto Return     | 479      |
| Reward Loss         | -47.7    |
| Running Env Steps   | 908500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 1817     |
----------------------------------
2025-02-01 19:31:08.476396 Eastern Standard Time
| Itration            | 1818     |
| Real Det Return     | 529      |
| Real Sto Return     | 483      |
| Reward Loss         | -45.5    |
| Running Env Steps   | 909000   |
| Running Forward KL  | -4.88    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1818     |
----------------------------------
2025-02-01 19:31:24.140007 Eastern Standard Time
| Itration            | 1819     |
| Real Det Return     | 510      |
| Real Sto Return     | 467      |
| Reward Loss         | -62.1    |
| Running Env Steps   | 909500   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1819     |
----------------------------------
2025-02-01 19:31:39.766288 Eastern Standard Time
| Itration            | 1820     |
| Real Det Return     | 523      |
| Real Sto Return     | 469      |
| Reward Loss         | -64.9    |
| Running Env Steps   | 910000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1820     |
----------------------------------
2025-02-01 19:31:55.335814 Eastern Standard Time
| Itration            | 1821     |
| Real Det Return     | 539      |
| Real Sto Return     | 491      |
| Reward Loss         | -55.8    |
| Running Env Steps   | 910500   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 5        |
| Running Update Time | 1821     |
----------------------------------
2025-02-01 19:32:10.948936 Eastern Standard Time
| Itration            | 1822     |
| Real Det Return     | 541      |
| Real Sto Return     | 483      |
| Reward Loss         | -55.5    |
| Running Env Steps   | 911000   |
| Running Forward KL  | -4.97    |
| Running Reverse KL  | 4.99     |
| Running Update Time | 1822     |
----------------------------------
2025-02-01 19:32:26.497214 Eastern Standard Time
| Itration            | 1823     |
| Real Det Return     | 536      |
| Real Sto Return     | 482      |
| Reward Loss         | -49      |
| Running Env Steps   | 911500   |
| Running Forward KL  | -5.65    |
| Running Reverse KL  | 5.41     |
| Running Update Time | 1823     |
----------------------------------
2025-02-01 19:32:42.027380 Eastern Standard Time
| Itration            | 1824     |
| Real Det Return     | 537      |
| Real Sto Return     | 489      |
| Reward Loss         | -59.2    |
| Running Env Steps   | 912000   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1824     |
----------------------------------
2025-02-01 19:32:57.652077 Eastern Standard Time
| Itration            | 1825     |
| Real Det Return     | 524      |
| Real Sto Return     | 478      |
| Reward Loss         | -41.2    |
| Running Env Steps   | 912500   |
| Running Forward KL  | -5.61    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 1825     |
----------------------------------
2025-02-01 19:33:13.196043 Eastern Standard Time
| Itration            | 1826     |
| Real Det Return     | 536      |
| Real Sto Return     | 482      |
| Reward Loss         | -52.5    |
| Running Env Steps   | 913000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 5.19     |
| Running Update Time | 1826     |
----------------------------------
2025-02-01 19:33:28.771162 Eastern Standard Time
| Itration            | 1827     |
| Real Det Return     | 533      |
| Real Sto Return     | 474      |
| Reward Loss         | -54.3    |
| Running Env Steps   | 913500   |
| Running Forward KL  | -5.2     |
| Running Reverse KL  | 4.71     |
| Running Update Time | 1827     |
----------------------------------
2025-02-01 19:33:44.372998 Eastern Standard Time
| Itration            | 1828     |
| Real Det Return     | 527      |
| Real Sto Return     | 479      |
| Reward Loss         | -53.1    |
| Running Env Steps   | 914000   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 5.54     |
| Running Update Time | 1828     |
----------------------------------
2025-02-01 19:33:59.963578 Eastern Standard Time
| Itration            | 1829     |
| Real Det Return     | 530      |
| Real Sto Return     | 488      |
| Reward Loss         | -56.1    |
| Running Env Steps   | 914500   |
| Running Forward KL  | -5.36    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1829     |
----------------------------------
2025-02-01 19:34:15.587412 Eastern Standard Time
| Itration            | 1830     |
| Real Det Return     | 534      |
| Real Sto Return     | 487      |
| Reward Loss         | -55.7    |
| Running Env Steps   | 915000   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1830     |
----------------------------------
2025-02-01 19:34:31.115779 Eastern Standard Time
| Itration            | 1831     |
| Real Det Return     | 530      |
| Real Sto Return     | 489      |
| Reward Loss         | -55.5    |
| Running Env Steps   | 915500   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1831     |
----------------------------------
2025-02-01 19:34:46.667254 Eastern Standard Time
| Itration            | 1832     |
| Real Det Return     | 532      |
| Real Sto Return     | 480      |
| Reward Loss         | -46.5    |
| Running Env Steps   | 916000   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 5.35     |
| Running Update Time | 1832     |
----------------------------------
2025-02-01 19:35:02.288189 Eastern Standard Time
| Itration            | 1833     |
| Real Det Return     | 528      |
| Real Sto Return     | 473      |
| Reward Loss         | -55.1    |
| Running Env Steps   | 916500   |
| Running Forward KL  | -4.81    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 1833     |
----------------------------------
2025-02-01 19:35:17.911458 Eastern Standard Time
| Itration            | 1834     |
| Real Det Return     | 533      |
| Real Sto Return     | 485      |
| Reward Loss         | -48.2    |
| Running Env Steps   | 917000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1834     |
----------------------------------
2025-02-01 19:35:33.607942 Eastern Standard Time
| Itration            | 1835     |
| Real Det Return     | 529      |
| Real Sto Return     | 475      |
| Reward Loss         | -55.3    |
| Running Env Steps   | 917500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 1835     |
----------------------------------
2025-02-01 19:35:49.245447 Eastern Standard Time
| Itration            | 1836     |
| Real Det Return     | 543      |
| Real Sto Return     | 483      |
| Reward Loss         | -53.4    |
| Running Env Steps   | 918000   |
| Running Forward KL  | -5.49    |
| Running Reverse KL  | 4.7      |
| Running Update Time | 1836     |
----------------------------------
2025-02-01 19:36:04.861736 Eastern Standard Time
| Itration            | 1837     |
| Real Det Return     | 543      |
| Real Sto Return     | 478      |
| Reward Loss         | -59      |
| Running Env Steps   | 918500   |
| Running Forward KL  | -5.68    |
| Running Reverse KL  | 4.42     |
| Running Update Time | 1837     |
----------------------------------
2025-02-01 19:36:20.478026 Eastern Standard Time
| Itration            | 1838     |
| Real Det Return     | 543      |
| Real Sto Return     | 485      |
| Reward Loss         | -49.2    |
| Running Env Steps   | 919000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1838     |
----------------------------------
2025-02-01 19:36:36.086569 Eastern Standard Time
| Itration            | 1839     |
| Real Det Return     | 525      |
| Real Sto Return     | 479      |
| Reward Loss         | -54.2    |
| Running Env Steps   | 919500   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 5.04     |
| Running Update Time | 1839     |
----------------------------------
2025-02-01 19:36:51.643205 Eastern Standard Time
| Itration            | 1840     |
| Real Det Return     | 537      |
| Real Sto Return     | 480      |
| Reward Loss         | -55.2    |
| Running Env Steps   | 920000   |
| Running Forward KL  | -5.41    |
| Running Reverse KL  | 5.23     |
| Running Update Time | 1840     |
----------------------------------
2025-02-01 19:37:07.181515 Eastern Standard Time
| Itration            | 1841     |
| Real Det Return     | 528      |
| Real Sto Return     | 483      |
| Reward Loss         | -43.3    |
| Running Env Steps   | 920500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 5.27     |
| Running Update Time | 1841     |
----------------------------------
2025-02-01 19:37:22.779386 Eastern Standard Time
| Itration            | 1842     |
| Real Det Return     | 527      |
| Real Sto Return     | 468      |
| Reward Loss         | -54.6    |
| Running Env Steps   | 921000   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1842     |
----------------------------------
2025-02-01 19:37:38.342368 Eastern Standard Time
| Itration            | 1843     |
| Real Det Return     | 522      |
| Real Sto Return     | 481      |
| Reward Loss         | -51.9    |
| Running Env Steps   | 921500   |
| Running Forward KL  | -5.37    |
| Running Reverse KL  | 4.94     |
| Running Update Time | 1843     |
----------------------------------
2025-02-01 19:37:53.906649 Eastern Standard Time
| Itration            | 1844     |
| Real Det Return     | 530      |
| Real Sto Return     | 475      |
| Reward Loss         | -51.5    |
| Running Env Steps   | 922000   |
| Running Forward KL  | -5.12    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1844     |
----------------------------------
2025-02-01 19:38:09.493614 Eastern Standard Time
| Itration            | 1845     |
| Real Det Return     | 547      |
| Real Sto Return     | 482      |
| Reward Loss         | -63.5    |
| Running Env Steps   | 922500   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 4.58     |
| Running Update Time | 1845     |
----------------------------------
2025-02-01 19:38:25.133599 Eastern Standard Time
| Itration            | 1846     |
| Real Det Return     | 540      |
| Real Sto Return     | 494      |
| Reward Loss         | -45.2    |
| Running Env Steps   | 923000   |
| Running Forward KL  | -5.39    |
| Running Reverse KL  | 5.6      |
| Running Update Time | 1846     |
----------------------------------
2025-02-01 19:38:40.848997 Eastern Standard Time
| Itration            | 1847     |
| Real Det Return     | 529      |
| Real Sto Return     | 483      |
| Reward Loss         | -46      |
| Running Env Steps   | 923500   |
| Running Forward KL  | -5.58    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1847     |
----------------------------------
2025-02-01 19:38:56.479123 Eastern Standard Time
| Itration            | 1848     |
| Real Det Return     | 535      |
| Real Sto Return     | 479      |
| Reward Loss         | -50.3    |
| Running Env Steps   | 924000   |
| Running Forward KL  | -5.36    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1848     |
----------------------------------
2025-02-01 19:39:12.103753 Eastern Standard Time
| Itration            | 1849     |
| Real Det Return     | 546      |
| Real Sto Return     | 487      |
| Reward Loss         | -66.3    |
| Running Env Steps   | 924500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.82     |
| Running Update Time | 1849     |
----------------------------------
2025-02-01 19:39:27.699154 Eastern Standard Time
| Itration            | 1850     |
| Real Det Return     | 533      |
| Real Sto Return     | 473      |
| Reward Loss         | -60      |
| Running Env Steps   | 925000   |
| Running Forward KL  | -4.64    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 1850     |
----------------------------------
2025-02-01 19:39:43.247482 Eastern Standard Time
| Itration            | 1851     |
| Real Det Return     | 530      |
| Real Sto Return     | 479      |
| Reward Loss         | -57.8    |
| Running Env Steps   | 925500   |
| Running Forward KL  | -5.42    |
| Running Reverse KL  | 5.09     |
| Running Update Time | 1851     |
----------------------------------
2025-02-01 19:39:58.841584 Eastern Standard Time
| Itration            | 1852     |
| Real Det Return     | 523      |
| Real Sto Return     | 470      |
| Reward Loss         | -66.8    |
| Running Env Steps   | 926000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 1852     |
----------------------------------
2025-02-01 19:40:14.466784 Eastern Standard Time
| Itration            | 1853     |
| Real Det Return     | 520      |
| Real Sto Return     | 455      |
| Reward Loss         | -47.3    |
| Running Env Steps   | 926500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.88     |
| Running Update Time | 1853     |
----------------------------------
2025-02-01 19:40:29.981859 Eastern Standard Time
| Itration            | 1854     |
| Real Det Return     | 549      |
| Real Sto Return     | 484      |
| Reward Loss         | -41.3    |
| Running Env Steps   | 927000   |
| Running Forward KL  | -5.47    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1854     |
----------------------------------
2025-02-01 19:40:45.523151 Eastern Standard Time
| Itration            | 1855     |
| Real Det Return     | 528      |
| Real Sto Return     | 485      |
| Reward Loss         | -59.6    |
| Running Env Steps   | 927500   |
| Running Forward KL  | -5.32    |
| Running Reverse KL  | 4.61     |
| Running Update Time | 1855     |
----------------------------------
2025-02-01 19:41:02.333097 Eastern Standard Time
| Itration            | 1856     |
| Real Det Return     | 539      |
| Real Sto Return     | 495      |
| Reward Loss         | -56.8    |
| Running Env Steps   | 928000   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 5.2      |
| Running Update Time | 1856     |
----------------------------------
2025-02-01 19:41:17.979456 Eastern Standard Time
| Itration            | 1857     |
| Real Det Return     | 534      |
| Real Sto Return     | 472      |
| Reward Loss         | -41.3    |
| Running Env Steps   | 928500   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 5.25     |
| Running Update Time | 1857     |
----------------------------------
2025-02-01 19:41:33.559434 Eastern Standard Time
| Itration            | 1858     |
| Real Det Return     | 530      |
| Real Sto Return     | 472      |
| Reward Loss         | -55.6    |
| Running Env Steps   | 929000   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1858     |
----------------------------------
2025-02-01 19:41:49.152542 Eastern Standard Time
| Itration            | 1859     |
| Real Det Return     | 530      |
| Real Sto Return     | 473      |
| Reward Loss         | -56.1    |
| Running Env Steps   | 929500   |
| Running Forward KL  | -5.56    |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1859     |
----------------------------------
2025-02-01 19:42:04.739397 Eastern Standard Time
| Itration            | 1860     |
| Real Det Return     | 536      |
| Real Sto Return     | 480      |
| Reward Loss         | -54.7    |
| Running Env Steps   | 930000   |
| Running Forward KL  | -5.41    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 1860     |
----------------------------------
2025-02-01 19:42:20.337831 Eastern Standard Time
| Itration            | 1861     |
| Real Det Return     | 534      |
| Real Sto Return     | 476      |
| Reward Loss         | -58.7    |
| Running Env Steps   | 930500   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 5.36     |
| Running Update Time | 1861     |
----------------------------------
2025-02-01 19:42:35.900030 Eastern Standard Time
| Itration            | 1862     |
| Real Det Return     | 526      |
| Real Sto Return     | 489      |
| Reward Loss         | -49.8    |
| Running Env Steps   | 931000   |
| Running Forward KL  | -5.47    |
| Running Reverse KL  | 5.07     |
| Running Update Time | 1862     |
----------------------------------
2025-02-01 19:42:51.482180 Eastern Standard Time
| Itration            | 1863     |
| Real Det Return     | 538      |
| Real Sto Return     | 487      |
| Reward Loss         | -48.5    |
| Running Env Steps   | 931500   |
| Running Forward KL  | -5.61    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1863     |
----------------------------------
2025-02-01 19:43:07.115697 Eastern Standard Time
| Itration            | 1864     |
| Real Det Return     | 519      |
| Real Sto Return     | 477      |
| Reward Loss         | -66.8    |
| Running Env Steps   | 932000   |
| Running Forward KL  | -5       |
| Running Reverse KL  | 4.65     |
| Running Update Time | 1864     |
----------------------------------
2025-02-01 19:43:22.751416 Eastern Standard Time
| Itration            | 1865     |
| Real Det Return     | 533      |
| Real Sto Return     | 475      |
| Reward Loss         | -57.7    |
| Running Env Steps   | 932500   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 5.46     |
| Running Update Time | 1865     |
----------------------------------
2025-02-01 19:43:38.355589 Eastern Standard Time
| Itration            | 1866     |
| Real Det Return     | 535      |
| Real Sto Return     | 476      |
| Reward Loss         | -56      |
| Running Env Steps   | 933000   |
| Running Forward KL  | -5.38    |
| Running Reverse KL  | 5.3      |
| Running Update Time | 1866     |
----------------------------------
2025-02-01 19:43:54.004957 Eastern Standard Time
| Itration            | 1867     |
| Real Det Return     | 523      |
| Real Sto Return     | 478      |
| Reward Loss         | -56.8    |
| Running Env Steps   | 933500   |
| Running Forward KL  | -5.37    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1867     |
----------------------------------
2025-02-01 19:44:09.558680 Eastern Standard Time
| Itration            | 1868     |
| Real Det Return     | 514      |
| Real Sto Return     | 458      |
| Reward Loss         | -63.4    |
| Running Env Steps   | 934000   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1868     |
----------------------------------
2025-02-01 19:44:25.161604 Eastern Standard Time
| Itration            | 1869     |
| Real Det Return     | 531      |
| Real Sto Return     | 478      |
| Reward Loss         | -45.8    |
| Running Env Steps   | 934500   |
| Running Forward KL  | -5.66    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 1869     |
----------------------------------
2025-02-01 19:44:40.774153 Eastern Standard Time
| Itration            | 1870     |
| Real Det Return     | 516      |
| Real Sto Return     | 464      |
| Reward Loss         | -63.6    |
| Running Env Steps   | 935000   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 4.66     |
| Running Update Time | 1870     |
----------------------------------
2025-02-01 19:44:56.410300 Eastern Standard Time
| Itration            | 1871     |
| Real Det Return     | 525      |
| Real Sto Return     | 486      |
| Reward Loss         | -57.6    |
| Running Env Steps   | 935500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 4.67     |
| Running Update Time | 1871     |
----------------------------------
2025-02-01 19:45:12.054640 Eastern Standard Time
| Itration            | 1872     |
| Real Det Return     | 535      |
| Real Sto Return     | 465      |
| Reward Loss         | -67.9    |
| Running Env Steps   | 936000   |
| Running Forward KL  | -4.85    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1872     |
----------------------------------
2025-02-01 19:45:27.652015 Eastern Standard Time
| Itration            | 1873     |
| Real Det Return     | 542      |
| Real Sto Return     | 488      |
| Reward Loss         | -61.2    |
| Running Env Steps   | 936500   |
| Running Forward KL  | -5.57    |
| Running Reverse KL  | 4.77     |
| Running Update Time | 1873     |
----------------------------------
2025-02-01 19:45:43.384857 Eastern Standard Time
| Itration            | 1874     |
| Real Det Return     | 542      |
| Real Sto Return     | 474      |
| Reward Loss         | -58.5    |
| Running Env Steps   | 937000   |
| Running Forward KL  | -5.13    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1874     |
----------------------------------
2025-02-01 19:45:59.053348 Eastern Standard Time
| Itration            | 1875     |
| Real Det Return     | 536      |
| Real Sto Return     | 479      |
| Reward Loss         | -53.8    |
| Running Env Steps   | 937500   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 4.64     |
| Running Update Time | 1875     |
----------------------------------
2025-02-01 19:46:14.652419 Eastern Standard Time
| Itration            | 1876     |
| Real Det Return     | 541      |
| Real Sto Return     | 481      |
| Reward Loss         | -53.5    |
| Running Env Steps   | 938000   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 5.06     |
| Running Update Time | 1876     |
----------------------------------
2025-02-01 19:46:30.221599 Eastern Standard Time
| Itration            | 1877     |
| Real Det Return     | 524      |
| Real Sto Return     | 470      |
| Reward Loss         | -54.3    |
| Running Env Steps   | 938500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 5.64     |
| Running Update Time | 1877     |
----------------------------------
2025-02-01 19:46:45.915909 Eastern Standard Time
| Itration            | 1878     |
| Real Det Return     | 532      |
| Real Sto Return     | 477      |
| Reward Loss         | -56.9    |
| Running Env Steps   | 939000   |
| Running Forward KL  | -5.42    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 1878     |
----------------------------------
2025-02-01 19:47:01.586736 Eastern Standard Time
| Itration            | 1879     |
| Real Det Return     | 509      |
| Real Sto Return     | 468      |
| Reward Loss         | -46.6    |
| Running Env Steps   | 939500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1879     |
----------------------------------
2025-02-01 19:47:17.271820 Eastern Standard Time
| Itration            | 1880     |
| Real Det Return     | 531      |
| Real Sto Return     | 469      |
| Reward Loss         | -64.5    |
| Running Env Steps   | 940000   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 5.1      |
| Running Update Time | 1880     |
----------------------------------
2025-02-01 19:47:32.889137 Eastern Standard Time
| Itration            | 1881     |
| Real Det Return     | 525      |
| Real Sto Return     | 459      |
| Reward Loss         | -77.7    |
| Running Env Steps   | 940500   |
| Running Forward KL  | -5.43    |
| Running Reverse KL  | 4.93     |
| Running Update Time | 1881     |
----------------------------------
2025-02-01 19:47:48.420654 Eastern Standard Time
| Itration            | 1882     |
| Real Det Return     | 544      |
| Real Sto Return     | 484      |
| Reward Loss         | -62      |
| Running Env Steps   | 941000   |
| Running Forward KL  | -5.6     |
| Running Reverse KL  | 4.33     |
| Running Update Time | 1882     |
----------------------------------
2025-02-01 19:48:04.101475 Eastern Standard Time
| Itration            | 1883     |
| Real Det Return     | 542      |
| Real Sto Return     | 480      |
| Reward Loss         | -48.9    |
| Running Env Steps   | 941500   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 4.62     |
| Running Update Time | 1883     |
----------------------------------
2025-02-01 19:48:19.794255 Eastern Standard Time
| Itration            | 1884     |
| Real Det Return     | 504      |
| Real Sto Return     | 465      |
| Reward Loss         | -58      |
| Running Env Steps   | 942000   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1884     |
----------------------------------
2025-02-01 19:48:35.415696 Eastern Standard Time
| Itration            | 1885     |
| Real Det Return     | 533      |
| Real Sto Return     | 467      |
| Reward Loss         | -64.9    |
| Running Env Steps   | 942500   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 4.68     |
| Running Update Time | 1885     |
----------------------------------
2025-02-01 19:48:51.026149 Eastern Standard Time
| Itration            | 1886     |
| Real Det Return     | 518      |
| Real Sto Return     | 473      |
| Reward Loss         | -63.7    |
| Running Env Steps   | 943000   |
| Running Forward KL  | -5.51    |
| Running Reverse KL  | 4.84     |
| Running Update Time | 1886     |
----------------------------------
2025-02-01 19:49:06.651042 Eastern Standard Time
| Itration            | 1887     |
| Real Det Return     | 532      |
| Real Sto Return     | 480      |
| Reward Loss         | -60.4    |
| Running Env Steps   | 943500   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 5.46     |
| Running Update Time | 1887     |
----------------------------------
2025-02-01 19:49:22.141121 Eastern Standard Time
| Itration            | 1888     |
| Real Det Return     | 535      |
| Real Sto Return     | 469      |
| Reward Loss         | -53.7    |
| Running Env Steps   | 944000   |
| Running Forward KL  | -5.62    |
| Running Reverse KL  | 5.15     |
| Running Update Time | 1888     |
----------------------------------
2025-02-01 19:49:37.780179 Eastern Standard Time
| Itration            | 1889     |
| Real Det Return     | 548      |
| Real Sto Return     | 486      |
| Reward Loss         | -53.8    |
| Running Env Steps   | 944500   |
| Running Forward KL  | -5.14    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1889     |
----------------------------------
2025-02-01 19:49:53.399952 Eastern Standard Time
| Itration            | 1890     |
| Real Det Return     | 524      |
| Real Sto Return     | 479      |
| Reward Loss         | -60.6    |
| Running Env Steps   | 945000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 5.24     |
| Running Update Time | 1890     |
----------------------------------
2025-02-01 19:50:08.955705 Eastern Standard Time
| Itration            | 1891     |
| Real Det Return     | 533      |
| Real Sto Return     | 480      |
| Reward Loss         | -43.9    |
| Running Env Steps   | 945500   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 5.31     |
| Running Update Time | 1891     |
----------------------------------
2025-02-01 19:50:24.608520 Eastern Standard Time
| Itration            | 1892     |
| Real Det Return     | 522      |
| Real Sto Return     | 478      |
| Reward Loss         | -49.7    |
| Running Env Steps   | 946000   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 5.32     |
| Running Update Time | 1892     |
----------------------------------
2025-02-01 19:50:40.283201 Eastern Standard Time
| Itration            | 1893     |
| Real Det Return     | 529      |
| Real Sto Return     | 469      |
| Reward Loss         | -51.9    |
| Running Env Steps   | 946500   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 5.28     |
| Running Update Time | 1893     |
----------------------------------
2025-02-01 19:50:55.852584 Eastern Standard Time
| Itration            | 1894     |
| Real Det Return     | 534      |
| Real Sto Return     | 480      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 947000   |
| Running Forward KL  | -5.45    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 1894     |
----------------------------------
2025-02-01 19:51:11.429911 Eastern Standard Time
| Itration            | 1895     |
| Real Det Return     | 543      |
| Real Sto Return     | 478      |
| Reward Loss         | -48.8    |
| Running Env Steps   | 947500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 5.66     |
| Running Update Time | 1895     |
----------------------------------
2025-02-01 19:51:27.016250 Eastern Standard Time
| Itration            | 1896     |
| Real Det Return     | 527      |
| Real Sto Return     | 482      |
| Reward Loss         | -58.4    |
| Running Env Steps   | 948000   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1896     |
----------------------------------
2025-02-01 19:51:42.587185 Eastern Standard Time
| Itration            | 1897     |
| Real Det Return     | 542      |
| Real Sto Return     | 493      |
| Reward Loss         | -53      |
| Running Env Steps   | 948500   |
| Running Forward KL  | -5.7     |
| Running Reverse KL  | 4.73     |
| Running Update Time | 1897     |
----------------------------------
2025-02-01 19:51:58.216498 Eastern Standard Time
| Itration            | 1898     |
| Real Det Return     | 536      |
| Real Sto Return     | 486      |
| Reward Loss         | -53.4    |
| Running Env Steps   | 949000   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 5.36     |
| Running Update Time | 1898     |
----------------------------------
2025-02-01 19:52:13.856647 Eastern Standard Time
| Itration            | 1899     |
| Real Det Return     | 527      |
| Real Sto Return     | 475      |
| Reward Loss         | -55.4    |
| Running Env Steps   | 949500   |
| Running Forward KL  | -5.62    |
| Running Reverse KL  | 5.2      |
| Running Update Time | 1899     |
----------------------------------
2025-02-01 19:52:29.419963 Eastern Standard Time
| Itration            | 1900     |
| Real Det Return     | 535      |
| Real Sto Return     | 482      |
| Reward Loss         | -54.7    |
| Running Env Steps   | 950000   |
| Running Forward KL  | -5.52    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1900     |
----------------------------------
2025-02-01 19:52:45.070609 Eastern Standard Time
| Itration            | 1901     |
| Real Det Return     | 537      |
| Real Sto Return     | 480      |
| Reward Loss         | -44.7    |
| Running Env Steps   | 950500   |
| Running Forward KL  | -5.27    |
| Running Reverse KL  | 5.39     |
| Running Update Time | 1901     |
----------------------------------
2025-02-01 19:53:00.829426 Eastern Standard Time
| Itration            | 1902     |
| Real Det Return     | 546      |
| Real Sto Return     | 481      |
| Reward Loss         | -56.5    |
| Running Env Steps   | 951000   |
| Running Forward KL  | -5.01    |
| Running Reverse KL  | 5.2      |
| Running Update Time | 1902     |
----------------------------------
2025-02-01 19:53:16.429535 Eastern Standard Time
| Itration            | 1903     |
| Real Det Return     | 521      |
| Real Sto Return     | 487      |
| Reward Loss         | -64.1    |
| Running Env Steps   | 951500   |
| Running Forward KL  | -4.95    |
| Running Reverse KL  | 5.43     |
| Running Update Time | 1903     |
----------------------------------
2025-02-01 19:53:31.995841 Eastern Standard Time
| Itration            | 1904     |
| Real Det Return     | 530      |
| Real Sto Return     | 478      |
| Reward Loss         | -49.2    |
| Running Env Steps   | 952000   |
| Running Forward KL  | -5.87    |
| Running Reverse KL  | 5.41     |
| Running Update Time | 1904     |
----------------------------------
2025-02-01 19:53:47.587262 Eastern Standard Time
| Itration            | 1905     |
| Real Det Return     | 533      |
| Real Sto Return     | 470      |
| Reward Loss         | -49.5    |
| Running Env Steps   | 952500   |
| Running Forward KL  | -5.55    |
| Running Reverse KL  | 5.57     |
| Running Update Time | 1905     |
----------------------------------
2025-02-01 19:54:03.169310 Eastern Standard Time
| Itration            | 1906     |
| Real Det Return     | 546      |
| Real Sto Return     | 495      |
| Reward Loss         | -40.3    |
| Running Env Steps   | 953000   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 5.44     |
| Running Update Time | 1906     |
----------------------------------
2025-02-01 19:54:18.768262 Eastern Standard Time
| Itration            | 1907     |
| Real Det Return     | 543      |
| Real Sto Return     | 485      |
| Reward Loss         | -53.6    |
| Running Env Steps   | 953500   |
| Running Forward KL  | -4.93    |
| Running Reverse KL  | 5.72     |
| Running Update Time | 1907     |
----------------------------------
2025-02-01 19:54:34.295570 Eastern Standard Time
| Itration            | 1908     |
| Real Det Return     | 538      |
| Real Sto Return     | 487      |
| Reward Loss         | -54.2    |
| Running Env Steps   | 954000   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 4.89     |
| Running Update Time | 1908     |
----------------------------------
2025-02-01 19:54:49.857235 Eastern Standard Time
| Itration            | 1909     |
| Real Det Return     | 538      |
| Real Sto Return     | 489      |
| Reward Loss         | -45.1    |
| Running Env Steps   | 954500   |
| Running Forward KL  | -5.63    |
| Running Reverse KL  | 5.16     |
| Running Update Time | 1909     |
----------------------------------
2025-02-01 19:55:05.447790 Eastern Standard Time
| Itration            | 1910     |
| Real Det Return     | 532      |
| Real Sto Return     | 481      |
| Reward Loss         | -44.8    |
| Running Env Steps   | 955000   |
| Running Forward KL  | -5.63    |
| Running Reverse KL  | 5.7      |
| Running Update Time | 1910     |
----------------------------------
2025-02-01 19:55:21.056990 Eastern Standard Time
| Itration            | 1911     |
| Real Det Return     | 528      |
| Real Sto Return     | 482      |
| Reward Loss         | -56.8    |
| Running Env Steps   | 955500   |
| Running Forward KL  | -5.38    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1911     |
----------------------------------
2025-02-01 19:55:36.664583 Eastern Standard Time
| Itration            | 1912     |
| Real Det Return     | 523      |
| Real Sto Return     | 473      |
| Reward Loss         | -52.1    |
| Running Env Steps   | 956000   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 5.36     |
| Running Update Time | 1912     |
----------------------------------
2025-02-01 19:55:52.172644 Eastern Standard Time
| Itration            | 1913     |
| Real Det Return     | 541      |
| Real Sto Return     | 493      |
| Reward Loss         | -35.4    |
| Running Env Steps   | 956500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1913     |
----------------------------------
2025-02-01 19:56:07.728838 Eastern Standard Time
| Itration            | 1914     |
| Real Det Return     | 524      |
| Real Sto Return     | 466      |
| Reward Loss         | -55.8    |
| Running Env Steps   | 957000   |
| Running Forward KL  | -5.41    |
| Running Reverse KL  | 5.08     |
| Running Update Time | 1914     |
----------------------------------
2025-02-01 19:56:23.339988 Eastern Standard Time
| Itration            | 1915     |
| Real Det Return     | 534      |
| Real Sto Return     | 476      |
| Reward Loss         | -46.8    |
| Running Env Steps   | 957500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1915     |
----------------------------------
2025-02-01 19:56:38.866770 Eastern Standard Time
| Itration            | 1916     |
| Real Det Return     | 530      |
| Real Sto Return     | 488      |
| Reward Loss         | -58.2    |
| Running Env Steps   | 958000   |
| Running Forward KL  | -5.77    |
| Running Reverse KL  | 5        |
| Running Update Time | 1916     |
----------------------------------
2025-02-01 19:56:54.412894 Eastern Standard Time
| Itration            | 1917     |
| Real Det Return     | 524      |
| Real Sto Return     | 483      |
| Reward Loss         | -56.1    |
| Running Env Steps   | 958500   |
| Running Forward KL  | -5.22    |
| Running Reverse KL  | 5.24     |
| Running Update Time | 1917     |
----------------------------------
2025-02-01 19:57:10.045625 Eastern Standard Time
| Itration            | 1918     |
| Real Det Return     | 532      |
| Real Sto Return     | 469      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 959000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1918     |
----------------------------------
2025-02-01 19:57:25.639378 Eastern Standard Time
| Itration            | 1919     |
| Real Det Return     | 527      |
| Real Sto Return     | 475      |
| Reward Loss         | -60.7    |
| Running Env Steps   | 959500   |
| Running Forward KL  | -5.4     |
| Running Reverse KL  | 5.37     |
| Running Update Time | 1919     |
----------------------------------
2025-02-01 19:57:41.277145 Eastern Standard Time
| Itration            | 1920     |
| Real Det Return     | 532      |
| Real Sto Return     | 481      |
| Reward Loss         | -51.4    |
| Running Env Steps   | 960000   |
| Running Forward KL  | -4.92    |
| Running Reverse KL  | 5.54     |
| Running Update Time | 1920     |
----------------------------------
2025-02-01 19:57:56.871758 Eastern Standard Time
| Itration            | 1921     |
| Real Det Return     | 524      |
| Real Sto Return     | 474      |
| Reward Loss         | -35.6    |
| Running Env Steps   | 960500   |
| Running Forward KL  | -5.36    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 1921     |
----------------------------------
2025-02-01 19:58:12.462859 Eastern Standard Time
| Itration            | 1922     |
| Real Det Return     | 525      |
| Real Sto Return     | 484      |
| Reward Loss         | -69.6    |
| Running Env Steps   | 961000   |
| Running Forward KL  | -4.77    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1922     |
----------------------------------
2025-02-01 19:58:28.283574 Eastern Standard Time
| Itration            | 1923     |
| Real Det Return     | 526      |
| Real Sto Return     | 481      |
| Reward Loss         | -50.5    |
| Running Env Steps   | 961500   |
| Running Forward KL  | -5.53    |
| Running Reverse KL  | 5.15     |
| Running Update Time | 1923     |
----------------------------------
2025-02-01 19:58:43.976087 Eastern Standard Time
| Itration            | 1924     |
| Real Det Return     | 529      |
| Real Sto Return     | 480      |
| Reward Loss         | -42.1    |
| Running Env Steps   | 962000   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 5.3      |
| Running Update Time | 1924     |
----------------------------------
2025-02-01 19:58:59.596504 Eastern Standard Time
| Itration            | 1925     |
| Real Det Return     | 512      |
| Real Sto Return     | 476      |
| Reward Loss         | -74.4    |
| Running Env Steps   | 962500   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 4.92     |
| Running Update Time | 1925     |
----------------------------------
2025-02-01 19:59:15.293873 Eastern Standard Time
| Itration            | 1926     |
| Real Det Return     | 532      |
| Real Sto Return     | 484      |
| Reward Loss         | -53      |
| Running Env Steps   | 963000   |
| Running Forward KL  | -5.84    |
| Running Reverse KL  | 4.97     |
| Running Update Time | 1926     |
----------------------------------
2025-02-01 19:59:31.057524 Eastern Standard Time
| Itration            | 1927     |
| Real Det Return     | 537      |
| Real Sto Return     | 492      |
| Reward Loss         | -60.4    |
| Running Env Steps   | 963500   |
| Running Forward KL  | -5.45    |
| Running Reverse KL  | 5.56     |
| Running Update Time | 1927     |
----------------------------------
2025-02-01 19:59:46.681362 Eastern Standard Time
| Itration            | 1928     |
| Real Det Return     | 532      |
| Real Sto Return     | 483      |
| Reward Loss         | -55.7    |
| Running Env Steps   | 964000   |
| Running Forward KL  | -5.05    |
| Running Reverse KL  | 5.09     |
| Running Update Time | 1928     |
----------------------------------
2025-02-01 20:00:02.273692 Eastern Standard Time
| Itration            | 1929     |
| Real Det Return     | 517      |
| Real Sto Return     | 472      |
| Reward Loss         | -61.2    |
| Running Env Steps   | 964500   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 1929     |
----------------------------------
2025-02-01 20:00:17.900372 Eastern Standard Time
| Itration            | 1930     |
| Real Det Return     | 525      |
| Real Sto Return     | 482      |
| Reward Loss         | -55.5    |
| Running Env Steps   | 965000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 5.09     |
| Running Update Time | 1930     |
----------------------------------
2025-02-01 20:00:33.572576 Eastern Standard Time
| Itration            | 1931     |
| Real Det Return     | 544      |
| Real Sto Return     | 479      |
| Reward Loss         | -55.5    |
| Running Env Steps   | 965500   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 4.86     |
| Running Update Time | 1931     |
----------------------------------
2025-02-01 20:00:49.231993 Eastern Standard Time
| Itration            | 1932     |
| Real Det Return     | 547      |
| Real Sto Return     | 489      |
| Reward Loss         | -55.5    |
| Running Env Steps   | 966000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 1932     |
----------------------------------
2025-02-01 20:01:04.885294 Eastern Standard Time
| Itration            | 1933     |
| Real Det Return     | 534      |
| Real Sto Return     | 489      |
| Reward Loss         | -48.5    |
| Running Env Steps   | 966500   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 5.37     |
| Running Update Time | 1933     |
----------------------------------
2025-02-01 20:01:20.481208 Eastern Standard Time
| Itration            | 1934     |
| Real Det Return     | 537      |
| Real Sto Return     | 481      |
| Reward Loss         | -52.9    |
| Running Env Steps   | 967000   |
| Running Forward KL  | -5.44    |
| Running Reverse KL  | 5.09     |
| Running Update Time | 1934     |
----------------------------------
2025-02-01 20:01:36.067433 Eastern Standard Time
| Itration            | 1935     |
| Real Det Return     | 524      |
| Real Sto Return     | 490      |
| Reward Loss         | -52      |
| Running Env Steps   | 967500   |
| Running Forward KL  | -5.85    |
| Running Reverse KL  | 5.05     |
| Running Update Time | 1935     |
----------------------------------
2025-02-01 20:01:51.730984 Eastern Standard Time
| Itration            | 1936     |
| Real Det Return     | 519      |
| Real Sto Return     | 479      |
| Reward Loss         | -58.3    |
| Running Env Steps   | 968000   |
| Running Forward KL  | -5.44    |
| Running Reverse KL  | 5.19     |
| Running Update Time | 1936     |
----------------------------------
2025-02-01 20:02:07.690548 Eastern Standard Time
| Itration            | 1937     |
| Real Det Return     | 530      |
| Real Sto Return     | 483      |
| Reward Loss         | -62.1    |
| Running Env Steps   | 968500   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 5.03     |
| Running Update Time | 1937     |
----------------------------------
2025-02-01 20:02:23.400029 Eastern Standard Time
| Itration            | 1938     |
| Real Det Return     | 533      |
| Real Sto Return     | 491      |
| Reward Loss         | -50.3    |
| Running Env Steps   | 969000   |
| Running Forward KL  | -5.33    |
| Running Reverse KL  | 5.56     |
| Running Update Time | 1938     |
----------------------------------
2025-02-01 20:02:39.109433 Eastern Standard Time
| Itration            | 1939     |
| Real Det Return     | 525      |
| Real Sto Return     | 472      |
| Reward Loss         | -57.1    |
| Running Env Steps   | 969500   |
| Running Forward KL  | -5.16    |
| Running Reverse KL  | 5.18     |
| Running Update Time | 1939     |
----------------------------------
2025-02-01 20:02:54.730168 Eastern Standard Time
| Itration            | 1940     |
| Real Det Return     | 530      |
| Real Sto Return     | 487      |
| Reward Loss         | -54.9    |
| Running Env Steps   | 970000   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 5.48     |
| Running Update Time | 1940     |
----------------------------------
2025-02-01 20:03:10.389134 Eastern Standard Time
| Itration            | 1941     |
| Real Det Return     | 525      |
| Real Sto Return     | 480      |
| Reward Loss         | -54.3    |
| Running Env Steps   | 970500   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 5.41     |
| Running Update Time | 1941     |
----------------------------------
2025-02-01 20:03:25.967595 Eastern Standard Time
| Itration            | 1942     |
| Real Det Return     | 537      |
| Real Sto Return     | 480      |
| Reward Loss         | -51.6    |
| Running Env Steps   | 971000   |
| Running Forward KL  | -5.24    |
| Running Reverse KL  | 5.18     |
| Running Update Time | 1942     |
----------------------------------
2025-02-01 20:03:41.616418 Eastern Standard Time
| Itration            | 1943     |
| Real Det Return     | 524      |
| Real Sto Return     | 472      |
| Reward Loss         | -67.2    |
| Running Env Steps   | 971500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 1943     |
----------------------------------
2025-02-01 20:03:57.191635 Eastern Standard Time
| Itration            | 1944     |
| Real Det Return     | 526      |
| Real Sto Return     | 473      |
| Reward Loss         | -57.3    |
| Running Env Steps   | 972000   |
| Running Forward KL  | -4.78    |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1944     |
----------------------------------
2025-02-01 20:04:12.821067 Eastern Standard Time
| Itration            | 1945     |
| Real Det Return     | 536      |
| Real Sto Return     | 486      |
| Reward Loss         | -58.5    |
| Running Env Steps   | 972500   |
| Running Forward KL  | -5.18    |
| Running Reverse KL  | 4.91     |
| Running Update Time | 1945     |
----------------------------------
2025-02-01 20:04:28.387682 Eastern Standard Time
| Itration            | 1946     |
| Real Det Return     | 538      |
| Real Sto Return     | 483      |
| Reward Loss         | -56.6    |
| Running Env Steps   | 973000   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 4.54     |
| Running Update Time | 1946     |
----------------------------------
2025-02-01 20:04:43.913529 Eastern Standard Time
| Itration            | 1947     |
| Real Det Return     | 523      |
| Real Sto Return     | 473      |
| Reward Loss         | -60.5    |
| Running Env Steps   | 973500   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 5.55     |
| Running Update Time | 1947     |
----------------------------------
2025-02-01 20:04:59.439289 Eastern Standard Time
| Itration            | 1948     |
| Real Det Return     | 532      |
| Real Sto Return     | 475      |
| Reward Loss         | -69      |
| Running Env Steps   | 974000   |
| Running Forward KL  | -5.38    |
| Running Reverse KL  | 4.95     |
| Running Update Time | 1948     |
----------------------------------
2025-02-01 20:05:14.981778 Eastern Standard Time
| Itration            | 1949     |
| Real Det Return     | 542      |
| Real Sto Return     | 486      |
| Reward Loss         | -58      |
| Running Env Steps   | 974500   |
| Running Forward KL  | -5.17    |
| Running Reverse KL  | 5.34     |
| Running Update Time | 1949     |
----------------------------------
2025-02-01 20:05:30.530923 Eastern Standard Time
| Itration            | 1950     |
| Real Det Return     | 521      |
| Real Sto Return     | 482      |
| Reward Loss         | -54.9    |
| Running Env Steps   | 975000   |
| Running Forward KL  | -5.54    |
| Running Reverse KL  | 4.81     |
| Running Update Time | 1950     |
----------------------------------
2025-02-01 20:05:46.169451 Eastern Standard Time
| Itration            | 1951     |
| Real Det Return     | 531      |
| Real Sto Return     | 480      |
| Reward Loss         | -58.1    |
| Running Env Steps   | 975500   |
| Running Forward KL  | -5.28    |
| Running Reverse KL  | 5.13     |
| Running Update Time | 1951     |
----------------------------------
2025-02-01 20:06:01.844764 Eastern Standard Time
| Itration            | 1952     |
| Real Det Return     | 535      |
| Real Sto Return     | 488      |
| Reward Loss         | -64.5    |
| Running Env Steps   | 976000   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 5.11     |
| Running Update Time | 1952     |
----------------------------------
2025-02-01 20:06:17.525944 Eastern Standard Time
| Itration            | 1953     |
| Real Det Return     | 524      |
| Real Sto Return     | 473      |
| Reward Loss         | -62.4    |
| Running Env Steps   | 976500   |
| Running Forward KL  | -5.11    |
| Running Reverse KL  | 4.83     |
| Running Update Time | 1953     |
----------------------------------
2025-02-01 20:06:33.127980 Eastern Standard Time
| Itration            | 1954     |
| Real Det Return     | 536      |
| Real Sto Return     | 485      |
| Reward Loss         | -52.2    |
| Running Env Steps   | 977000   |
| Running Forward KL  | -5.63    |
| Running Reverse KL  | 4.51     |
| Running Update Time | 1954     |
----------------------------------
2025-02-01 20:06:48.808865 Eastern Standard Time
| Itration            | 1955     |
| Real Det Return     | 517      |
| Real Sto Return     | 468      |
| Reward Loss         | -53.2    |
| Running Env Steps   | 977500   |
| Running Forward KL  | -5.08    |
| Running Reverse KL  | 5.63     |
| Running Update Time | 1955     |
----------------------------------
2025-02-01 20:07:04.528624 Eastern Standard Time
| Itration            | 1956     |
| Real Det Return     | 527      |
| Real Sto Return     | 476      |
| Reward Loss         | -61.8    |
| Running Env Steps   | 978000   |
| Running Forward KL  | -5.4     |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1956     |
----------------------------------
2025-02-01 20:07:20.212655 Eastern Standard Time
| Itration            | 1957     |
| Real Det Return     | 534      |
| Real Sto Return     | 481      |
| Reward Loss         | -56      |
| Running Env Steps   | 978500   |
| Running Forward KL  | -5.09    |
| Running Reverse KL  | 5.37     |
| Running Update Time | 1957     |
----------------------------------
2025-02-01 20:07:35.903156 Eastern Standard Time
| Itration            | 1958     |
| Real Det Return     | 527      |
| Real Sto Return     | 474      |
| Reward Loss         | -62.1    |
| Running Env Steps   | 979000   |
| Running Forward KL  | -5.23    |
| Running Reverse KL  | 5.37     |
| Running Update Time | 1958     |
----------------------------------
2025-02-01 20:07:51.556850 Eastern Standard Time
| Itration            | 1959     |
| Real Det Return     | 531      |
| Real Sto Return     | 476      |
| Reward Loss         | -68.2    |
| Running Env Steps   | 979500   |
| Running Forward KL  | -5.03    |
| Running Reverse KL  | 4.56     |
| Running Update Time | 1959     |
----------------------------------
2025-02-01 20:08:07.272830 Eastern Standard Time
| Itration            | 1960     |
| Real Det Return     | 534      |
| Real Sto Return     | 493      |
| Reward Loss         | -65.8    |
| Running Env Steps   | 980000   |
| Running Forward KL  | -5.29    |
| Running Reverse KL  | 4.79     |
| Running Update Time | 1960     |
----------------------------------
2025-02-01 20:08:22.947426 Eastern Standard Time
| Itration            | 1961     |
| Real Det Return     | 540      |
| Real Sto Return     | 468      |
| Reward Loss         | -62.5    |
| Running Env Steps   | 980500   |
| Running Forward KL  | -5.42    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1961     |
----------------------------------
2025-02-01 20:08:38.705865 Eastern Standard Time
| Itration            | 1962     |
| Real Det Return     | 543      |
| Real Sto Return     | 485      |
| Reward Loss         | -55.7    |
| Running Env Steps   | 981000   |
| Running Forward KL  | -5.47    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1962     |
----------------------------------
2025-02-01 20:08:54.417609 Eastern Standard Time
| Itration            | 1963     |
| Real Det Return     | 534      |
| Real Sto Return     | 474      |
| Reward Loss         | -59.9    |
| Running Env Steps   | 981500   |
| Running Forward KL  | -5.39    |
| Running Reverse KL  | 5.03     |
| Running Update Time | 1963     |
----------------------------------
2025-02-01 20:09:10.068264 Eastern Standard Time
| Itration            | 1964     |
| Real Det Return     | 528      |
| Real Sto Return     | 469      |
| Reward Loss         | -64.5    |
| Running Env Steps   | 982000   |
| Running Forward KL  | -5.25    |
| Running Reverse KL  | 4.98     |
| Running Update Time | 1964     |
----------------------------------
2025-02-01 20:09:25.817707 Eastern Standard Time
| Itration            | 1965     |
| Real Det Return     | 541      |
| Real Sto Return     | 482      |
| Reward Loss         | -54.2    |
| Running Env Steps   | 982500   |
| Running Forward KL  | -5.47    |
| Running Reverse KL  | 5.25     |
| Running Update Time | 1965     |
----------------------------------
2025-02-01 20:09:41.478608 Eastern Standard Time
| Itration            | 1966     |
| Real Det Return     | 520      |
| Real Sto Return     | 475      |
| Reward Loss         | -66.8    |
| Running Env Steps   | 983000   |
| Running Forward KL  | -4.99    |
| Running Reverse KL  | 4.59     |
| Running Update Time | 1966     |
----------------------------------
2025-02-01 20:09:57.182135 Eastern Standard Time
| Itration            | 1967     |
| Real Det Return     | 514      |
| Real Sto Return     | 474      |
| Reward Loss         | -67.4    |
| Running Env Steps   | 983500   |
| Running Forward KL  | -4.63    |
| Running Reverse KL  | 5.62     |
| Running Update Time | 1967     |
----------------------------------
2025-02-01 20:10:12.946798 Eastern Standard Time
| Itration            | 1968     |
| Real Det Return     | 533      |
| Real Sto Return     | 485      |
| Reward Loss         | -56.5    |
| Running Env Steps   | 984000   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 1968     |
----------------------------------
2025-02-01 20:10:28.666606 Eastern Standard Time
| Itration            | 1969     |
| Real Det Return     | 527      |
| Real Sto Return     | 474      |
| Reward Loss         | -69.9    |
| Running Env Steps   | 984500   |
| Running Forward KL  | -5.3     |
| Running Reverse KL  | 5.09     |
| Running Update Time | 1969     |
----------------------------------
2025-02-01 20:10:44.339815 Eastern Standard Time
| Itration            | 1970     |
| Real Det Return     | 536      |
| Real Sto Return     | 471      |
| Reward Loss         | -57.5    |
| Running Env Steps   | 985000   |
| Running Forward KL  | -5.4     |
| Running Reverse KL  | 5.39     |
| Running Update Time | 1970     |
----------------------------------
2025-02-01 20:11:00.028666 Eastern Standard Time
| Itration            | 1971     |
| Real Det Return     | 530      |
| Real Sto Return     | 482      |
| Reward Loss         | -66.5    |
| Running Env Steps   | 985500   |
| Running Forward KL  | -4.87    |
| Running Reverse KL  | 4.87     |
| Running Update Time | 1971     |
----------------------------------
2025-02-01 20:11:15.746335 Eastern Standard Time
| Itration            | 1972     |
| Real Det Return     | 541      |
| Real Sto Return     | 489      |
| Reward Loss         | -51.4    |
| Running Env Steps   | 986000   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 4.96     |
| Running Update Time | 1972     |
----------------------------------
2025-02-01 20:11:31.377674 Eastern Standard Time
| Itration            | 1973     |
| Real Det Return     | 553      |
| Real Sto Return     | 494      |
| Reward Loss         | -56      |
| Running Env Steps   | 986500   |
| Running Forward KL  | -5.34    |
| Running Reverse KL  | 5.02     |
| Running Update Time | 1973     |
----------------------------------
2025-02-01 20:11:47.040870 Eastern Standard Time
| Itration            | 1974     |
| Real Det Return     | 538      |
| Real Sto Return     | 485      |
| Reward Loss         | -62.3    |
| Running Env Steps   | 987000   |
| Running Forward KL  | -5.53    |
| Running Reverse KL  | 4.76     |
| Running Update Time | 1974     |
----------------------------------
2025-02-01 20:12:02.641412 Eastern Standard Time
| Itration            | 1975     |
| Real Det Return     | 545      |
| Real Sto Return     | 489      |
| Reward Loss         | -60.9    |
| Running Env Steps   | 987500   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 4.63     |
| Running Update Time | 1975     |
----------------------------------
2025-02-01 20:12:18.385276 Eastern Standard Time
| Itration            | 1976     |
| Real Det Return     | 535      |
| Real Sto Return     | 489      |
| Reward Loss         | -47.1    |
| Running Env Steps   | 988000   |
| Running Forward KL  | -5.44    |
| Running Reverse KL  | 4.94     |
| Running Update Time | 1976     |
----------------------------------
2025-02-01 20:12:34.044297 Eastern Standard Time
| Itration            | 1977     |
| Real Det Return     | 542      |
| Real Sto Return     | 488      |
| Reward Loss         | -54.7    |
| Running Env Steps   | 988500   |
| Running Forward KL  | -5.53    |
| Running Reverse KL  | 4.79     |
| Running Update Time | 1977     |
----------------------------------
2025-02-01 20:12:49.666727 Eastern Standard Time
| Itration            | 1978     |
| Real Det Return     | 543      |
| Real Sto Return     | 484      |
| Reward Loss         | -53.1    |
| Running Env Steps   | 989000   |
| Running Forward KL  | -5.21    |
| Running Reverse KL  | 5.19     |
| Running Update Time | 1978     |
----------------------------------
2025-02-01 20:13:05.419787 Eastern Standard Time
| Itration            | 1979     |
| Real Det Return     | 532      |
| Real Sto Return     | 495      |
| Reward Loss         | -54.6    |
| Running Env Steps   | 989500   |
| Running Forward KL  | -5.74    |
| Running Reverse KL  | 5.5      |
| Running Update Time | 1979     |
----------------------------------
2025-02-01 20:13:21.120303 Eastern Standard Time
| Itration            | 1980     |
| Real Det Return     | 516      |
| Real Sto Return     | 471      |
| Reward Loss         | -64.9    |
| Running Env Steps   | 990000   |
| Running Forward KL  | -5.26    |
| Running Reverse KL  | 5.21     |
| Running Update Time | 1980     |
----------------------------------
2025-02-01 20:13:36.806756 Eastern Standard Time
| Itration            | 1981     |
| Real Det Return     | 538      |
| Real Sto Return     | 473      |
| Reward Loss         | -48.7    |
| Running Env Steps   | 990500   |
| Running Forward KL  | -5.37    |
| Running Reverse KL  | 4.84     |
| Running Update Time | 1981     |
----------------------------------
2025-02-01 20:13:52.430155 Eastern Standard Time
| Itration            | 1982     |
| Real Det Return     | 532      |
| Real Sto Return     | 490      |
| Reward Loss         | -54.9    |
| Running Env Steps   | 991000   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 5.17     |
| Running Update Time | 1982     |
----------------------------------
2025-02-01 20:14:08.135729 Eastern Standard Time
| Itration            | 1983     |
| Real Det Return     | 551      |
| Real Sto Return     | 493      |
| Reward Loss         | -60      |
| Running Env Steps   | 991500   |
| Running Forward KL  | -5.42    |
| Running Reverse KL  | 5.52     |
| Running Update Time | 1983     |
----------------------------------
2025-02-01 20:14:23.909688 Eastern Standard Time
| Itration            | 1984     |
| Real Det Return     | 559      |
| Real Sto Return     | 481      |
| Reward Loss         | -54.8    |
| Running Env Steps   | 992000   |
| Running Forward KL  | -5.78    |
| Running Reverse KL  | 5.38     |
| Running Update Time | 1984     |
----------------------------------
2025-02-01 20:14:39.592934 Eastern Standard Time
| Itration            | 1985     |
| Real Det Return     | 550      |
| Real Sto Return     | 488      |
| Reward Loss         | -57.6    |
| Running Env Steps   | 992500   |
| Running Forward KL  | -5.07    |
| Running Reverse KL  | 4.78     |
| Running Update Time | 1985     |
----------------------------------
2025-02-01 20:14:55.173572 Eastern Standard Time
| Itration            | 1986     |
| Real Det Return     | 531      |
| Real Sto Return     | 477      |
| Reward Loss         | -63.8    |
| Running Env Steps   | 993000   |
| Running Forward KL  | -5.37    |
| Running Reverse KL  | 5.3      |
| Running Update Time | 1986     |
----------------------------------
2025-02-01 20:15:10.978014 Eastern Standard Time
| Itration            | 1987     |
| Real Det Return     | 532      |
| Real Sto Return     | 484      |
| Reward Loss         | -64.9    |
| Running Env Steps   | 993500   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 4.85     |
| Running Update Time | 1987     |
----------------------------------
2025-02-01 20:15:26.691965 Eastern Standard Time
| Itration            | 1988     |
| Real Det Return     | 537      |
| Real Sto Return     | 493      |
| Reward Loss         | -66.9    |
| Running Env Steps   | 994000   |
| Running Forward KL  | -5.31    |
| Running Reverse KL  | 4.9      |
| Running Update Time | 1988     |
----------------------------------
2025-02-01 20:15:42.379806 Eastern Standard Time
| Itration            | 1989     |
| Real Det Return     | 540      |
| Real Sto Return     | 485      |
| Reward Loss         | -73.3    |
| Running Env Steps   | 994500   |
| Running Forward KL  | -4.98    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1989     |
----------------------------------
2025-02-01 20:15:58.116368 Eastern Standard Time
| Itration            | 1990     |
| Real Det Return     | 537      |
| Real Sto Return     | 480      |
| Reward Loss         | -59.4    |
| Running Env Steps   | 995000   |
| Running Forward KL  | -5.15    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 1990     |
----------------------------------
2025-02-01 20:16:13.767082 Eastern Standard Time
| Itration            | 1991     |
| Real Det Return     | 547      |
| Real Sto Return     | 496      |
| Reward Loss         | -47.8    |
| Running Env Steps   | 995500   |
| Running Forward KL  | -5.1     |
| Running Reverse KL  | 5.42     |
| Running Update Time | 1991     |
----------------------------------
2025-02-01 20:16:29.404026 Eastern Standard Time
| Itration            | 1992     |
| Real Det Return     | 541      |
| Real Sto Return     | 475      |
| Reward Loss         | -69.4    |
| Running Env Steps   | 996000   |
| Running Forward KL  | -5.35    |
| Running Reverse KL  | 5.29     |
| Running Update Time | 1992     |
----------------------------------
2025-02-01 20:16:45.115427 Eastern Standard Time
| Itration            | 1993     |
| Real Det Return     | 533      |
| Real Sto Return     | 488      |
| Reward Loss         | -67.8    |
| Running Env Steps   | 996500   |
| Running Forward KL  | -5.37    |
| Running Reverse KL  | 5.14     |
| Running Update Time | 1993     |
----------------------------------
2025-02-01 20:17:00.813485 Eastern Standard Time
| Itration            | 1994     |
| Real Det Return     | 533      |
| Real Sto Return     | 489      |
| Reward Loss         | -62.4    |
| Running Env Steps   | 997000   |
| Running Forward KL  | -4.96    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1994     |
----------------------------------
2025-02-01 20:17:16.578483 Eastern Standard Time
| Itration            | 1995     |
| Real Det Return     | 540      |
| Real Sto Return     | 481      |
| Reward Loss         | -64.9    |
| Running Env Steps   | 997500   |
| Running Forward KL  | -5.48    |
| Running Reverse KL  | 5        |
| Running Update Time | 1995     |
----------------------------------
2025-02-01 20:17:32.245067 Eastern Standard Time
| Itration            | 1996     |
| Real Det Return     | 543      |
| Real Sto Return     | 492      |
| Reward Loss         | -57.9    |
| Running Env Steps   | 998000   |
| Running Forward KL  | -5.88    |
| Running Reverse KL  | 5.12     |
| Running Update Time | 1996     |
----------------------------------
2025-02-01 20:17:48.007688 Eastern Standard Time
| Itration            | 1997     |
| Real Det Return     | 545      |
| Real Sto Return     | 504      |
| Reward Loss         | -53.6    |
| Running Env Steps   | 998500   |
| Running Forward KL  | -5.04    |
| Running Reverse KL  | 5.34     |
| Running Update Time | 1997     |
----------------------------------
2025-02-01 20:18:03.842321 Eastern Standard Time
| Itration            | 1998     |
| Real Det Return     | 539      |
| Real Sto Return     | 485      |
| Reward Loss         | -59.1    |
| Running Env Steps   | 999000   |
| Running Forward KL  | -5.19    |
| Running Reverse KL  | 5.01     |
| Running Update Time | 1998     |
----------------------------------
2025-02-01 20:18:19.514182 Eastern Standard Time
| Itration            | 1999     |
| Real Det Return     | 538      |
| Real Sto Return     | 486      |
| Reward Loss         | -58      |
| Running Env Steps   | 999500   |
| Running Forward KL  | -5.57    |
| Running Reverse KL  | 5.26     |
| Running Update Time | 1999     |
----------------------------------
