2025-09-16 08:57:25,901 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.000-delay_9
2025-09-16 08:57:25,901 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.000-delay_9
2025-09-16 08:57:25,901 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x149e607d0690>}
2025-09-16 08:57:25,901 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-16 08:57:25,905 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-16 08:57:25,922 baseline-bpql-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=71, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-16 08:57:25,922 baseline-bpql-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 08:57:26,733 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-16 08:57:26,734 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-16 08:59:03,449 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 08:59:12,642 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -356.16385 ± 88.119
2025-09-16 08:59:12,642 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-421.1918, -146.57863, -447.5194, -439.13437, -415.48264, -390.51617, -303.5323, -335.55026, -280.51044, -381.62225]
2025-09-16 08:59:12,642 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 08:59:12,643 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-356.16) for latency 9
2025-09-16 08:59:12,649 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 54 minutes, 45 seconds)
2025-09-16 09:00:54,406 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:01:03,554 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -153.21019 ± 40.635
2025-09-16 09:01:03,554 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-171.66133, -119.134415, -106.80705, -223.8395, -154.36923, -187.80597, -142.87839, -85.731445, -196.87051, -143.00414]
2025-09-16 09:01:03,554 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:01:03,554 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-153.21) for latency 9
2025-09-16 09:01:03,559 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 57 minutes, 4 seconds)
2025-09-16 09:02:45,469 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:02:54,651 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -130.05380 ± 59.968
2025-09-16 09:02:54,651 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-98.02303, -181.53337, -210.57954, -99.58393, -96.96728, -236.56895, -26.251886, -138.94374, -120.625084, -91.46126]
2025-09-16 09:02:54,651 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:02:54,651 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-130.05) for latency 9
2025-09-16 09:02:54,665 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 56 minutes, 43 seconds)
2025-09-16 09:04:35,667 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:04:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 104.94202 ± 99.320
2025-09-16 09:04:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [56.95219, 25.333466, 250.89075, 185.75488, 66.8837, 138.69073, -119.28309, 197.77684, 119.86749, 126.55324]
2025-09-16 09:04:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:04:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (104.94) for latency 9
2025-09-16 09:04:44,888 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 55 minutes, 15 seconds)
2025-09-16 09:06:25,859 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:06:35,013 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1559.06213 ± 118.985
2025-09-16 09:06:35,013 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1334.9785, 1580.6005, 1399.9545, 1571.5438, 1527.4297, 1725.5674, 1663.6115, 1650.847, 1475.3491, 1660.7383]
2025-09-16 09:06:35,013 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:06:35,013 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1559.06) for latency 9
2025-09-16 09:06:35,033 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 53 minutes, 37 seconds)
2025-09-16 09:08:16,791 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:08:25,900 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 530.85278 ± 437.823
2025-09-16 09:08:25,900 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1804.9324, 400.8711, 530.1251, 362.3459, 323.20724, 411.11993, 258.44565, 194.19505, 475.664, 547.62164]
2025-09-16 09:08:25,900 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:08:25,905 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 53 minutes, 21 seconds)
2025-09-16 09:10:07,007 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:10:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1733.60999 ± 682.190
2025-09-16 09:10:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [379.3511, 1464.2517, 2644.4893, 2070.5671, 1543.3931, 2252.0854, 2076.2642, 2306.3403, 717.8715, 1881.4861]
2025-09-16 09:10:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:10:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1733.61) for latency 9
2025-09-16 09:10:17,502 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 51 minutes, 43 seconds)
2025-09-16 09:11:59,666 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:12:09,927 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 929.87140 ± 616.235
2025-09-16 09:12:09,927 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [465.46844, 969.31885, 728.1835, 1739.3923, 434.42654, 395.04535, 438.16708, 1050.1876, 2361.9885, 716.5364]
2025-09-16 09:12:09,927 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:12:09,934 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 50 minutes, 16 seconds)
2025-09-16 09:13:50,671 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:13:59,709 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1716.62427 ± 508.169
2025-09-16 09:13:59,709 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1417.8501, 2112.5632, 885.7172, 2138.453, 2276.7817, 2050.3572, 1809.7261, 1166.4296, 1052.6678, 2255.6968]
2025-09-16 09:13:59,710 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:13:59,713 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 48 minutes, 17 seconds)
2025-09-16 09:15:39,995 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:15:49,115 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1958.22656 ± 456.690
2025-09-16 09:15:49,115 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2222.8582, 2267.1184, 2037.6283, 1099.521, 1053.7098, 2361.395, 2347.7388, 2024.1085, 2020.4869, 2147.701]
2025-09-16 09:15:49,115 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:15:49,115 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1958.23) for latency 9
2025-09-16 09:15:49,119 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 46 minutes, 13 seconds)
2025-09-16 09:17:30,113 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:17:39,282 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1865.31604 ± 668.067
2025-09-16 09:17:39,283 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2643.838, 2642.564, 1105.281, 971.15717, 2442.055, 2573.8882, 2199.7817, 1111.4534, 1663.7072, 1299.4344]
2025-09-16 09:17:39,283 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:17:39,287 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 44 minutes, 10 seconds)
2025-09-16 09:19:19,706 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:19:28,784 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1820.08752 ± 659.138
2025-09-16 09:19:28,784 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2558.7783, 995.45844, 2216.144, 953.5237, 1097.8712, 2373.9668, 2202.866, 2791.5017, 1722.8031, 1287.9634]
2025-09-16 09:19:28,784 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:19:28,816 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 41 minutes, 43 seconds)
2025-09-16 09:21:09,973 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:21:18,991 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2053.18604 ± 661.534
2025-09-16 09:21:18,991 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2593.9543, 1335.2969, 1352.0553, 2939.0269, 2854.2344, 1363.8617, 1957.3042, 2890.1926, 1841.3466, 1404.5891]
2025-09-16 09:21:18,991 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:21:18,992 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2053.19) for latency 9
2025-09-16 09:21:18,998 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 39 minutes, 13 seconds)
2025-09-16 09:22:59,350 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:23:08,530 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2142.53271 ± 383.524
2025-09-16 09:23:08,530 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2164.6829, 2551.25, 2680.775, 1976.4281, 1500.3519, 1794.6522, 2622.6604, 2122.8342, 1702.2198, 2309.4712]
2025-09-16 09:23:08,530 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:23:08,530 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2142.53) for latency 9
2025-09-16 09:23:08,534 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 37 minutes, 19 seconds)
2025-09-16 09:24:49,272 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:24:59,694 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2653.25269 ± 519.374
2025-09-16 09:24:59,694 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2669.7761, 2623.2017, 2653.8972, 2830.472, 2742.0413, 3000.6543, 2739.1868, 3057.3804, 3048.955, 1166.9635]
2025-09-16 09:24:59,694 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:24:59,694 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2653.25) for latency 9
2025-09-16 09:24:59,698 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 35 minutes, 59 seconds)
2025-09-16 09:26:40,141 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:26:49,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2135.49438 ± 704.263
2025-09-16 09:26:49,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1619.3312, 2475.1616, 3392.664, 2078.4817, 2763.1545, 1292.1558, 1857.3893, 1558.2771, 3018.6755, 1299.6528]
2025-09-16 09:26:49,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:26:49,343 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 34 minutes)
2025-09-16 09:28:29,611 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:28:38,823 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2410.66943 ± 695.347
2025-09-16 09:28:38,823 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2882.6335, 1986.8988, 1444.9742, 2893.0876, 1288.0018, 2403.881, 1784.1964, 3232.3284, 3064.8164, 3125.876]
2025-09-16 09:28:38,823 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:28:38,831 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 32 minutes, 10 seconds)
2025-09-16 09:30:19,829 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:30:30,265 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2507.72510 ± 537.736
2025-09-16 09:30:30,265 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2990.7185, 2865.7266, 3170.865, 2420.4119, 2783.1257, 2799.0671, 1846.0332, 2819.0857, 1527.4828, 1854.7347]
2025-09-16 09:30:30,266 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:30:30,272 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 30 minutes, 40 seconds)
2025-09-16 09:32:10,394 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:32:19,428 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2786.81006 ± 543.235
2025-09-16 09:32:19,428 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3130.0117, 1643.2146, 3307.228, 3283.2917, 2612.6533, 3189.4592, 1986.103, 3176.2869, 2875.4304, 2664.4214]
2025-09-16 09:32:19,428 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:32:19,428 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2786.81) for latency 9
2025-09-16 09:32:19,434 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 28 minutes, 44 seconds)
2025-09-16 09:33:58,817 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:34:07,886 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2549.74854 ± 648.062
2025-09-16 09:34:07,886 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1911.6401, 2968.144, 3142.3079, 3205.8342, 3282.8071, 1519.4406, 1968.7473, 2921.567, 1729.5687, 2847.4304]
2025-09-16 09:34:07,886 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:34:07,890 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 26 minutes, 11 seconds)
2025-09-16 09:35:46,521 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:35:55,461 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2374.81763 ± 745.900
2025-09-16 09:35:55,462 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3785.03, 3012.598, 2887.996, 1673.0054, 2954.4653, 1566.78, 1633.1785, 1609.8007, 1938.2317, 2687.0896]
2025-09-16 09:35:55,462 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:35:55,465 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 23 minutes, 48 seconds)
2025-09-16 09:37:36,421 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:37:46,791 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2527.63599 ± 901.010
2025-09-16 09:37:46,791 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3024.499, 3275.0444, 1552.5544, 1483.7181, 2816.2942, 1512.8904, 3356.8938, 1272.7905, 3347.128, 3634.5461]
2025-09-16 09:37:46,792 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:37:46,805 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 22 minutes, 28 seconds)
2025-09-16 09:39:25,961 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:39:36,386 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3389.57227 ± 213.022
2025-09-16 09:39:36,387 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3409.7402, 3551.4624, 3652.7996, 3074.3264, 3565.1135, 3281.354, 3384.038, 3341.9158, 2998.6755, 3636.2979]
2025-09-16 09:39:36,387 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:39:36,387 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (3389.57) for latency 9
2025-09-16 09:39:36,394 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 20 minutes, 10 seconds)
2025-09-16 09:41:15,000 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:41:25,456 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3084.48999 ± 779.946
2025-09-16 09:41:25,456 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3717.3965, 3440.6206, 3463.153, 2566.895, 3651.592, 3580.6548, 3451.8813, 1807.7172, 3656.6375, 1508.3542]
2025-09-16 09:41:25,456 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:41:25,463 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 18 minutes, 19 seconds)
2025-09-16 09:43:04,643 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:43:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2976.54639 ± 1000.894
2025-09-16 09:43:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1777.7318, 3909.849, 3963.3013, 3638.6184, 3847.9475, 3342.7827, 3975.735, 1831.6094, 1864.6711, 1613.2169]
2025-09-16 09:43:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:43:15,064 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 16 minutes, 47 seconds)
2025-09-16 09:44:54,863 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:45:03,853 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3387.75732 ± 541.463
2025-09-16 09:45:03,853 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3744.1267, 3336.6665, 2596.6458, 3430.0508, 3693.0913, 3683.2231, 3721.9004, 2129.012, 3729.3794, 3813.482]
2025-09-16 09:45:03,853 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:45:03,864 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 15 minutes, 16 seconds)
2025-09-16 09:46:42,710 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:46:53,103 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4661.34424 ± 71.361
2025-09-16 09:46:53,103 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4724.319, 4657.156, 4603.291, 4718.9043, 4510.8433, 4675.5186, 4692.1904, 4594.535, 4665.933, 4770.7515]
2025-09-16 09:46:53,103 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:46:53,103 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4661.34) for latency 9
2025-09-16 09:46:53,114 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 12 minutes, 56 seconds)
2025-09-16 09:48:33,245 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:48:42,400 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3756.10083 ± 742.613
2025-09-16 09:48:42,400 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3545.8057, 3728.2273, 3070.548, 4233.9873, 2274.7283, 3922.0347, 4499.6123, 4793.066, 3067.8657, 4425.1323]
2025-09-16 09:48:42,400 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:48:42,405 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 11 minutes, 2 seconds)
2025-09-16 09:50:21,686 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:50:30,860 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4434.64453 ± 192.035
2025-09-16 09:50:30,860 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4339.985, 4578.7847, 4105.9453, 4190.44, 4442.451, 4534.3086, 4461.336, 4705.8037, 4293.5044, 4693.8853]
2025-09-16 09:50:30,860 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:50:30,864 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 9 minutes, 4 seconds)
2025-09-16 09:52:10,563 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:52:21,029 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4168.15234 ± 796.610
2025-09-16 09:52:21,029 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4507.2915, 4155.0083, 4697.3906, 4007.793, 4851.717, 3911.6458, 4602.641, 4496.4385, 1937.5416, 4514.053]
2025-09-16 09:52:21,029 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:52:21,033 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 7 minutes, 23 seconds)
2025-09-16 09:54:00,814 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:54:09,908 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4690.20996 ± 165.747
2025-09-16 09:54:09,908 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4892.132, 4716.35, 4641.137, 4370.4683, 4753.719, 4465.105, 4902.916, 4760.5156, 4591.3804, 4808.3784]
2025-09-16 09:54:09,908 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:54:09,908 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4690.21) for latency 9
2025-09-16 09:54:09,912 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 5 minutes, 35 seconds)
2025-09-16 09:55:49,686 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:56:00,138 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4508.90430 ± 372.048
2025-09-16 09:56:00,138 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4623.5215, 3465.4412, 4696.205, 4267.758, 4718.322, 4674.446, 4734.725, 4556.6865, 4610.1963, 4741.743]
2025-09-16 09:56:00,138 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:56:00,156 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 3 minutes, 59 seconds)
2025-09-16 09:57:39,404 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:57:48,538 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4804.01709 ± 221.938
2025-09-16 09:57:48,538 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4372.3394, 4969.1064, 5023.7036, 4929.092, 4823.9404, 4669.7476, 4449.0215, 5043.851, 4851.049, 4908.32]
2025-09-16 09:57:48,538 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:57:48,538 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4804.02) for latency 9
2025-09-16 09:57:48,547 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 1 minute, 58 seconds)
2025-09-16 09:59:27,115 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 09:59:36,173 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4888.95020 ± 597.870
2025-09-16 09:59:36,173 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4872.596, 3151.462, 5300.146, 5111.9634, 5005.0186, 5249.7437, 5084.6436, 4809.0195, 5084.1133, 5220.791]
2025-09-16 09:59:36,173 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:59:36,173 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4888.95) for latency 9
2025-09-16 09:59:36,194 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 59 minutes, 58 seconds)
2025-09-16 10:01:16,170 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:01:26,583 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4983.06201 ± 136.024
2025-09-16 10:01:26,583 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4754.0225, 4983.842, 5065.004, 5108.519, 4713.2407, 5062.0903, 5090.9326, 4995.0635, 4942.152, 5115.7554]
2025-09-16 10:01:26,583 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:01:26,583 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4983.06) for latency 9
2025-09-16 10:01:26,613 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 58 minutes, 12 seconds)
2025-09-16 10:03:05,941 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:03:15,015 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5183.75928 ± 77.243
2025-09-16 10:03:15,015 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5224.4604, 5203.4146, 5225.8296, 5254.6113, 5111.903, 5164.82, 5194.0566, 4984.8735, 5233.6626, 5239.961]
2025-09-16 10:03:15,015 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:03:15,015 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5183.76) for latency 9
2025-09-16 10:03:15,035 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 56 minutes, 17 seconds)
2025-09-16 10:04:53,274 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:05:02,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5003.21973 ± 98.599
2025-09-16 10:05:02,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5061.335, 5013.2925, 4777.043, 5056.6865, 5109.0107, 4976.3115, 4894.755, 5124.0527, 5035.8774, 4983.8345]
2025-09-16 10:05:02,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:05:02,343 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 53 minutes, 51 seconds)
2025-09-16 10:06:41,027 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:06:50,054 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4994.94922 ± 400.094
2025-09-16 10:06:50,055 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5349.034, 3887.8447, 5168.475, 5181.551, 5010.9043, 5218.164, 4910.538, 4814.206, 5144.547, 5264.2256]
2025-09-16 10:06:50,055 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:06:50,062 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 51 minutes, 54 seconds)
2025-09-16 10:08:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:08:35,562 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5334.07666 ± 157.088
2025-09-16 10:08:35,562 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5520.94, 5375.554, 5430.396, 5227.4375, 5319.3975, 5142.046, 5456.142, 5455.3584, 5419.293, 4994.198]
2025-09-16 10:08:35,562 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:08:35,562 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5334.08) for latency 9
2025-09-16 10:08:35,570 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 49 minutes, 40 seconds)
2025-09-16 10:10:13,178 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:10:22,180 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4800.37842 ± 1012.046
2025-09-16 10:10:22,180 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5257.3667, 1952.7014, 4140.6045, 5043.282, 5072.855, 5356.761, 5073.959, 5402.5337, 5336.8003, 5366.923]
2025-09-16 10:10:22,180 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:10:22,189 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 47 minutes, 6 seconds)
2025-09-16 10:11:59,729 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:12:08,671 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5053.19336 ± 680.520
2025-09-16 10:12:08,671 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5390.7617, 5350.797, 5200.742, 5399.116, 5150.3613, 5103.331, 5327.002, 5175.605, 5397.601, 3036.6123]
2025-09-16 10:12:08,671 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:12:08,690 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 44 minutes, 57 seconds)
2025-09-16 10:13:46,132 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:13:55,026 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5361.05371 ± 69.500
2025-09-16 10:13:55,026 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5434.4546, 5213.3423, 5362.206, 5354.8647, 5365.8203, 5358.413, 5313.624, 5339.8853, 5371.927, 5496.002]
2025-09-16 10:13:55,026 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:13:55,026 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5361.05) for latency 9
2025-09-16 10:13:55,031 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 42 minutes, 59 seconds)
2025-09-16 10:15:32,430 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:15:42,601 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5045.47656 ± 625.032
2025-09-16 10:15:42,601 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5478.979, 5265.794, 5513.666, 5192.22, 4170.7163, 5429.9966, 5379.223, 5250.7593, 3522.153, 5251.256]
2025-09-16 10:15:42,601 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:15:42,625 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 41 minutes, 11 seconds)
2025-09-16 10:17:19,715 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:17:28,779 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5419.74316 ± 59.765
2025-09-16 10:17:28,779 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5431.6763, 5446.111, 5395.751, 5566.796, 5347.647, 5365.614, 5385.6167, 5457.088, 5425.652, 5375.484]
2025-09-16 10:17:28,779 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:17:28,779 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5419.74) for latency 9
2025-09-16 10:17:28,784 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 39 minutes, 31 seconds)
2025-09-16 10:19:05,826 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:19:15,975 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5069.10645 ± 663.461
2025-09-16 10:19:15,975 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5334.74, 5396.4727, 5358.602, 5416.9727, 3411.221, 5430.786, 5420.0464, 5370.818, 5389.505, 4161.902]
2025-09-16 10:19:15,976 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:19:15,980 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 37 minutes, 51 seconds)
2025-09-16 10:20:52,907 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:21:03,033 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5511.17188 ± 157.158
2025-09-16 10:21:03,033 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5516.085, 5595.2476, 5539.363, 5676.2407, 5128.5713, 5553.832, 5583.4453, 5596.2563, 5613.357, 5309.323]
2025-09-16 10:21:03,033 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:21:03,033 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5511.17) for latency 9
2025-09-16 10:21:03,041 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 36 minutes, 10 seconds)
2025-09-16 10:22:39,125 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:22:49,239 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5534.84619 ± 102.191
2025-09-16 10:22:49,239 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5597.857, 5510.1562, 5658.3857, 5529.8364, 5560.0327, 5589.118, 5265.1074, 5608.9985, 5483.6855, 5545.2803]
2025-09-16 10:22:49,239 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:22:49,239 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5534.85) for latency 9
2025-09-16 10:22:49,265 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 34 minutes, 22 seconds)
2025-09-16 10:24:25,423 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:24:34,367 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5484.86133 ± 27.054
2025-09-16 10:24:34,367 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5482.1396, 5546.1533, 5479.242, 5458.3203, 5452.705, 5510.438, 5500.4624, 5456.933, 5474.8125, 5487.4077]
2025-09-16 10:24:34,367 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:24:34,373 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 32 minutes, 10 seconds)
2025-09-16 10:26:11,368 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:26:20,252 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5417.82275 ± 118.086
2025-09-16 10:26:20,252 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5535.782, 5478.787, 5398.71, 5306.6494, 5515.4297, 5377.1177, 5133.4834, 5541.1196, 5433.1035, 5458.0474]
2025-09-16 10:26:20,252 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:26:20,261 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 30 minutes, 21 seconds)
2025-09-16 10:27:57,495 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:28:06,456 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5440.96777 ± 169.104
2025-09-16 10:28:06,456 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5570.799, 5388.9175, 5552.933, 5152.9287, 5610.3, 5616.293, 5167.8643, 5543.042, 5294.585, 5512.016]
2025-09-16 10:28:06,456 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:28:06,461 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 28 minutes, 24 seconds)
2025-09-16 10:29:42,981 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:29:51,887 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5567.55957 ± 88.081
2025-09-16 10:29:51,887 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5571.424, 5599.1665, 5639.9326, 5603.4775, 5611.455, 5463.4736, 5659.617, 5669.1694, 5434.0205, 5423.8647]
2025-09-16 10:29:51,887 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:29:51,887 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5567.56) for latency 9
2025-09-16 10:29:51,905 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 26 minutes, 22 seconds)
2025-09-16 10:31:29,120 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:31:39,180 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5578.53027 ± 65.735
2025-09-16 10:31:39,180 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5566.2896, 5473.509, 5526.235, 5652.594, 5553.7563, 5585.502, 5717.568, 5604.5713, 5582.6255, 5522.652]
2025-09-16 10:31:39,180 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:31:39,180 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5578.53) for latency 9
2025-09-16 10:31:39,208 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 24 minutes, 47 seconds)
2025-09-16 10:33:15,972 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:33:24,831 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5552.62256 ± 168.320
2025-09-16 10:33:24,832 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5655.8364, 5602.893, 5706.164, 5634.7295, 5671.751, 5417.8467, 5504.429, 5505.21, 5700.29, 5127.08]
2025-09-16 10:33:24,832 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:33:24,861 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 23 minutes, 6 seconds)
2025-09-16 10:35:01,858 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:35:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5635.30176 ± 125.467
2025-09-16 10:35:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5747.8193, 5699.2144, 5616.5835, 5643.9053, 5615.507, 5610.7095, 5296.0767, 5734.4526, 5755.074, 5633.679]
2025-09-16 10:35:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:35:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5635.30) for latency 9
2025-09-16 10:35:10,701 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 21 minutes, 20 seconds)
2025-09-16 10:36:47,357 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:36:56,237 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5503.04932 ± 164.061
2025-09-16 10:36:56,237 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5415.169, 5158.4707, 5584.3633, 5721.864, 5651.48, 5591.325, 5620.43, 5565.93, 5352.249, 5369.2095]
2025-09-16 10:36:56,237 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:36:56,247 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 19 minutes, 28 seconds)
2025-09-16 10:38:33,710 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:38:42,457 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5201.32178 ± 1324.919
2025-09-16 10:38:42,457 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5625.025, 5714.9976, 5630.467, 5640.1616, 5652.8174, 5669.8994, 5685.294, 5489.7705, 1230.2737, 5674.51]
2025-09-16 10:38:42,457 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:38:42,464 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 17 minutes, 48 seconds)
2025-09-16 10:40:19,467 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:40:29,502 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5673.35938 ± 92.967
2025-09-16 10:40:29,502 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5626.621, 5695.2817, 5591.1196, 5792.1504, 5611.137, 5689.553, 5751.0327, 5476.357, 5782.6685, 5717.6763]
2025-09-16 10:40:29,502 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:40:29,502 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5673.36) for latency 9
2025-09-16 10:40:29,507 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 16 minutes)
2025-09-16 10:42:06,744 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:42:16,769 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5106.60889 ± 981.030
2025-09-16 10:42:16,769 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5715.296, 5594.193, 4017.3271, 5628.9634, 2541.0093, 5588.225, 5738.572, 5217.146, 5490.951, 5534.401]
2025-09-16 10:42:16,769 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:42:16,776 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 14 minutes, 28 seconds)
2025-09-16 10:43:53,138 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:44:02,025 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5692.97949 ± 87.792
2025-09-16 10:44:02,025 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5728.384, 5743.932, 5688.675, 5718.859, 5760.475, 5788.7446, 5650.523, 5466.537, 5638.317, 5745.347]
2025-09-16 10:44:02,025 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:44:02,025 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5692.98) for latency 9
2025-09-16 10:44:02,030 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 12 minutes, 36 seconds)
2025-09-16 10:45:37,909 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:45:47,971 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4894.62695 ± 1385.636
2025-09-16 10:45:47,971 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5752.3154, 5326.046, 5737.8403, 5704.345, 2655.3745, 1724.3188, 5563.2954, 5753.6587, 5695.9653, 5033.1094]
2025-09-16 10:45:47,972 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:45:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 10 minutes, 53 seconds)
2025-09-16 10:47:23,806 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:47:33,836 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5573.71191 ± 248.941
2025-09-16 10:47:33,836 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5065.341, 5108.351, 5640.0537, 5728.749, 5675.073, 5792.277, 5697.1367, 5732.7124, 5705.5835, 5591.842]
2025-09-16 10:47:33,836 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:47:33,844 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 9 minutes, 4 seconds)
2025-09-16 10:49:11,536 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:49:21,636 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5640.24023 ± 74.841
2025-09-16 10:49:21,636 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5604.956, 5538.2935, 5546.903, 5541.645, 5701.129, 5706.3257, 5700.4956, 5755.2017, 5635.616, 5671.8374]
2025-09-16 10:49:21,637 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:49:21,645 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 7 minutes, 24 seconds)
2025-09-16 10:50:58,753 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:51:08,883 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5417.33594 ± 492.049
2025-09-16 10:51:08,883 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5552.6978, 5639.075, 5563.6333, 5640.629, 5728.751, 4091.4087, 5730.739, 4940.608, 5637.8213, 5647.997]
2025-09-16 10:51:08,883 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:51:08,892 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 5 minutes, 37 seconds)
2025-09-16 10:52:45,530 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:52:54,471 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5201.56006 ± 958.569
2025-09-16 10:52:54,471 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5602.195, 5115.18, 5424.6826, 5731.7256, 2406.765, 5712.0845, 5627.233, 5495.5044, 5787.0415, 5113.193]
2025-09-16 10:52:54,471 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:52:54,491 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes, 53 seconds)
2025-09-16 10:54:29,532 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:54:38,315 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4927.59131 ± 1437.465
2025-09-16 10:54:38,316 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5740.4683, 5491.6465, 5425.131, 5721.0654, 1325.694, 5489.5923, 5609.1514, 5755.548, 2994.9158, 5722.702]
2025-09-16 10:54:38,316 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:54:38,321 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 1 minute, 52 seconds)
2025-09-16 10:56:14,930 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:56:23,819 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5525.81787 ± 243.011
2025-09-16 10:56:23,819 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5411.304, 5631.1445, 5643.926, 5789.7236, 5638.423, 5475.059, 5687.2305, 4878.368, 5653.7773, 5449.222]
2025-09-16 10:56:23,819 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:56:23,836 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 3 seconds)
2025-09-16 10:58:00,181 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:58:10,207 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5612.26074 ± 445.266
2025-09-16 10:58:10,207 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5862.181, 5718.536, 5759.957, 5754.5967, 5746.7104, 5731.599, 5864.55, 5744.6616, 5651.105, 4288.7104]
2025-09-16 10:58:10,207 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:58:10,212 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 58 minutes, 8 seconds)
2025-09-16 10:59:47,378 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 10:59:56,188 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5220.17969 ± 797.079
2025-09-16 10:59:56,188 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5588.681, 5718.395, 5403.534, 5570.85, 3116.1265, 4352.051, 5526.377, 5695.6133, 5628.665, 5601.506]
2025-09-16 10:59:56,188 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:59:56,193 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 56 minutes, 14 seconds)
2025-09-16 11:01:32,346 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:01:41,179 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3883.56006 ± 1916.902
2025-09-16 11:01:41,180 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1322.8549, 1860.6803, 4707.3022, 1767.3937, 5653.7715, 5244.5835, 5799.451, 1327.4303, 5674.8916, 5477.2437]
2025-09-16 11:01:41,180 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:01:41,188 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 25 seconds)
2025-09-16 11:03:18,400 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:03:27,219 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5674.02783 ± 66.703
2025-09-16 11:03:27,219 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5672.2437, 5662.711, 5571.0044, 5683.078, 5766.297, 5636.8994, 5710.929, 5557.992, 5752.9526, 5726.1685]
2025-09-16 11:03:27,219 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:03:27,225 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 53 seconds)
2025-09-16 11:05:03,820 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:05:12,505 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5591.49512 ± 133.466
2025-09-16 11:05:12,505 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5675.283, 5400.536, 5772.3477, 5721.549, 5576.159, 5683.27, 5472.605, 5530.0493, 5705.0884, 5378.063]
2025-09-16 11:05:12,505 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:05:12,511 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 51 minutes, 6 seconds)
2025-09-16 11:06:49,247 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:06:59,267 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4518.25488 ± 1445.027
2025-09-16 11:06:59,267 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2102.2947, 5467.0596, 1329.1337, 5422.8237, 5190.234, 5150.863, 5020.4497, 5815.0103, 5124.8457, 4559.8384]
2025-09-16 11:06:59,267 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:06:59,274 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 49 minutes, 22 seconds)
2025-09-16 11:08:36,707 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:08:46,750 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4971.98926 ± 1336.115
2025-09-16 11:08:46,750 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1299.4368, 5388.268, 4594.5186, 5599.724, 5619.6606, 5760.017, 5777.436, 4140.9507, 5744.6685, 5795.212]
2025-09-16 11:08:46,750 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:08:46,756 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 45 seconds)
2025-09-16 11:10:22,179 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:10:31,049 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5638.68457 ± 110.076
2025-09-16 11:10:31,049 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5597.4736, 5703.1357, 5593.9385, 5722.2036, 5673.7905, 5689.609, 5656.8906, 5337.7476, 5670.345, 5741.709]
2025-09-16 11:10:31,049 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:10:31,066 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 55 seconds)
2025-09-16 11:12:08,800 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:12:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5724.12061 ± 53.531
2025-09-16 11:12:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5741.382, 5753.89, 5615.6997, 5720.4067, 5801.9683, 5712.921, 5728.4985, 5727.4604, 5650.7715, 5788.2056]
2025-09-16 11:12:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:12:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5724.12) for latency 9
2025-09-16 11:12:18,905 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 44 minutes, 18 seconds)
2025-09-16 11:13:55,969 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:14:04,895 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4342.50342 ± 2010.652
2025-09-16 11:14:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5732.6426, 5381.6416, 5680.1274, 1280.6361, 5514.4473, 5700.579, 5804.603, 1275.2439, 1273.8127, 5781.3022]
2025-09-16 11:14:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:14:04,903 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 35 seconds)
2025-09-16 11:15:40,420 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:15:49,320 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5778.10693 ± 43.850
2025-09-16 11:15:49,320 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5827.5117, 5846.1436, 5793.9116, 5758.3433, 5685.7856, 5736.9775, 5812.1226, 5774.0728, 5779.4155, 5766.788]
2025-09-16 11:15:49,320 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:15:49,320 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5778.11) for latency 9
2025-09-16 11:15:49,335 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 38 seconds)
2025-09-16 11:17:26,676 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:17:35,451 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5723.48145 ± 46.049
2025-09-16 11:17:35,451 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5768.369, 5691.552, 5700.3066, 5659.33, 5692.4194, 5777.8022, 5737.75, 5801.6284, 5737.7505, 5667.904]
2025-09-16 11:17:35,451 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:17:35,462 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 46 seconds)
2025-09-16 11:19:13,473 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:19:22,387 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5703.03613 ± 41.531
2025-09-16 11:19:22,387 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5653.9424, 5711.984, 5655.6133, 5698.7427, 5704.249, 5690.881, 5683.264, 5806.643, 5689.7607, 5735.2783]
2025-09-16 11:19:22,387 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:19:22,406 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 11 seconds)
2025-09-16 11:20:59,436 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:21:08,547 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5381.73682 ± 986.700
2025-09-16 11:21:08,547 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5670.0283, 5708.2183, 5781.2314, 5674.2793, 5773.5527, 5711.929, 2424.0352, 5698.178, 5651.415, 5724.5005]
2025-09-16 11:21:08,547 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:21:08,554 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 18 seconds)
2025-09-16 11:22:45,106 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:22:53,890 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5710.59570 ± 68.254
2025-09-16 11:22:53,890 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5778.2617, 5676.0864, 5768.4683, 5538.8647, 5702.3706, 5735.8525, 5697.472, 5677.739, 5767.2314, 5763.613]
2025-09-16 11:22:53,890 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:22:53,897 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 30 seconds)
2025-09-16 11:24:29,944 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:24:40,025 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5690.23926 ± 54.787
2025-09-16 11:24:40,025 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5738.0874, 5551.0522, 5738.1943, 5731.464, 5676.34, 5684.4165, 5648.368, 5714.9863, 5732.1196, 5687.364]
2025-09-16 11:24:40,025 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:24:40,033 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 50 seconds)
2025-09-16 11:26:16,949 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:26:27,044 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5720.15332 ± 135.212
2025-09-16 11:26:27,045 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5759.1333, 5752.217, 5751.094, 5871.6226, 5540.196, 5740.9834, 5780.6616, 5779.46, 5829.238, 5396.924]
2025-09-16 11:26:27,045 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:26:27,071 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 7 seconds)
2025-09-16 11:28:03,300 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:28:12,078 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5714.36377 ± 94.842
2025-09-16 11:28:12,079 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5676.3184, 5725.717, 5780.044, 5804.8833, 5529.421, 5560.929, 5743.2363, 5834.7593, 5721.6274, 5766.6987]
2025-09-16 11:28:12,079 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:28:12,085 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 14 seconds)
2025-09-16 11:29:49,149 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:29:58,088 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5669.83789 ± 77.946
2025-09-16 11:29:58,088 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5733.007, 5688.409, 5587.1846, 5612.7974, 5613.2793, 5516.4263, 5715.7554, 5766.14, 5720.09, 5745.2896]
2025-09-16 11:29:58,088 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:29:58,095 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 28 seconds)
2025-09-16 11:31:34,020 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:31:42,894 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5696.71387 ± 124.615
2025-09-16 11:31:42,895 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5748.054, 5723.9355, 5503.79, 5745.869, 5760.035, 5699.7583, 5742.773, 5799.5576, 5416.38, 5826.9883]
2025-09-16 11:31:42,895 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:31:42,902 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 41 seconds)
2025-09-16 11:33:19,117 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:33:27,982 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5625.64160 ± 83.420
2025-09-16 11:33:27,982 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5682.115, 5578.617, 5474.2734, 5655.8994, 5751.232, 5646.3228, 5713.5366, 5578.762, 5663.254, 5512.4067]
2025-09-16 11:33:27,982 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:33:27,989 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 52 seconds)
2025-09-16 11:35:04,382 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:35:13,267 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5193.78271 ± 1293.570
2025-09-16 11:35:13,267 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5673.8516, 1373.5171, 5611.2964, 5791.256, 5651.4673, 5012.5913, 5469.762, 5862.316, 5714.127, 5777.6406]
2025-09-16 11:35:13,267 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:35:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 2 seconds)
2025-09-16 11:36:49,027 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:36:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5453.32080 ± 376.654
2025-09-16 11:36:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5703.201, 5635.2925, 5654.122, 5557.6206, 5322.0054, 5667.802, 4963.3384, 5637.1133, 4562.8125, 5829.904]
2025-09-16 11:36:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:36:57,905 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 16 seconds)
2025-09-16 11:38:33,932 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:38:42,762 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5632.63525 ± 233.531
2025-09-16 11:38:42,762 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5718.3496, 5786.7524, 4970.615, 5715.853, 5513.8926, 5776.292, 5696.2686, 5716.371, 5784.466, 5647.493]
2025-09-16 11:38:42,762 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:38:42,771 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 29 seconds)
2025-09-16 11:40:17,877 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:40:26,556 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5431.17676 ± 465.843
2025-09-16 11:40:26,556 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5754.319, 5684.4263, 5442.6953, 5677.2075, 5357.0537, 5763.6055, 4123.544, 5453.283, 5308.954, 5746.6846]
2025-09-16 11:40:26,556 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:40:26,564 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 42 seconds)
2025-09-16 11:42:03,765 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:42:12,542 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5772.61719 ± 65.780
2025-09-16 11:42:12,542 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5783.517, 5789.881, 5798.456, 5857.802, 5757.6606, 5771.7925, 5624.1113, 5820.6143, 5690.156, 5832.1807]
2025-09-16 11:42:12,542 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:42:12,548 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 59 seconds)
2025-09-16 11:43:49,262 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:43:57,963 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5408.88232 ± 656.558
2025-09-16 11:43:57,963 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5663.222, 5853.009, 5760.1743, 3565.4192, 5749.027, 5673.1406, 5742.935, 5182.7246, 5745.727, 5153.4453]
2025-09-16 11:43:57,963 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:43:57,979 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 14 seconds)
2025-09-16 11:45:35,613 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:45:45,633 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5706.20361 ± 112.428
2025-09-16 11:45:45,633 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5781.9966, 5687.7383, 5744.295, 5829.495, 5734.232, 5397.4106, 5713.4473, 5786.3096, 5674.3804, 5712.734]
2025-09-16 11:45:45,633 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:45:45,647 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 33 seconds)
2025-09-16 11:47:21,359 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:47:31,389 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5329.01807 ± 1352.300
2025-09-16 11:47:31,389 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5739.71, 5806.737, 5802.9404, 5833.3877, 1275.5206, 5776.5444, 5721.0347, 5659.2095, 5848.9907, 5826.1016]
2025-09-16 11:47:31,389 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:47:31,401 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 48 seconds)
2025-09-16 11:49:09,045 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:49:17,883 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5351.27881 ± 989.234
2025-09-16 11:49:17,883 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5627.782, 5761.6196, 2398.165, 5491.4546, 5692.9067, 5741.072, 5745.2983, 5786.399, 5517.139, 5750.952]
2025-09-16 11:49:17,883 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:49:17,891 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 5 seconds)
2025-09-16 11:50:56,112 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:51:04,855 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5627.99170 ± 184.159
2025-09-16 11:51:04,856 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5402.543, 5527.1987, 5578.9077, 5682.559, 5845.764, 5599.253, 5828.261, 5882.877, 5649.502, 5283.0513]
2025-09-16 11:51:04,856 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:51:04,864 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 19 seconds)
2025-09-16 11:52:41,987 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:52:52,003 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5758.31396 ± 59.808
2025-09-16 11:52:52,004 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5762.9414, 5780.3745, 5727.036, 5752.874, 5819.4727, 5742.104, 5744.651, 5614.419, 5789.3125, 5849.953]
2025-09-16 11:52:52,004 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:52:52,013 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 33 seconds)
2025-09-16 11:54:30,392 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:54:40,432 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5670.95654 ± 88.922
2025-09-16 11:54:40,432 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5675.817, 5809.925, 5534.7534, 5632.087, 5670.87, 5768.5044, 5708.013, 5521.6963, 5641.759, 5746.145]
2025-09-16 11:54:40,433 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:54:40,439 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 46 seconds)
2025-09-16 11:56:19,149 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:56:28,009 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5203.94434 ± 1148.383
2025-09-16 11:56:28,009 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5748.8945, 1805.8667, 5701.179, 5817.29, 5640.1133, 5631.7446, 5743.371, 5213.5576, 5399.269, 5338.1616]
2025-09-16 11:56:28,009 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:56:28,018 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1251 [DEBUG]: Training session finished
