2025-09-16 09:15:59,376 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.000-delay_24
2025-09-16 09:15:59,376 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.000-delay_24
2025-09-16 09:15:59,376 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x148ce795c950>}
2025-09-16 09:15:59,376 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-16 09:15:59,381 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-16 09:15:59,558 baseline-bpql-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-16 09:15:59,558 baseline-bpql-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 09:16:00,977 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-16 09:16:00,977 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-16 09:17:44,936 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:17:59,352 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -416.82138 ± 35.882
2025-09-16 09:17:59,353 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-439.85245, -407.48676, -447.3436, -398.0126, -366.15628, -439.0099, -344.60742, -465.1383, -429.84842, -430.7578]
2025-09-16 09:17:59,353 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:17:59,353 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-416.82) for latency 24
2025-09-16 09:17:59,366 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 15 minutes, 20 seconds)
2025-09-16 09:19:47,739 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:20:00,637 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -280.22229 ± 57.979
2025-09-16 09:20:00,637 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-200.69893, -250.24994, -307.52032, -335.3188, -236.95975, -259.41257, -368.5808, -366.49753, -266.54504, -210.43924]
2025-09-16 09:20:00,637 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:20:00,637 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-280.22) for latency 24
2025-09-16 09:20:00,646 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 15 minutes, 43 seconds)
2025-09-16 09:21:49,454 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:22:02,442 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -200.85297 ± 70.635
2025-09-16 09:22:02,443 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-253.58247, -312.38425, -261.55243, -214.22264, -132.86206, -225.25748, -241.95467, -152.37128, -151.887, -62.45541]
2025-09-16 09:22:02,443 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:22:02,443 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-200.85) for latency 24
2025-09-16 09:22:02,459 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 14 minutes, 47 seconds)
2025-09-16 09:23:50,835 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:24:03,802 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -108.84422 ± 91.973
2025-09-16 09:24:03,802 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-152.50471, -97.498215, -128.17307, 115.72642, -160.94507, -171.18398, -237.64845, -72.44414, -149.58502, -34.18604]
2025-09-16 09:24:03,802 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:24:03,802 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-108.84) for latency 24
2025-09-16 09:24:03,823 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 13 minutes, 8 seconds)
2025-09-16 09:25:52,691 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:26:05,638 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -1.20639 ± 73.254
2025-09-16 09:26:05,638 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [74.26414, 20.77285, -48.975887, -144.89775, 13.427544, -51.21463, 39.296837, 128.09508, 15.071679, -57.903778]
2025-09-16 09:26:05,638 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:26:05,638 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-1.21) for latency 24
2025-09-16 09:26:05,645 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 11 minutes, 28 seconds)
2025-09-16 09:27:54,552 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:28:07,406 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 148.18921 ± 196.138
2025-09-16 09:28:07,406 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [263.9873, 326.18323, 69.84022, 456.68863, 358.00522, -108.20888, 48.238567, 29.820713, -169.72345, 207.06061]
2025-09-16 09:28:07,406 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:28:07,406 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (148.19) for latency 24
2025-09-16 09:28:07,413 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 10 minutes, 31 seconds)
2025-09-16 09:29:56,559 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:30:09,522 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 479.86002 ± 305.989
2025-09-16 09:30:09,522 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [834.22296, 63.26917, 690.5315, 162.14496, 750.0137, 400.75012, 217.27461, 101.48931, 723.13135, 855.77264]
2025-09-16 09:30:09,522 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:30:09,522 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (479.86) for latency 24
2025-09-16 09:30:09,526 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 8 minutes, 45 seconds)
2025-09-16 09:31:58,050 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:32:12,499 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 666.61194 ± 189.467
2025-09-16 09:32:12,499 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [478.497, 308.1265, 840.4505, 484.12717, 871.95575, 776.5255, 818.0725, 692.1597, 537.27155, 858.93353]
2025-09-16 09:32:12,499 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:32:12,499 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (666.61) for latency 24
2025-09-16 09:32:12,523 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 7 minutes, 5 seconds)
2025-09-16 09:34:00,874 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:34:13,636 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 799.62048 ± 164.936
2025-09-16 09:34:13,636 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [781.7074, 437.82996, 937.82465, 683.4304, 1062.2036, 705.38824, 911.03644, 723.8118, 890.9721, 862.0001]
2025-09-16 09:34:13,636 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:34:13,636 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (799.62) for latency 24
2025-09-16 09:34:13,653 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 4 minutes, 58 seconds)
2025-09-16 09:36:03,028 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:36:15,964 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 782.98663 ± 178.192
2025-09-16 09:36:15,964 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [503.15283, 685.25903, 673.24756, 974.8564, 794.3669, 502.2524, 1012.8497, 937.1306, 940.1949, 806.5562]
2025-09-16 09:36:15,964 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:36:15,974 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 3 minutes, 5 seconds)
2025-09-16 09:38:05,877 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:38:20,271 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1008.93018 ± 133.580
2025-09-16 09:38:20,271 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [795.5366, 1115.1372, 953.0808, 743.64874, 989.85333, 1060.6589, 1051.5618, 1117.2837, 1103.046, 1159.4952]
2025-09-16 09:38:20,271 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:38:20,271 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1008.93) for latency 24
2025-09-16 09:38:20,276 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 1 minute, 48 seconds)
2025-09-16 09:40:07,098 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:40:20,052 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1124.39954 ± 119.104
2025-09-16 09:40:20,052 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1058.4246, 1060.171, 1157.8981, 1115.5155, 1245.7297, 1253.0233, 1032.7744, 1343.0603, 1051.4493, 925.9479]
2025-09-16 09:40:20,052 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:40:20,052 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1124.40) for latency 24
2025-09-16 09:40:20,060 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 59 minutes, 5 seconds)
2025-09-16 09:42:08,445 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:42:21,106 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1062.91797 ± 409.478
2025-09-16 09:42:21,107 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1364.8474, 1186.8438, 1263.9131, 1043.9103, 1159.5778, 1155.6355, -132.30264, 1145.9856, 1335.4136, 1105.3557]
2025-09-16 09:42:21,107 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:42:21,112 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 56 minutes, 29 seconds)
2025-09-16 09:44:09,473 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:44:22,085 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1292.75171 ± 149.478
2025-09-16 09:44:22,086 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1265.2747, 1338.9299, 1177.962, 1225.6797, 1325.6155, 1681.917, 1245.8876, 1204.0074, 1103.2999, 1358.9436]
2025-09-16 09:44:22,086 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:44:22,086 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1292.75) for latency 24
2025-09-16 09:44:22,093 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 54 minutes, 25 seconds)
2025-09-16 09:46:09,024 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:46:23,185 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1334.85754 ± 163.105
2025-09-16 09:46:23,185 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1213.5839, 1259.8021, 1097.1815, 1461.4131, 1273.9738, 1518.1268, 1113.7205, 1350.8379, 1602.7673, 1457.1676]
2025-09-16 09:46:23,185 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:46:23,185 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1334.86) for latency 24
2025-09-16 09:46:23,200 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 52 minutes, 2 seconds)
2025-09-16 09:48:11,429 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:48:24,116 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1335.64453 ± 166.540
2025-09-16 09:48:24,116 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1240.1694, 1707.9324, 1359.6554, 1259.5797, 1256.6156, 1218.4222, 1382.7732, 1561.8613, 1138.1416, 1231.2936]
2025-09-16 09:48:24,116 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:48:24,116 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1335.64) for latency 24
2025-09-16 09:48:24,119 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 49 minutes, 4 seconds)
2025-09-16 09:50:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:50:25,768 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1366.97876 ± 118.703
2025-09-16 09:50:25,768 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1337.1925, 1549.9462, 1267.558, 1268.2373, 1384.7412, 1379.7412, 1434.9585, 1580.3873, 1251.3625, 1215.6643]
2025-09-16 09:50:25,769 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:50:25,769 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1366.98) for latency 24
2025-09-16 09:50:25,778 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 47 minutes, 34 seconds)
2025-09-16 09:52:13,186 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:52:25,901 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1358.49976 ± 211.680
2025-09-16 09:52:25,901 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1289.7217, 1180.9886, 1358.8942, 1963.9752, 1253.8643, 1273.5005, 1332.5582, 1416.4767, 1221.195, 1293.8234]
2025-09-16 09:52:25,901 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:52:25,907 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 45 minutes, 18 seconds)
2025-09-16 09:54:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:54:26,083 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1472.94617 ± 306.522
2025-09-16 09:54:26,084 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2151.1506, 1170.9414, 1619.8722, 1181.6697, 1176.3875, 1252.4719, 1640.0508, 1782.7538, 1337.5825, 1416.5813]
2025-09-16 09:54:26,084 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:54:26,084 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1472.95) for latency 24
2025-09-16 09:54:26,089 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 43 minutes, 4 seconds)
2025-09-16 09:56:12,169 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:56:26,235 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1428.16772 ± 190.306
2025-09-16 09:56:26,235 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1320.1816, 1578.5225, 1263.7726, 1316.3398, 1497.5793, 1871.4019, 1387.7157, 1186.6868, 1546.2241, 1313.2526]
2025-09-16 09:56:26,235 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:56:26,254 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 40 minutes, 48 seconds)
2025-09-16 09:58:12,830 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 09:58:25,534 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1553.47485 ± 276.444
2025-09-16 09:58:25,534 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1556.1763, 1844.7369, 1297.1338, 1469.5842, 1524.0708, 1510.4718, 1235.8811, 1334.5597, 2227.8296, 1534.3048]
2025-09-16 09:58:25,534 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:58:25,534 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1553.47) for latency 24
2025-09-16 09:58:25,541 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 38 minutes, 22 seconds)
2025-09-16 10:00:13,201 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:00:27,250 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1503.54810 ± 279.829
2025-09-16 10:00:27,250 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1224.5188, 1269.2885, 1369.0856, 1685.0874, 1232.2225, 1334.8169, 1303.9385, 2044.2565, 1734.8185, 1837.4479]
2025-09-16 10:00:27,250 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:00:27,292 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 36 minutes, 23 seconds)
2025-09-16 10:02:15,053 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:02:27,653 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1790.14319 ± 629.805
2025-09-16 10:02:27,653 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1531.6161, 3250.658, 1318.2341, 1992.2687, 1474.0035, 856.26984, 2065.478, 1349.453, 2300.2783, 1763.1726]
2025-09-16 10:02:27,653 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:02:27,653 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1790.14) for latency 24
2025-09-16 10:02:27,661 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 27 seconds)
2025-09-16 10:04:12,817 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:04:26,904 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1760.76978 ± 450.267
2025-09-16 10:04:26,905 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1298.0748, 2658.4158, 1552.9012, 1750.9813, 2302.5876, 1418.7383, 2116.2373, 1893.0647, 1323.5709, 1293.125]
2025-09-16 10:04:26,905 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:04:26,911 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 12 seconds)
2025-09-16 10:06:13,530 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:06:26,118 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2312.53271 ± 748.814
2025-09-16 10:06:26,118 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3098.4944, 1743.3547, 1345.2571, 2257.9666, 3391.0022, 1662.8326, 2320.3652, 3160.0576, 1264.1067, 2881.8906]
2025-09-16 10:06:26,118 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:06:26,118 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2312.53) for latency 24
2025-09-16 10:06:26,121 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 58 seconds)
2025-09-16 10:08:11,933 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:08:24,483 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1900.86560 ± 482.383
2025-09-16 10:08:24,484 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2057.0078, 1424.4802, 1371.1869, 1573.4877, 1198.4055, 2625.1423, 1765.8282, 2251.9304, 2161.9382, 2579.2502]
2025-09-16 10:08:24,484 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:08:24,490 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 27 minutes, 44 seconds)
2025-09-16 10:10:10,998 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:10:23,407 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1744.76733 ± 553.171
2025-09-16 10:10:23,407 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1424.2435, 3030.2454, 1528.1813, 2565.6304, 1497.394, 1806.6353, 1274.6903, 1333.2726, 1530.1779, 1457.2026]
2025-09-16 10:10:23,407 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:10:23,412 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 25 minutes, 3 seconds)
2025-09-16 10:12:09,354 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:12:21,940 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1841.17053 ± 459.366
2025-09-16 10:12:21,940 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2315.5442, 1422.7451, 2128.4395, 1475.4459, 1551.3627, 1432.6605, 2917.5212, 1554.9338, 1801.4515, 1811.5994]
2025-09-16 10:12:21,940 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:12:21,945 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 37 seconds)
2025-09-16 10:14:07,460 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:14:20,086 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2166.98145 ± 1057.637
2025-09-16 10:14:20,086 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1884.6676, 3720.16, 1231.5997, 1315.0555, 1363.1653, 1238.36, 1851.2775, 1542.3436, 3693.3665, 3829.8188]
2025-09-16 10:14:20,086 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:14:20,105 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 23 seconds)
2025-09-16 10:16:05,331 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:16:17,963 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2134.71460 ± 827.535
2025-09-16 10:16:17,963 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1529.2843, 1924.0015, 2441.5513, 1326.9492, 1364.3378, 1379.8704, 3719.725, 3378.304, 2625.5142, 1657.609]
2025-09-16 10:16:17,963 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:16:17,968 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 18 minutes, 5 seconds)
2025-09-16 10:18:03,821 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:18:17,710 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1869.14917 ± 718.804
2025-09-16 10:18:17,710 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2171.2847, 630.19775, 2018.3894, 2586.1511, 1536.9747, 2537.077, 1499.9801, 1354.3413, 1215.5035, 3141.5913]
2025-09-16 10:18:17,710 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:18:17,717 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 26 seconds)
2025-09-16 10:20:04,337 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:20:17,005 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2045.77808 ± 557.353
2025-09-16 10:20:17,005 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1510.7186, 1986.0215, 1459.1952, 1448.1091, 2552.4148, 1826.2615, 2463.7026, 2584.4785, 3088.7678, 1538.112]
2025-09-16 10:20:17,006 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:20:17,012 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 14 minutes, 32 seconds)
2025-09-16 10:22:02,673 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:22:15,269 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1503.70190 ± 223.929
2025-09-16 10:22:15,270 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1615.6871, 1285.2373, 1661.1108, 1250.5913, 1287.6155, 1947.9242, 1727.7712, 1397.0211, 1297.0421, 1567.0188]
2025-09-16 10:22:15,270 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:22:15,274 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 12 minutes, 30 seconds)
2025-09-16 10:24:02,216 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:24:14,850 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1983.73608 ± 591.465
2025-09-16 10:24:14,851 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1686.0758, 1493.0261, 1243.7944, 2053.7659, 2625.757, 1462.6904, 2061.5134, 3212.783, 2454.9055, 1543.0492]
2025-09-16 10:24:14,851 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:24:14,854 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 10 minutes, 50 seconds)
2025-09-16 10:25:59,555 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:26:12,079 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1950.67712 ± 490.533
2025-09-16 10:26:12,079 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2490.3013, 1874.2266, 1694.6617, 1461.077, 2811.5896, 2541.892, 1647.1675, 2147.002, 1514.5981, 1324.256]
2025-09-16 10:26:12,079 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:26:12,115 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 8 minutes, 43 seconds)
2025-09-16 10:27:58,870 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:28:11,368 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1788.47437 ± 407.545
2025-09-16 10:28:11,368 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2093.0168, 2066.6428, 1616.9331, 1984.1841, 2655.2002, 1636.7098, 1840.0726, 1223.8912, 1372.039, 1396.0546]
2025-09-16 10:28:11,368 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:28:11,375 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 6 minutes, 38 seconds)
2025-09-16 10:29:57,508 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:30:11,387 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2083.74170 ± 663.397
2025-09-16 10:30:11,388 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1837.8977, 1953.1077, 1532.5624, 1968.2781, 3881.2568, 1399.8505, 2228.9949, 2153.4858, 2288.008, 1593.976]
2025-09-16 10:30:11,388 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:30:11,396 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 4 minutes, 49 seconds)
2025-09-16 10:31:56,687 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:32:09,219 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2184.22021 ± 692.888
2025-09-16 10:32:09,219 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1513.6821, 2576.519, 2256.8413, 1423.441, 3352.929, 2395.0256, 1571.2128, 1495.7439, 3320.7407, 1936.0682]
2025-09-16 10:32:09,219 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:32:09,230 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 2 minutes, 45 seconds)
2025-09-16 10:33:55,805 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:34:08,120 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1962.36877 ± 669.672
2025-09-16 10:34:08,120 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1596.8628, 1541.0375, 1523.1064, 2402.1223, 2580.8994, 1631.1761, 3633.4214, 1400.9762, 1677.7742, 1636.3127]
2025-09-16 10:34:08,120 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:34:08,130 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 37 seconds)
2025-09-16 10:35:53,963 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:36:06,434 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2046.14917 ± 621.899
2025-09-16 10:36:06,434 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1868.0203, 3261.5488, 3018.5288, 1501.5549, 1657.1654, 1948.2844, 1667.0492, 1463.3505, 2523.8757, 1552.1135]
2025-09-16 10:36:06,434 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:36:06,439 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 58 minutes, 51 seconds)
2025-09-16 10:37:51,742 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:38:04,340 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1943.27600 ± 642.344
2025-09-16 10:38:04,340 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1422.0316, 1725.4641, 3714.2937, 1765.5676, 2106.294, 1636.1255, 1308.0673, 1645.0488, 2071.541, 2038.3254]
2025-09-16 10:38:04,340 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:38:04,345 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 56 minutes, 37 seconds)
2025-09-16 10:39:50,394 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:40:03,036 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2071.53589 ± 512.434
2025-09-16 10:40:03,036 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1459.1298, 2851.1743, 1404.1713, 2487.275, 1524.8944, 2336.4958, 2086.8552, 1640.7507, 2749.1475, 2175.4639]
2025-09-16 10:40:03,036 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:40:03,041 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 54 minutes, 23 seconds)
2025-09-16 10:41:49,944 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:42:03,953 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1856.67053 ± 413.735
2025-09-16 10:42:03,953 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1445.3536, 2621.147, 2059.3474, 1591.9008, 1192.468, 2228.271, 2005.0289, 1525.3037, 1705.1361, 2192.7488]
2025-09-16 10:42:03,953 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:42:03,969 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes)
2025-09-16 10:43:50,393 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:44:02,949 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1593.33545 ± 909.252
2025-09-16 10:44:02,949 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2369.0413, 2399.5322, 1583.771, 161.54398, 1409.1396, 1842.0294, 3062.308, 1491.2097, 1663.5719, -48.79189]
2025-09-16 10:44:02,949 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:44:02,955 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 2 seconds)
2025-09-16 10:45:49,429 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:46:01,918 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1719.80444 ± 487.596
2025-09-16 10:46:01,918 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1698.8832, 1358.8492, 1295.4084, 2654.4624, 1339.0865, 1425.4506, 1565.3024, 2637.2788, 1802.0054, 1421.3186]
2025-09-16 10:46:01,918 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:46:01,924 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 10 seconds)
2025-09-16 10:47:48,152 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:48:00,740 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2234.13989 ± 736.203
2025-09-16 10:48:00,741 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1782.427, 1997.8577, 3302.3662, 1650.9797, 2395.987, 2602.4038, 1379.6324, 1767.0308, 3749.754, 1712.9614]
2025-09-16 10:48:00,741 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:48:00,749 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 21 seconds)
2025-09-16 10:49:46,851 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:49:59,336 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1681.26685 ± 355.007
2025-09-16 10:49:59,336 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [738.9029, 1702.61, 1918.0013, 1992.078, 1974.6617, 1716.7593, 1663.6855, 1662.3937, 1468.9448, 1974.6309]
2025-09-16 10:49:59,336 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:49:59,341 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 20 seconds)
2025-09-16 10:51:45,288 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:51:59,295 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2672.26416 ± 862.642
2025-09-16 10:51:59,295 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2801.6362, 1502.8073, 1592.1918, 3696.4636, 3746.8538, 2641.4712, 2926.2227, 2722.9795, 1408.6936, 3683.3213]
2025-09-16 10:51:59,295 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:51:59,295 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2672.26) for latency 24
2025-09-16 10:51:59,300 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 11 seconds)
2025-09-16 10:53:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:53:56,784 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1736.87280 ± 653.852
2025-09-16 10:53:56,784 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1406.2164, 1161.041, 2177.5261, 1611.6107, 3216.5007, 1660.3358, 1786.063, 694.4304, 1425.1603, 2229.8433]
2025-09-16 10:53:56,785 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:53:56,791 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 40 minutes, 57 seconds)
2025-09-16 10:55:42,347 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:55:54,926 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1652.11682 ± 179.332
2025-09-16 10:55:54,926 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1495.22, 1769.9307, 1939.0901, 1848.6603, 1477.3433, 1501.1221, 1666.1677, 1859.4216, 1544.542, 1419.6707]
2025-09-16 10:55:54,926 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:55:54,931 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 38 minutes, 50 seconds)
2025-09-16 10:57:40,829 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:57:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1736.60022 ± 452.533
2025-09-16 10:57:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1859.7981, 1398.5189, 907.07935, 1613.2017, 2167.9602, 1843.3746, 2417.031, 2309.0776, 1297.2104, 1552.7502]
2025-09-16 10:57:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:57:53,349 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 36 minutes, 47 seconds)
2025-09-16 10:59:39,710 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 10:59:52,352 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2192.44873 ± 782.483
2025-09-16 10:59:52,352 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1371.203, 1523.3068, 2328.402, 2612.4092, 1562.4696, 3041.2979, 2405.3696, 3573.2642, 2575.2083, 931.5569]
2025-09-16 10:59:52,352 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:59:52,358 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 52 seconds)
2025-09-16 11:01:38,586 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:01:51,121 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2027.66797 ± 525.089
2025-09-16 11:01:51,122 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2075.7405, 1298.1113, 2264.373, 2962.182, 2719.338, 2234.5396, 1513.7025, 1501.8545, 2145.5732, 1561.2656]
2025-09-16 11:01:51,122 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:01:51,142 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 32 minutes, 43 seconds)
2025-09-16 11:03:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:03:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1919.17480 ± 430.074
2025-09-16 11:03:48,383 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1276.2412, 2495.6003, 1604.1895, 1489.8773, 1920.7025, 1866.647, 1544.87, 2494.0813, 1996.2708, 2503.267]
2025-09-16 11:03:48,383 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:03:48,402 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 30 minutes, 42 seconds)
2025-09-16 11:05:35,384 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:05:47,994 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1898.22974 ± 629.375
2025-09-16 11:05:47,994 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2730.1382, 3416.4263, 1703.8147, 1613.6658, 1475.1047, 1468.6161, 1507.7739, 1452.216, 2045.8372, 1568.7025]
2025-09-16 11:05:47,994 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:05:48,001 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 28 minutes, 57 seconds)
2025-09-16 11:07:33,323 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:07:45,790 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1729.82776 ± 725.659
2025-09-16 11:07:45,790 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1294.7112, 1379.43, 1344.2433, 1453.639, 1804.5448, 1404.7832, 3859.473, 1661.2296, 1623.5408, 1472.6821]
2025-09-16 11:07:45,790 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:07:45,798 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 26 minutes, 53 seconds)
2025-09-16 11:09:32,418 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:09:44,952 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2292.76636 ± 649.181
2025-09-16 11:09:44,952 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2536.24, 1372.3918, 2479.702, 2184.5308, 2020.6423, 3539.841, 2960.8499, 1318.03, 1916.2628, 2599.173]
2025-09-16 11:09:44,953 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:09:44,959 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 24 minutes, 56 seconds)
2025-09-16 11:11:31,502 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:11:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2616.83594 ± 861.520
2025-09-16 11:11:45,409 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3711.7683, 2285.146, 1502.7559, 3573.8083, 3185.188, 2071.4473, 2747.81, 3684.4402, 1198.672, 2207.3245]
2025-09-16 11:11:45,409 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:11:45,415 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 11 seconds)
2025-09-16 11:13:31,239 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:13:43,847 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2717.93896 ± 636.352
2025-09-16 11:13:43,847 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2505.104, 3853.3982, 2467.1572, 1894.9369, 2146.2576, 2673.2405, 3772.6143, 2915.1128, 2041.4224, 2910.1487]
2025-09-16 11:13:43,847 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:13:43,847 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2717.94) for latency 24
2025-09-16 11:13:43,857 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 22 seconds)
2025-09-16 11:15:29,294 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:15:43,121 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2284.78955 ± 660.690
2025-09-16 11:15:43,121 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2980.191, 1556.856, 1798.7122, 3090.6897, 2373.4238, 1811.4579, 1322.8036, 3240.0552, 2778.7996, 1894.9084]
2025-09-16 11:15:43,121 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:15:43,127 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 21 seconds)
2025-09-16 11:17:29,392 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:17:43,368 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2210.21484 ± 527.048
2025-09-16 11:17:43,368 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2582.3965, 2894.9739, 1777.9991, 1429.5231, 1673.4742, 2005.8618, 1806.8729, 2594.1357, 3064.8118, 2272.0977]
2025-09-16 11:17:43,369 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:17:43,375 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 41 seconds)
2025-09-16 11:19:29,621 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:19:42,227 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2719.02173 ± 765.249
2025-09-16 11:19:42,228 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3099.1465, 3108.7102, 1617.01, 2509.668, 1759.7007, 1868.5082, 2637.38, 3943.865, 2864.3074, 3781.9224]
2025-09-16 11:19:42,228 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:19:42,228 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2719.02) for latency 24
2025-09-16 11:19:42,233 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 39 seconds)
2025-09-16 11:21:28,537 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:21:40,821 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2924.13940 ± 762.679
2025-09-16 11:21:40,821 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3135.4087, 1963.7765, 3663.9043, 3444.059, 2473.9365, 3814.273, 2543.4004, 2508.3171, 1683.6421, 4010.6763]
2025-09-16 11:21:40,821 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:21:40,821 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2924.14) for latency 24
2025-09-16 11:21:40,831 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 26 seconds)
2025-09-16 11:23:26,542 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:23:39,159 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2126.21216 ± 662.914
2025-09-16 11:23:39,159 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1542.9462, 1560.3365, 2639.0378, 2206.3408, 2113.4236, 2139.2612, 1963.7529, 1736.599, 3836.1445, 1524.2767]
2025-09-16 11:23:39,159 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:23:39,165 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 26 seconds)
2025-09-16 11:25:25,136 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:25:37,635 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2877.96240 ± 623.275
2025-09-16 11:25:37,635 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3649.0383, 2604.6616, 3804.8293, 2806.7405, 2820.7996, 2527.2705, 2255.2217, 2743.3674, 1834.791, 3732.905]
2025-09-16 11:25:37,635 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:25:37,670 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 21 seconds)
2025-09-16 11:27:23,094 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:27:35,613 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2645.12695 ± 886.278
2025-09-16 11:27:35,613 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1604.5123, 3553.685, 2045.5001, 1527.095, 2829.164, 4133.2505, 2427.1006, 2584.0425, 1890.3356, 3856.582]
2025-09-16 11:27:35,613 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:27:35,620 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 7 seconds)
2025-09-16 11:29:21,553 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:29:34,140 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2252.44067 ± 801.462
2025-09-16 11:29:34,140 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1507.2485, 3684.8655, 3231.0593, 1454.9086, 2507.8064, 1389.2383, 2586.735, 1467.4834, 2924.6692, 1770.3911]
2025-09-16 11:29:34,140 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:29:34,150 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 6 seconds)
2025-09-16 11:31:20,884 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:31:34,832 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2821.87915 ± 875.273
2025-09-16 11:31:34,833 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3814.0173, 2726.6392, 2201.598, 3150.2197, 1657.6293, 2404.982, 4054.5684, 4137.215, 2319.5288, 1752.3949]
2025-09-16 11:31:34,833 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:31:34,843 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 21 seconds)
2025-09-16 11:33:22,442 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:33:34,941 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2318.82300 ± 593.033
2025-09-16 11:33:34,941 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2634.637, 2381.824, 3119.5898, 1579.98, 3110.646, 2693.5894, 2588.6628, 1711.8826, 1359.1356, 2008.2832]
2025-09-16 11:33:34,941 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:33:34,951 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 33 seconds)
2025-09-16 11:35:21,052 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:35:33,536 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1950.02368 ± 434.432
2025-09-16 11:35:33,536 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2340.74, 1544.4877, 1330.4425, 2282.4497, 1991.228, 1468.528, 2206.8977, 2743.2307, 1572.8103, 2019.4214]
2025-09-16 11:35:33,536 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:35:33,546 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 35 seconds)
2025-09-16 11:37:19,604 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:37:32,151 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2825.71313 ± 1006.547
2025-09-16 11:37:32,151 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2373.0642, 3913.2505, 1586.447, 1574.1196, 3719.8647, 2364.8767, 3957.8037, 4098.4204, 1545.3735, 3123.9116]
2025-09-16 11:37:32,151 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:37:32,189 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 40 seconds)
2025-09-16 11:39:17,794 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:39:31,681 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1859.37964 ± 339.192
2025-09-16 11:39:31,682 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1471.8864, 1628.9163, 1765.2606, 2724.5974, 1767.8746, 2132.0647, 1738.1141, 1914.3434, 1888.7947, 1561.9453]
2025-09-16 11:39:31,682 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:39:31,689 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 46 seconds)
2025-09-16 11:41:17,559 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:41:30,053 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2163.60107 ± 601.669
2025-09-16 11:41:30,053 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1475.0166, 2106.124, 2569.9368, 1484.7471, 3222.346, 1407.0413, 2339.4775, 2883.6523, 2432.863, 1714.8055]
2025-09-16 11:41:30,053 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:41:30,071 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 34 seconds)
2025-09-16 11:43:17,031 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:43:29,613 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3074.84863 ± 828.453
2025-09-16 11:43:29,613 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4339.368, 1608.0948, 3395.3425, 1873.2234, 3156.5796, 3047.5767, 2484.0767, 3579.2212, 3224.1045, 4040.8962]
2025-09-16 11:43:29,613 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:43:29,613 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (3074.85) for latency 24
2025-09-16 11:43:29,622 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 32 seconds)
2025-09-16 11:45:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:45:27,636 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2776.88184 ± 871.830
2025-09-16 11:45:27,636 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2022.1732, 4000.5537, 3970.3696, 2495.184, 2467.5486, 3912.0242, 3127.9062, 2332.886, 1505.6996, 1934.473]
2025-09-16 11:45:27,636 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:45:27,644 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 30 seconds)
2025-09-16 11:47:13,554 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:47:27,511 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2053.37280 ± 542.206
2025-09-16 11:47:27,512 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1749.4698, 1817.0791, 3251.091, 1418.888, 1967.2037, 1419.3694, 2040.5128, 1942.7181, 2150.609, 2776.7869]
2025-09-16 11:47:27,512 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:47:27,524 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 37 seconds)
2025-09-16 11:49:13,378 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:49:25,729 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2288.10449 ± 582.481
2025-09-16 11:49:25,729 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1436.3284, 2754.7368, 2255.307, 1665.4672, 3153.4531, 1799.7579, 2840.5632, 1956.9388, 1979.7104, 3038.7817]
2025-09-16 11:49:25,729 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:49:25,736 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 32 seconds)
2025-09-16 11:51:10,589 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:51:23,089 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3002.85474 ± 995.092
2025-09-16 11:51:23,089 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3882.4575, 1651.3778, 4062.0234, 1667.1903, 3875.8567, 3759.8499, 3949.4756, 1559.2329, 2752.3838, 2868.6982]
2025-09-16 11:51:23,089 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:51:23,100 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 29 seconds)
2025-09-16 11:53:08,492 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:53:20,981 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2273.93164 ± 639.534
2025-09-16 11:53:20,981 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1846.3198, 2332.238, 1850.5889, 1954.772, 2258.6707, 3622.4553, 1910.678, 2256.6992, 3264.2334, 1442.6604]
2025-09-16 11:53:20,981 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:53:20,988 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 23 seconds)
2025-09-16 11:55:07,273 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:55:19,886 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3307.16919 ± 822.105
2025-09-16 11:55:19,886 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2898.4966, 3596.0637, 1604.0056, 3509.4956, 4106.26, 4016.1726, 4082.1804, 2783.07, 2360.4014, 4115.543]
2025-09-16 11:55:19,886 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:55:19,886 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (3307.17) for latency 24
2025-09-16 11:55:19,922 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 29 seconds)
2025-09-16 11:57:06,107 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:57:20,031 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3541.18115 ± 724.018
2025-09-16 11:57:20,031 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4079.6367, 3482.817, 3270.1523, 1683.4896, 3283.0405, 4255.7183, 4171.4004, 4207.728, 3495.0295, 3482.7983]
2025-09-16 11:57:20,031 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:57:20,031 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (3541.18) for latency 24
2025-09-16 11:57:20,044 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 31 seconds)
2025-09-16 11:59:05,510 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 11:59:18,023 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1949.87732 ± 568.183
2025-09-16 11:59:18,023 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1635.8191, 1272.862, 3211.3967, 2123.3245, 2656.4495, 1700.0563, 1662.7278, 1382.0837, 1724.1991, 2129.8564]
2025-09-16 11:59:18,023 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:59:18,033 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 32 seconds)
2025-09-16 12:01:03,259 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:01:15,852 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2733.39868 ± 1006.412
2025-09-16 12:01:15,853 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1873.7465, 1902.4528, 1348.1517, 1829.8584, 2783.2222, 4228.954, 4319.2383, 2441.7546, 3790.7412, 2815.8657]
2025-09-16 12:01:15,853 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:01:15,864 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 35 seconds)
2025-09-16 12:03:03,392 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:03:17,221 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3327.07080 ± 1099.450
2025-09-16 12:03:17,221 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4686.764, 1679.0458, 2254.8599, 4556.9697, 4397.2373, 1940.8501, 3941.6858, 3715.7305, 2303.6956, 3793.8687]
2025-09-16 12:03:17,221 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:03:17,229 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 47 seconds)
2025-09-16 12:05:02,994 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:05:15,160 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3010.72632 ± 976.924
2025-09-16 12:05:15,160 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1400.507, 3576.6016, 4242.6533, 4118.2925, 3630.1638, 2595.8623, 2846.686, 1848.8352, 1939.2943, 3908.367]
2025-09-16 12:05:15,160 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:05:15,169 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 45 seconds)
2025-09-16 12:07:02,453 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:07:14,857 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2945.94849 ± 739.172
2025-09-16 12:07:14,857 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2272.3157, 1627.0897, 4019.9072, 2908.6387, 3476.8838, 3230.1453, 2177.5298, 2875.9578, 4017.931, 2853.0835]
2025-09-16 12:07:14,857 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:07:14,873 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 45 seconds)
2025-09-16 12:09:02,705 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:09:15,123 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2781.13232 ± 892.355
2025-09-16 12:09:15,123 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1398.6383, 2195.0098, 3553.4036, 3110.2478, 4096.3823, 3687.945, 1894.6229, 2616.216, 3521.6462, 1737.212]
2025-09-16 12:09:15,123 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:09:15,132 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 52 seconds)
2025-09-16 12:11:01,325 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:11:13,748 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2582.78052 ± 966.259
2025-09-16 12:11:13,748 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1870.0813, 2428.2292, 2233.7456, 4269.933, 1876.9714, 1432.2842, 2268.5317, 3141.2366, 4368.467, 1938.3236]
2025-09-16 12:11:13,748 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:11:13,762 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 54 seconds)
2025-09-16 12:13:01,427 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:13:13,957 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3607.68750 ± 1021.109
2025-09-16 12:13:13,958 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4233.712, 4399.549, 3510.4338, 4195.6074, 4236.238, 1655.4751, 1610.2455, 4333.75, 3734.3735, 4167.4907]
2025-09-16 12:13:13,958 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:13:13,958 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (3607.69) for latency 24
2025-09-16 12:13:13,966 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 52 seconds)
2025-09-16 12:15:01,804 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:15:14,193 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3004.27905 ± 1008.290
2025-09-16 12:15:14,193 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2257.6504, 2735.6113, 1360.6256, 2489.4946, 4032.8774, 4166.9546, 1601.5753, 3817.822, 3405.391, 4174.7886]
2025-09-16 12:15:14,193 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:15:14,226 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 58 seconds)
2025-09-16 12:17:02,099 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:17:14,581 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2194.63770 ± 757.571
2025-09-16 12:17:14,581 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1986.4845, 2294.7217, 1386.8065, 1837.0278, 1613.8835, 2135.2668, 2674.0444, 1683.84, 4211.7085, 2122.593]
2025-09-16 12:17:14,581 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:17:14,602 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 59 seconds)
2025-09-16 12:19:01,342 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:19:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3176.43433 ± 1036.599
2025-09-16 12:19:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4532.0654, 2327.0562, 1961.2235, 2793.5823, 4524.0874, 3997.066, 4275.0957, 2047.2666, 3376.0078, 1930.8927]
2025-09-16 12:19:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:19:13,876 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 57 seconds)
2025-09-16 12:21:01,415 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:21:15,292 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2438.29541 ± 823.565
2025-09-16 12:21:15,292 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1954.2903, 3225.8865, 4422.117, 2138.609, 2046.5315, 1861.1069, 2693.4307, 2718.5996, 1481.4697, 1840.9167]
2025-09-16 12:21:15,292 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:21:15,301 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 2 seconds)
2025-09-16 12:23:03,150 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:23:15,669 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2922.57837 ± 956.339
2025-09-16 12:23:15,669 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2627.1514, 2345.1348, 1898.287, 3945.621, 2389.0999, 3246.3025, 2732.4531, 1420.2853, 4095.0679, 4526.379]
2025-09-16 12:23:15,669 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:23:15,676 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 2 seconds)
2025-09-16 12:25:03,866 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:25:16,341 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2788.80713 ± 922.642
2025-09-16 12:25:16,341 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3695.634, 1975.2573, 3369.0605, 2671.2815, 4468.283, 3527.971, 1630.0833, 1516.3988, 2834.5505, 2199.5525]
2025-09-16 12:25:16,341 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:25:16,350 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 2 seconds)
2025-09-16 12:27:03,788 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:27:16,295 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2232.21313 ± 576.861
2025-09-16 12:27:16,295 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2102.5137, 3118.023, 2271.2148, 2018.0771, 2313.235, 2846.3472, 1343.003, 1409.1163, 2965.9548, 1934.6462]
2025-09-16 12:27:16,295 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:27:16,305 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 1 second)
2025-09-16 12:29:04,310 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:29:16,698 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3607.83081 ± 947.019
2025-09-16 12:29:16,698 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4241.8076, 2435.9229, 1981.6686, 3176.7517, 4501.2, 4430.167, 3995.7087, 2443.3745, 4509.7896, 4361.917]
2025-09-16 12:29:16,698 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:29:16,698 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (3607.83) for latency 24
2025-09-16 12:29:16,720 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 1 second)
2025-09-16 12:31:03,930 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:31:16,434 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3454.99951 ± 949.366
2025-09-16 12:31:16,434 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4432.32, 3428.6555, 2412.438, 4406.561, 1749.0739, 4350.812, 3934.599, 2220.214, 4225.1455, 3390.1758]
2025-09-16 12:31:16,434 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:31:16,463 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes)
2025-09-16 12:33:05,078 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:33:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 3902.62036 ± 822.269
2025-09-16 12:33:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4283.3057, 4214.591, 4340.7227, 3279.7178, 4186.3735, 4146.2676, 1607.8884, 4300.3735, 4377.691, 4289.273]
2025-09-16 12:33:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:33:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (3902.62) for latency 24
2025-09-16 12:33:17,583 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes)
2025-09-16 12:35:05,874 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 12:35:18,379 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2394.77393 ± 869.847
2025-09-16 12:35:18,379 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2306.3198, 1517.9791, 2928.5205, 1524.0295, 2223.99, 1779.4807, 2130.698, 1972.343, 2994.5073, 4569.8716]
2025-09-16 12:35:18,379 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:35:18,393 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1251 [DEBUG]: Training session finished
