2025-08-07 04:07:32,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc10-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:07:32,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc10-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:07:32,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15134cd5edd0>}
2025-08-07 04:07:32,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1111 [DEBUG]: using device: cuda
2025-08-07 04:07:32,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1133 [INFO]: Creating new trainer
2025-08-07 04:07:32,981 baseline-bpql-noiseperc10-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=219, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:07:32,982 baseline-bpql-noiseperc10-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:07:34,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1194 [DEBUG]: Starting training session...
2025-08-07 04:07:34,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 1/100
2025-08-07 04:09:15,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:09:15,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -26.94188 ± 33.749
2025-08-07 04:09:15,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-39.77782, -54.93227, -103.93695, -8.15757, 6.0533476, -24.189062, -45.45522, 2.1614177, -16.186594, 15.001884]
2025-08-07 04:09:15,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 62.0, 85.0, 45.0, 31.0, 71.0, 63.0, 44.0, 43.0, 32.0]
2025-08-07 04:09:15,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (-26.94) for latency ExtremeClogL1U23
2025-08-07 04:09:15,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 46 minutes, 59 seconds)
2025-08-07 04:10:59,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:11:02,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -206.62955 ± 344.087
2025-08-07 04:11:02,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-62.72154, 11.077641, -65.186935, -16.182117, -144.026, -8.767432, -1026.4673, -8.556121, -19.207573, -726.2581]
2025-08-07 04:11:02,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [117.0, 65.0, 76.0, 58.0, 131.0, 82.0, 1000.0, 56.0, 65.0, 708.0]
2025-08-07 04:11:02,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 50 minutes, 8 seconds)
2025-08-07 04:12:56,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:12:57,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -47.78907 ± 73.054
2025-08-07 04:12:57,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-115.857124, -7.645555, -78.735985, -46.232224, -27.509035, 21.946665, -11.2219095, -6.9539995, 23.300419, -228.98195]
2025-08-07 04:12:57,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [156.0, 62.0, 184.0, 104.0, 57.0, 39.0, 55.0, 41.0, 38.0, 294.0]
2025-08-07 04:12:57,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 54 minutes, 14 seconds)
2025-08-07 04:14:33,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:14:36,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -110.22343 ± 313.232
2025-08-07 04:14:36,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [17.52627, -1.0206299, 16.95977, 8.667446, 11.105987, 16.024605, -1043.059, 8.694624, -23.112741, -114.02071]
2025-08-07 04:14:36,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 58.0, 59.0, 52.0, 43.0, 44.0, 1000.0, 66.0, 74.0, 120.0]
2025-08-07 04:14:36,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 48 minutes, 38 seconds)
2025-08-07 04:16:21,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:16:23,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -15.14342 ± 37.452
2025-08-07 04:16:23,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-12.171835, -15.503182, 12.844559, 11.55171, -1.7854098, 22.732975, 25.090025, -36.53171, -97.80926, -59.852047]
2025-08-07 04:16:23,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 82.0, 52.0, 98.0, 169.0, 35.0, 30.0, 110.0, 168.0, 117.0]
2025-08-07 04:16:23,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (-15.14) for latency ExtremeClogL1U23
2025-08-07 04:16:23,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 47 minutes, 22 seconds)
2025-08-07 04:18:08,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:18:14,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -84.35976 ± 143.160
2025-08-07 04:18:14,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-3.0582926, -21.605114, -287.90253, 26.516153, -286.08688, 3.9088998, 31.677896, -329.4594, -2.6924968, 25.104212]
2025-08-07 04:18:14,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [208.0, 82.0, 1000.0, 102.0, 1000.0, 99.0, 100.0, 1000.0, 239.0, 178.0]
2025-08-07 04:18:14,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 48 minutes, 45 seconds)
2025-08-07 04:20:06,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:20:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 303.96234 ± 124.956
2025-08-07 04:20:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [330.5888, 231.83636, 228.1414, 277.1922, 268.55673, 373.07086, 264.7243, 645.6994, 239.2152, 180.59843]
2025-08-07 04:20:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 458.0, 386.0]
2025-08-07 04:20:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (303.96) for latency ExtremeClogL1U23
2025-08-07 04:20:20,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 52 minutes, 47 seconds)
2025-08-07 04:22:08,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:22:23,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 524.26630 ± 103.982
2025-08-07 04:22:23,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [340.4162, 494.8557, 360.46896, 521.68976, 603.178, 521.306, 537.78766, 611.49927, 548.265, 703.19635]
2025-08-07 04:22:23,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [605.0, 1000.0, 1000.0, 768.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:22:23,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (524.27) for latency ExtremeClogL1U23
2025-08-07 04:22:23,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 53 minutes, 26 seconds)
2025-08-07 04:24:10,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:24:20,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 457.11029 ± 276.605
2025-08-07 04:24:20,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [705.78406, 265.19147, 120.22015, 737.71625, 569.0445, 647.0095, 80.17013, 53.440147, 751.8644, 640.6625]
2025-08-07 04:24:20,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 398.0, 119.0, 1000.0, 1000.0, 1000.0, 90.0, 94.0, 1000.0, 1000.0]
2025-08-07 04:24:20,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 20 seconds)
2025-08-07 04:26:01,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:26:12,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 279.30536 ± 244.659
2025-08-07 04:26:12,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [603.42413, 156.52965, 187.96071, 564.29926, 238.16881, -166.21988, 71.834335, 141.94182, 623.0963, 372.01828]
2025-08-07 04:26:12,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 145.0, 231.0, 1000.0, 294.0, 1000.0, 61.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:26:12,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 44 seconds)
2025-08-07 04:27:58,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:28:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 482.86652 ± 254.996
2025-08-07 04:28:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [718.4031, 626.99414, 668.7741, 322.8089, 80.97968, 92.70555, 706.0814, 595.569, 247.16655, 769.1828]
2025-08-07 04:28:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 390.0, 84.0, 85.0, 1000.0, 1000.0, 295.0, 1000.0]
2025-08-07 04:28:08,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 22 seconds)
2025-08-07 04:29:58,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:30:05,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 312.89746 ± 301.971
2025-08-07 04:30:05,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [642.4356, 55.51721, 136.66711, 51.533813, 599.30206, 115.32246, 43.67878, 859.8849, 43.134457, 581.49805]
2025-08-07 04:30:05,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 50.0, 134.0, 49.0, 1000.0, 147.0, 52.0, 1000.0, 47.0, 1000.0]
2025-08-07 04:30:05,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 51 minutes, 34 seconds)
2025-08-07 04:31:50,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:32:01,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 499.54254 ± 239.653
2025-08-07 04:32:01,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [764.9646, 641.16266, 622.7185, 89.12405, 134.11484, 221.06886, 747.40625, 559.9139, 588.09155, 626.8599]
2025-08-07 04:32:01,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 96.0, 122.0, 237.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:32:01,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 47 minutes, 42 seconds)
2025-08-07 04:33:47,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:33:50,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 125.75004 ± 98.803
2025-08-07 04:33:50,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [17.602251, 120.85839, 306.2096, 257.07312, 245.95027, 60.207294, 89.248405, 59.478348, 62.180008, 38.692596]
2025-08-07 04:33:50,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 117.0, 383.0, 395.0, 411.0, 65.0, 59.0, 62.0, 58.0, 52.0]
2025-08-07 04:33:50,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 43 minutes, 14 seconds)
2025-08-07 04:35:32,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:35:36,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 183.59470 ± 146.935
2025-08-07 04:35:36,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [249.98134, 209.16016, 37.21638, 122.22596, 81.32687, 387.2057, 93.21806, 54.187695, 502.4309, 98.99383]
2025-08-07 04:35:36,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [378.0, 245.0, 43.0, 147.0, 71.0, 568.0, 109.0, 68.0, 649.0, 109.0]
2025-08-07 04:35:36,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 39 minutes, 50 seconds)
2025-08-07 04:37:20,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:37:25,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 215.95047 ± 238.076
2025-08-07 04:37:25,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [689.0016, 118.29233, 100.13163, 110.746185, 296.20044, 58.268345, 74.724205, 14.971127, 650.39624, 46.772346]
2025-08-07 04:37:25,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 171.0, 239.0, 74.0, 373.0, 130.0, 64.0, 45.0, 1000.0, 47.0]
2025-08-07 04:37:25,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 35 minutes, 52 seconds)
2025-08-07 04:39:11,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:39:17,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 289.58429 ± 206.151
2025-08-07 04:39:17,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [151.7288, 301.055, 206.46245, 670.59686, 118.340225, 119.22239, 49.208714, 329.15085, 291.6209, 658.45685]
2025-08-07 04:39:17,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [150.0, 381.0, 188.0, 1000.0, 157.0, 103.0, 62.0, 387.0, 263.0, 787.0]
2025-08-07 04:39:17,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 32 minutes, 40 seconds)
2025-08-07 04:41:05,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:41:10,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 236.01929 ± 263.987
2025-08-07 04:41:10,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [42.466972, 117.2018, 43.083015, 78.73532, 114.48776, 267.31952, 721.17645, 105.3246, 91.83778, 778.5596]
2025-08-07 04:41:10,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 157.0, 45.0, 125.0, 99.0, 292.0, 1000.0, 116.0, 98.0, 1000.0]
2025-08-07 04:41:10,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 29 minutes, 52 seconds)
2025-08-07 04:43:02,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:43:07,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 263.93915 ± 222.758
2025-08-07 04:43:07,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [64.35949, 184.96342, 198.73248, 91.32044, 209.1373, 198.79988, 101.84264, 724.0643, 194.72098, 671.4508]
2025-08-07 04:43:07,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 175.0, 203.0, 100.0, 283.0, 323.0, 120.0, 1000.0, 196.0, 794.0]
2025-08-07 04:43:07,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 30 minutes, 28 seconds)
2025-08-07 04:44:46,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:44:54,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 434.83340 ± 323.183
2025-08-07 04:44:54,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [31.47064, 431.2315, 394.08893, 164.48503, 721.9039, 771.08295, 43.008495, 991.95715, 130.16826, 668.93677]
2025-08-07 04:44:54,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 437.0, 488.0, 191.0, 730.0, 961.0, 41.0, 1000.0, 134.0, 1000.0]
2025-08-07 04:44:54,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 28 minutes, 46 seconds)
2025-08-07 04:46:45,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:46:54,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 539.44617 ± 254.842
2025-08-07 04:46:54,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [381.2894, 420.75473, 289.7786, 743.613, 833.79266, 372.53607, 651.21906, 713.83984, 909.0027, 78.635155]
2025-08-07 04:46:54,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [361.0, 422.0, 299.0, 1000.0, 771.0, 410.0, 1000.0, 1000.0, 1000.0, 69.0]
2025-08-07 04:46:54,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (539.45) for latency ExtremeClogL1U23
2025-08-07 04:46:54,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 29 minutes, 54 seconds)
2025-08-07 04:48:40,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:48:47,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 397.17874 ± 324.601
2025-08-07 04:48:47,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [475.7659, 27.97291, 78.82377, 692.7052, 891.0457, 160.7801, 103.27346, 657.59625, 805.5721, 78.25193]
2025-08-07 04:48:47,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [435.0, 36.0, 91.0, 1000.0, 1000.0, 139.0, 80.0, 1000.0, 795.0, 79.0]
2025-08-07 04:48:47,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 28 minutes, 21 seconds)
2025-08-07 04:50:26,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:50:32,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 282.56842 ± 285.293
2025-08-07 04:50:32,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [102.21443, 572.2356, 144.3094, 71.051056, 697.3004, 252.74611, 52.864883, 71.2584, 830.4446, 31.259483]
2025-08-07 04:50:32,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [174.0, 1000.0, 175.0, 76.0, 1000.0, 272.0, 85.0, 89.0, 1000.0, 83.0]
2025-08-07 04:50:32,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 24 minutes, 26 seconds)
2025-08-07 04:52:21,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:52:27,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 268.13678 ± 240.754
2025-08-07 04:52:27,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [84.46882, 649.5455, 591.99023, 148.1853, 87.67618, 599.26434, 30.215963, 73.19581, 344.78452, 72.04123]
2025-08-07 04:52:27,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 1000.0, 1000.0, 168.0, 126.0, 1000.0, 42.0, 97.0, 322.0, 100.0]
2025-08-07 04:52:27,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 21 minutes, 48 seconds)
2025-08-07 04:54:11,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:54:14,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 202.33458 ± 164.521
2025-08-07 04:54:14,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [56.459568, 116.76929, 197.80864, 469.4645, 284.9716, 232.84602, 77.226524, 509.1403, 41.262447, 37.396976]
2025-08-07 04:54:14,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 135.0, 246.0, 412.0, 373.0, 189.0, 82.0, 496.0, 45.0, 60.0]
2025-08-07 04:54:14,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 20 minutes)
2025-08-07 04:56:00,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:56:03,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 232.12283 ± 278.377
2025-08-07 04:56:03,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [27.75557, 62.07244, 362.06747, 286.9713, 142.43968, 1009.744, 175.44789, 124.95663, 87.10542, 42.667645]
2025-08-07 04:56:03,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 65.0, 354.0, 322.0, 95.0, 1000.0, 204.0, 142.0, 79.0, 57.0]
2025-08-07 04:56:03,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 15 minutes, 21 seconds)
2025-08-07 04:57:53,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:58:00,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 410.42148 ± 322.925
2025-08-07 04:58:00,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [50.574993, 35.52724, 182.6803, 573.11896, 320.40952, 710.67847, 234.03384, 1084.5267, 225.6493, 687.01575]
2025-08-07 04:58:00,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 34.0, 164.0, 1000.0, 281.0, 596.0, 242.0, 1000.0, 204.0, 1000.0]
2025-08-07 04:58:00,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 14 minutes, 37 seconds)
2025-08-07 04:59:43,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:59:47,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 245.18608 ± 206.106
2025-08-07 04:59:47,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [698.0958, 153.53539, 78.63242, 239.63861, 125.63879, 129.00974, 31.53713, 101.68463, 510.20544, 383.8829]
2025-08-07 04:59:47,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 195.0, 88.0, 213.0, 157.0, 159.0, 35.0, 88.0, 583.0, 352.0]
2025-08-07 04:59:47,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 13 minutes, 6 seconds)
2025-08-07 05:01:38,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:01:44,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 344.63687 ± 258.358
2025-08-07 05:01:44,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [261.59033, 31.138657, 705.6963, 45.554657, 193.7409, 712.9328, 306.91742, 695.5399, 388.61475, 104.642815]
2025-08-07 05:01:44,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [264.0, 39.0, 685.0, 64.0, 212.0, 629.0, 322.0, 1000.0, 351.0, 99.0]
2025-08-07 05:01:44,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 11 minutes, 50 seconds)
2025-08-07 05:03:23,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:03:34,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 750.41992 ± 372.565
2025-08-07 05:03:34,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [218.71753, 403.03412, 1146.8167, 242.5057, 359.24304, 1049.1625, 1164.4136, 918.79517, 967.5149, 1033.9958]
2025-08-07 05:03:34,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [183.0, 356.0, 962.0, 254.0, 364.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:03:34,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (750.42) for latency ExtremeClogL1U23
2025-08-07 05:03:34,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 10 minutes, 37 seconds)
2025-08-07 05:05:20,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:05:28,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 540.83691 ± 453.105
2025-08-07 05:05:28,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [267.31158, 46.02644, 54.390182, 1129.1576, 93.32683, 1003.5934, 818.4495, 857.7548, 1096.9667, 41.392303]
2025-08-07 05:05:28,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [328.0, 42.0, 54.0, 1000.0, 113.0, 1000.0, 1000.0, 1000.0, 836.0, 36.0]
2025-08-07 05:05:28,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 9 minutes, 57 seconds)
2025-08-07 05:07:14,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:07:21,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 518.24084 ± 328.130
2025-08-07 05:07:21,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [370.32294, 744.41455, 91.37503, 199.72572, 507.705, 218.55212, 977.02545, 742.5877, 1069.0653, 261.6348]
2025-08-07 05:07:21,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [349.0, 1000.0, 86.0, 217.0, 391.0, 177.0, 816.0, 672.0, 1000.0, 197.0]
2025-08-07 05:07:21,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 7 minutes, 10 seconds)
2025-08-07 05:09:16,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:09:20,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 354.45792 ± 374.690
2025-08-07 05:09:20,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [105.715416, 749.13934, 362.39758, 21.892744, 139.27386, 10.323714, 334.2192, 551.94653, 37.235943, 1232.435]
2025-08-07 05:09:20,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [92.0, 524.0, 283.0, 33.0, 130.0, 32.0, 250.0, 457.0, 43.0, 1000.0]
2025-08-07 05:09:20,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 7 minutes, 58 seconds)
2025-08-07 05:11:00,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:11:08,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 449.15390 ± 321.154
2025-08-07 05:11:08,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [264.77142, 228.46992, 214.9021, 210.30522, 142.49384, 881.79456, 562.4472, 148.07996, 847.3299, 990.9452]
2025-08-07 05:11:08,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [285.0, 223.0, 147.0, 203.0, 132.0, 1000.0, 548.0, 112.0, 1000.0, 1000.0]
2025-08-07 05:11:08,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 4 minutes)
2025-08-07 05:12:53,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:13:05,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 865.71094 ± 332.530
2025-08-07 05:13:05,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [473.5372, 832.53827, 1308.4913, 699.8909, 620.26794, 1182.0491, 1211.5122, 291.64352, 830.00916, 1207.1691]
2025-08-07 05:13:05,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [364.0, 1000.0, 1000.0, 1000.0, 603.0, 1000.0, 990.0, 260.0, 605.0, 1000.0]
2025-08-07 05:13:05,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (865.71) for latency ExtremeClogL1U23
2025-08-07 05:13:05,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 3 minutes, 44 seconds)
2025-08-07 05:14:53,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:15:01,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 577.93182 ± 477.181
2025-08-07 05:15:01,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [83.975525, 1338.1985, 148.83316, 1033.3988, 832.10944, 237.41206, 844.96, 107.25606, 33.428486, 1119.7462]
2025-08-07 05:15:01,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 1000.0, 140.0, 774.0, 1000.0, 190.0, 1000.0, 94.0, 36.0, 831.0]
2025-08-07 05:15:01,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes, 13 seconds)
2025-08-07 05:16:43,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:16:46,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 226.16402 ± 196.473
2025-08-07 05:16:46,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [53.744873, 481.35156, 32.30603, 425.0482, 60.818863, 138.76483, 31.788372, 506.27103, 438.5101, 93.03647]
2025-08-07 05:16:46,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 348.0, 34.0, 359.0, 57.0, 99.0, 33.0, 356.0, 332.0, 126.0]
2025-08-07 05:16:46,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 58 minutes, 34 seconds)
2025-08-07 05:18:36,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:18:43,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 496.55103 ± 419.556
2025-08-07 05:18:43,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1270.1196, 29.187181, 792.57196, 341.42203, 254.79646, 401.67484, 37.298283, 923.9839, 28.512846, 885.94293]
2025-08-07 05:18:43,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 36.0, 645.0, 1000.0, 187.0, 325.0, 40.0, 708.0, 33.0, 693.0]
2025-08-07 05:18:43,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 56 minutes, 25 seconds)
2025-08-07 05:20:26,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:20:33,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 498.31314 ± 389.151
2025-08-07 05:20:33,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1381.5437, 280.8988, 874.3846, 116.02773, 265.08746, 774.4183, 570.38513, 437.8251, 91.16007, 191.40059]
2025-08-07 05:20:33,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 195.0, 1000.0, 90.0, 199.0, 1000.0, 348.0, 337.0, 72.0, 152.0]
2025-08-07 05:20:33,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 54 minutes, 59 seconds)
2025-08-07 05:22:24,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:22:34,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 864.05286 ± 479.009
2025-08-07 05:22:34,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1516.8739, 1368.3424, 565.45624, 296.9004, 510.6827, 1119.2823, 1144.2427, 75.81587, 658.2849, 1384.6464]
2025-08-07 05:22:34,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [982.0, 1000.0, 500.0, 231.0, 463.0, 903.0, 754.0, 47.0, 468.0, 1000.0]
2025-08-07 05:22:34,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 53 minutes, 45 seconds)
2025-08-07 05:24:18,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:24:25,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 585.73004 ± 467.264
2025-08-07 05:24:25,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [101.23325, 674.83594, 816.01117, 381.27393, 122.190834, 1451.3615, 1350.1888, 557.0902, 176.35686, 226.7577]
2025-08-07 05:24:25,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [134.0, 491.0, 1000.0, 299.0, 111.0, 1000.0, 1000.0, 408.0, 207.0, 202.0]
2025-08-07 05:24:25,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 50 minutes, 54 seconds)
2025-08-07 05:26:07,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:26:12,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 432.50113 ± 365.004
2025-08-07 05:26:12,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [215.08878, 60.164608, 119.033295, 710.1447, 634.6317, 215.9533, 323.6219, 312.56546, 1356.6135, 377.1939]
2025-08-07 05:26:12,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [179.0, 57.0, 77.0, 519.0, 487.0, 172.0, 243.0, 251.0, 858.0, 313.0]
2025-08-07 05:26:12,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 49 minutes, 25 seconds)
2025-08-07 05:27:58,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:28:07,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 773.51117 ± 511.726
2025-08-07 05:28:07,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1377.8574, 200.37383, 1070.3718, 23.421595, 530.72034, 929.0856, 1397.6127, 1277.1493, 42.01828, 886.50085]
2025-08-07 05:28:07,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 133.0, 757.0, 33.0, 346.0, 610.0, 1000.0, 895.0, 39.0, 608.0]
2025-08-07 05:28:07,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 47 minutes)
2025-08-07 05:29:53,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:29:59,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 503.02753 ± 414.517
2025-08-07 05:29:59,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [100.89416, 325.3706, 1262.2784, 854.2106, 516.3363, 248.47173, 339.01205, 56.869705, 172.09183, 1154.7399]
2025-08-07 05:29:59,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 250.0, 1000.0, 1000.0, 380.0, 175.0, 255.0, 54.0, 111.0, 785.0]
2025-08-07 05:29:59,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 45 minutes, 40 seconds)
2025-08-07 05:31:47,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:31:53,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 506.49854 ± 324.795
2025-08-07 05:31:53,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [563.90247, 465.43198, 15.290982, 876.7432, 36.927265, 945.22314, 509.73132, 918.3499, 208.78755, 524.5974]
2025-08-07 05:31:53,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [376.0, 289.0, 33.0, 1000.0, 55.0, 615.0, 330.0, 620.0, 173.0, 336.0]
2025-08-07 05:31:53,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 42 minutes, 33 seconds)
2025-08-07 05:33:38,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:33:40,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 214.87212 ± 204.103
2025-08-07 05:33:40,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [53.810207, 171.64948, 335.37302, 125.43619, 246.55211, 56.779907, 768.4669, 219.78754, 96.73597, 74.12997]
2025-08-07 05:33:40,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 128.0, 250.0, 124.0, 193.0, 42.0, 543.0, 137.0, 89.0, 99.0]
2025-08-07 05:33:40,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 39 minutes, 54 seconds)
2025-08-07 05:35:33,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:35:39,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 563.94446 ± 231.816
2025-08-07 05:35:39,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [906.09674, 565.60095, 361.61456, 535.90826, 684.28705, 924.33954, 486.06772, 209.52756, 271.52298, 694.47894]
2025-08-07 05:35:39,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [626.0, 387.0, 244.0, 311.0, 407.0, 1000.0, 256.0, 148.0, 148.0, 442.0]
2025-08-07 05:35:39,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 40 minutes, 6 seconds)
2025-08-07 05:37:19,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:37:24,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 537.53320 ± 401.762
2025-08-07 05:37:24,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [82.68942, 231.84712, 1117.775, 1427.9235, 521.17224, 464.3354, 250.08554, 360.7654, 297.19135, 621.5467]
2025-08-07 05:37:24,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 190.0, 674.0, 1000.0, 396.0, 276.0, 204.0, 213.0, 201.0, 394.0]
2025-08-07 05:37:24,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 36 minutes, 42 seconds)
2025-08-07 05:39:11,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:39:15,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 335.39780 ± 297.849
2025-08-07 05:39:15,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [255.59442, 166.1857, 374.9612, 99.19734, 90.58001, 297.2179, 517.91034, 1113.393, 35.182426, 403.75583]
2025-08-07 05:39:15,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [205.0, 117.0, 281.0, 85.0, 78.0, 236.0, 324.0, 1000.0, 44.0, 245.0]
2025-08-07 05:39:15,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 34 minutes, 31 seconds)
2025-08-07 05:40:59,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:41:08,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 890.12195 ± 676.693
2025-08-07 05:41:08,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [52.80228, 1532.924, 1646.8704, 192.7218, 98.67709, 433.71707, 1463.4136, 1710.9015, 1417.847, 351.34607]
2025-08-07 05:41:08,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 1000.0, 1000.0, 131.0, 82.0, 272.0, 1000.0, 1000.0, 1000.0, 270.0]
2025-08-07 05:41:08,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (890.12) for latency ExtremeClogL1U23
2025-08-07 05:41:08,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 32 minutes, 33 seconds)
2025-08-07 05:43:00,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:43:08,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 671.58942 ± 365.793
2025-08-07 05:43:08,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [760.13135, 863.40314, 345.66544, 1103.6561, 702.968, 907.67633, 352.28067, 1290.3563, 147.72545, 242.03137]
2025-08-07 05:43:08,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [520.0, 536.0, 291.0, 806.0, 407.0, 1000.0, 196.0, 814.0, 113.0, 171.0]
2025-08-07 05:43:08,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 32 minutes, 43 seconds)
2025-08-07 05:44:53,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:45:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 775.29468 ± 588.088
2025-08-07 05:45:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [175.42513, 1546.5824, 1647.9275, 866.0341, 323.42395, 181.19377, 1022.83307, 39.850574, 477.2999, 1472.3763]
2025-08-07 05:45:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [113.0, 992.0, 1000.0, 1000.0, 206.0, 146.0, 678.0, 56.0, 347.0, 1000.0]
2025-08-07 05:45:01,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 29 minutes, 58 seconds)
2025-08-07 05:46:40,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:46:47,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 658.37860 ± 505.969
2025-08-07 05:46:48,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1450.6377, 1108.4558, 604.0354, 290.35214, 138.43037, 1534.366, 270.78305, 121.35746, 736.9899, 328.37848]
2025-08-07 05:46:48,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [891.0, 642.0, 309.0, 173.0, 97.0, 1000.0, 160.0, 80.0, 1000.0, 227.0]
2025-08-07 05:46:48,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 28 minutes, 12 seconds)
2025-08-07 05:48:34,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:48:40,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 697.77747 ± 376.418
2025-08-07 05:48:40,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [614.5371, 773.3243, 675.17163, 738.28564, 1289.6316, 424.59305, 216.59225, 888.5698, 1273.7301, 83.33992]
2025-08-07 05:48:40,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [344.0, 419.0, 455.0, 475.0, 777.0, 241.0, 128.0, 522.0, 1000.0, 67.0]
2025-08-07 05:48:40,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 26 minutes, 37 seconds)
2025-08-07 05:50:26,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:50:37,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 829.06427 ± 467.778
2025-08-07 05:50:37,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1556.6932, 1158.2223, 1468.6127, 859.8173, 605.33264, 889.7195, 150.94618, 329.161, 247.20752, 1024.9297]
2025-08-07 05:50:37,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [981.0, 1000.0, 1000.0, 1000.0, 403.0, 1000.0, 112.0, 232.0, 153.0, 1000.0]
2025-08-07 05:50:37,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 25 minutes, 16 seconds)
2025-08-07 05:52:19,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:52:29,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 954.52063 ± 592.275
2025-08-07 05:52:29,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [152.76996, 1701.6597, 802.1534, 112.159065, 989.0269, 1454.8821, 187.77113, 1601.8838, 1510.8871, 1032.0128]
2025-08-07 05:52:29,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [135.0, 1000.0, 517.0, 82.0, 1000.0, 1000.0, 110.0, 1000.0, 882.0, 726.0]
2025-08-07 05:52:29,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (954.52) for latency ExtremeClogL1U23
2025-08-07 05:52:29,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 22 minutes, 18 seconds)
2025-08-07 05:54:14,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:54:28,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1421.77698 ± 297.928
2025-08-07 05:54:28,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1628.4506, 1586.3865, 1721.9651, 690.1535, 1381.4149, 1109.682, 1385.4873, 1513.5948, 1700.0239, 1500.611]
2025-08-07 05:54:28,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 462.0, 823.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:54:28,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (1421.78) for latency ExtremeClogL1U23
2025-08-07 05:54:28,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 21 minutes, 18 seconds)
2025-08-07 05:56:15,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:56:21,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 671.76111 ± 574.283
2025-08-07 05:56:21,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [573.82043, 281.7846, 1257.0302, 1436.6604, 86.20463, 210.49805, 66.8254, 149.29263, 1050.6829, 1604.8124]
2025-08-07 05:56:21,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [325.0, 173.0, 668.0, 955.0, 63.0, 142.0, 101.0, 123.0, 616.0, 883.0]
2025-08-07 05:56:21,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 20 minutes, 19 seconds)
2025-08-07 05:58:07,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:58:13,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 546.12952 ± 392.875
2025-08-07 05:58:13,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [408.21082, 1180.1503, 103.63541, 375.78073, 250.15164, 947.0248, 1182.5876, 131.18636, 599.5953, 282.97205]
2025-08-07 05:58:13,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [281.0, 719.0, 66.0, 246.0, 151.0, 1000.0, 705.0, 106.0, 389.0, 189.0]
2025-08-07 05:58:13,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 18 minutes, 15 seconds)
2025-08-07 05:59:59,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:00:08,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 714.55652 ± 539.039
2025-08-07 06:00:08,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [497.7482, 196.82184, 1458.836, 995.7297, 1596.2179, 170.00543, 289.53223, 1220.9517, 672.1145, 47.60854]
2025-08-07 06:00:08,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [305.0, 157.0, 1000.0, 628.0, 1000.0, 116.0, 175.0, 1000.0, 1000.0, 53.0]
2025-08-07 06:00:08,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 16 minutes, 8 seconds)
2025-08-07 06:01:48,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:01:56,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 683.54486 ± 309.499
2025-08-07 06:01:56,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [851.1114, 277.91794, 214.68346, 710.7264, 501.13504, 961.67896, 319.6137, 965.261, 1029.6921, 1003.6283]
2025-08-07 06:01:56,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [464.0, 211.0, 116.0, 1000.0, 299.0, 499.0, 198.0, 543.0, 1000.0, 644.0]
2025-08-07 06:01:56,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 13 minutes, 39 seconds)
2025-08-07 06:03:44,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:03:50,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 536.47607 ± 314.946
2025-08-07 06:03:50,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [240.87076, 462.1349, 702.1978, 300.4267, 956.4852, 269.92032, 350.1974, 342.5332, 1239.841, 500.1539]
2025-08-07 06:03:50,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [166.0, 314.0, 1000.0, 190.0, 512.0, 165.0, 181.0, 211.0, 631.0, 236.0]
2025-08-07 06:03:50,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 11 minutes, 4 seconds)
2025-08-07 06:05:35,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:05:44,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 800.98621 ± 425.982
2025-08-07 06:05:44,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [974.5937, 611.15283, 1643.5807, 878.734, 1141.0775, 66.79236, 690.9811, 1034.338, 249.15796, 719.4534]
2025-08-07 06:05:44,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [606.0, 389.0, 907.0, 1000.0, 730.0, 61.0, 378.0, 1000.0, 164.0, 424.0]
2025-08-07 06:05:44,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 9 minutes, 24 seconds)
2025-08-07 06:07:29,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:07:40,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1092.76147 ± 701.302
2025-08-07 06:07:40,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [672.3352, 31.357227, 1612.8093, 1512.7288, 932.4588, 328.22394, 1740.0707, 1978.7053, 237.3463, 1881.5792]
2025-08-07 06:07:40,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 33.0, 1000.0, 1000.0, 495.0, 186.0, 915.0, 1000.0, 173.0, 1000.0]
2025-08-07 06:07:40,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 8 minutes, 3 seconds)
2025-08-07 06:09:27,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:09:37,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 951.03064 ± 683.038
2025-08-07 06:09:37,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [551.5685, 1696.5637, 130.978, 168.29381, 1326.3245, 727.7026, 1906.2965, 1116.6777, 1818.7472, 67.154495]
2025-08-07 06:09:37,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [340.0, 1000.0, 98.0, 135.0, 689.0, 1000.0, 1000.0, 1000.0, 1000.0, 62.0]
2025-08-07 06:09:37,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 6 minutes, 22 seconds)
2025-08-07 06:11:24,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:11:32,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 776.79553 ± 533.084
2025-08-07 06:11:32,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1357.859, 191.06123, 691.47, 543.4559, 373.51077, 977.9205, 264.88916, 1873.1316, 1211.2865, 283.3702]
2025-08-07 06:11:32,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [737.0, 145.0, 1000.0, 301.0, 193.0, 601.0, 134.0, 1000.0, 1000.0, 163.0]
2025-08-07 06:11:32,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 minutes, 19 seconds)
2025-08-07 06:13:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:13:25,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1068.17908 ± 426.866
2025-08-07 06:13:25,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1725.5554, 1033.8229, 1676.5063, 917.72144, 1032.7682, 1196.5961, 869.13837, 1133.9103, 104.58553, 991.18536]
2025-08-07 06:13:25,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 529.0, 618.0, 797.0, 500.0, 625.0, 89.0, 590.0]
2025-08-07 06:13:25,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 3 minutes, 18 seconds)
2025-08-07 06:15:10,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:15:16,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 686.44696 ± 545.465
2025-08-07 06:15:16,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [538.3603, 1801.3688, 1525.243, 727.10895, 494.62714, 94.70301, 882.2494, 106.145454, 266.5107, 428.15274]
2025-08-07 06:15:16,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [318.0, 1000.0, 816.0, 390.0, 309.0, 86.0, 474.0, 85.0, 199.0, 250.0]
2025-08-07 06:15:16,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 59 seconds)
2025-08-07 06:17:01,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:17:10,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 991.31183 ± 549.133
2025-08-07 06:17:10,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [779.22284, 563.95905, 1716.7534, 1326.122, 185.50932, 761.5668, 253.36665, 1103.357, 1365.9799, 1857.2805]
2025-08-07 06:17:10,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 358.0, 1000.0, 689.0, 158.0, 426.0, 217.0, 564.0, 798.0, 1000.0]
2025-08-07 06:17:10,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 58 minutes, 54 seconds)
2025-08-07 06:18:57,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:19:07,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1115.95386 ± 736.826
2025-08-07 06:19:07,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [681.1427, 1935.925, 1921.1063, 138.22203, 1686.2959, 296.36157, 1822.2075, 1812.4113, 590.3722, 275.49426]
2025-08-07 06:19:07,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [429.0, 1000.0, 1000.0, 86.0, 1000.0, 198.0, 1000.0, 1000.0, 328.0, 178.0]
2025-08-07 06:19:07,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 56 minutes, 59 seconds)
2025-08-07 06:20:45,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:20:51,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 564.02588 ± 311.969
2025-08-07 06:20:51,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [629.9118, 338.2291, 701.82654, 384.32095, 524.252, 1036.408, 593.00055, 272.98273, 1106.8612, 52.465656]
2025-08-07 06:20:51,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 223.0, 425.0, 214.0, 296.0, 570.0, 345.0, 155.0, 707.0, 48.0]
2025-08-07 06:20:51,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 53 minutes, 59 seconds)
2025-08-07 06:22:38,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:22:48,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1098.14136 ± 432.079
2025-08-07 06:22:48,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [308.10272, 832.18835, 1439.3718, 1078.8961, 1416.9635, 1320.1512, 871.7762, 1306.6075, 576.25793, 1831.098]
2025-08-07 06:22:48,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [183.0, 520.0, 1000.0, 644.0, 849.0, 732.0, 1000.0, 755.0, 334.0, 1000.0]
2025-08-07 06:22:48,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 52 minutes, 35 seconds)
2025-08-07 06:24:35,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:40,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 598.65717 ± 702.540
2025-08-07 06:24:40,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1873.6469, 170.8726, 1301.7383, 1728.5979, 66.73773, 502.62912, 43.797752, 97.01411, 165.84827, 35.689487]
2025-08-07 06:24:40,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 87.0, 730.0, 1000.0, 57.0, 244.0, 44.0, 74.0, 93.0, 34.0]
2025-08-07 06:24:40,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 50 minutes, 45 seconds)
2025-08-07 06:26:20,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:26:30,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1082.49731 ± 679.585
2025-08-07 06:26:30,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [705.51624, 77.9597, 309.56403, 881.0892, 1869.2802, 339.8974, 1198.8762, 1907.459, 1654.4125, 1880.9194]
2025-08-07 06:26:30,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 65.0, 197.0, 478.0, 1000.0, 191.0, 667.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:26:30,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 48 minutes, 31 seconds)
2025-08-07 06:28:17,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:28:26,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 672.03339 ± 462.837
2025-08-07 06:28:26,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [220.38748, 1240.5736, 74.212616, 1172.8839, 775.0324, 988.2676, 738.51605, 182.93135, 92.65567, 1234.8732]
2025-08-07 06:28:26,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [166.0, 682.0, 56.0, 639.0, 1000.0, 1000.0, 1000.0, 106.0, 59.0, 736.0]
2025-08-07 06:28:26,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 46 minutes, 35 seconds)
2025-08-07 06:30:11,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:30:21,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1159.73694 ± 633.498
2025-08-07 06:30:21,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1210.8523, 839.1502, 1668.6447, 1764.2008, 405.9652, 1645.0399, 16.843544, 506.0377, 1752.8937, 1787.7408]
2025-08-07 06:30:21,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [794.0, 427.0, 1000.0, 984.0, 277.0, 937.0, 19.0, 248.0, 1000.0, 1000.0]
2025-08-07 06:30:21,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 39 seconds)
2025-08-07 06:32:02,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:32:13,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1070.54272 ± 603.375
2025-08-07 06:32:13,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [764.71045, 1675.3306, 230.58153, 1188.4082, 1732.7096, 1669.3982, 1296.6257, 443.248, 102.2628, 1602.1528]
2025-08-07 06:32:13,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [418.0, 1000.0, 132.0, 1000.0, 1000.0, 1000.0, 1000.0, 259.0, 66.0, 1000.0]
2025-08-07 06:32:13,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 15 seconds)
2025-08-07 06:34:00,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:34:13,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1281.51514 ± 459.763
2025-08-07 06:34:13,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1743.8549, 1244.3127, 1100.4005, 280.19467, 1494.98, 1005.7886, 1690.9531, 1642.6731, 1776.2976, 835.69604]
2025-08-07 06:34:13,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 665.0, 733.0, 173.0, 890.0, 1000.0, 1000.0, 1000.0, 1000.0, 492.0]
2025-08-07 06:34:13,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 42 minutes)
2025-08-07 06:35:53,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:36:04,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1237.01477 ± 629.031
2025-08-07 06:36:04,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [39.44535, 1847.8066, 1684.844, 1634.5741, 1761.5747, 1676.8654, 1591.4705, 1165.3895, 654.03723, 314.14093]
2025-08-07 06:36:04,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 1000.0, 1000.0, 1000.0, 1000.0, 978.0, 1000.0, 698.0, 420.0, 214.0]
2025-08-07 06:36:04,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 40 minutes, 11 seconds)
2025-08-07 06:37:48,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:37:58,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1156.93787 ± 656.243
2025-08-07 06:37:58,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1732.3351, 1083.1913, 1444.8588, 351.66785, 180.1479, 1800.0394, 1903.7205, 712.24744, 434.1211, 1927.0498]
2025-08-07 06:37:58,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 545.0, 780.0, 209.0, 128.0, 1000.0, 1000.0, 397.0, 255.0, 1000.0]
2025-08-07 06:37:58,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 7 seconds)
2025-08-07 06:39:43,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:39:52,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1095.67065 ± 676.951
2025-08-07 06:39:52,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1752.8911, 1796.2406, 1980.4583, 93.513954, 320.71014, 1250.644, 996.7139, 93.39421, 1579.1803, 1092.9601]
2025-08-07 06:39:52,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 70.0, 195.0, 700.0, 548.0, 57.0, 896.0, 636.0]
2025-08-07 06:39:52,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 9 seconds)
2025-08-07 06:41:41,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:41:49,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1005.63165 ± 696.102
2025-08-07 06:41:49,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1609.7241, 274.49454, 1776.046, 566.55634, 1798.398, 1829.2637, 1393.0829, 116.691574, 491.6899, 200.3704]
2025-08-07 06:41:49,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 198.0, 1000.0, 321.0, 1000.0, 1000.0, 781.0, 81.0, 277.0, 146.0]
2025-08-07 06:41:49,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 36 seconds)
2025-08-07 06:43:31,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:43:40,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1094.58521 ± 649.014
2025-08-07 06:43:40,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1614.8889, 1863.9014, 1770.4972, 1842.2709, 770.6028, 296.95587, 668.9798, 1461.0826, 101.58741, 555.084]
2025-08-07 06:43:40,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 426.0, 179.0, 400.0, 806.0, 75.0, 273.0]
2025-08-07 06:43:40,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 10 seconds)
2025-08-07 06:45:24,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:45:33,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1038.08020 ± 636.297
2025-08-07 06:45:33,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [909.4609, 1735.9014, 122.88769, 1139.3163, 1066.8186, 591.3618, 1845.168, 1978.709, 927.01526, 64.161644]
2025-08-07 06:45:33,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [511.0, 1000.0, 92.0, 650.0, 584.0, 380.0, 1000.0, 1000.0, 552.0, 44.0]
2025-08-07 06:45:33,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 21 seconds)
2025-08-07 06:47:21,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:33,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1208.15540 ± 570.089
2025-08-07 06:47:33,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1055.0303, 1738.6758, 1766.5287, 629.4558, 1909.648, 59.075157, 1040.8501, 785.5757, 1393.5488, 1703.1653]
2025-08-07 06:47:33,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [724.0, 1000.0, 1000.0, 383.0, 1000.0, 68.0, 549.0, 1000.0, 731.0, 1000.0]
2025-08-07 06:47:33,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 44 seconds)
2025-08-07 06:49:13,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:21,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1106.73511 ± 746.378
2025-08-07 06:49:21,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1844.0427, 821.7686, 2033.1384, 2016.7495, 1775.1987, 647.36664, 1318.9486, 106.57516, 469.17603, 34.38611]
2025-08-07 06:49:21,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 446.0, 1000.0, 1000.0, 950.0, 342.0, 590.0, 61.0, 270.0, 36.0]
2025-08-07 06:49:21,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 33 seconds)
2025-08-07 06:51:12,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:20,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 874.47107 ± 554.131
2025-08-07 06:51:20,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1999.1875, 231.60672, 1698.9121, 819.9664, 851.7095, 480.39215, 1191.171, 404.28845, 521.59845, 545.8791]
2025-08-07 06:51:20,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 134.0, 1000.0, 1000.0, 412.0, 279.0, 668.0, 218.0, 286.0, 301.0]
2025-08-07 06:51:20,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 42 seconds)
2025-08-07 06:53:03,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:13,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1182.85645 ± 663.724
2025-08-07 06:53:13,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1091.9147, 1975.4529, 1588.2902, 112.93139, 1653.9452, 986.3947, 69.60766, 1783.6017, 745.246, 1821.1793]
2025-08-07 06:53:13,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [810.0, 1000.0, 1000.0, 75.0, 1000.0, 515.0, 80.0, 1000.0, 418.0, 1000.0]
2025-08-07 06:53:13,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 55 seconds)
2025-08-07 06:54:57,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1008.51337 ± 497.622
2025-08-07 06:55:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1790.465, 849.89734, 763.3473, 1901.9785, 709.47296, 1236.0405, 1238.8782, 641.5781, 268.39468, 685.08124]
2025-08-07 06:55:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 520.0, 478.0, 1000.0, 1000.0, 649.0, 1000.0, 381.0, 164.0, 425.0]
2025-08-07 06:55:08,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 3 seconds)
2025-08-07 06:56:54,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:04,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1162.98804 ± 671.533
2025-08-07 06:57:04,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1370.6332, 710.6051, 313.79147, 1808.4672, 163.80083, 1989.7567, 922.32214, 582.72125, 1820.3829, 1947.3989]
2025-08-07 06:57:04,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [802.0, 420.0, 180.0, 1000.0, 82.0, 1000.0, 502.0, 304.0, 1000.0, 1000.0]
2025-08-07 06:57:04,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 2 seconds)
2025-08-07 06:58:41,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:49,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 879.98499 ± 614.197
2025-08-07 06:58:49,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [111.376205, 1735.5765, 134.19936, 1149.2002, 784.5476, 224.87914, 674.62537, 1809.3346, 646.36035, 1529.7495]
2025-08-07 06:58:49,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 981.0, 93.0, 1000.0, 458.0, 149.0, 395.0, 1000.0, 339.0, 842.0]
2025-08-07 06:58:49,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 2 seconds)
2025-08-07 07:00:29,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:36,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 859.17639 ± 440.481
2025-08-07 07:00:36,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [263.66364, 1414.6554, 1675.1354, 552.4327, 902.9879, 534.2359, 1326.543, 468.91324, 683.1942, 770.00244]
2025-08-07 07:00:36,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [156.0, 674.0, 1000.0, 337.0, 525.0, 311.0, 696.0, 257.0, 369.0, 364.0]
2025-08-07 07:00:36,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 49 seconds)
2025-08-07 07:02:27,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:37,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1180.60022 ± 786.370
2025-08-07 07:02:37,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [73.70028, 1814.0963, 1697.7098, 1807.6418, 533.2216, 72.81323, 240.969, 1843.7571, 1889.9255, 1832.1675]
2025-08-07 07:02:37,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 1000.0, 982.0, 1000.0, 341.0, 51.0, 156.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:02:37,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 9 seconds)
2025-08-07 07:04:12,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:19,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 776.86005 ± 616.025
2025-08-07 07:04:19,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [286.40314, 1740.7883, 543.75555, 328.77673, 800.2936, 689.2154, 1515.0576, 142.58244, 1697.6606, 24.067266]
2025-08-07 07:04:19,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [179.0, 1000.0, 354.0, 202.0, 522.0, 418.0, 1000.0, 106.0, 1000.0, 34.0]
2025-08-07 07:04:19,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 1 second)
2025-08-07 07:06:03,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:06:13,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 987.24512 ± 585.855
2025-08-07 07:06:13,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [935.8873, 1460.6207, 1165.7386, 953.71375, 59.230118, 1856.8934, 702.3421, 1789.9725, 84.96856, 863.08435]
2025-08-07 07:06:13,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [550.0, 1000.0, 632.0, 464.0, 56.0, 1000.0, 1000.0, 911.0, 58.0, 1000.0]
2025-08-07 07:06:13,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 8 seconds)
2025-08-07 07:07:56,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1280.25464 ± 376.812
2025-08-07 07:08:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1535.0486, 1113.3105, 944.8874, 1114.8302, 1076.558, 1459.4764, 1228.3726, 1809.3615, 616.0684, 1904.6327]
2025-08-07 07:08:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [756.0, 604.0, 504.0, 617.0, 1000.0, 723.0, 630.0, 1000.0, 328.0, 1000.0]
2025-08-07 07:08:07,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 25 seconds)
2025-08-07 07:09:54,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:10:07,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1349.18042 ± 500.908
2025-08-07 07:10:07,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1540.5759, 222.36247, 1430.3799, 1614.5682, 991.6477, 1746.1882, 1488.6029, 774.05566, 1792.2693, 1891.153]
2025-08-07 07:10:07,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 127.0, 779.0, 1000.0, 548.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:10:07,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 42 seconds)
2025-08-07 07:11:47,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:54,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 814.19080 ± 489.022
2025-08-07 07:11:54,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [668.27966, 1049.3436, 480.7211, 1757.086, 1250.2185, 193.52937, 99.699425, 661.60535, 723.18384, 1258.2406]
2025-08-07 07:11:54,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [373.0, 528.0, 256.0, 1000.0, 1000.0, 140.0, 57.0, 351.0, 350.0, 622.0]
2025-08-07 07:11:54,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 42 seconds)
2025-08-07 07:13:33,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:42,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1118.65491 ± 682.315
2025-08-07 07:13:42,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [402.71375, 1950.3881, 555.22394, 1262.8873, 1736.8802, 425.8843, 1910.2048, 175.23337, 1928.471, 838.6625]
2025-08-07 07:13:42,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [264.0, 1000.0, 333.0, 702.0, 916.0, 248.0, 1000.0, 93.0, 1000.0, 388.0]
2025-08-07 07:13:42,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 52 seconds)
2025-08-07 07:15:27,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:37,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1244.95276 ± 550.785
2025-08-07 07:15:37,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1901.2645, 350.53717, 1844.9264, 1459.2423, 956.38965, 387.6377, 837.8367, 1321.0396, 1667.3406, 1723.3125]
2025-08-07 07:15:37,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 210.0, 1000.0, 817.0, 526.0, 203.0, 448.0, 730.0, 890.0, 1000.0]
2025-08-07 07:15:37,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1251 [DEBUG]: Training session finished
