2025-08-07 04:09:26,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc15-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:09:26,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc15-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:09:26,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14a9eac87b90>}
2025-08-07 04:09:26,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1111 [DEBUG]: using device: cuda
2025-08-07 04:09:26,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1133 [INFO]: Creating new trainer
2025-08-07 04:09:26,273 baseline-bpql-noiseperc15-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=219, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:09:26,273 baseline-bpql-noiseperc15-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:09:28,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1194 [DEBUG]: Starting training session...
2025-08-07 04:09:28,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 1/100
2025-08-07 04:11:08,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:11:10,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -148.44598 ± 357.886
2025-08-07 04:11:10,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-1217.6705, 7.9231095, -91.51815, -54.328545, -17.40878, -48.892395, 16.763779, -44.708706, -44.732494, 10.11287]
2025-08-07 04:11:10,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 35.0, 99.0, 79.0, 55.0, 79.0, 30.0, 73.0, 104.0, 66.0]
2025-08-07 04:11:10,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (-148.45) for latency ExtremeClogL1U23
2025-08-07 04:11:10,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 48 minutes, 56 seconds)
2025-08-07 04:12:56,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:13:00,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -202.92551 ± 378.116
2025-08-07 04:13:00,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-959.96735, -14.475765, -35.076084, 8.002707, -957.74884, -5.227899, -15.012719, -10.308098, -15.053631, -24.38762]
2025-08-07 04:13:00,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 78.0, 69.0, 92.0, 1000.0, 40.0, 49.0, 55.0, 60.0, 59.0]
2025-08-07 04:13:00,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 52 minutes, 56 seconds)
2025-08-07 04:14:48,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:14:49,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -18.78502 ± 29.040
2025-08-07 04:14:49,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-41.941757, -22.519669, -14.033388, -10.757451, 1.6744648, 4.9529085, 5.257472, -83.87353, 17.835922, -44.44516]
2025-08-07 04:14:49,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 77.0, 43.0, 57.0, 80.0, 41.0, 49.0, 88.0, 44.0, 65.0]
2025-08-07 04:14:49,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (-18.79) for latency ExtremeClogL1U23
2025-08-07 04:14:49,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 52 minutes, 47 seconds)
2025-08-07 04:16:35,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:16:36,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -79.52641 ± 82.524
2025-08-07 04:16:36,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-20.805672, -50.15291, -107.84588, -50.010746, 19.576488, -59.543766, -127.22405, -294.07248, -85.31719, -19.867916]
2025-08-07 04:16:36,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 99.0, 226.0, 70.0, 43.0, 114.0, 215.0, 214.0, 115.0, 80.0]
2025-08-07 04:16:36,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 51 minutes, 25 seconds)
2025-08-07 04:18:19,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:18:22,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -116.06106 ± 228.603
2025-08-07 04:18:22,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-789.7769, -34.481773, -0.09093308, -25.9716, -68.574394, -6.761034, -143.64015, 16.155094, -65.26513, -42.20377]
2025-08-07 04:18:22,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 88.0, 93.0, 126.0, 205.0, 88.0, 248.0, 46.0, 86.0, 101.0]
2025-08-07 04:18:22,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 49 minutes, 2 seconds)
2025-08-07 04:20:14,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:20:19,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -80.53378 ± 85.690
2025-08-07 04:20:19,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-92.1865, -0.47139055, -13.39318, -208.43456, -3.5775673, -29.142302, -32.34028, -23.89404, -239.17172, -162.72621]
2025-08-07 04:20:19,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [275.0, 232.0, 135.0, 610.0, 83.0, 139.0, 170.0, 191.0, 1000.0, 500.0]
2025-08-07 04:20:19,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 52 minutes, 1 second)
2025-08-07 04:22:02,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:22:13,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 136.21184 ± 63.564
2025-08-07 04:22:13,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [219.22542, 210.50967, 202.18214, 102.53491, 152.2788, 78.64539, 98.242836, 30.368484, 190.10204, 78.02875]
2025-08-07 04:22:13,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 503.0, 429.0, 387.0, 205.0, 205.0, 841.0, 1000.0, 1000.0]
2025-08-07 04:22:13,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (136.21) for latency ExtremeClogL1U23
2025-08-07 04:22:13,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 51 minutes, 24 seconds)
2025-08-07 04:24:04,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:24:12,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 220.61227 ± 150.611
2025-08-07 04:24:12,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [221.60156, 412.2652, 118.292336, 18.62043, 418.9059, 113.86131, 452.61807, 37.181602, 217.32722, 195.44904]
2025-08-07 04:24:12,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [995.0, 1000.0, 216.0, 102.0, 1000.0, 144.0, 1000.0, 204.0, 487.0, 427.0]
2025-08-07 04:24:12,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (220.61) for latency ExtremeClogL1U23
2025-08-07 04:24:12,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 52 minutes, 51 seconds)
2025-08-07 04:25:56,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:25:59,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 80.52615 ± 49.522
2025-08-07 04:25:59,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [204.99722, 58.287285, 118.48472, 99.855095, 26.716967, 65.09592, 41.725174, 50.505577, 91.61457, 47.979023]
2025-08-07 04:25:59,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 81.0, 252.0, 464.0, 26.0, 92.0, 145.0, 132.0, 134.0, 156.0]
2025-08-07 04:25:59,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 50 minutes, 42 seconds)
2025-08-07 04:27:38,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:27:41,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 60.84863 ± 35.413
2025-08-07 04:27:41,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [68.619156, 52.194523, 20.322018, 137.72787, 39.136574, 50.5308, 100.25609, 17.98418, 81.886055, 39.82907]
2025-08-07 04:27:41,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [132.0, 89.0, 61.0, 485.0, 155.0, 61.0, 354.0, 65.0, 144.0, 88.0]
2025-08-07 04:27:41,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 47 minutes, 41 seconds)
2025-08-07 04:29:27,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:29:28,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 48.53910 ± 26.434
2025-08-07 04:29:28,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [24.482943, 40.64174, 65.26786, 40.977024, 62.797302, 23.776924, 14.118488, 103.09298, 33.37299, 76.86273]
2025-08-07 04:29:28,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [116.0, 70.0, 235.0, 48.0, 128.0, 62.0, 84.0, 250.0, 42.0, 114.0]
2025-08-07 04:29:28,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 42 minutes, 54 seconds)
2025-08-07 04:31:15,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:31:24,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 199.66742 ± 139.856
2025-08-07 04:31:24,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [342.32727, 41.077057, 70.52678, 85.37804, 375.97876, 360.798, 353.79276, 63.138817, 242.2032, 61.45362]
2025-08-07 04:31:24,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 170.0, 293.0, 143.0, 1000.0, 1000.0, 1000.0, 134.0, 1000.0, 63.0]
2025-08-07 04:31:24,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 41 minutes, 40 seconds)
2025-08-07 04:33:11,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:33:19,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 213.49080 ± 202.822
2025-08-07 04:33:19,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [441.13675, 125.07132, 398.29843, 83.78974, 527.7338, 27.556587, 15.414039, 56.08054, 456.71732, 3.1092935]
2025-08-07 04:33:19,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 211.0, 1000.0, 144.0, 1000.0, 65.0, 36.0, 121.0, 1000.0, 17.0]
2025-08-07 04:33:19,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 38 minutes, 27 seconds)
2025-08-07 04:35:02,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:35:06,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 93.92871 ± 113.229
2025-08-07 04:35:06,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-5.1089683, 62.136623, 32.467228, 40.59336, 105.26964, 72.48808, 13.639768, 89.38558, 113.01137, 415.40445]
2025-08-07 04:35:06,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 280.0, 113.0, 49.0, 284.0, 124.0, 161.0, 300.0, 283.0, 1000.0]
2025-08-07 04:35:06,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 36 minutes, 41 seconds)
2025-08-07 04:37:00,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:37:08,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 277.42484 ± 200.173
2025-08-07 04:37:08,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [74.2558, 110.060394, 602.29407, 475.01968, 85.79683, 69.83139, 121.105896, 467.8709, 275.425, 492.5884]
2025-08-07 04:37:08,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [188.0, 130.0, 896.0, 1000.0, 208.0, 336.0, 128.0, 1000.0, 508.0, 1000.0]
2025-08-07 04:37:08,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (277.42) for latency ExtremeClogL1U23
2025-08-07 04:37:08,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 40 minutes, 47 seconds)
2025-08-07 04:38:48,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:38:51,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 78.47098 ± 111.714
2025-08-07 04:38:51,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [31.960653, 12.269876, 38.48002, 81.830666, 30.078394, 372.63657, 3.4657733, -16.690731, 43.337093, 187.34143]
2025-08-07 04:38:51,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [70.0, 177.0, 157.0, 94.0, 86.0, 1000.0, 46.0, 182.0, 76.0, 355.0]
2025-08-07 04:38:51,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 37 minutes, 30 seconds)
2025-08-07 04:40:35,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:40:39,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 122.04910 ± 172.600
2025-08-07 04:40:39,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [16.79274, 14.490194, 52.48266, 470.23944, 14.012747, 51.826668, 38.94832, 82.28122, 459.25992, 20.156967]
2025-08-07 04:40:39,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 69.0, 70.0, 1000.0, 152.0, 94.0, 70.0, 261.0, 1000.0, 119.0]
2025-08-07 04:40:39,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 33 minutes, 43 seconds)
2025-08-07 04:42:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:42:29,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 61.75401 ± 69.055
2025-08-07 04:42:29,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [26.113375, 56.481266, 91.2399, 85.78809, 71.35762, -74.877, 37.62261, 220.67792, 59.76985, 43.36639]
2025-08-07 04:42:29,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 129.0, 124.0, 96.0, 95.0, 719.0, 100.0, 1000.0, 136.0, 43.0]
2025-08-07 04:42:29,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 30 minutes, 27 seconds)
2025-08-07 04:44:18,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:44:23,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 137.53384 ± 153.532
2025-08-07 04:44:23,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [105.60175, 47.021507, 525.0966, 64.547134, 86.9134, 14.582869, 71.24481, 57.020676, 332.41626, 70.89356]
2025-08-07 04:44:23,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 49.0, 1000.0, 293.0, 125.0, 21.0, 251.0, 131.0, 1000.0, 277.0]
2025-08-07 04:44:23,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 30 minutes, 30 seconds)
2025-08-07 04:46:08,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:46:13,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 95.18001 ± 139.526
2025-08-07 04:46:13,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [20.01759, 52.398228, 17.381945, 263.90137, 450.84238, 77.89219, -16.554688, 32.020855, 33.881454, 20.018738]
2025-08-07 04:46:13,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 60.0, 67.0, 1000.0, 1000.0, 136.0, 255.0, 61.0, 160.0, 204.0]
2025-08-07 04:46:13,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 25 minutes, 17 seconds)
2025-08-07 04:47:56,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:48:01,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 116.14824 ± 140.679
2025-08-07 04:48:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [437.77115, 64.339355, 20.74149, 71.208565, 31.563688, 40.74738, 11.081045, 28.705833, 117.51953, 337.80435]
2025-08-07 04:48:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 99.0, 38.0, 194.0, 78.0, 134.0, 42.0, 45.0, 732.0, 1000.0]
2025-08-07 04:48:01,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 24 minutes, 52 seconds)
2025-08-07 04:49:53,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:49:55,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 62.77415 ± 17.708
2025-08-07 04:49:55,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [68.72237, 64.33784, 62.36295, 69.898254, 58.7537, 42.372334, 22.70694, 69.63827, 81.28765, 87.66122]
2025-08-07 04:49:55,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [171.0, 123.0, 180.0, 111.0, 172.0, 180.0, 42.0, 129.0, 95.0, 107.0]
2025-08-07 04:49:55,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 24 minutes, 33 seconds)
2025-08-07 04:51:44,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:51:50,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 180.64397 ± 167.290
2025-08-07 04:51:50,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [457.66214, 56.679703, 21.650827, 47.253162, 80.21307, 438.75314, 136.85138, 384.55228, 26.020452, 156.80342]
2025-08-07 04:51:50,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 70.0, 347.0, 158.0, 193.0, 1000.0, 293.0, 1000.0, 72.0, 206.0]
2025-08-07 04:51:50,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 24 minutes, 5 seconds)
2025-08-07 04:53:32,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:53:37,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 110.39004 ± 146.710
2025-08-07 04:53:37,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [385.49216, 399.49036, 61.377968, 100.66203, 10.196973, 72.78461, 82.719696, 16.317818, 20.402624, -45.54394]
2025-08-07 04:53:37,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 84.0, 542.0, 40.0, 201.0, 199.0, 197.0, 41.0, 183.0]
2025-08-07 04:53:37,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 20 minutes, 16 seconds)
2025-08-07 04:55:24,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:55:29,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 150.39697 ± 129.270
2025-08-07 04:55:29,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [39.25184, 106.38258, 398.7145, -37.306595, 35.63038, 125.65166, 185.38762, 130.81033, 168.54703, 350.9004]
2025-08-07 04:55:29,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [224.0, 132.0, 1000.0, 263.0, 44.0, 283.0, 267.0, 155.0, 309.0, 1000.0]
2025-08-07 04:55:29,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 19 minutes, 3 seconds)
2025-08-07 04:57:11,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:57:19,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 237.66862 ± 193.234
2025-08-07 04:57:19,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [500.98132, 581.3491, 338.51587, 66.385864, 74.77845, 44.576786, 98.75436, 95.31313, 150.20058, 425.8309]
2025-08-07 04:57:19,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 126.0, 111.0, 244.0, 291.0, 195.0, 328.0, 1000.0]
2025-08-07 04:57:19,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 17 minutes, 37 seconds)
2025-08-07 04:59:04,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:59:08,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 108.57800 ± 96.976
2025-08-07 04:59:08,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [36.915264, 49.37281, 88.55995, 114.25103, 211.33777, 60.494255, 38.73601, 343.84964, 138.44872, 3.8146224]
2025-08-07 04:59:08,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [98.0, 133.0, 164.0, 229.0, 339.0, 434.0, 46.0, 1000.0, 182.0, 25.0]
2025-08-07 04:59:08,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 14 minutes, 22 seconds)
2025-08-07 05:00:59,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:01:04,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 132.09230 ± 156.007
2025-08-07 05:01:04,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [75.83343, 354.17856, 509.49304, 57.723934, 27.81941, 96.877815, 20.509365, 69.88249, 88.91572, 19.689081]
2025-08-07 05:01:04,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [112.0, 1000.0, 1000.0, 65.0, 74.0, 141.0, 167.0, 83.0, 264.0, 43.0]
2025-08-07 05:01:04,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 12 minutes, 44 seconds)
2025-08-07 05:02:49,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:02:52,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 97.92383 ± 103.229
2025-08-07 05:02:52,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [73.52839, 87.60753, 82.91872, 391.69772, 51.024265, 22.073528, 140.87672, 25.006311, 57.027866, 47.477184]
2025-08-07 05:02:52,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [88.0, 244.0, 146.0, 1000.0, 71.0, 59.0, 267.0, 107.0, 119.0, 39.0]
2025-08-07 05:02:52,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 11 minutes, 24 seconds)
2025-08-07 05:04:34,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:04:40,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 196.42694 ± 162.035
2025-08-07 05:04:40,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [164.86116, 500.22037, 485.5296, 30.112698, 82.75887, 86.69281, 130.40504, 256.68866, 38.26902, 188.7311]
2025-08-07 05:04:40,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [180.0, 1000.0, 1000.0, 44.0, 80.0, 119.0, 305.0, 678.0, 85.0, 302.0]
2025-08-07 05:04:40,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 8 minutes, 26 seconds)
2025-08-07 05:06:28,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:06:30,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 63.41670 ± 49.162
2025-08-07 05:06:30,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [83.17415, 14.972463, 42.12083, 38.841663, 66.33705, 17.81054, 73.86504, 187.01843, 92.150116, 17.876738]
2025-08-07 05:06:30,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [181.0, 35.0, 68.0, 94.0, 130.0, 69.0, 116.0, 256.0, 333.0, 50.0]
2025-08-07 05:06:30,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 6 minutes, 37 seconds)
2025-08-07 05:08:20,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:08:24,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 157.50267 ± 191.966
2025-08-07 05:08:24,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [21.42376, 156.48718, 44.366837, 55.42647, 73.58778, 615.4612, 436.40683, 62.055847, 86.946266, 22.864645]
2025-08-07 05:08:24,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 113.0, 79.0, 154.0, 147.0, 941.0, 1000.0, 94.0, 154.0, 44.0]
2025-08-07 05:08:24,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 6 minutes, 6 seconds)
2025-08-07 05:10:05,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:10:12,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 202.42783 ± 152.041
2025-08-07 05:10:12,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [97.19464, 138.07735, 316.5598, 44.37651, 94.87562, 259.88986, 492.85217, 126.29798, 37.712215, 416.4421]
2025-08-07 05:10:12,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [155.0, 254.0, 659.0, 230.0, 222.0, 446.0, 1000.0, 304.0, 33.0, 1000.0]
2025-08-07 05:10:12,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 2 minutes, 23 seconds)
2025-08-07 05:11:58,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:12:03,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 209.62166 ± 168.320
2025-08-07 05:12:03,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [149.14885, 60.465454, 125.20756, 149.23785, 53.313107, 513.83545, 74.42879, 170.62605, 263.24026, 536.7133]
2025-08-07 05:12:03,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [236.0, 82.0, 172.0, 144.0, 156.0, 1000.0, 94.0, 171.0, 473.0, 1000.0]
2025-08-07 05:12:03,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 1 minute, 8 seconds)
2025-08-07 05:13:47,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:13:53,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 171.06828 ± 166.866
2025-08-07 05:13:53,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [63.826153, 70.294075, 332.61932, 573.8789, 28.345606, 29.876263, 121.07872, 44.960243, 269.5411, 176.26245]
2025-08-07 05:13:53,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [137.0, 111.0, 645.0, 1000.0, 132.0, 68.0, 275.0, 124.0, 625.0, 219.0]
2025-08-07 05:13:53,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 59 minutes, 47 seconds)
2025-08-07 05:15:39,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:15:46,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 266.02792 ± 238.635
2025-08-07 05:15:46,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [70.215904, 475.30856, 305.74344, 567.85504, 61.292885, 430.8185, 17.65632, 17.586147, 661.57104, 52.2314]
2025-08-07 05:15:46,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 1000.0, 532.0, 771.0, 60.0, 1000.0, 58.0, 63.0, 1000.0, 45.0]
2025-08-07 05:15:46,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 58 minutes, 40 seconds)
2025-08-07 05:17:31,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:17:37,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 246.08456 ± 175.651
2025-08-07 05:17:37,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [183.54308, 553.63196, 127.30683, 95.846596, 163.37173, 166.5618, 358.55667, 561.78906, 221.57138, 28.666416]
2025-08-07 05:17:37,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [318.0, 1000.0, 273.0, 162.0, 294.0, 245.0, 487.0, 1000.0, 335.0, 34.0]
2025-08-07 05:17:37,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 56 minutes, 8 seconds)
2025-08-07 05:19:29,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:19:33,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 176.73660 ± 138.646
2025-08-07 05:19:33,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [119.20932, 386.62958, 55.627686, 96.556145, 143.64221, 175.62683, 114.0243, 130.8403, 494.24518, 50.964344]
2025-08-07 05:19:33,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [129.0, 510.0, 48.0, 132.0, 196.0, 190.0, 133.0, 200.0, 1000.0, 63.0]
2025-08-07 05:19:33,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 55 minutes, 58 seconds)
2025-08-07 05:21:14,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:21:18,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 137.54541 ± 125.872
2025-08-07 05:21:18,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [61.398804, 105.971954, 72.51128, 105.41073, 46.692734, 480.61496, 142.06833, 29.942951, 105.09867, 225.74382]
2025-08-07 05:21:18,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [132.0, 150.0, 90.0, 261.0, 110.0, 1000.0, 163.0, 35.0, 116.0, 380.0]
2025-08-07 05:21:18,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 52 minutes, 46 seconds)
2025-08-07 05:23:03,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:23:11,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 296.98584 ± 188.300
2025-08-07 05:23:11,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [107.03772, 519.2359, 498.87726, 535.9314, 186.95491, 72.02206, 519.0121, 74.02427, 221.27652, 235.48607]
2025-08-07 05:23:11,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [121.0, 1000.0, 1000.0, 1000.0, 427.0, 99.0, 1000.0, 82.0, 207.0, 312.0]
2025-08-07 05:23:11,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (296.99) for latency ExtremeClogL1U23
2025-08-07 05:23:11,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 51 minutes, 40 seconds)
2025-08-07 05:24:57,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:25:05,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 264.05441 ± 177.811
2025-08-07 05:25:05,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [115.83716, 275.76834, 452.7235, 398.39777, 428.8931, 251.76338, 86.585434, 28.611004, 52.548584, 549.4158]
2025-08-07 05:25:05,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [184.0, 396.0, 1000.0, 1000.0, 537.0, 336.0, 118.0, 89.0, 60.0, 1000.0]
2025-08-07 05:25:05,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 49 minutes, 56 seconds)
2025-08-07 05:26:52,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:26:59,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 208.54424 ± 187.462
2025-08-07 05:26:59,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [59.961617, 492.69785, 171.46771, 34.42721, 234.07097, 498.9962, 14.409113, 442.70047, 62.370968, 74.340164]
2025-08-07 05:26:59,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 1000.0, 216.0, 45.0, 284.0, 1000.0, 45.0, 1000.0, 109.0, 100.0]
2025-08-07 05:26:59,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 48 minutes, 33 seconds)
2025-08-07 05:28:43,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:28:47,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 180.95897 ± 126.485
2025-08-07 05:28:47,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [48.247593, 56.829395, 55.099983, 157.79889, 372.6298, 220.97131, 203.93274, 256.9478, 396.2141, 40.91813]
2025-08-07 05:28:47,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 55.0, 75.0, 149.0, 624.0, 353.0, 251.0, 441.0, 516.0, 75.0]
2025-08-07 05:28:47,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 45 minutes, 17 seconds)
2025-08-07 05:30:34,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:30:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 253.69724 ± 186.436
2025-08-07 05:30:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [268.1733, 24.632591, 121.120804, 47.518044, 570.8753, 65.57049, 213.06311, 465.00827, 481.39786, 279.6126]
2025-08-07 05:30:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [248.0, 46.0, 282.0, 51.0, 1000.0, 69.0, 310.0, 1000.0, 1000.0, 401.0]
2025-08-07 05:30:41,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 45 minutes, 5 seconds)
2025-08-07 05:32:27,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:32:29,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 146.85962 ± 74.271
2025-08-07 05:32:29,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [32.551647, 50.243958, 113.91329, 208.83035, 238.96729, 192.82483, 151.63486, 58.205418, 235.22067, 186.20389]
2025-08-07 05:32:29,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 51.0, 149.0, 209.0, 306.0, 222.0, 257.0, 94.0, 316.0, 341.0]
2025-08-07 05:32:29,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 42 minutes, 23 seconds)
2025-08-07 05:34:17,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:34:28,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 469.09204 ± 198.908
2025-08-07 05:34:28,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [412.8488, 240.87498, 423.06296, 573.9749, 682.08453, 798.064, 586.9729, 389.45404, 504.60947, 78.97391]
2025-08-07 05:34:28,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [457.0, 250.0, 1000.0, 947.0, 1000.0, 1000.0, 1000.0, 559.0, 1000.0, 98.0]
2025-08-07 05:34:28,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (469.09) for latency ExtremeClogL1U23
2025-08-07 05:34:28,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 41 minutes, 22 seconds)
2025-08-07 05:36:10,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:36:14,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 195.39751 ± 180.344
2025-08-07 05:36:14,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [9.275463, 30.637169, 209.11601, 649.6868, 229.9499, 246.29536, 244.22963, 56.075825, 251.20949, 27.499378]
2025-08-07 05:36:14,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 48.0, 205.0, 939.0, 289.0, 357.0, 197.0, 48.0, 301.0, 54.0]
2025-08-07 05:36:14,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 38 minutes, 8 seconds)
2025-08-07 05:38:05,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:38:13,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 307.64185 ± 202.629
2025-08-07 05:38:13,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [354.26114, 192.21797, 527.248, 536.9785, 53.245316, 563.0008, 199.09592, 507.16052, 97.35346, 45.856693]
2025-08-07 05:38:13,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [349.0, 275.0, 1000.0, 645.0, 63.0, 1000.0, 248.0, 1000.0, 108.0, 44.0]
2025-08-07 05:38:13,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 38 minutes, 4 seconds)
2025-08-07 05:39:54,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:40:02,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 391.89270 ± 312.082
2025-08-07 05:40:02,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [64.02501, 160.30635, 29.42768, 501.4753, 843.6729, 843.21387, 239.13684, 746.1065, 58.515114, 433.04727]
2025-08-07 05:40:02,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 118.0, 48.0, 1000.0, 1000.0, 913.0, 296.0, 935.0, 65.0, 425.0]
2025-08-07 05:40:02,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 35 minutes, 21 seconds)
2025-08-07 05:41:49,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:41:53,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 227.89041 ± 167.169
2025-08-07 05:41:53,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [116.18041, 443.16467, 303.30014, 5.2399135, 27.802189, 192.19672, 327.17215, 88.686646, 535.6143, 239.54683]
2025-08-07 05:41:53,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [135.0, 430.0, 378.0, 28.0, 35.0, 179.0, 340.0, 78.0, 590.0, 274.0]
2025-08-07 05:41:53,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 33 minutes, 55 seconds)
2025-08-07 05:43:40,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:43:45,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 240.81038 ± 203.890
2025-08-07 05:43:45,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [68.36707, 498.46024, 578.35187, 120.41912, 210.53712, 133.43831, 549.8246, 155.89946, 31.715244, 61.090572]
2025-08-07 05:43:45,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 582.0, 642.0, 134.0, 269.0, 151.0, 1000.0, 150.0, 57.0, 91.0]
2025-08-07 05:43:45,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 30 minutes, 58 seconds)
2025-08-07 05:45:31,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:45:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 295.83228 ± 208.299
2025-08-07 05:45:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [391.0062, 523.52966, 41.56702, 90.08682, 574.50696, 594.9943, 84.71151, 84.76863, 217.47026, 355.68164]
2025-08-07 05:45:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [466.0, 623.0, 85.0, 79.0, 719.0, 840.0, 75.0, 112.0, 174.0, 404.0]
2025-08-07 05:45:37,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 30 minutes, 1 second)
2025-08-07 05:47:21,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:47:27,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 287.37799 ± 170.574
2025-08-07 05:47:27,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [397.7415, 212.05405, 305.31674, 631.80884, 76.25949, 183.79973, 502.56552, 300.61563, 87.295425, 176.32309]
2025-08-07 05:47:27,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [399.0, 243.0, 268.0, 713.0, 101.0, 200.0, 1000.0, 342.0, 163.0, 199.0]
2025-08-07 05:47:27,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 26 minutes, 46 seconds)
2025-08-07 05:49:21,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:49:26,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 257.88678 ± 237.337
2025-08-07 05:49:26,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [128.20303, 330.30206, 628.63, 69.26694, 71.604385, 588.4104, 137.807, 18.363916, 575.5045, 30.775602]
2025-08-07 05:49:26,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [147.0, 412.0, 775.0, 104.0, 60.0, 585.0, 157.0, 61.0, 601.0, 34.0]
2025-08-07 05:49:26,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 26 minutes, 31 seconds)
2025-08-07 05:51:05,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:51:12,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 367.87112 ± 327.040
2025-08-07 05:51:12,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [996.3907, 123.14184, 392.23157, 160.33475, 716.2701, 30.570477, 437.3137, 723.20746, 82.476074, 16.77476]
2025-08-07 05:51:12,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 153.0, 484.0, 292.0, 797.0, 45.0, 523.0, 1000.0, 111.0, 37.0]
2025-08-07 05:51:12,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 23 minutes, 52 seconds)
2025-08-07 05:53:00,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:53:07,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 365.46967 ± 262.943
2025-08-07 05:53:07,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [53.238346, 656.9879, 607.5839, 888.11755, 173.02795, 196.47417, 154.10143, 138.40845, 310.44604, 476.3108]
2025-08-07 05:53:07,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 870.0, 620.0, 1000.0, 195.0, 301.0, 186.0, 176.0, 386.0, 551.0]
2025-08-07 05:53:07,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 22 minutes, 22 seconds)
2025-08-07 05:54:50,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:54:58,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 431.87637 ± 265.737
2025-08-07 05:54:58,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [689.66327, 455.08124, 611.3202, 28.208647, 381.44315, 859.7132, 52.8341, 541.2383, 138.35815, 560.90356]
2025-08-07 05:54:58,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 400.0, 661.0, 34.0, 452.0, 1000.0, 53.0, 695.0, 203.0, 766.0]
2025-08-07 05:54:58,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 20 minutes, 26 seconds)
2025-08-07 05:56:48,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:56:52,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 226.35469 ± 217.538
2025-08-07 05:56:52,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [66.66198, 231.06432, 107.51111, 145.93481, 7.9333415, 210.45116, 705.29626, 92.487946, 122.05007, 574.15594]
2025-08-07 05:56:52,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 299.0, 120.0, 143.0, 35.0, 209.0, 694.0, 105.0, 109.0, 671.0]
2025-08-07 05:56:52,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 19 minutes, 8 seconds)
2025-08-07 05:58:34,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:58:40,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 369.97797 ± 216.879
2025-08-07 05:58:40,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [575.10834, 168.37315, 212.22911, 125.03584, 605.0753, 200.68045, 248.82379, 821.798, 372.83652, 369.81946]
2025-08-07 05:58:40,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [602.0, 160.0, 156.0, 168.0, 756.0, 226.0, 261.0, 1000.0, 412.0, 453.0]
2025-08-07 05:58:40,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 15 minutes, 44 seconds)
2025-08-07 06:00:25,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:00:32,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 355.55759 ± 258.676
2025-08-07 06:00:32,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [704.5748, 73.88396, 281.75464, 261.45044, 106.378845, 582.6789, 578.87537, 143.87032, 756.85767, 65.25087]
2025-08-07 06:00:32,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [739.0, 101.0, 296.0, 288.0, 175.0, 1000.0, 639.0, 201.0, 930.0, 109.0]
2025-08-07 06:00:32,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 14 minutes, 34 seconds)
2025-08-07 06:02:20,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:02:25,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 294.78958 ± 351.385
2025-08-07 06:02:25,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [114.26549, 796.6882, 386.90018, 46.28183, 19.143766, 112.10847, 51.698193, 195.58144, 1112.4471, 112.78092]
2025-08-07 06:02:25,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [187.0, 1000.0, 469.0, 49.0, 46.0, 123.0, 134.0, 214.0, 1000.0, 117.0]
2025-08-07 06:02:25,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 12 minutes, 33 seconds)
2025-08-07 06:04:13,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:04:21,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 357.60110 ± 295.977
2025-08-07 06:04:21,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [70.942924, 403.05652, 513.5972, 25.545618, 561.6137, 154.5951, 104.32354, 191.56111, 516.984, 1033.7913]
2025-08-07 06:04:21,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [81.0, 426.0, 1000.0, 33.0, 684.0, 219.0, 98.0, 225.0, 1000.0, 1000.0]
2025-08-07 06:04:21,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 11 minutes, 17 seconds)
2025-08-07 06:06:03,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:06:09,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 260.55032 ± 160.594
2025-08-07 06:06:09,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [127.07167, 462.54456, 191.55843, 36.585716, 562.4864, 241.63066, 83.352005, 363.73035, 341.34552, 195.19765]
2025-08-07 06:06:09,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [143.0, 515.0, 203.0, 44.0, 1000.0, 295.0, 123.0, 485.0, 412.0, 260.0]
2025-08-07 06:06:09,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 8 minutes, 38 seconds)
2025-08-07 06:08:00,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:08:10,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 436.23331 ± 263.827
2025-08-07 06:08:10,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [55.81752, 128.28719, 1010.743, 582.50385, 571.41223, 550.82733, 333.8566, 492.11877, 445.3775, 191.38899]
2025-08-07 06:08:10,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [97.0, 113.0, 1000.0, 616.0, 1000.0, 1000.0, 346.0, 1000.0, 470.0, 317.0]
2025-08-07 06:08:10,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 8 minutes, 21 seconds)
2025-08-07 06:09:55,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:10:05,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 468.09082 ± 296.490
2025-08-07 06:10:05,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [24.739126, 784.51917, 513.98, 544.8946, 529.3683, 75.86941, 42.499638, 552.91284, 807.80096, 804.3244]
2025-08-07 06:10:05,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [111.0, 762.0, 1000.0, 1000.0, 1000.0, 88.0, 57.0, 627.0, 718.0, 842.0]
2025-08-07 06:10:05,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 6 minutes, 52 seconds)
2025-08-07 06:11:50,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:11:54,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 244.65425 ± 164.492
2025-08-07 06:11:54,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [216.34747, 82.3391, 116.37226, 475.87308, 21.493723, 185.57341, 126.59868, 500.12106, 277.28754, 444.53607]
2025-08-07 06:11:54,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [211.0, 132.0, 119.0, 488.0, 34.0, 207.0, 142.0, 496.0, 335.0, 504.0]
2025-08-07 06:11:54,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 4 minutes, 27 seconds)
2025-08-07 06:13:38,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:13:43,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 289.14948 ± 204.444
2025-08-07 06:13:43,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [155.40314, 261.35504, 496.27353, 469.2463, 573.8955, 136.17276, 86.27467, 569.5392, 21.062357, 122.27219]
2025-08-07 06:13:43,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [210.0, 388.0, 476.0, 1000.0, 597.0, 172.0, 76.0, 515.0, 29.0, 145.0]
2025-08-07 06:13:43,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 1 minute, 51 seconds)
2025-08-07 06:15:35,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:15:41,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 253.06042 ± 227.683
2025-08-07 06:15:41,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [371.20908, 76.233864, 50.766853, 8.547872, 443.18158, 35.953033, 387.91388, 42.73067, 412.20062, 701.8667]
2025-08-07 06:15:41,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 81.0, 72.0, 37.0, 1000.0, 51.0, 341.0, 72.0, 434.0, 1000.0]
2025-08-07 06:15:41,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 5 seconds)
2025-08-07 06:17:24,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:17:30,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 344.43750 ± 347.178
2025-08-07 06:17:30,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [365.8357, 343.04996, 61.561695, 960.8831, 196.54791, 68.05443, 1007.42017, 13.270474, 391.74057, 36.010983]
2025-08-07 06:17:30,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [409.0, 401.0, 79.0, 882.0, 248.0, 71.0, 1000.0, 33.0, 396.0, 52.0]
2025-08-07 06:17:30,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 57 minutes, 52 seconds)
2025-08-07 06:19:16,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:19:24,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 467.14371 ± 282.073
2025-08-07 06:19:24,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [357.96603, 393.5939, 543.95917, 452.07806, 385.22665, 965.7631, 164.25526, 315.6035, 105.1545, 987.83655]
2025-08-07 06:19:24,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [339.0, 344.0, 600.0, 1000.0, 373.0, 970.0, 179.0, 273.0, 124.0, 1000.0]
2025-08-07 06:19:24,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 55 minutes, 53 seconds)
2025-08-07 06:21:07,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:21:11,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 232.56123 ± 212.696
2025-08-07 06:21:11,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [149.14253, 62.9693, 816.3023, 207.49065, 238.29791, 216.52525, 350.78845, 146.12025, 64.334946, 73.6406]
2025-08-07 06:21:11,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [117.0, 72.0, 852.0, 209.0, 342.0, 256.0, 305.0, 138.0, 55.0, 85.0]
2025-08-07 06:21:11,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 53 minutes, 52 seconds)
2025-08-07 06:22:58,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:23:06,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 437.88141 ± 244.009
2025-08-07 06:23:06,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [367.79294, 531.77234, 1056.1215, 212.82573, 307.23602, 163.8152, 422.00946, 280.99246, 420.9598, 615.28864]
2025-08-07 06:23:06,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [343.0, 527.0, 1000.0, 240.0, 327.0, 132.0, 430.0, 291.0, 1000.0, 670.0]
2025-08-07 06:23:06,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 52 minutes, 31 seconds)
2025-08-07 06:24:51,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 432.66595 ± 282.533
2025-08-07 06:24:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [526.63995, 177.54965, 1073.8213, 726.4417, 122.16987, 270.18912, 496.6381, 198.09335, 502.7434, 232.37325]
2025-08-07 06:24:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [648.0, 221.0, 1000.0, 1000.0, 143.0, 310.0, 490.0, 194.0, 445.0, 247.0]
2025-08-07 06:24:59,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 50 minutes, 10 seconds)
2025-08-07 06:26:51,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:26:59,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 411.36029 ± 180.583
2025-08-07 06:26:59,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [538.725, 392.13657, 355.4981, 573.78424, 159.26355, 633.27124, 229.56322, 641.98517, 130.75127, 458.62463]
2025-08-07 06:26:59,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 420.0, 441.0, 621.0, 167.0, 1000.0, 178.0, 592.0, 207.0, 505.0]
2025-08-07 06:26:59,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 20 seconds)
2025-08-07 06:28:42,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:28:51,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 578.89807 ± 391.122
2025-08-07 06:28:51,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [153.89833, 1077.8313, 29.785767, 813.42084, 144.07295, 898.004, 1000.506, 449.13232, 969.0097, 253.31998]
2025-08-07 06:28:51,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [194.0, 1000.0, 49.0, 761.0, 153.0, 1000.0, 967.0, 417.0, 868.0, 272.0]
2025-08-07 06:28:51,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (578.90) for latency ExtremeClogL1U23
2025-08-07 06:28:51,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 18 seconds)
2025-08-07 06:30:41,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:30:48,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 319.38693 ± 297.928
2025-08-07 06:30:48,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [152.41248, 111.771225, 11.279585, 55.172134, 865.06714, 681.4404, 158.2751, 431.90387, 663.7466, 62.80087]
2025-08-07 06:30:48,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [135.0, 139.0, 39.0, 74.0, 685.0, 1000.0, 209.0, 1000.0, 611.0, 76.0]
2025-08-07 06:30:48,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 7 seconds)
2025-08-07 06:32:26,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:32:32,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 378.92865 ± 234.349
2025-08-07 06:32:32,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [376.56857, 168.218, 719.8212, 749.78284, 112.677704, 111.05958, 433.68265, 535.4102, 110.10588, 471.9599]
2025-08-07 06:32:32,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [408.0, 148.0, 766.0, 679.0, 195.0, 111.0, 425.0, 509.0, 163.0, 482.0]
2025-08-07 06:32:32,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 23 seconds)
2025-08-07 06:34:23,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:34:26,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 183.59973 ± 119.940
2025-08-07 06:34:26,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [237.31377, 28.661604, 347.43292, 284.28174, 264.143, 320.06033, 29.26395, 67.43883, 204.65248, 52.748634]
2025-08-07 06:34:26,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [234.0, 45.0, 391.0, 293.0, 254.0, 292.0, 35.0, 87.0, 234.0, 55.0]
2025-08-07 06:34:26,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 35 seconds)
2025-08-07 06:36:07,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:36:15,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 481.78702 ± 243.005
2025-08-07 06:36:15,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [663.2748, 572.89136, 238.7612, 778.09436, 72.67531, 584.45825, 747.7696, 100.91455, 575.0023, 484.0282]
2025-08-07 06:36:15,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [690.0, 623.0, 241.0, 1000.0, 86.0, 507.0, 1000.0, 92.0, 539.0, 470.0]
2025-08-07 06:36:15,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 38 minutes, 54 seconds)
2025-08-07 06:38:03,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:38:11,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 390.03806 ± 353.357
2025-08-07 06:38:11,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [60.622005, 174.3496, 188.87811, 65.58258, 14.7674055, 255.36885, 1014.37885, 543.0379, 628.90155, 954.4937]
2025-08-07 06:38:11,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 203.0, 226.0, 66.0, 42.0, 223.0, 1000.0, 1000.0, 1000.0, 977.0]
2025-08-07 06:38:11,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 17 seconds)
2025-08-07 06:39:59,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:40:08,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 505.89105 ± 295.622
2025-08-07 06:40:08,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [151.56305, 601.81165, 239.56395, 500.45547, 935.8125, 422.2295, 55.076332, 1014.2655, 627.0385, 511.09433]
2025-08-07 06:40:08,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [137.0, 523.0, 254.0, 412.0, 1000.0, 407.0, 104.0, 1000.0, 1000.0, 517.0]
2025-08-07 06:40:08,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 28 seconds)
2025-08-07 06:41:52,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:41:59,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 392.88550 ± 231.404
2025-08-07 06:41:59,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [628.54346, 176.16014, 896.27124, 405.49884, 379.207, 337.8028, 523.4827, 284.31622, 30.075407, 267.49734]
2025-08-07 06:41:59,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 167.0, 1000.0, 458.0, 338.0, 302.0, 540.0, 365.0, 26.0, 243.0]
2025-08-07 06:41:59,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes)
2025-08-07 06:43:46,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:43:53,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 432.39584 ± 295.337
2025-08-07 06:43:53,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [162.33165, 449.72083, 453.3542, 532.1497, 320.4913, 1123.1553, 468.91956, 652.9547, 55.74784, 105.13315]
2025-08-07 06:43:53,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [134.0, 448.0, 392.0, 547.0, 367.0, 1000.0, 1000.0, 610.0, 73.0, 138.0]
2025-08-07 06:43:53,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 8 seconds)
2025-08-07 06:45:36,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:45:43,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 360.45340 ± 266.579
2025-08-07 06:45:43,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [54.564, 285.61737, 180.96574, 59.33237, 369.93716, 162.02037, 501.1884, 523.4867, 982.79767, 484.62418]
2025-08-07 06:45:43,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 303.0, 188.0, 115.0, 361.0, 157.0, 486.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:45:43,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 18 seconds)
2025-08-07 06:47:34,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:41,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 352.26498 ± 170.834
2025-08-07 06:47:41,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [211.18306, 260.85352, 320.52075, 677.62305, 220.29085, 429.63013, 550.46375, 95.28691, 263.0107, 493.787]
2025-08-07 06:47:41,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [224.0, 250.0, 310.0, 1000.0, 195.0, 440.0, 1000.0, 105.0, 228.0, 403.0]
2025-08-07 06:47:41,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 30 seconds)
2025-08-07 06:49:30,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:38,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 486.09073 ± 275.708
2025-08-07 06:49:38,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [247.50702, 890.1987, 646.6438, 259.79886, 346.6182, 411.69736, 1026.4395, 573.5124, 172.70467, 285.78653]
2025-08-07 06:49:38,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [256.0, 896.0, 653.0, 271.0, 301.0, 401.0, 1000.0, 1000.0, 225.0, 211.0]
2025-08-07 06:49:38,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 36 seconds)
2025-08-07 06:51:19,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:26,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 464.46820 ± 362.097
2025-08-07 06:51:26,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [199.90047, 182.91553, 33.547726, 818.09125, 229.37369, 362.2196, 59.70969, 915.3831, 920.9024, 922.6387]
2025-08-07 06:51:26,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [194.0, 188.0, 27.0, 828.0, 222.0, 360.0, 92.0, 859.0, 1000.0, 831.0]
2025-08-07 06:51:26,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 35 seconds)
2025-08-07 06:53:10,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:21,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 582.89227 ± 294.849
2025-08-07 06:53:21,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [15.26581, 1085.8911, 520.18317, 645.52075, 501.9208, 771.9534, 450.29205, 910.3605, 676.83887, 250.69621]
2025-08-07 06:53:21,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 1000.0, 1000.0, 589.0, 1000.0, 775.0, 1000.0, 799.0, 738.0, 280.0]
2025-08-07 06:53:21,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (582.89) for latency ExtremeClogL1U23
2025-08-07 06:53:21,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 44 seconds)
2025-08-07 06:55:06,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:11,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 315.07025 ± 280.576
2025-08-07 06:55:11,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [52.35203, 727.4992, 390.08533, 785.40405, 552.60754, 35.633446, 29.870684, 31.610561, 169.1312, 376.50864]
2025-08-07 06:55:11,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 811.0, 379.0, 704.0, 546.0, 54.0, 77.0, 56.0, 181.0, 524.0]
2025-08-07 06:55:11,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 49 seconds)
2025-08-07 06:57:06,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 391.50906 ± 328.099
2025-08-07 06:57:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [1094.4514, 320.1694, 546.7266, 309.78473, 818.5304, 41.599083, 170.44907, 95.22012, 446.14594, 72.01378]
2025-08-07 06:57:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 326.0, 535.0, 278.0, 757.0, 48.0, 272.0, 67.0, 418.0, 77.0]
2025-08-07 06:57:12,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 1 second)
2025-08-07 06:58:56,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:05,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 532.56433 ± 357.333
2025-08-07 06:59:05,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [357.70172, 275.51523, 356.5173, 747.0143, 208.96643, 527.4362, 478.4429, 1228.1666, 68.73742, 1077.1456]
2025-08-07 06:59:05,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [306.0, 351.0, 404.0, 645.0, 165.0, 531.0, 1000.0, 1000.0, 66.0, 1000.0]
2025-08-07 06:59:05,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes)
2025-08-07 07:00:46,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:54,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 487.98325 ± 301.255
2025-08-07 07:00:54,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [718.09344, 526.7774, 472.62378, 700.6899, 139.5917, 504.44415, 1152.057, 132.31244, 367.7762, 165.46619]
2025-08-07 07:00:54,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 462.0, 531.0, 1000.0, 152.0, 455.0, 1000.0, 189.0, 302.0, 137.0]
2025-08-07 07:00:54,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 7 seconds)
2025-08-07 07:02:37,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:46,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 555.25812 ± 317.464
2025-08-07 07:02:46,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [395.95468, 300.058, 568.1308, 394.30923, 1049.3109, 977.2254, 888.5313, 45.18633, 264.1194, 669.7551]
2025-08-07 07:02:46,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [359.0, 297.0, 616.0, 436.0, 1000.0, 858.0, 819.0, 50.0, 249.0, 1000.0]
2025-08-07 07:02:46,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 10 seconds)
2025-08-07 07:04:33,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:40,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 432.27750 ± 385.179
2025-08-07 07:04:40,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [220.38069, 322.06027, 100.25793, 494.29446, 1094.6328, 31.41282, 110.81614, 1018.4249, 828.5736, 101.92116]
2025-08-07 07:04:40,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [197.0, 353.0, 98.0, 1000.0, 985.0, 41.0, 109.0, 1000.0, 874.0, 86.0]
2025-08-07 07:04:40,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 22 seconds)
2025-08-07 07:06:22,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:06:30,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 563.97308 ± 274.871
2025-08-07 07:06:30,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [227.10497, 1043.56, 722.30914, 160.42787, 450.91983, 231.18579, 836.6867, 678.93036, 651.9319, 636.6745]
2025-08-07 07:06:30,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [198.0, 1000.0, 699.0, 157.0, 444.0, 201.0, 844.0, 576.0, 646.0, 548.0]
2025-08-07 07:06:30,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 18 seconds)
2025-08-07 07:08:18,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:24,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 480.92642 ± 321.854
2025-08-07 07:08:24,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [232.87526, 1319.3207, 232.73465, 193.15169, 676.3193, 382.3338, 650.6266, 306.5024, 339.84753, 475.55225]
2025-08-07 07:08:24,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [219.0, 1000.0, 245.0, 205.0, 579.0, 318.0, 511.0, 285.0, 323.0, 458.0]
2025-08-07 07:08:24,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 27 seconds)
2025-08-07 07:10:15,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:10:25,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 607.51215 ± 448.347
2025-08-07 07:10:25,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [887.45184, 1177.3853, 7.6492457, 76.98307, 165.86487, 1032.4, 71.797554, 677.4627, 929.4638, 1048.6632]
2025-08-07 07:10:25,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 11.0, 68.0, 251.0, 1000.0, 75.0, 746.0, 927.0, 1000.0]
2025-08-07 07:10:25,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (607.51) for latency ExtremeClogL1U23
2025-08-07 07:10:25,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 42 seconds)
2025-08-07 07:12:00,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:06,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 344.75177 ± 290.693
2025-08-07 07:12:06,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [19.510363, 354.99118, 392.2364, 591.52344, 58.837463, 200.96301, 1039.4178, 187.1415, 478.10016, 124.796646]
2025-08-07 07:12:06,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 366.0, 354.0, 530.0, 71.0, 232.0, 931.0, 209.0, 1000.0, 128.0]
2025-08-07 07:12:06,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 43 seconds)
2025-08-07 07:14:00,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:12,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 640.97504 ± 316.377
2025-08-07 07:14:12,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [602.7, 1087.0479, 739.32153, 44.20383, 1073.2991, 520.6734, 575.62897, 471.08896, 331.59552, 964.19073]
2025-08-07 07:14:12,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [562.0, 1000.0, 791.0, 47.0, 1000.0, 590.0, 1000.0, 1000.0, 424.0, 1000.0]
2025-08-07 07:14:12,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (640.98) for latency ExtremeClogL1U23
2025-08-07 07:14:12,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 54 seconds)
2025-08-07 07:15:47,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:55,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 574.87976 ± 361.831
2025-08-07 07:15:55,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [822.1697, 32.310764, 865.693, 299.26508, 1048.5049, 528.84906, 313.6276, 947.517, 41.647682, 849.2135]
2025-08-07 07:15:55,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 36.0, 1000.0, 268.0, 1000.0, 453.0, 278.0, 868.0, 79.0, 653.0]
2025-08-07 07:15:55,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1251 [DEBUG]: Training session finished
