2025-08-07 04:10:28,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc25-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:10:28,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc25-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:10:28,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14cd190a3150>}
2025-08-07 04:10:28,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1111 [DEBUG]: using device: cuda
2025-08-07 04:10:28,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1133 [INFO]: Creating new trainer
2025-08-07 04:10:28,816 baseline-bpql-noiseperc25-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=219, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:10:28,816 baseline-bpql-noiseperc25-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:10:30,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1194 [DEBUG]: Starting training session...
2025-08-07 04:10:30,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 1/100
2025-08-07 04:12:12,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:12:12,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -25.82920 ± 24.609
2025-08-07 04:12:12,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-12.599185, -48.038666, 2.5211663, 13.404143, -50.010582, -4.7394896, -16.21955, -63.881294, -46.728367, -32.000244]
2025-08-07 04:12:12,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 55.0, 36.0, 40.0, 46.0, 28.0, 36.0, 51.0, 50.0, 52.0]
2025-08-07 04:12:12,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-25.83) for latency ExtremeClogL1U23
2025-08-07 04:12:12,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 48 minutes, 21 seconds)
2025-08-07 04:13:56,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:13:57,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.64615 ± 30.946
2025-08-07 04:13:57,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-40.59888, -66.29266, -0.23521928, -43.64243, -10.078809, -16.446308, -90.64369, -11.849132, -26.619776, 19.945364]
2025-08-07 04:13:57,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 122.0, 17.0, 94.0, 72.0, 59.0, 82.0, 28.0, 76.0, 32.0]
2025-08-07 04:13:57,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 48 minutes, 56 seconds)
2025-08-07 04:15:52,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:15:53,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -37.77944 ± 48.863
2025-08-07 04:15:53,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-158.71034, -25.071953, 10.293278, 1.5771122, -53.152035, -2.4807408, -79.207436, -2.0250323, -15.373326, -53.64391]
2025-08-07 04:15:53,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [166.0, 51.0, 31.0, 46.0, 79.0, 28.0, 112.0, 34.0, 60.0, 80.0]
2025-08-07 04:15:53,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 54 minutes, 3 seconds)
2025-08-07 04:17:30,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:17:32,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -136.13638 ± 287.646
2025-08-07 04:17:32,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-79.795074, 7.0201087, -66.29754, -9.172655, -992.8213, -3.4532342, -106.93277, -21.41302, -31.613993, -56.884434]
2025-08-07 04:17:32,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [113.0, 38.0, 75.0, 40.0, 1000.0, 22.0, 80.0, 52.0, 81.0, 63.0]
2025-08-07 04:17:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 48 minutes, 42 seconds)
2025-08-07 04:19:25,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:19:30,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -347.93936 ± 450.515
2025-08-07 04:19:30,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-41.829544, -1011.42004, -1.8960335, -126.51686, -6.2358785, -1042.3752, -43.196827, 0.36950475, -165.2497, -1041.0428]
2025-08-07 04:19:30,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 1000.0, 64.0, 122.0, 57.0, 1000.0, 75.0, 47.0, 143.0, 1000.0]
2025-08-07 04:19:30,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 50 minutes, 59 seconds)
2025-08-07 04:21:07,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:21:10,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -127.93367 ± 279.737
2025-08-07 04:21:10,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-33.13673, -70.803894, -95.64326, -3.456944, -85.04807, -34.993702, -5.344088, -3.0521588, 12.493258, -960.3511]
2025-08-07 04:21:10,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 113.0, 103.0, 12.0, 114.0, 92.0, 31.0, 50.0, 46.0, 1000.0]
2025-08-07 04:21:10,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 48 minutes, 23 seconds)
2025-08-07 04:23:00,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:23:04,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -151.24545 ± 200.974
2025-08-07 04:23:04,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-529.7884, 1.8653252, -19.464201, -250.67444, -62.83322, -50.78361, -528.4912, -49.60388, -21.667973, -1.0128183]
2025-08-07 04:23:04,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 17.0, 62.0, 348.0, 76.0, 61.0, 1000.0, 81.0, 70.0, 35.0]
2025-08-07 04:23:04,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 49 minutes, 27 seconds)
2025-08-07 04:24:49,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:24:50,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.17289 ± 15.823
2025-08-07 04:24:50,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-33.053608, 3.0666504, -23.343616, -17.470152, 0.11402519, -47.576538, -8.135291, -4.1849785, -14.261833, 3.116397]
2025-08-07 04:24:50,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [91.0, 62.0, 70.0, 55.0, 46.0, 38.0, 42.0, 42.0, 94.0, 18.0]
2025-08-07 04:24:50,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-14.17) for latency ExtremeClogL1U23
2025-08-07 04:24:50,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 44 minutes, 28 seconds)
2025-08-07 04:26:31,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:26:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.47273 ± 14.308
2025-08-07 04:26:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.8192285, 5.4922633, 8.246254, -20.881012, -25.585333, -16.939575, -9.805322, -2.7184563, 1.5033876, 24.141283]
2025-08-07 04:26:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [116.0, 18.0, 41.0, 47.0, 63.0, 34.0, 55.0, 29.0, 23.0, 27.0]
2025-08-07 04:26:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-3.47) for latency ExtremeClogL1U23
2025-08-07 04:26:32,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 43 minutes, 47 seconds)
2025-08-07 04:28:16,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:28:17,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -46.00764 ± 44.954
2025-08-07 04:28:17,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-99.72164, -105.50431, 3.5623639, -120.730194, -11.920278, -37.202995, -62.614136, -8.978816, -12.966243, -4.000149]
2025-08-07 04:28:17,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [109.0, 96.0, 37.0, 118.0, 52.0, 124.0, 126.0, 54.0, 32.0, 40.0]
2025-08-07 04:28:17,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 38 minutes, 2 seconds)
2025-08-07 04:30:02,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:30:05,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -136.68684 ± 246.724
2025-08-07 04:30:05,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.15729, -79.09163, -19.012634, -47.034584, -124.059235, -73.79528, -21.910728, -79.8855, -870.3495, -35.57207]
2025-08-07 04:30:05,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 132.0, 87.0, 125.0, 206.0, 88.0, 32.0, 90.0, 1000.0, 78.0]
2025-08-07 04:30:05,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 38 minutes, 44 seconds)
2025-08-07 04:31:50,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:31:53,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -182.19341 ± 307.430
2025-08-07 04:31:53,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-100.97167, -3.824113, -37.3583, -22.625877, -825.56757, -18.607014, -762.3497, -41.075863, -13.166678, 3.6126153]
2025-08-07 04:31:53,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [110.0, 36.0, 71.0, 47.0, 1000.0, 36.0, 1000.0, 53.0, 65.0, 42.0]
2025-08-07 04:31:53,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 35 minutes, 21 seconds)
2025-08-07 04:33:39,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:33:40,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -47.91301 ± 45.551
2025-08-07 04:33:40,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [2.3360307, -19.54849, -100.98699, -72.1012, -39.758965, -22.217672, -52.36001, -16.364658, -150.94992, -7.1782203]
2025-08-07 04:33:40,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 45.0, 95.0, 106.0, 64.0, 36.0, 84.0, 37.0, 166.0, 35.0]
2025-08-07 04:33:40,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 33 minutes, 47 seconds)
2025-08-07 04:35:24,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:35:27,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -100.04630 ± 233.305
2025-08-07 04:35:27,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-28.051455, -30.650587, -15.056694, 3.3470378, -7.063097, -65.78404, -3.5296211, -797.6934, -37.20289, -18.778282]
2025-08-07 04:35:27,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 64.0, 37.0, 45.0, 42.0, 101.0, 39.0, 1000.0, 49.0, 61.0]
2025-08-07 04:35:27,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 33 minutes, 13 seconds)
2025-08-07 04:37:12,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:37:13,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.13279 ± 18.928
2025-08-07 04:37:13,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [6.6808834, 4.981868, -17.932802, -36.800877, 4.714309, -3.2703972, -50.091297, -1.2067028, -7.164503, 8.761639]
2025-08-07 04:37:13,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 36.0, 81.0, 73.0, 39.0, 47.0, 75.0, 65.0, 37.0, 40.0]
2025-08-07 04:37:13,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 31 minutes, 46 seconds)
2025-08-07 04:39:05,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:39:06,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.00942 ± 23.469
2025-08-07 04:39:06,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.873173, -75.626755, -13.919294, -51.825443, -40.628384, -0.58111846, -21.960243, -3.937183, -35.99393, -27.494993]
2025-08-07 04:39:06,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 70.0, 37.0, 79.0, 71.0, 41.0, 32.0, 45.0, 47.0, 50.0]
2025-08-07 04:39:06,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 31 minutes, 27 seconds)
2025-08-07 04:40:52,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:40:54,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -58.07122 ± 167.096
2025-08-07 04:40:54,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-6.6986732, -36.70562, 38.532883, 4.1226773, 10.192967, -554.36487, -39.262012, 9.200757, 19.187988, -24.9183]
2025-08-07 04:40:54,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 73.0, 61.0, 50.0, 28.0, 1000.0, 64.0, 36.0, 41.0, 29.0]
2025-08-07 04:40:54,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 29 minutes, 39 seconds)
2025-08-07 04:42:36,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:42:38,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -45.89166 ± 110.101
2025-08-07 04:42:38,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.929881, -17.867622, -17.661627, 23.18417, 11.850498, 0.13472696, -6.7836056, -51.204918, -370.9892, -25.649183]
2025-08-07 04:42:38,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 76.0, 74.0, 19.0, 40.0, 29.0, 61.0, 63.0, 1000.0, 65.0]
2025-08-07 04:42:38,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 27 minutes, 11 seconds)
2025-08-07 04:44:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:44:19,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -56.61095 ± 132.861
2025-08-07 04:44:19,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.1323276, -5.5636473, -21.723959, -452.09167, 10.238141, -9.966316, -39.99611, -42.748493, -5.960877, -1.428983]
2025-08-07 04:44:19,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 46.0, 47.0, 1000.0, 19.0, 41.0, 51.0, 46.0, 48.0, 36.0]
2025-08-07 04:44:19,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 23 minutes, 37 seconds)
2025-08-07 04:46:10,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:46:11,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -21.64671 ± 28.709
2025-08-07 04:46:11,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [0.31487405, -79.50271, -6.0667653, -31.63423, -61.09003, -32.34614, -24.728437, 9.583028, -0.37191564, 9.375265]
2025-08-07 04:46:11,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 138.0, 116.0, 58.0, 106.0, 55.0, 48.0, 67.0, 36.0, 38.0]
2025-08-07 04:46:11,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 23 minutes, 30 seconds)
2025-08-07 04:47:56,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:47:57,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.77606 ± 18.538
2025-08-07 04:47:57,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.382626, -74.86317, -14.26656, -28.28277, -37.42133, -20.936728, -21.573166, -23.174519, -44.222744, -6.636984]
2025-08-07 04:47:57,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 60.0, 38.0, 33.0, 67.0, 41.0, 62.0, 40.0, 57.0, 48.0]
2025-08-07 04:47:57,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 19 minutes, 50 seconds)
2025-08-07 04:49:38,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:49:39,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.63574 ± 34.508
2025-08-07 04:49:39,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-38.352943, 19.755146, 11.381312, -21.216671, 7.467805, 26.58945, -4.0145526, -98.43861, 1.7105743, -11.238961]
2025-08-07 04:49:39,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [80.0, 41.0, 31.0, 39.0, 33.0, 37.0, 37.0, 142.0, 72.0, 42.0]
2025-08-07 04:49:39,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 16 minutes, 23 seconds)
2025-08-07 04:51:24,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:51:24,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -15.57960 ± 23.550
2025-08-07 04:51:24,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.9078228, -6.656179, 12.862998, 5.109092, -67.53461, -18.034746, -15.689612, -21.336506, 3.5776334, -46.186253]
2025-08-07 04:51:24,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 44.0, 22.0, 15.0, 98.0, 70.0, 23.0, 130.0, 38.0, 59.0]
2025-08-07 04:51:24,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 15 minutes)
2025-08-07 04:53:17,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:53:18,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -44.33937 ± 29.696
2025-08-07 04:53:18,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-25.319132, -29.764492, -48.763138, -6.7968063, -38.111973, -46.782387, -21.302711, -28.768696, -103.656494, -94.12783]
2025-08-07 04:53:18,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 56.0, 53.0, 45.0, 34.0, 189.0, 45.0, 41.0, 96.0, 75.0]
2025-08-07 04:53:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 16 minutes, 43 seconds)
2025-08-07 04:54:52,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:54:53,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -33.72631 ± 25.686
2025-08-07 04:54:53,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-11.593703, 0.651751, -4.6618295, -43.5593, -47.608536, -41.90562, -5.720631, -80.09671, -42.245693, -60.522823]
2025-08-07 04:54:53,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 85.0, 61.0, 79.0, 40.0, 79.0, 60.0, 85.0, 65.0, 69.0]
2025-08-07 04:54:53,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 10 minutes, 32 seconds)
2025-08-07 04:56:46,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:56:49,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -16.45460 ± 31.557
2025-08-07 04:56:49,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-21.253117, -14.223086, -87.31626, 8.110187, -13.493185, 21.72563, -0.3831432, 4.704889, -2.0761018, -60.34176]
2025-08-07 04:56:49,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 35.0, 1000.0, 13.0, 35.0, 34.0, 35.0, 41.0, 43.0, 67.0]
2025-08-07 04:56:49,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 11 minutes, 9 seconds)
2025-08-07 04:58:23,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:58:24,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -22.48273 ± 30.666
2025-08-07 04:58:24,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-25.63688, 1.8816419, -104.90193, -12.111659, -0.93993014, -43.74808, -10.218235, -24.618633, 0.13662599, -4.6702437]
2025-08-07 04:58:24,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 83.0, 30.0, 43.0, 66.0, 41.0, 55.0, 73.0, 38.0]
2025-08-07 04:58:24,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 7 minutes, 46 seconds)
2025-08-07 05:00:08,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:00:09,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -13.72117 ± 12.068
2025-08-07 05:00:09,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-32.777943, -26.771559, -26.1664, -15.669826, -0.57021224, -13.662281, -8.942208, -14.833721, -6.9186273, 9.101084]
2025-08-07 05:00:09,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 44.0, 54.0, 37.0, 44.0, 36.0, 86.0, 36.0, 31.0, 26.0]
2025-08-07 05:00:09,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 5 minutes, 49 seconds)
2025-08-07 05:01:54,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:01:55,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -22.79575 ± 48.240
2025-08-07 05:01:55,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [10.643787, -119.44377, 13.809509, -6.3398705, -55.36591, -31.93588, 2.6896813, 17.538568, 32.202305, -91.755905]
2025-08-07 05:01:55,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 386.0, 34.0, 47.0, 42.0, 49.0, 39.0, 36.0, 52.0, 87.0]
2025-08-07 05:01:55,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 2 minutes, 22 seconds)
2025-08-07 05:03:43,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:03:43,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -7.96121 ± 19.316
2025-08-07 05:03:43,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [13.136193, -13.723837, 14.486266, 0.91170657, -17.106014, 17.887966, 1.5917503, -35.930855, -26.26962, -34.59567]
2025-08-07 05:03:43,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 40.0, 18.0, 33.0, 53.0, 31.0, 25.0, 53.0, 41.0, 45.0]
2025-08-07 05:03:43,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 3 minutes, 39 seconds)
2025-08-07 05:05:25,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:05:26,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.51461 ± 17.471
2025-08-07 05:05:26,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-18.984407, 7.5954027, 11.150484, 13.851828, -46.258217, -16.3863, -18.498728, -23.38557, -17.736961, -16.493639]
2025-08-07 05:05:26,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 41.0, 37.0, 33.0, 57.0, 40.0, 46.0, 59.0, 42.0, 42.0]
2025-08-07 05:05:26,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 59 minutes, 2 seconds)
2025-08-07 05:07:14,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:07:16,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.18079 ± 40.715
2025-08-07 05:07:16,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [19.774895, 9.859303, -2.5436752, -63.769497, -127.770485, -36.17332, -13.239322, -29.426548, -1.5641989, -26.955078]
2025-08-07 05:07:16,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 79.0, 48.0, 73.0, 1000.0, 79.0, 53.0, 44.0, 37.0, 45.0]
2025-08-07 05:07:16,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 35 seconds)
2025-08-07 05:09:00,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:09:01,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.37114 ± 16.390
2025-08-07 05:09:01,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-20.068718, -24.44121, -2.4239252, -13.084387, -40.952118, -7.463793, 12.525012, -15.1693125, -46.24232, -16.390598]
2025-08-07 05:09:01,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 50.0, 63.0, 42.0, 60.0, 40.0, 51.0, 35.0, 39.0, 31.0]
2025-08-07 05:09:01,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 58 minutes, 53 seconds)
2025-08-07 05:10:46,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:10:47,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.41330 ± 36.639
2025-08-07 05:10:47,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.9521549, -11.772526, -83.970146, -67.04615, 20.872252, -4.7209234, 22.818602, -33.623184, -76.9974, -25.741386]
2025-08-07 05:10:47,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [76.0, 61.0, 95.0, 86.0, 65.0, 39.0, 70.0, 49.0, 97.0, 30.0]
2025-08-07 05:10:47,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 57 minutes, 4 seconds)
2025-08-07 05:12:33,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:12:33,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.25008 ± 26.476
2025-08-07 05:12:33,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-57.565533, 12.778446, 0.5951372, -21.713455, -3.9202728, 9.991891, -2.292004, 19.472845, 0.2505806, -60.098408]
2025-08-07 05:12:33,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [90.0, 37.0, 48.0, 37.0, 31.0, 32.0, 33.0, 37.0, 32.0, 70.0]
2025-08-07 05:12:33,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 54 minutes, 53 seconds)
2025-08-07 05:14:18,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:14:18,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -18.89310 ± 33.888
2025-08-07 05:14:18,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [12.848212, -3.303445, -7.396049, -5.662499, -2.3040345, -5.7484736, -69.592735, -5.660857, -2.7642148, -99.346886]
2025-08-07 05:14:18,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 50.0, 35.0, 37.0, 37.0, 36.0, 70.0, 32.0, 42.0, 109.0]
2025-08-07 05:14:18,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 53 minutes, 31 seconds)
2025-08-07 05:16:08,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:16:09,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -19.02392 ± 19.398
2025-08-07 05:16:09,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-46.053753, -24.77929, -3.0740304, -25.587822, -17.387846, -37.654194, -2.4462407, -46.039967, 1.1336335, 11.6503105]
2025-08-07 05:16:09,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 50.0, 35.0, 48.0, 35.0, 65.0, 43.0, 49.0, 35.0, 36.0]
2025-08-07 05:16:09,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 51 minutes, 55 seconds)
2025-08-07 05:17:52,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:17:53,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -21.46329 ± 33.935
2025-08-07 05:17:53,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [12.439956, -30.860498, -38.2822, -28.185652, -0.55650514, -96.45302, -43.71104, 33.937485, -19.409107, -3.5522711]
2025-08-07 05:17:53,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 100.0, 74.0, 49.0, 30.0, 80.0, 96.0, 34.0, 50.0, 31.0]
2025-08-07 05:17:53,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 49 minutes, 58 seconds)
2025-08-07 05:19:38,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:19:41,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -21.54387 ± 42.182
2025-08-07 05:19:41,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.87402, -26.585575, -3.3097658, -0.24967954, 6.3037148, -73.13985, 9.950991, 6.54283, -126.828026, 1.750685]
2025-08-07 05:19:41,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 63.0, 23.0, 31.0, 35.0, 55.0, 52.0, 41.0, 1000.0, 32.0]
2025-08-07 05:19:41,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 48 minutes, 23 seconds)
2025-08-07 05:21:26,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:21:28,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -20.97058 ± 29.719
2025-08-07 05:21:28,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-15.25178, -4.6674333, -34.09929, -23.877516, -1.7421975, -97.5146, 19.094234, -25.437504, -26.79939, 0.5897017]
2025-08-07 05:21:28,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 33.0, 49.0, 51.0, 33.0, 1000.0, 48.0, 47.0, 31.0, 37.0]
2025-08-07 05:21:28,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 46 minutes, 59 seconds)
2025-08-07 05:23:05,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:23:06,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.55779 ± 21.720
2025-08-07 05:23:06,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.3785192, -43.33617, -42.745476, -38.53565, -12.904553, 26.842794, -8.908269, 8.938486, -18.564653, -15.98591]
2025-08-07 05:23:06,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 58.0, 43.0, 48.0, 58.0, 55.0, 40.0, 27.0, 35.0, 65.0]
2025-08-07 05:23:06,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 43 minutes, 42 seconds)
2025-08-07 05:24:49,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:24:50,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.59399 ± 26.366
2025-08-07 05:24:50,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.8053801, -36.816628, -4.451225, -33.21347, -62.807556, -7.7809286, -7.330419, 3.114948, 2.5911798, 38.94876]
2025-08-07 05:24:50,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 51.0, 21.0, 61.0, 99.0, 14.0, 35.0, 35.0, 15.0, 43.0]
2025-08-07 05:24:50,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 40 minutes, 39 seconds)
2025-08-07 05:26:34,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:26:37,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -18.91071 ± 30.346
2025-08-07 05:26:37,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.4428741, -17.373686, -6.9708605, -8.099198, 4.8169074, -93.66972, -3.3546677, -58.364143, -11.62491, 4.0902867]
2025-08-07 05:26:37,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 36.0, 37.0, 36.0, 45.0, 1000.0, 26.0, 86.0, 64.0, 28.0]
2025-08-07 05:26:37,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 39 minutes, 25 seconds)
2025-08-07 05:28:21,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:28:23,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.46507 ± 43.694
2025-08-07 05:28:23,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.7447205, -18.044874, 10.1234665, -132.29343, -2.5668092, -33.337875, 6.35027, -81.66086, -1.5365057, -17.428795]
2025-08-07 05:28:23,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 40.0, 52.0, 1000.0, 58.0, 51.0, 43.0, 102.0, 30.0, 37.0]
2025-08-07 05:28:23,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 37 minutes, 37 seconds)
2025-08-07 05:30:08,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:30:08,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.62317 ± 13.949
2025-08-07 05:30:08,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-4.7647943, -40.65772, 0.6931125, -1.9682115, -3.9987156, -14.334677, 10.76647, 11.297358, -3.7721117, 0.5075552]
2025-08-07 05:30:08,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 66.0, 32.0, 43.0, 46.0, 58.0, 40.0, 29.0, 42.0, 17.0]
2025-08-07 05:30:08,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 35 minutes, 19 seconds)
2025-08-07 05:32:00,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:32:02,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -22.37109 ± 32.928
2025-08-07 05:32:02,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.0532837, -6.1049495, -103.26383, 4.110669, 2.9348264, -32.427834, -3.2158089, -4.9085526, -28.02992, -56.85873]
2025-08-07 05:32:02,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 32.0, 1000.0, 32.0, 33.0, 50.0, 43.0, 26.0, 43.0, 61.0]
2025-08-07 05:32:02,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 36 minutes, 38 seconds)
2025-08-07 05:33:42,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:33:45,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -35.55439 ± 61.670
2025-08-07 05:33:45,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.3336596, -31.41634, 5.420497, 8.10297, -146.22452, -51.958202, -23.691504, 24.734436, -155.60574, 9.760878]
2025-08-07 05:33:45,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 62.0, 24.0, 38.0, 1000.0, 48.0, 58.0, 78.0, 1000.0, 40.0]
2025-08-07 05:33:45,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 34 minutes, 36 seconds)
2025-08-07 05:35:31,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:35:32,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.99366 ± 17.832
2025-08-07 05:35:32,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-4.1234193, -22.704699, -0.9613887, -53.089157, -31.740274, -3.1205175, -15.447804, -26.312931, -3.263372, 10.826997]
2025-08-07 05:35:32,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 50.0, 47.0, 56.0, 43.0, 13.0, 50.0, 37.0, 44.0, 16.0]
2025-08-07 05:35:32,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 32 minutes, 47 seconds)
2025-08-07 05:37:17,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:37:17,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -7.91115 ± 18.441
2025-08-07 05:37:17,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.0535104, -40.49829, 0.15230362, -38.40809, -3.3220203, -9.225491, 6.243211, 21.676102, 4.064828, -16.740526]
2025-08-07 05:37:17,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 55.0, 22.0, 48.0, 41.0, 28.0, 38.0, 42.0, 36.0, 43.0]
2025-08-07 05:37:17,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 30 minutes, 43 seconds)
2025-08-07 05:39:03,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:39:03,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -13.06366 ± 21.118
2025-08-07 05:39:03,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.181414, -11.481838, 6.0463414, -2.6860049, -0.95697975, -7.7051315, 3.499765, -17.895773, -71.764015, -19.511578]
2025-08-07 05:39:03,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 36.0, 28.0, 66.0, 33.0, 59.0, 40.0, 42.0, 102.0, 33.0]
2025-08-07 05:39:03,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 29 minutes, 9 seconds)
2025-08-07 05:40:56,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:40:58,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -35.96783 ± 53.053
2025-08-07 05:40:58,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.037001, -5.2815614, -1.766541, 17.303741, -7.468128, -29.596487, -68.89348, -10.055878, -169.03043, -79.85261]
2025-08-07 05:40:58,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 42.0, 18.0, 61.0, 16.0, 78.0, 151.0, 46.0, 1000.0, 109.0]
2025-08-07 05:40:58,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 27 minutes, 30 seconds)
2025-08-07 05:42:38,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:42:38,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.52644 ± 25.772
2025-08-07 05:42:38,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.606663, -2.3444262, 12.690841, 13.991237, -20.148026, 12.501527, -41.839676, -70.22986, -10.09394, -4.398747]
2025-08-07 05:42:38,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 32.0, 55.0, 39.0, 33.0, 62.0, 41.0, 78.0, 42.0, 38.0]
2025-08-07 05:42:38,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 25 minutes, 18 seconds)
2025-08-07 05:44:21,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:44:22,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.75983 ± 10.781
2025-08-07 05:44:22,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.3952893, -20.066685, 1.0161399, -1.3323381, 2.7200403, -16.144428, 9.54546, 11.190915, -19.122625, -6.8001056]
2025-08-07 05:44:22,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 54.0, 42.0, 34.0, 30.0, 44.0, 39.0, 37.0, 62.0, 53.0]
2025-08-07 05:44:22,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 23 minutes)
2025-08-07 05:46:07,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:46:07,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.10038 ± 16.969
2025-08-07 05:46:07,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.815707, 4.603975, -20.996122, 16.277933, -37.900723, -21.927471, -4.7654023, 15.509418, 11.2028055, -3.1924732]
2025-08-07 05:46:07,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 46.0, 51.0, 61.0, 47.0, 34.0, 28.0, 63.0, 48.0, 32.0]
2025-08-07 05:46:07,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 21 minutes, 16 seconds)
2025-08-07 05:47:52,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:47:55,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -25.11874 ± 33.862
2025-08-07 05:47:55,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-89.42716, -10.35833, -3.5531693, -92.39184, -7.935883, -30.916553, -0.044037767, -3.7978475, -4.4183736, -8.344244]
2025-08-07 05:47:55,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 50.0, 38.0, 129.0, 32.0, 25.0, 21.0, 36.0, 43.0, 39.0]
2025-08-07 05:47:55,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 19 minutes, 42 seconds)
2025-08-07 05:49:46,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:49:48,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.36271 ± 29.370
2025-08-07 05:49:48,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-46.20556, -23.02121, -58.1026, -4.0365496, -23.703045, -7.220271, -92.13887, 16.833015, -14.878731, -21.153234]
2025-08-07 05:49:48,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 35.0, 55.0, 37.0, 41.0, 40.0, 1000.0, 32.0, 45.0, 42.0]
2025-08-07 05:49:48,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 17 minutes, 41 seconds)
2025-08-07 05:51:35,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:51:36,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -22.70098 ± 38.203
2025-08-07 05:51:36,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [10.621005, -129.22568, -9.808279, 1.9991814, -15.499605, -4.0374894, -14.306104, -37.934147, -30.535645, 1.7169443]
2025-08-07 05:51:36,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 96.0, 24.0, 31.0, 56.0, 39.0, 34.0, 47.0, 46.0, 41.0]
2025-08-07 05:51:36,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 17 minutes, 4 seconds)
2025-08-07 05:53:12,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:53:14,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.01537 ± 53.686
2025-08-07 05:53:14,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [16.916416, 6.446587, -68.37877, -3.5309112, 2.6090744, -28.991499, 30.906042, -164.71805, -18.784534, -42.628075]
2025-08-07 05:53:14,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 27.0, 44.0, 63.0, 39.0, 53.0, 43.0, 1000.0, 41.0, 77.0]
2025-08-07 05:53:14,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 14 minutes, 31 seconds)
2025-08-07 05:54:58,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:54:59,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.63661 ± 19.181
2025-08-07 05:54:59,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-32.713627, 19.172115, 5.2004604, -10.90066, -15.946553, 7.077244, 13.084663, -33.355083, 15.924135, -23.90876]
2025-08-07 05:54:59,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 35.0, 22.0, 36.0, 52.0, 34.0, 33.0, 48.0, 79.0, 50.0]
2025-08-07 05:54:59,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 12 minutes, 40 seconds)
2025-08-07 05:56:44,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:56:46,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -25.50874 ± 39.091
2025-08-07 05:56:46,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-41.34759, -0.67852855, 1.6387186, -69.77871, -18.071333, -121.27032, 3.6575816, -7.354671, 6.269968, -8.152513]
2025-08-07 05:56:46,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 33.0, 15.0, 95.0, 46.0, 1000.0, 20.0, 41.0, 33.0, 38.0]
2025-08-07 05:56:46,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 10 minutes, 54 seconds)
2025-08-07 05:58:31,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:58:32,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.10058 ± 23.863
2025-08-07 05:58:32,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-15.683463, 33.56122, -34.123436, -45.90969, -17.52329, 0.10265623, -34.804554, 21.83015, -16.86928, -1.5860862]
2025-08-07 05:58:32,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 42.0, 53.0, 46.0, 56.0, 33.0, 66.0, 55.0, 34.0, 23.0]
2025-08-07 05:58:32,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 8 minutes, 5 seconds)
2025-08-07 06:00:16,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:00:19,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -33.53392 ± 46.179
2025-08-07 06:00:19,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-64.88099, -20.09249, -11.520631, -54.728714, 46.888306, -130.46286, -70.86295, -9.631415, -22.165888, 2.118436]
2025-08-07 06:00:19,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [88.0, 65.0, 32.0, 83.0, 47.0, 1000.0, 66.0, 38.0, 42.0, 32.0]
2025-08-07 06:00:19,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 6 minutes, 11 seconds)
2025-08-07 06:02:06,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:02:07,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.05973 ± 28.160
2025-08-07 06:02:07,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [11.148296, 15.073267, -2.344236, -15.761048, 13.705298, -2.1703327, -27.585356, 1.4010913, -85.4843, -8.58001]
2025-08-07 06:02:07,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 58.0, 35.0, 94.0, 38.0, 31.0, 44.0, 33.0, 94.0, 24.0]
2025-08-07 06:02:07,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 5 minutes, 43 seconds)
2025-08-07 06:03:56,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:03:57,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: 1.31166 ± 18.920
2025-08-07 06:03:57,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.909647, -3.1062188, -11.036563, -2.2831938, 24.186314, -39.99217, 9.019756, 8.023531, 31.744621, -9.349113]
2025-08-07 06:03:57,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 40.0, 35.0, 31.0, 36.0, 75.0, 24.0, 37.0, 36.0, 32.0]
2025-08-07 06:03:57,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (1.31) for latency ExtremeClogL1U23
2025-08-07 06:03:57,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 4 minutes, 33 seconds)
2025-08-07 06:05:38,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:05:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.08065 ± 60.583
2025-08-07 06:05:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.684584, -19.429657, -5.7340555, -32.620743, -22.032637, 4.7484655, -205.94876, -4.827887, -8.071765, 9.426037]
2025-08-07 06:05:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 45.0, 34.0, 57.0, 50.0, 34.0, 1000.0, 49.0, 34.0, 45.0]
2025-08-07 06:05:40,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 2 minutes, 12 seconds)
2025-08-07 06:07:21,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:07:22,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.39357 ± 16.788
2025-08-07 06:07:22,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-14.306331, -9.977002, 24.974314, -13.355229, 2.581359, 17.631166, -3.7478135, -27.851736, 0.043607067, -29.92804]
2025-08-07 06:07:22,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 25.0, 38.0, 34.0, 18.0, 36.0, 24.0, 40.0, 40.0, 31.0]
2025-08-07 06:07:22,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 seconds)
2025-08-07 06:09:06,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:09:08,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.74715 ± 34.457
2025-08-07 06:09:08,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-31.719942, 17.361757, -2.5874796, 12.5282135, -10.796843, 17.23111, 17.062307, -15.913824, -100.55198, 9.915178]
2025-08-07 06:09:08,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 92.0, 25.0, 62.0, 56.0, 29.0, 30.0, 55.0, 1000.0, 33.0]
2025-08-07 06:09:08,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 58 minutes, 12 seconds)
2025-08-07 06:10:53,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:10:53,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.60992 ± 14.721
2025-08-07 06:10:53,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.1962216, -19.856882, -43.46215, -4.701772, 10.462967, -9.6661215, -0.549567, -7.5817447, 0.8772527, 5.18256]
2025-08-07 06:10:53,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 95.0, 71.0, 36.0, 18.0, 41.0, 44.0, 66.0, 34.0, 32.0]
2025-08-07 06:10:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 56 minutes, 8 seconds)
2025-08-07 06:12:39,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:12:41,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -34.30314 ± 62.669
2025-08-07 06:12:41,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-172.12816, -9.236451, -1.9771782, 22.909468, 0.31497633, -27.398783, -12.111208, 0.98304003, -141.03008, -3.3570373]
2025-08-07 06:12:41,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [110.0, 44.0, 35.0, 45.0, 34.0, 39.0, 41.0, 42.0, 1000.0, 33.0]
2025-08-07 06:12:41,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 9 seconds)
2025-08-07 06:14:25,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:14:26,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -18.06561 ± 33.038
2025-08-07 06:14:26,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [10.822124, -23.436518, -114.16367, -12.812667, -5.1833825, -6.8936305, -10.353443, -9.204784, -3.2912676, -6.138872]
2025-08-07 06:14:26,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 41.0, 120.0, 42.0, 33.0, 49.0, 31.0, 32.0, 36.0, 48.0]
2025-08-07 06:14:26,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 36 seconds)
2025-08-07 06:16:10,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:16:11,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -3.81670 ± 16.273
2025-08-07 06:16:11,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-13.935946, 13.17183, 7.1690564, 5.3251495, -9.658918, -1.0479182, -17.008562, -20.495628, 26.915241, -28.601318]
2025-08-07 06:16:11,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 39.0, 32.0, 32.0, 32.0, 33.0, 36.0, 44.0, 80.0, 46.0]
2025-08-07 06:16:11,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 51 minutes, 9 seconds)
2025-08-07 06:17:56,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:17:56,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -13.50693 ± 22.206
2025-08-07 06:17:56,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.0831919, -1.109043, 7.651366, -25.827879, -5.081083, 0.33091673, 14.405941, -26.3819, -36.77101, -61.203403]
2025-08-07 06:17:56,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 15.0, 31.0, 98.0, 31.0, 34.0, 39.0, 47.0, 104.0, 74.0]
2025-08-07 06:17:56,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 49 minutes, 19 seconds)
2025-08-07 06:19:40,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:19:41,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -21.44603 ± 27.865
2025-08-07 06:19:41,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [12.009634, -3.3734581, 0.69061434, -34.341778, -40.29957, -77.85193, 8.044627, -31.754728, 0.06742212, -47.651123]
2025-08-07 06:19:41,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 27.0, 34.0, 92.0, 46.0, 111.0, 39.0, 35.0, 31.0, 46.0]
2025-08-07 06:19:41,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 29 seconds)
2025-08-07 06:21:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:21:27,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.84680 ± 78.643
2025-08-07 06:21:27,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-11.352387, -256.44553, 11.899076, -6.6170387, 7.342289, 11.374995, 11.92598, 13.553256, -0.47702628, -49.6716]
2025-08-07 06:21:27,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 1000.0, 31.0, 36.0, 14.0, 38.0, 33.0, 36.0, 58.0, 104.0]
2025-08-07 06:21:27,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 37 seconds)
2025-08-07 06:23:14,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:23:16,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -25.99237 ± 50.753
2025-08-07 06:23:16,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-19.316948, -6.7659254, 5.166187, -13.670302, 8.1025, -168.94128, -1.770016, 10.198044, -20.50157, -52.424366]
2025-08-07 06:23:16,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 34.0, 33.0, 45.0, 44.0, 1000.0, 50.0, 33.0, 38.0, 90.0]
2025-08-07 06:23:16,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 44 minutes, 12 seconds)
2025-08-07 06:25:01,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:25:03,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.02184 ± 36.084
2025-08-07 06:25:03,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.4941473, -1.8278233, -1.8909208, 7.017008, -108.6644, 16.499147, 18.015394, -0.21758993, 1.4301226, -42.07347]
2025-08-07 06:25:03,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 17.0, 37.0, 50.0, 1000.0, 32.0, 48.0, 44.0, 41.0, 41.0]
2025-08-07 06:25:03,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 32 seconds)
2025-08-07 06:26:53,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:26:54,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.54621 ± 11.831
2025-08-07 06:26:54,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [22.075735, -9.90913, -5.5372505, -1.5085001, 0.24389571, -9.906139, -27.96114, -9.531829, 0.6411246, -4.0688987]
2025-08-07 06:26:54,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 64.0, 33.0, 32.0, 14.0, 36.0, 47.0, 34.0, 33.0, 31.0]
2025-08-07 06:26:54,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 12 seconds)
2025-08-07 06:28:36,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:28:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.03741 ± 55.220
2025-08-07 06:28:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-187.30705, -6.859355, 10.2153, -26.839788, 4.5156, -16.535585, -8.119817, -38.549583, 0.30951592, -1.2033352]
2025-08-07 06:28:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [194.0, 36.0, 35.0, 73.0, 49.0, 61.0, 77.0, 40.0, 40.0, 25.0]
2025-08-07 06:28:37,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 18 seconds)
2025-08-07 06:30:22,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:30:23,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.72751 ± 9.026
2025-08-07 06:30:23,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.5981487, -19.759298, -1.7783027, -23.82861, -1.1108406, -1.6671644, 0.78196454, -8.490493, -7.620932, 6.796741]
2025-08-07 06:30:23,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 17.0, 70.0, 15.0, 60.0, 19.0, 27.0, 62.0, 34.0]
2025-08-07 06:30:23,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 28 seconds)
2025-08-07 06:32:10,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:32:13,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.33726 ± 49.462
2025-08-07 06:32:13,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.299636, 22.783428, 4.19099, -5.064378, 19.557459, -64.940186, -15.994688, -147.23607, -8.039448, 17.070627]
2025-08-07 06:32:13,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 26.0, 33.0, 76.0, 36.0, 89.0, 34.0, 1000.0, 42.0, 35.0]
2025-08-07 06:32:13,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 45 seconds)
2025-08-07 06:33:50,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:33:50,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -15.77640 ± 15.149
2025-08-07 06:33:50,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-21.421621, -11.500237, -9.22483, -2.908804, -4.389405, -30.491667, -53.146374, 0.7997542, -12.293529, -13.187266]
2025-08-07 06:33:50,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 32.0, 65.0, 52.0, 14.0, 59.0, 74.0, 53.0, 73.0, 38.0]
2025-08-07 06:33:50,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 24 seconds)
2025-08-07 06:35:41,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:35:41,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.54819 ± 30.944
2025-08-07 06:35:41,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-56.85125, -12.157698, 0.6383096, 7.28762, -1.7839785, -93.02553, 10.846497, -17.317463, -12.065942, -1.0524449]
2025-08-07 06:35:41,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 31.0, 26.0, 35.0, 33.0, 99.0, 33.0, 62.0, 33.0, 16.0]
2025-08-07 06:35:41,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 38 seconds)
2025-08-07 06:37:21,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:37:24,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -25.48174 ± 54.061
2025-08-07 06:37:24,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [12.005264, 13.004698, -23.11098, -24.40837, -15.747818, -182.15335, -6.0450115, -16.710865, 9.419478, -21.070503]
2025-08-07 06:37:24,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 33.0, 30.0, 31.0, 48.0, 1000.0, 34.0, 43.0, 37.0, 32.0]
2025-08-07 06:37:24,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 50 seconds)
2025-08-07 06:39:09,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:39:11,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -38.21737 ± 46.801
2025-08-07 06:39:11,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-105.60292, 4.534529, 13.646707, -26.475883, -93.393486, -7.5273385, -120.58836, -15.214345, -29.903051, -1.6495496]
2025-08-07 06:39:11,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 35.0, 45.0, 86.0, 82.0, 42.0, 1000.0, 36.0, 32.0, 41.0]
2025-08-07 06:39:11,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 11 seconds)
2025-08-07 06:40:54,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:40:56,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -22.53182 ± 38.801
2025-08-07 06:40:56,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.3677664, -15.909094, -29.827206, -47.46833, -33.22871, -8.871525, 20.330765, 12.359372, -122.16264, -5.908547]
2025-08-07 06:40:56,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 50.0, 64.0, 127.0, 48.0, 32.0, 31.0, 29.0, 1000.0, 65.0]
2025-08-07 06:40:56,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 11 seconds)
2025-08-07 06:42:49,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:42:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -48.84861 ± 60.753
2025-08-07 06:42:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-36.243767, -29.882534, -10.817938, -20.91291, -179.32298, -151.91548, -6.25089, 14.467428, -44.791176, -22.8159]
2025-08-07 06:42:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [76.0, 35.0, 40.0, 35.0, 1000.0, 1000.0, 50.0, 33.0, 45.0, 34.0]
2025-08-07 06:42:52,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 16 seconds)
2025-08-07 06:44:35,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:37,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.92498 ± 50.733
2025-08-07 06:44:37,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.2965477, -1.3169233, -24.384378, 17.701788, 7.660371, -158.93933, -27.113972, -2.83224, 28.970581, 12.300889]
2025-08-07 06:44:37,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 25.0, 41.0, 39.0, 32.0, 1000.0, 35.0, 41.0, 44.0, 56.0]
2025-08-07 06:44:37,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 13 seconds)
2025-08-07 06:46:16,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:46:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.15077 ± 22.363
2025-08-07 06:46:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [20.418114, 12.804112, -18.633757, 0.13868791, 0.82251346, -4.3367267, -57.738586, -37.09568, 2.7413101, -0.6276978]
2025-08-07 06:46:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 32.0, 33.0, 33.0, 55.0, 35.0, 43.0, 61.0, 49.0, 15.0]
2025-08-07 06:46:17,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 19 seconds)
2025-08-07 06:48:06,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:48:09,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -18.75879 ± 31.122
2025-08-07 06:48:09,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [7.873932, -92.44952, -21.317144, -25.955294, 9.765106, -45.913246, -31.943758, 12.159666, -6.8261886, 7.0185785]
2025-08-07 06:48:09,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 1000.0, 64.0, 42.0, 38.0, 40.0, 56.0, 58.0, 33.0, 14.0]
2025-08-07 06:48:09,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 41 seconds)
2025-08-07 06:49:45,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:46,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.39245 ± 27.074
2025-08-07 06:49:46,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-72.04886, 6.3351626, -6.967699, -55.355667, 8.7297535, -11.2688055, 5.3769727, 14.305425, -6.5510707, -6.479682]
2025-08-07 06:49:46,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [141.0, 30.0, 38.0, 59.0, 33.0, 31.0, 31.0, 43.0, 44.0, 39.0]
2025-08-07 06:49:46,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 39 seconds)
2025-08-07 06:51:30,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:31,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.51332 ± 22.327
2025-08-07 06:51:31,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-14.415382, -14.098551, -54.772923, -26.33223, 6.9775796, 19.097202, -4.1901193, -27.251255, 22.69355, -22.841063]
2025-08-07 06:51:31,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 24.0, 101.0, 46.0, 35.0, 36.0, 33.0, 93.0, 33.0, 50.0]
2025-08-07 06:51:31,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 33 seconds)
2025-08-07 06:53:14,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:15,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.49985 ± 19.406
2025-08-07 06:53:15,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-13.113579, -17.866713, -40.68669, -21.200924, -46.859364, -2.9179304, 3.7462742, 0.5390196, 12.239762, 11.121691]
2025-08-07 06:53:15,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 41.0, 45.0, 57.0, 83.0, 33.0, 35.0, 63.0, 39.0, 33.0]
2025-08-07 06:53:15,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 48 seconds)
2025-08-07 06:55:04,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:05,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -0.63898 ± 21.185
2025-08-07 06:55:05,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.7771162, -58.10236, 7.237827, -3.6572335, 16.24274, 25.636272, -5.167915, 1.8111246, 9.636014, -1.8034106]
2025-08-07 06:55:05,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 49.0, 37.0, 65.0, 35.0, 44.0, 38.0, 34.0, 33.0, 17.0]
2025-08-07 06:55:05,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 19 seconds)
2025-08-07 06:56:51,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:51,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.62556 ± 27.938
2025-08-07 06:56:51,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1.3222052, -15.129047, -15.269385, -39.56212, -9.995446, 3.415372, -31.102884, -96.84636, -28.461542, -51.982025]
2025-08-07 06:56:51,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 68.0, 34.0, 31.0, 34.0, 35.0, 32.0, 76.0, 31.0, 33.0]
2025-08-07 06:56:51,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 27 seconds)
2025-08-07 06:58:28,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:32,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -36.66261 ± 69.368
2025-08-07 06:58:32,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.8630123, -6.4874983, -58.69626, -196.32414, -0.88160294, 26.463717, 4.7936935, 6.302445, -136.94574, -1.9876802]
2025-08-07 06:58:32,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 33.0, 95.0, 1000.0, 22.0, 32.0, 31.0, 39.0, 1000.0, 32.0]
2025-08-07 06:58:32,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 46 seconds)
2025-08-07 07:00:16,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:17,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.12576 ± 16.169
2025-08-07 07:00:17,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.1975765, 1.0131105, 8.995194, -36.54822, -14.892465, -18.281208, 17.848003, 6.411491, 8.762895, -21.368809]
2025-08-07 07:00:17,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [70.0, 48.0, 49.0, 53.0, 54.0, 34.0, 49.0, 35.0, 24.0, 70.0]
2025-08-07 07:00:17,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes)
2025-08-07 07:02:02,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:02,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -7.77490 ± 25.594
2025-08-07 07:02:02,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.7059395, 17.976204, -10.901385, -2.3236387, 12.11629, -4.02316, 8.779998, -79.0086, -3.266012, -14.392744]
2025-08-07 07:02:02,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 32.0, 31.0, 33.0, 33.0, 52.0, 30.0, 87.0, 33.0, 40.0]
2025-08-07 07:02:02,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 16 seconds)
2025-08-07 07:03:53,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:55,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -21.29404 ± 41.614
2025-08-07 07:03:55,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.3740132, 4.496527, -1.9849426, -25.227636, -2.4568396, -20.812592, -7.8176093, -15.010086, 2.24533, -142.99854]
2025-08-07 07:03:55,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 33.0, 34.0, 35.0, 32.0, 37.0, 28.0, 17.0, 11.0, 1000.0]
2025-08-07 07:03:55,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 32 seconds)
2025-08-07 07:05:30,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:31,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -19.70279 ± 12.403
2025-08-07 07:05:31,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-14.371271, -5.1850863, -39.483536, -8.152529, -34.347675, -30.496424, -26.797615, -17.661943, 0.0016143717, -20.533382]
2025-08-07 07:05:31,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 35.0, 55.0, 32.0, 42.0, 91.0, 34.0, 40.0, 47.0, 44.0]
2025-08-07 07:05:31,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 43 seconds)
2025-08-07 07:07:14,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:15,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.82425 ± 13.147
2025-08-07 07:07:15,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.9203684, -18.290714, -6.7423983, -5.8466935, 14.845527, -7.3583183, -10.138471, 2.2968729, 6.136162, -35.064877]
2025-08-07 07:07:15,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 65.0, 39.0, 35.0, 32.0, 36.0, 55.0, 33.0, 16.0, 66.0]
2025-08-07 07:07:15,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1251 [DEBUG]: Training session finished
