2025-08-07 04:05:14,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc5-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:05:14,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc5-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:05:14,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14d40dfc7d10>}
2025-08-07 04:05:14,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1111 [DEBUG]: using device: cuda
2025-08-07 04:05:14,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1133 [INFO]: Creating new trainer
2025-08-07 04:05:14,711 baseline-bpql-noiseperc5-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=219, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:05:14,711 baseline-bpql-noiseperc5-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:05:15,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1194 [DEBUG]: Starting training session...
2025-08-07 04:05:15,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 1/100
2025-08-07 04:06:57,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:06:58,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -42.67396 ± 55.401
2025-08-07 04:06:58,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [-29.681164, -144.91528, 17.846836, -51.588264, -81.01451, -18.616186, 15.083714, -11.754947, 7.1867976, -129.2866]
2025-08-07 04:06:58,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 138.0, 34.0, 74.0, 82.0, 55.0, 37.0, 43.0, 31.0, 126.0]
2025-08-07 04:06:58,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (-42.67) for latency ExtremeClogL1U23
2025-08-07 04:06:58,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 50 minutes, 31 seconds)
2025-08-07 04:08:36,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:08:38,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -99.88160 ± 238.126
2025-08-07 04:08:38,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [-33.84251, 3.720956, -72.96216, -22.328785, -5.045476, 19.062057, -804.4962, 7.709987, -108.77422, 18.140364]
2025-08-07 04:08:38,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [141.0, 66.0, 168.0, 57.0, 74.0, 71.0, 1000.0, 58.0, 109.0, 47.0]
2025-08-07 04:08:38,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 46 minutes, 2 seconds)
2025-08-07 04:10:21,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:10:23,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -61.93922 ± 64.022
2025-08-07 04:10:23,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [-121.12543, -90.564354, 38.98175, -55.074623, -16.39525, 7.7095437, -6.3668737, -95.10387, -178.83731, -102.61576]
2025-08-07 04:10:23,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [140.0, 128.0, 51.0, 131.0, 73.0, 46.0, 82.0, 107.0, 251.0, 131.0]
2025-08-07 04:10:23,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 45 minutes, 59 seconds)
2025-08-07 04:12:06,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:12:10,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -168.23924 ± 330.841
2025-08-07 04:12:10,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [21.420704, -15.148861, 31.136326, -839.00635, -7.1917996, -813.936, 15.943619, 11.042957, 8.346356, -94.99949]
2025-08-07 04:12:10,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 71.0, 51.0, 1000.0, 173.0, 1000.0, 76.0, 58.0, 81.0, 163.0]
2025-08-07 04:12:10,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 46 minutes, 2 seconds)
2025-08-07 04:13:56,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:14:01,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -50.64778 ± 138.955
2025-08-07 04:14:01,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [-270.99527, -27.841932, 16.98553, 2.6709225, 55.070473, -2.4690444, 52.033615, -370.96387, 26.816067, 12.215708]
2025-08-07 04:14:01,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 191.0, 76.0, 173.0, 73.0, 216.0, 75.0, 1000.0, 64.0, 78.0]
2025-08-07 04:14:01,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 46 minutes, 26 seconds)
2025-08-07 04:15:50,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:16:00,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 204.74857 ± 87.191
2025-08-07 04:16:00,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [384.78583, 240.79004, 136.13339, 176.39001, 73.29502, 226.10301, 163.59914, 321.71945, 169.01263, 155.65698]
2025-08-07 04:16:00,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 760.0, 548.0, 1000.0, 142.0, 625.0, 346.0, 921.0, 732.0, 457.0]
2025-08-07 04:16:00,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (204.75) for latency ExtremeClogL1U23
2025-08-07 04:16:00,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 49 minutes, 33 seconds)
2025-08-07 04:17:36,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:17:52,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 698.95221 ± 31.785
2025-08-07 04:17:52,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [717.6755, 660.5661, 722.7136, 652.3608, 653.67566, 718.844, 709.7174, 753.5268, 699.8415, 700.6011]
2025-08-07 04:17:52,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:17:52,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (698.95) for latency ExtremeClogL1U23
2025-08-07 04:17:52,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 51 minutes, 34 seconds)
2025-08-07 04:19:41,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:19:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 710.02753 ± 110.511
2025-08-07 04:19:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [881.0212, 702.29144, 612.4347, 773.24023, 597.9882, 693.9085, 769.3627, 669.31903, 525.5252, 875.1842]
2025-08-07 04:19:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 590.0, 1000.0]
2025-08-07 04:19:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (710.03) for latency ExtremeClogL1U23
2025-08-07 04:19:56,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 55 minutes, 36 seconds)
2025-08-07 04:21:44,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:21:56,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 671.10767 ± 311.162
2025-08-07 04:21:56,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [846.29565, 831.89734, 865.22766, 822.0865, 827.4394, 851.166, 72.28777, 690.8733, 40.04149, 863.76135]
2025-08-07 04:21:56,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 81.0, 1000.0, 51.0, 1000.0]
2025-08-07 04:21:56,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 49 seconds)
2025-08-07 04:23:32,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:23:40,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 392.51682 ± 367.722
2025-08-07 04:23:40,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [823.49493, 33.07149, 864.5672, 289.40936, 56.83049, 833.744, 817.1298, 106.25358, 48.993126, 51.674084]
2025-08-07 04:23:40,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 49.0, 1000.0, 344.0, 50.0, 1000.0, 1000.0, 156.0, 68.0, 49.0]
2025-08-07 04:23:40,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 53 minutes, 45 seconds)
2025-08-07 04:25:30,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:25:41,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 593.84192 ± 354.054
2025-08-07 04:25:41,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [866.71967, 45.70858, 865.0986, 71.36178, 797.6599, 803.9592, 841.2643, 845.9527, 751.56903, 49.12602]
2025-08-07 04:25:41,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 38.0, 1000.0, 59.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 50.0]
2025-08-07 04:25:41,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 52 minutes, 34 seconds)
2025-08-07 04:27:28,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:27:38,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 534.31775 ± 380.243
2025-08-07 04:27:38,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [836.07526, 783.3072, 875.5223, 872.7057, 858.32227, 79.16071, 44.002163, 838.3595, 68.74254, 86.97947]
2025-08-07 04:27:38,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 65.0, 41.0, 1000.0, 104.0, 80.0]
2025-08-07 04:27:38,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 51 minutes, 52 seconds)
2025-08-07 04:29:15,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:29:24,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 472.54517 ± 383.501
2025-08-07 04:29:24,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [90.69012, 58.359512, 825.1441, 825.6497, 862.15546, 53.682217, 190.43042, 857.9427, 898.3465, 63.05088]
2025-08-07 04:29:24,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [90.0, 49.0, 1000.0, 1000.0, 1000.0, 50.0, 228.0, 1000.0, 1000.0, 50.0]
2025-08-07 04:29:24,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 44 minutes, 41 seconds)
2025-08-07 04:31:08,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:31:17,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 466.05215 ± 382.608
2025-08-07 04:31:17,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [789.03107, 866.0533, 873.1282, 856.2099, 42.47403, 129.82857, 847.59033, 162.04399, 54.315693, 39.84629]
2025-08-07 04:31:17,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 51.0, 123.0, 1000.0, 191.0, 58.0, 47.0]
2025-08-07 04:31:17,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 40 minutes, 38 seconds)
2025-08-07 04:33:00,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:33:05,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 299.07928 ± 315.486
2025-08-07 04:33:05,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [937.375, 43.885063, 876.57605, 189.89435, 73.14287, 76.00216, 58.295334, 308.06674, 258.07397, 169.48146]
2025-08-07 04:33:05,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 46.0, 1000.0, 263.0, 54.0, 57.0, 44.0, 377.0, 316.0, 158.0]
2025-08-07 04:33:05,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 40 minutes, 8 seconds)
2025-08-07 04:34:53,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:35:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 537.23889 ± 347.837
2025-08-07 04:35:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [406.688, 32.800594, 726.84595, 665.98096, 858.7392, 80.919464, 40.369953, 894.7791, 944.57526, 720.6908]
2025-08-07 04:35:03,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [493.0, 38.0, 832.0, 1000.0, 1000.0, 57.0, 47.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:35:03,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 37 minutes, 19 seconds)
2025-08-07 04:36:44,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:36:59,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 885.98181 ± 161.234
2025-08-07 04:36:59,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [468.0056, 845.4681, 976.53265, 948.4748, 998.8888, 1058.3068, 1011.2058, 891.8652, 770.5172, 890.5537]
2025-08-07 04:36:59,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [505.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:36:59,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (885.98) for latency ExtremeClogL1U23
2025-08-07 04:36:59,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 35 minutes, 10 seconds)
2025-08-07 04:38:50,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:39:00,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 556.55029 ± 316.827
2025-08-07 04:39:00,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [764.62213, 777.616, 883.7188, 46.007263, 329.71597, 164.49696, 690.5586, 891.68365, 189.2505, 827.833]
2025-08-07 04:39:00,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [826.0, 1000.0, 1000.0, 50.0, 296.0, 126.0, 1000.0, 1000.0, 163.0, 1000.0]
2025-08-07 04:39:00,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 37 minutes, 28 seconds)
2025-08-07 04:40:38,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:40:48,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 557.87061 ± 343.456
2025-08-07 04:40:48,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [236.66641, 647.20386, 817.09753, 68.96333, 73.258446, 752.9378, 1012.98535, 282.27103, 697.19824, 990.12396]
2025-08-07 04:40:48,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [191.0, 561.0, 1000.0, 53.0, 49.0, 1000.0, 1000.0, 232.0, 1000.0, 860.0]
2025-08-07 04:40:48,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 34 minutes, 8 seconds)
2025-08-07 04:42:31,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:42:39,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 472.44620 ± 308.872
2025-08-07 04:42:39,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [151.7834, 988.99713, 478.3144, 812.6997, 748.168, 223.71713, 303.0397, 86.01076, 189.68245, 742.0492]
2025-08-07 04:42:39,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [141.0, 873.0, 423.0, 1000.0, 1000.0, 253.0, 287.0, 80.0, 160.0, 1000.0]
2025-08-07 04:42:39,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 33 minutes, 7 seconds)
2025-08-07 04:44:25,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:44:32,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 498.13907 ± 348.572
2025-08-07 04:44:32,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [177.10829, 484.06302, 1229.723, 363.38586, 63.43471, 738.6631, 611.3368, 871.6384, 274.24014, 167.7969]
2025-08-07 04:44:32,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [165.0, 333.0, 1000.0, 312.0, 57.0, 612.0, 492.0, 1000.0, 184.0, 121.0]
2025-08-07 04:44:32,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 29 minutes, 44 seconds)
2025-08-07 04:46:20,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:46:30,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 597.24915 ± 380.815
2025-08-07 04:46:30,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [52.466057, 100.132126, 818.37085, 1216.2612, 580.1634, 823.11444, 951.5577, 359.9338, 193.65695, 876.83496]
2025-08-07 04:46:30,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 112.0, 1000.0, 1000.0, 463.0, 1000.0, 1000.0, 289.0, 144.0, 1000.0]
2025-08-07 04:46:30,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 28 minutes, 26 seconds)
2025-08-07 04:48:11,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:48:24,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 950.37341 ± 280.238
2025-08-07 04:48:24,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [932.56995, 1034.2443, 582.5237, 1067.593, 603.6033, 880.8066, 1381.0972, 1238.2089, 553.5015, 1229.5859]
2025-08-07 04:48:24,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 599.0, 1000.0, 514.0, 1000.0, 1000.0, 1000.0, 443.0, 1000.0]
2025-08-07 04:48:24,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (950.37) for latency ExtremeClogL1U23
2025-08-07 04:48:24,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 24 minutes, 55 seconds)
2025-08-07 04:50:12,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:50:22,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 882.16876 ± 448.320
2025-08-07 04:50:22,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [234.40256, 1063.1482, 464.12802, 1359.2295, 1341.8007, 253.75764, 1146.7758, 1359.0397, 1144.6854, 454.71973]
2025-08-07 04:50:22,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [155.0, 778.0, 378.0, 1000.0, 1000.0, 174.0, 922.0, 1000.0, 1000.0, 339.0]
2025-08-07 04:50:22,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 25 minutes, 28 seconds)
2025-08-07 04:51:59,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:52:10,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 732.09363 ± 399.919
2025-08-07 04:52:10,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [401.7339, 221.15886, 268.3603, 1356.1814, 695.8909, 243.40134, 930.01746, 1011.2528, 1066.0587, 1126.8802]
2025-08-07 04:52:10,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [295.0, 152.0, 228.0, 1000.0, 1000.0, 184.0, 1000.0, 1000.0, 702.0, 1000.0]
2025-08-07 04:52:10,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 22 minutes, 34 seconds)
2025-08-07 04:53:56,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:54:04,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 566.86163 ± 365.040
2025-08-07 04:54:04,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [317.52966, 461.57272, 748.4901, 923.79486, 916.3121, 963.85583, 998.01447, 211.53653, 83.725876, 43.783497]
2025-08-07 04:54:04,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [214.0, 296.0, 1000.0, 1000.0, 1000.0, 674.0, 711.0, 141.0, 64.0, 41.0]
2025-08-07 04:54:04,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 21 minutes, 6 seconds)
2025-08-07 04:55:53,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:56:01,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 650.93152 ± 428.476
2025-08-07 04:56:01,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [93.49371, 685.5973, 868.3416, 149.99342, 1361.7832, 314.65753, 1261.1246, 888.7336, 228.48984, 657.10077]
2025-08-07 04:56:01,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 527.0, 565.0, 107.0, 1000.0, 223.0, 927.0, 1000.0, 135.0, 462.0]
2025-08-07 04:56:01,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 19 minutes, 2 seconds)
2025-08-07 04:57:46,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:57:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 504.83804 ± 367.198
2025-08-07 04:57:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [991.8208, 445.42447, 145.1423, 141.00557, 1002.155, 1109.323, 339.6626, 187.70354, 505.83298, 180.31017]
2025-08-07 04:57:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [613.0, 350.0, 97.0, 117.0, 1000.0, 691.0, 248.0, 122.0, 319.0, 148.0]
2025-08-07 04:57:51,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 16 minutes, 6 seconds)
2025-08-07 04:59:28,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:59:37,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 704.12061 ± 364.923
2025-08-07 04:59:37,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [882.0525, 1180.4484, 1360.4993, 211.0371, 507.15723, 935.4677, 232.75789, 477.65802, 523.34406, 730.7835]
2025-08-07 04:59:37,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 933.0, 161.0, 283.0, 596.0, 126.0, 337.0, 387.0, 1000.0]
2025-08-07 04:59:37,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 11 minutes, 16 seconds)
2025-08-07 05:01:29,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:01:40,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 721.09998 ± 380.079
2025-08-07 05:01:40,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [333.093, 283.101, 909.2566, 1071.4174, 1204.6492, 881.4655, 832.39575, 235.80232, 1197.0767, 262.74222]
2025-08-07 05:01:40,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [253.0, 234.0, 1000.0, 760.0, 1000.0, 1000.0, 1000.0, 154.0, 1000.0, 193.0]
2025-08-07 05:01:40,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 13 minutes, 2 seconds)
2025-08-07 05:03:17,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:03:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1004.79504 ± 450.439
2025-08-07 05:03:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [226.03238, 1249.8492, 998.5711, 1158.537, 633.96515, 1635.9065, 1572.8203, 531.4964, 1378.0089, 662.7623]
2025-08-07 05:03:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [162.0, 785.0, 1000.0, 1000.0, 391.0, 1000.0, 1000.0, 299.0, 877.0, 393.0]
2025-08-07 05:03:28,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1004.80) for latency ExtremeClogL1U23
2025-08-07 05:03:28,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 9 minutes, 46 seconds)
2025-08-07 05:05:13,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:05:25,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1070.79517 ± 450.021
2025-08-07 05:05:25,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [478.18845, 832.32227, 627.7298, 1651.8267, 1324.4243, 1524.5665, 1189.31, 875.13605, 1718.1349, 486.31378]
2025-08-07 05:05:25,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [311.0, 554.0, 406.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 273.0]
2025-08-07 05:05:25,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1070.80) for latency ExtremeClogL1U23
2025-08-07 05:05:25,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 7 minutes, 50 seconds)
2025-08-07 05:07:07,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:07:17,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 977.40320 ± 678.456
2025-08-07 05:07:17,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1670.612, 91.90863, 1471.2554, 226.0737, 664.30566, 147.70955, 1798.0513, 1629.7169, 472.642, 1601.7577]
2025-08-07 05:07:17,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 74.0, 1000.0, 140.0, 404.0, 101.0, 1000.0, 1000.0, 266.0, 1000.0]
2025-08-07 05:07:17,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 6 minutes, 14 seconds)
2025-08-07 05:09:02,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:09:13,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1158.87927 ± 578.283
2025-08-07 05:09:13,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1624.941, 164.24567, 633.1374, 1487.0249, 529.613, 1704.7249, 1599.936, 1498.7719, 561.1856, 1785.2131]
2025-08-07 05:09:13,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 104.0, 428.0, 900.0, 336.0, 1000.0, 1000.0, 1000.0, 319.0, 1000.0]
2025-08-07 05:09:13,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1158.88) for latency ExtremeClogL1U23
2025-08-07 05:09:13,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 6 minutes, 49 seconds)
2025-08-07 05:10:55,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:11:05,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1027.19849 ± 610.589
2025-08-07 05:11:05,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1938.5895, 1048.6324, 1890.6837, 856.9356, 1766.6487, 349.36682, 356.79617, 254.18217, 764.6436, 1045.5054]
2025-08-07 05:11:05,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 199.0, 197.0, 175.0, 421.0, 566.0]
2025-08-07 05:11:05,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 2 minutes, 31 seconds)
2025-08-07 05:12:50,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:13:00,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1004.00061 ± 639.095
2025-08-07 05:13:00,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [914.1147, 1716.9645, 1071.6766, 136.95686, 279.5876, 506.78323, 317.14496, 1494.173, 1633.5719, 1969.0323]
2025-08-07 05:13:00,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 89.0, 192.0, 266.0, 177.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:13:00,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes)
2025-08-07 05:14:43,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:14:52,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 874.61829 ± 621.553
2025-08-07 05:14:52,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1123.5261, 1587.486, 890.2872, 89.34007, 295.0334, 195.32687, 939.2483, 195.4948, 1703.5269, 1726.9132]
2025-08-07 05:14:52,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [666.0, 1000.0, 464.0, 57.0, 201.0, 122.0, 1000.0, 122.0, 1000.0, 1000.0]
2025-08-07 05:14:52,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 59 minutes, 5 seconds)
2025-08-07 05:16:42,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:16:53,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1085.06287 ± 543.733
2025-08-07 05:16:53,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1751.1711, 1342.8169, 365.44904, 1471.8705, 418.39545, 686.40466, 1428.5885, 1760.6056, 336.89667, 1288.4307]
2025-08-07 05:16:53,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 822.0, 238.0, 1000.0, 210.0, 467.0, 862.0, 1000.0, 254.0, 1000.0]
2025-08-07 05:16:53,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 59 minutes)
2025-08-07 05:18:33,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:18:43,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 902.90039 ± 464.127
2025-08-07 05:18:43,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1014.7467, 350.81165, 979.8792, 1848.5616, 903.9483, 503.28058, 1139.7539, 1100.4698, 98.460236, 1089.0918]
2025-08-07 05:18:43,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 203.0, 1000.0, 1000.0, 482.0, 270.0, 635.0, 625.0, 77.0, 595.0]
2025-08-07 05:18:43,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 55 minutes, 48 seconds)
2025-08-07 05:20:24,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:20:32,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 870.81348 ± 583.655
2025-08-07 05:20:32,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [354.21246, 901.23035, 344.03723, 722.6094, 551.91705, 37.750977, 805.8121, 1853.3502, 1344.835, 1792.3801]
2025-08-07 05:20:32,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [216.0, 438.0, 178.0, 380.0, 295.0, 40.0, 1000.0, 1000.0, 747.0, 1000.0]
2025-08-07 05:20:32,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 53 minutes, 26 seconds)
2025-08-07 05:22:16,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:22:26,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1295.42346 ± 738.924
2025-08-07 05:22:26,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [316.9403, 1851.12, 1872.1681, 1281.2959, 796.9813, 2168.1116, 2238.8992, 537.6734, 189.93987, 1701.1058]
2025-08-07 05:22:26,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [175.0, 1000.0, 916.0, 642.0, 481.0, 1000.0, 1000.0, 239.0, 118.0, 1000.0]
2025-08-07 05:22:26,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1295.42) for latency ExtremeClogL1U23
2025-08-07 05:22:26,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 51 minutes, 15 seconds)
2025-08-07 05:24:17,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:24:31,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1738.36462 ± 359.287
2025-08-07 05:24:31,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1935.9089, 1763.4572, 1748.0424, 1942.8097, 1910.595, 1814.7114, 1621.3934, 1819.382, 2100.2378, 727.1081]
2025-08-07 05:24:31,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 870.0, 1000.0, 1000.0, 427.0]
2025-08-07 05:24:31,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1738.36) for latency ExtremeClogL1U23
2025-08-07 05:24:31,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 52 minutes)
2025-08-07 05:26:09,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:26:18,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1168.32837 ± 880.538
2025-08-07 05:26:18,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2427.3884, 544.10767, 705.1716, 60.75893, 1850.2961, 2247.4275, 200.47343, 588.12067, 2295.6567, 763.8839]
2025-08-07 05:26:18,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 242.0, 340.0, 48.0, 942.0, 1000.0, 110.0, 328.0, 1000.0, 335.0]
2025-08-07 05:26:18,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 47 minutes, 20 seconds)
2025-08-07 05:28:07,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:28:21,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1841.10327 ± 640.286
2025-08-07 05:28:21,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1906.8712, 2122.1147, 2202.5618, 862.7511, 2146.6882, 336.3376, 2225.455, 2284.164, 2264.0225, 2060.068]
2025-08-07 05:28:21,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 184.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:28:21,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1841.10) for latency ExtremeClogL1U23
2025-08-07 05:28:21,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 47 minutes, 57 seconds)
2025-08-07 05:30:02,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:30:10,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1185.26440 ± 655.908
2025-08-07 05:30:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [653.09534, 1185.3457, 1187.8926, 1197.0508, 147.27243, 1599.1217, 934.01855, 2308.207, 2159.8108, 480.83057]
2025-08-07 05:30:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [289.0, 530.0, 481.0, 625.0, 79.0, 733.0, 425.0, 962.0, 1000.0, 188.0]
2025-08-07 05:30:10,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 45 minutes, 55 seconds)
2025-08-07 05:31:52,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:32:02,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1393.01135 ± 597.108
2025-08-07 05:32:02,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2323.3384, 2031.3507, 444.42154, 1781.64, 1299.8282, 790.42816, 896.0588, 2088.202, 1203.9176, 1070.9282]
2025-08-07 05:32:02,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 279.0, 869.0, 542.0, 348.0, 415.0, 1000.0, 600.0, 513.0]
2025-08-07 05:32:02,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 43 minutes, 44 seconds)
2025-08-07 05:33:47,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:34:01,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2049.00024 ± 682.250
2025-08-07 05:34:01,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2408.2385, 1885.7482, 50.211113, 2313.6838, 2254.3855, 2194.9434, 2341.383, 2450.7336, 2327.0383, 2263.6375]
2025-08-07 05:34:01,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 41.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:34:01,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2049.00) for latency ExtremeClogL1U23
2025-08-07 05:34:01,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 40 minutes, 41 seconds)
2025-08-07 05:35:45,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:36:00,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2208.92310 ± 166.075
2025-08-07 05:36:00,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2236.253, 2157.09, 2446.7598, 1940.4758, 2253.3928, 2195.4097, 2165.8608, 2378.4639, 1927.8156, 2387.7083]
2025-08-07 05:36:00,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 941.0, 1000.0]
2025-08-07 05:36:00,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2208.92) for latency ExtremeClogL1U23
2025-08-07 05:36:00,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 41 minutes)
2025-08-07 05:37:51,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:38:05,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1889.75745 ± 702.691
2025-08-07 05:38:05,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2350.6453, 2398.836, 2228.8384, 2322.2446, 31.348122, 2161.3699, 2058.5989, 2015.806, 1169.3568, 2160.5298]
2025-08-07 05:38:05,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 46.0, 1000.0, 1000.0, 1000.0, 486.0, 1000.0]
2025-08-07 05:38:05,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 39 minutes, 12 seconds)
2025-08-07 05:39:48,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:40:00,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1846.01782 ± 779.447
2025-08-07 05:40:00,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2438.8484, 640.56464, 2462.1975, 1605.2533, 2392.2769, 2316.4094, 419.8378, 2168.2441, 1258.9326, 2757.6128]
2025-08-07 05:40:00,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 315.0, 1000.0, 700.0, 1000.0, 1000.0, 203.0, 1000.0, 483.0, 1000.0]
2025-08-07 05:40:00,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 38 minutes, 15 seconds)
2025-08-07 05:41:38,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:41:49,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1278.18286 ± 851.075
2025-08-07 05:41:49,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [891.64966, 2362.0137, 439.21954, 2461.1333, 43.071938, 1351.5714, 1538.5106, 2251.038, 150.95758, 1292.6621]
2025-08-07 05:41:49,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 172.0, 1000.0, 35.0, 701.0, 593.0, 1000.0, 1000.0, 596.0]
2025-08-07 05:41:49,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 52 seconds)
2025-08-07 05:43:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:43:45,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1958.17505 ± 865.861
2025-08-07 05:43:45,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2411.3481, 1924.3668, 2364.8616, 2414.5496, 2413.2732, 489.90866, 59.80827, 2571.165, 2636.8877, 2295.582]
2025-08-07 05:43:45,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 191.0, 44.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:43:45,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 33 minutes, 19 seconds)
2025-08-07 05:45:34,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:45:47,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2007.62341 ± 785.733
2025-08-07 05:45:47,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2608.749, 176.45677, 2355.9377, 2580.585, 2697.0896, 1575.0789, 1153.7198, 2662.1152, 1852.0887, 2414.4146]
2025-08-07 05:45:47,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [979.0, 94.0, 1000.0, 1000.0, 1000.0, 641.0, 522.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:45:47,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 31 minutes, 55 seconds)
2025-08-07 05:47:26,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:47:34,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1277.34778 ± 856.988
2025-08-07 05:47:34,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2584.8555, 47.382, 249.72069, 967.28394, 956.0892, 1674.0835, 1902.5109, 355.70547, 2449.1816, 1586.6649]
2025-08-07 05:47:34,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 45.0, 99.0, 372.0, 358.0, 637.0, 742.0, 150.0, 1000.0, 636.0]
2025-08-07 05:47:34,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 27 minutes, 18 seconds)
2025-08-07 05:49:23,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:49:32,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1430.72021 ± 861.935
2025-08-07 05:49:32,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [472.40433, 1590.2021, 292.87613, 2413.4944, 2135.5764, 386.0517, 510.65143, 2428.6963, 2204.8406, 1872.4082]
2025-08-07 05:49:32,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [190.0, 670.0, 151.0, 1000.0, 981.0, 153.0, 259.0, 1000.0, 1000.0, 788.0]
2025-08-07 05:49:32,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 25 minutes, 53 seconds)
2025-08-07 05:51:15,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:51:29,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2297.18652 ± 702.002
2025-08-07 05:51:29,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2673.2083, 2350.6846, 2586.7637, 234.50972, 2343.6304, 2723.575, 2570.198, 2285.1448, 2582.518, 2621.632]
2025-08-07 05:51:29,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 131.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:51:29,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2297.19) for latency ExtremeClogL1U23
2025-08-07 05:51:29,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 25 minutes, 4 seconds)
2025-08-07 05:53:09,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:53:23,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2341.09521 ± 349.342
2025-08-07 05:53:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2480.8633, 2327.3079, 2475.3223, 1317.6826, 2358.744, 2431.1504, 2444.5645, 2452.5896, 2503.5208, 2619.2063]
2025-08-07 05:53:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 587.0, 1000.0, 1000.0, 1000.0, 910.0, 1000.0, 1000.0]
2025-08-07 05:53:23,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2341.10) for latency ExtremeClogL1U23
2025-08-07 05:53:23,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 22 minutes, 57 seconds)
2025-08-07 05:55:10,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:55:24,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2226.30444 ± 776.247
2025-08-07 05:55:24,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [39.256767, 2500.8267, 2534.303, 2621.3438, 2596.2024, 1707.7852, 2555.9434, 2438.787, 2743.1843, 2525.4138]
2025-08-07 05:55:24,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 1000.0, 1000.0, 1000.0, 1000.0, 688.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:55:24,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 20 minutes, 45 seconds)
2025-08-07 05:57:13,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:57:23,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1722.67615 ± 933.480
2025-08-07 05:57:23,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2857.0596, 2438.3972, 2407.9373, 1534.5918, 630.7289, 644.91364, 504.1845, 857.1607, 2724.8762, 2626.912]
2025-08-07 05:57:23,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 911.0, 641.0, 234.0, 242.0, 215.0, 349.0, 1000.0, 1000.0]
2025-08-07 05:57:23,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes, 31 seconds)
2025-08-07 05:59:01,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:59:15,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2372.81543 ± 454.918
2025-08-07 05:59:15,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2588.281, 1609.5868, 2815.3843, 2380.8108, 1783.7635, 2604.2507, 1802.0159, 3043.5884, 2498.3496, 2602.1226]
2025-08-07 05:59:15,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 589.0, 1000.0, 1000.0, 715.0, 1000.0, 692.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:59:15,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2372.82) for latency ExtremeClogL1U23
2025-08-07 05:59:15,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 17 minutes, 40 seconds)
2025-08-07 06:00:59,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:01:12,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2294.44727 ± 879.744
2025-08-07 06:01:12,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2635.8042, 2565.3672, 1241.7557, 2224.9832, 2649.0403, 2986.378, 84.42228, 2892.5554, 2909.6367, 2754.5315]
2025-08-07 06:01:12,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 487.0, 962.0, 1000.0, 1000.0, 48.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:01:12,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 15 minutes, 49 seconds)
2025-08-07 06:02:57,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:03:08,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1713.87671 ± 979.650
2025-08-07 06:03:08,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1405.0472, 1248.6943, 2789.5122, 724.4317, 2310.9736, 628.83405, 38.973053, 2905.4626, 2459.5044, 2627.3335]
2025-08-07 06:03:08,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 522.0, 1000.0, 331.0, 1000.0, 248.0, 36.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:03:08,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 1 second)
2025-08-07 06:04:55,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:05:11,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2617.52075 ± 135.187
2025-08-07 06:05:11,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2691.1482, 2562.952, 2590.0698, 2594.6208, 2411.6548, 2950.6975, 2627.6907, 2625.0232, 2479.8376, 2641.5112]
2025-08-07 06:05:11,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:05:11,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2617.52) for latency ExtremeClogL1U23
2025-08-07 06:05:11,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 20 seconds)
2025-08-07 06:06:54,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:07:09,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2567.47900 ± 196.339
2025-08-07 06:07:09,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2273.595, 2440.863, 2770.5186, 2775.749, 2354.6833, 2318.5854, 2604.8245, 2692.6511, 2837.101, 2606.218]
2025-08-07 06:07:09,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:07:09,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 20 seconds)
2025-08-07 06:08:57,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:09:05,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1387.24927 ± 915.271
2025-08-07 06:09:05,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [511.68524, 2793.7961, 495.3197, 949.24646, 1888.5682, 1392.0803, 378.7728, 2398.0854, 2607.763, 457.17618]
2025-08-07 06:09:05,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [181.0, 1000.0, 202.0, 418.0, 1000.0, 637.0, 183.0, 1000.0, 1000.0, 192.0]
2025-08-07 06:09:05,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 53 seconds)
2025-08-07 06:10:50,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:11:06,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2533.43140 ± 239.787
2025-08-07 06:11:06,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2666.198, 2573.6384, 2561.2175, 2468.3362, 2651.0813, 2704.7722, 1871.1344, 2542.7249, 2793.7764, 2501.4348]
2025-08-07 06:11:06,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:11:06,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 14 seconds)
2025-08-07 06:12:47,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:13:02,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2523.22485 ± 178.567
2025-08-07 06:13:02,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2566.2896, 2341.7761, 2595.4043, 2768.0312, 2526.3765, 2574.9697, 2135.7607, 2411.1465, 2560.9167, 2751.5784]
2025-08-07 06:13:02,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:13:02,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 23 seconds)
2025-08-07 06:14:44,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:14:56,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1831.09412 ± 715.527
2025-08-07 06:14:56,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1618.381, 2470.3904, 2039.455, 2462.7695, 2529.2344, 2294.7668, 666.6469, 1490.0786, 468.8171, 2270.4006]
2025-08-07 06:14:56,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [816.0, 1000.0, 855.0, 1000.0, 1000.0, 1000.0, 327.0, 625.0, 202.0, 937.0]
2025-08-07 06:14:56,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 26 seconds)
2025-08-07 06:16:43,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:16:57,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2380.22388 ± 532.946
2025-08-07 06:16:57,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1135.0175, 2759.2634, 2658.7773, 1644.118, 2615.0244, 2260.694, 2810.402, 2841.5972, 2471.1558, 2606.1877]
2025-08-07 06:16:57,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [426.0, 1000.0, 1000.0, 701.0, 1000.0, 809.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:16:57,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 41 seconds)
2025-08-07 06:18:40,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:18:53,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2195.56763 ± 941.407
2025-08-07 06:18:53,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2839.0781, 2628.7124, 471.0358, 2733.2048, 2329.0107, 224.9306, 2925.1545, 2798.237, 2579.7505, 2426.5627]
2025-08-07 06:18:53,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 178.0, 1000.0, 1000.0, 161.0, 1000.0, 1000.0, 1000.0, 943.0]
2025-08-07 06:18:53,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 45 seconds)
2025-08-07 06:20:31,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:20:45,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2323.90283 ± 494.843
2025-08-07 06:20:45,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1767.1831, 2739.7874, 2759.603, 2717.2417, 2558.7915, 2563.5576, 1354.7625, 2643.3535, 2477.0945, 1657.6519]
2025-08-07 06:20:45,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 449.0, 1000.0, 1000.0, 672.0]
2025-08-07 06:20:45,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 1 second)
2025-08-07 06:22:28,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:22:42,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2419.13574 ± 638.376
2025-08-07 06:22:42,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2429.453, 2880.3926, 2384.9744, 1297.6003, 2768.5688, 2732.916, 2878.8384, 2881.7568, 1087.7524, 2849.106]
2025-08-07 06:22:42,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 521.0, 1000.0, 1000.0, 1000.0, 1000.0, 466.0, 1000.0]
2025-08-07 06:22:42,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 5 seconds)
2025-08-07 06:24:25,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:40,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2618.63281 ± 147.310
2025-08-07 06:24:40,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2546.5466, 2755.0144, 2812.4216, 2270.4019, 2619.9963, 2695.8818, 2551.493, 2659.826, 2533.666, 2741.081]
2025-08-07 06:24:40,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 881.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:24:40,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2618.63) for latency ExtremeClogL1U23
2025-08-07 06:24:40,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 33 seconds)
2025-08-07 06:26:28,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:26:39,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1876.07715 ± 1039.583
2025-08-07 06:26:39,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [918.4457, 38.893353, 2700.4155, 2941.599, 2188.7256, 2479.365, 2541.7632, 2628.1108, 2212.7183, 110.7337]
2025-08-07 06:26:39,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [375.0, 36.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 890.0, 60.0]
2025-08-07 06:26:39,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 50 minutes, 28 seconds)
2025-08-07 06:28:14,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:28:26,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2096.94214 ± 873.541
2025-08-07 06:28:26,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2921.1926, 777.96246, 2333.2705, 2711.9011, 2583.4692, 2686.7405, 2368.1624, 2244.368, 79.467064, 2262.8877]
2025-08-07 06:28:26,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 301.0, 757.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 60.0, 1000.0]
2025-08-07 06:28:26,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 46 seconds)
2025-08-07 06:30:10,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:30:25,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2675.03101 ± 297.447
2025-08-07 06:30:25,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2674.4268, 2663.5444, 1878.0096, 2901.1218, 2708.9888, 2610.4272, 3040.782, 2624.1528, 2747.202, 2901.6562]
2025-08-07 06:30:25,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 701.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:30:25,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2675.03) for latency ExtremeClogL1U23
2025-08-07 06:30:25,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 20 seconds)
2025-08-07 06:32:07,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:32:23,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2507.80347 ± 589.476
2025-08-07 06:32:23,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2785.1125, 2696.88, 2532.9058, 2652.0928, 2804.0312, 774.8889, 2757.818, 2451.5623, 2812.312, 2810.4312]
2025-08-07 06:32:23,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:32:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 44 minutes, 31 seconds)
2025-08-07 06:34:06,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:34:17,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1890.46875 ± 986.062
2025-08-07 06:34:17,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [657.5119, 2450.7964, 2781.2827, 430.73315, 435.90988, 2800.5823, 1388.9369, 2781.5662, 2588.7822, 2588.5852]
2025-08-07 06:34:17,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [277.0, 1000.0, 1000.0, 206.0, 178.0, 1000.0, 551.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:34:17,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 42 minutes, 18 seconds)
2025-08-07 06:35:56,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:36:10,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2381.56982 ± 803.962
2025-08-07 06:36:10,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2486.1382, 2700.2798, 2849.5342, 2797.936, 864.2259, 2962.6558, 2942.995, 729.50055, 2829.1477, 2653.2842]
2025-08-07 06:36:10,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 299.0, 1000.0, 1000.0]
2025-08-07 06:36:10,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 58 seconds)
2025-08-07 06:38:02,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:38:18,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2835.56641 ± 111.844
2025-08-07 06:38:18,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2819.76, 2917.886, 2890.708, 2831.9211, 2962.2188, 2772.3396, 2574.3372, 2766.3755, 2836.4138, 2983.7021]
2025-08-07 06:38:18,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:38:18,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2835.57) for latency ExtremeClogL1U23
2025-08-07 06:38:18,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 25 seconds)
2025-08-07 06:39:59,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:40:14,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2591.82861 ± 227.183
2025-08-07 06:40:14,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2640.0322, 2849.2803, 2487.668, 2722.4993, 2744.1506, 2660.5652, 1987.6334, 2509.969, 2583.7722, 2732.7158]
2025-08-07 06:40:14,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 737.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:40:14,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 17 seconds)
2025-08-07 06:41:49,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:42:03,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2493.55981 ± 735.969
2025-08-07 06:42:03,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2998.864, 2717.8987, 2813.354, 1806.0917, 505.02338, 2848.5547, 2622.3235, 2825.68, 2889.6802, 2908.1262]
2025-08-07 06:42:03,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 675.0, 230.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:42:03,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 49 seconds)
2025-08-07 06:43:45,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:43:57,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2115.87646 ± 734.158
2025-08-07 06:43:57,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2238.8945, 972.61755, 2520.3762, 2839.914, 904.1039, 2801.4324, 1838.2466, 2782.187, 1452.262, 2808.7314]
2025-08-07 06:43:57,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [723.0, 385.0, 1000.0, 1000.0, 365.0, 1000.0, 671.0, 1000.0, 495.0, 1000.0]
2025-08-07 06:43:57,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 51 seconds)
2025-08-07 06:45:46,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:46:00,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2463.46924 ± 690.863
2025-08-07 06:46:00,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [517.78, 2972.9912, 2318.2778, 2925.668, 2702.1372, 2560.8228, 2223.381, 2830.96, 2694.486, 2888.1877]
2025-08-07 06:46:00,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [213.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:46:00,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 25 seconds)
2025-08-07 06:47:36,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:51,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2363.98706 ± 688.299
2025-08-07 06:47:51,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2664.077, 2204.0789, 2788.3877, 787.4732, 2594.9685, 2698.4668, 2583.6091, 2946.7021, 1367.4946, 3004.6143]
2025-08-07 06:47:51,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:47:51,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 40 seconds)
2025-08-07 06:49:37,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:48,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1984.14905 ± 952.033
2025-08-07 06:49:48,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1793.7133, 1258.9263, 2981.6357, 2840.6846, 825.3693, 96.56554, 2799.6294, 1804.623, 2512.3474, 2927.9956]
2025-08-07 06:49:48,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [742.0, 488.0, 1000.0, 1000.0, 305.0, 70.0, 1000.0, 694.0, 1000.0, 1000.0]
2025-08-07 06:49:48,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 48 seconds)
2025-08-07 06:51:28,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:43,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2839.89233 ± 128.387
2025-08-07 06:51:43,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2871.257, 2711.0635, 2840.7292, 2575.0002, 2807.2168, 3032.556, 2899.307, 2993.248, 2757.5107, 2911.0374]
2025-08-07 06:51:43,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:51:43,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2839.89) for latency ExtremeClogL1U23
2025-08-07 06:51:43,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 8 seconds)
2025-08-07 06:53:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:43,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2664.61914 ± 81.856
2025-08-07 06:53:43,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2712.5647, 2707.8655, 2600.0051, 2774.818, 2619.5903, 2640.8164, 2607.0774, 2813.74, 2638.0037, 2531.708]
2025-08-07 06:53:43,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:53:43,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 27 seconds)
2025-08-07 06:55:19,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:34,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2595.54980 ± 322.672
2025-08-07 06:55:34,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2730.8882, 2870.2917, 2878.6326, 2580.6077, 2855.2712, 2827.5818, 1987.4061, 1988.8378, 2559.794, 2676.189]
2025-08-07 06:55:34,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 700.0, 739.0, 1000.0, 1000.0]
2025-08-07 06:55:34,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 2 seconds)
2025-08-07 06:57:16,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:29,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2357.15430 ± 946.939
2025-08-07 06:57:29,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2696.1572, 2835.17, 2819.4443, 2813.9653, 2598.4517, 3039.2139, 62.264935, 984.867, 2961.696, 2760.3147]
2025-08-07 06:57:29,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 55.0, 392.0, 1000.0, 1000.0]
2025-08-07 06:57:29,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 14 seconds)
2025-08-07 06:59:10,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:24,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2738.10376 ± 298.007
2025-08-07 06:59:24,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2760.2312, 2557.599, 2830.212, 2976.4873, 2945.9023, 1990.7893, 2556.5764, 3107.2827, 2864.3127, 2791.6443]
2025-08-07 06:59:24,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 673.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:59:24,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 17 seconds)
2025-08-07 07:01:07,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2851.77783 ± 94.887
2025-08-07 07:01:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2727.4812, 2878.5718, 2982.8428, 2942.7073, 2877.107, 2866.8242, 2924.767, 2896.5554, 2722.3215, 2698.6006]
2025-08-07 07:01:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:01:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2851.78) for latency ExtremeClogL1U23
2025-08-07 07:01:22,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 25 seconds)
2025-08-07 07:03:04,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:19,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2713.37891 ± 333.378
2025-08-07 07:03:19,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2877.1873, 2025.0864, 2923.1301, 2788.4016, 2972.0889, 2142.547, 3023.4097, 2680.4, 2980.1282, 2721.41]
2025-08-07 07:03:19,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 743.0, 1000.0, 906.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:03:19,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 26 seconds)
2025-08-07 07:05:01,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:16,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2782.92432 ± 280.415
2025-08-07 07:05:16,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2150.0054, 2916.2188, 2871.444, 3026.2883, 2821.2876, 2944.6792, 2901.8762, 2323.0688, 2931.5188, 2942.8552]
2025-08-07 07:05:16,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 735.0, 1000.0, 1000.0]
2025-08-07 07:05:16,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 38 seconds)
2025-08-07 07:06:58,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:14,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2713.80347 ± 188.159
2025-08-07 07:07:14,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2431.9983, 2777.0288, 2608.3418, 2335.4, 2702.2432, 2891.0183, 2899.8142, 2853.4385, 2884.137, 2754.6123]
2025-08-07 07:07:14,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:07:14,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 44 seconds)
2025-08-07 07:08:58,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:12,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2310.99219 ± 877.485
2025-08-07 07:09:12,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2308.6858, 2901.2393, 590.96155, 2225.9004, 2762.7493, 662.5443, 2895.1006, 2868.1208, 2984.7837, 2909.8357]
2025-08-07 07:09:12,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [841.0, 1000.0, 211.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:09:12,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 49 seconds)
2025-08-07 07:11:00,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:11,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2128.88354 ± 1046.998
2025-08-07 07:11:11,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2385.146, 2782.985, 3002.143, 2481.2588, 601.0984, 698.9321, 3029.5256, 393.04874, 3011.6345, 2903.0627]
2025-08-07 07:11:11,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 238.0, 296.0, 1000.0, 146.0, 1000.0, 1000.0]
2025-08-07 07:11:11,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 53 seconds)
2025-08-07 07:12:45,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:59,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2537.61719 ± 716.429
2025-08-07 07:12:59,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2823.244, 2621.509, 2811.9192, 2714.9495, 2708.6423, 2699.2085, 2933.3962, 409.05728, 2943.2988, 2710.9473]
2025-08-07 07:12:59,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 147.0, 1000.0, 1000.0]
2025-08-07 07:12:59,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 52 seconds)
2025-08-07 07:14:43,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2838.46362 ± 101.980
2025-08-07 07:14:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2895.7053, 2786.823, 2918.3203, 2863.1763, 2576.3174, 2798.7375, 2943.5984, 2806.3943, 2926.509, 2869.054]
2025-08-07 07:14:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:14:58,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 56 seconds)
2025-08-07 07:16:40,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:55,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2693.72705 ± 404.121
2025-08-07 07:16:55,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1842.9395, 3016.1763, 2016.1439, 2898.8901, 2980.3147, 2910.833, 2933.5176, 2601.8562, 2713.4521, 3023.1462]
2025-08-07 07:16:55,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [742.0, 1000.0, 687.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:16:55,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1251 [DEBUG]: Training session finished
