2025-08-07 04:04:22,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc0-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:04:22,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc0-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:04:22,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1496a7f78550>}
2025-08-07 04:04:22,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1111 [DEBUG]: using device: cuda
2025-08-07 04:04:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1133 [INFO]: Creating new trainer
2025-08-07 04:04:22,292 baseline-bpql-noiseperc0-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=219, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:04:22,292 baseline-bpql-noiseperc0-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:04:23,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1194 [DEBUG]: Starting training session...
2025-08-07 04:04:23,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 1/100
2025-08-07 04:06:03,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:06:07,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -315.58673 ± 579.956
2025-08-07 04:06:07,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [-102.49538, 16.867552, 22.683527, 29.847483, 21.384045, -86.666, -34.195034, -1449.9553, -1493.0804, -80.257774]
2025-08-07 04:06:07,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 29.0, 29.0, 29.0, 29.0, 97.0, 62.0, 1000.0, 1000.0, 112.0]
2025-08-07 04:06:07,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (-315.59) for latency ExtremeClogL1U23
2025-08-07 04:06:07,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 52 minutes, 22 seconds)
2025-08-07 04:07:58,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:08:01,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -110.87550 ± 213.518
2025-08-07 04:08:01,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [13.89496, -200.17618, -156.42038, 25.353088, -703.96594, -91.1802, -81.83566, 24.735214, 44.87641, 15.963633]
2025-08-07 04:08:01,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 216.0, 148.0, 70.0, 1000.0, 142.0, 154.0, 60.0, 117.0, 80.0]
2025-08-07 04:08:01,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (-110.88) for latency ExtremeClogL1U23
2025-08-07 04:08:01,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 58 minutes, 27 seconds)
2025-08-07 04:09:39,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:09:40,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -17.04506 ± 21.866
2025-08-07 04:09:40,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [-28.864046, -31.630035, -0.8549282, -9.68345, 14.207161, -16.619827, -10.15359, -65.00514, 7.955921, -29.802706]
2025-08-07 04:09:40,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 93.0, 114.0, 81.0, 56.0, 111.0, 66.0, 131.0, 76.0, 101.0]
2025-08-07 04:09:40,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (-17.05) for latency ExtremeClogL1U23
2025-08-07 04:09:40,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 51 minutes, 10 seconds)
2025-08-07 04:11:27,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:11:29,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -41.19909 ± 173.806
2025-08-07 04:11:29,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [40.534252, -35.574005, 15.74359, -557.31805, 54.18566, 23.899576, 19.382801, 33.02435, -15.381846, 9.512826]
2025-08-07 04:11:29,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 125.0, 55.0, 1000.0, 66.0, 49.0, 53.0, 56.0, 123.0, 167.0]
2025-08-07 04:11:29,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 50 minutes, 41 seconds)
2025-08-07 04:13:22,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:13:29,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 57.84515 ± 50.072
2025-08-07 04:13:29,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [-12.4353285, 21.053608, 90.66775, 70.624, 31.431477, 40.412613, 2.8387957, 80.06277, 85.18016, 168.61559]
2025-08-07 04:13:29,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 146.0, 181.0, 92.0, 1000.0, 43.0, 1000.0, 265.0, 91.0, 1000.0]
2025-08-07 04:13:29,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (57.85) for latency ExtremeClogL1U23
2025-08-07 04:13:29,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 53 minutes, 3 seconds)
2025-08-07 04:15:10,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:15:19,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 360.24106 ± 178.217
2025-08-07 04:15:19,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [332.64108, 361.01608, 345.13495, 526.18005, 390.06177, 384.87015, 395.71237, 49.4246, 104.52169, 712.84796]
2025-08-07 04:15:19,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [458.0, 473.0, 1000.0, 1000.0, 494.0, 583.0, 1000.0, 49.0, 112.0, 1000.0]
2025-08-07 04:15:19,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (360.24) for latency ExtremeClogL1U23
2025-08-07 04:15:19,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 53 minutes, 4 seconds)
2025-08-07 04:17:04,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:17:14,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 471.20874 ± 256.614
2025-08-07 04:17:14,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [827.84906, 451.43347, 690.76044, 679.68524, 597.8362, 145.0038, 29.618002, 545.3209, 152.34885, 592.232]
2025-08-07 04:17:14,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 491.0, 1000.0, 1000.0, 1000.0, 162.0, 32.0, 706.0, 213.0, 1000.0]
2025-08-07 04:17:14,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (471.21) for latency ExtremeClogL1U23
2025-08-07 04:17:14,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 51 minutes, 20 seconds)
2025-08-07 04:18:55,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:18:58,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 168.83664 ± 180.010
2025-08-07 04:18:58,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [264.03363, 595.63904, 338.1664, 31.811815, 42.982796, 32.581203, 76.17549, 33.016315, 32.340927, 241.61874]
2025-08-07 04:18:58,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [332.0, 737.0, 339.0, 32.0, 60.0, 32.0, 73.0, 32.0, 30.0, 378.0]
2025-08-07 04:18:58,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 50 minutes, 58 seconds)
2025-08-07 04:20:43,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:20:46,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 162.68286 ± 86.603
2025-08-07 04:20:46,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [199.21404, 163.45465, 46.801674, 177.20045, 264.98703, 292.03903, 33.989887, 106.226105, 91.0737, 251.84204]
2025-08-07 04:20:46,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [321.0, 180.0, 38.0, 204.0, 266.0, 426.0, 30.0, 91.0, 114.0, 445.0]
2025-08-07 04:20:46,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 48 minutes, 58 seconds)
2025-08-07 04:22:38,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:22:43,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 317.85532 ± 330.298
2025-08-07 04:22:43,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [840.7026, 91.470856, 177.96136, 219.80595, 63.760036, 868.9097, 65.562874, 60.438435, 731.96594, 57.97561]
2025-08-07 04:22:43,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 85.0, 155.0, 242.0, 57.0, 1000.0, 64.0, 60.0, 760.0, 60.0]
2025-08-07 04:22:43,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 46 minutes, 13 seconds)
2025-08-07 04:24:23,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:24:30,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 425.88126 ± 228.811
2025-08-07 04:24:30,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [625.9044, 231.19576, 470.69104, 413.04398, 81.8292, 219.35905, 288.50085, 749.1954, 359.9177, 819.17523]
2025-08-07 04:24:30,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [560.0, 248.0, 417.0, 482.0, 81.0, 220.0, 232.0, 1000.0, 323.0, 852.0]
2025-08-07 04:24:30,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 43 minutes, 15 seconds)
2025-08-07 04:26:18,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:26:22,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 305.97568 ± 257.819
2025-08-07 04:26:22,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [245.2672, 328.18283, 115.81214, 192.27412, 622.17975, 938.3041, 132.78621, 274.39505, 99.70117, 110.854294]
2025-08-07 04:26:22,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [219.0, 320.0, 92.0, 158.0, 566.0, 1000.0, 94.0, 273.0, 82.0, 94.0]
2025-08-07 04:26:22,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 40 minutes, 52 seconds)
2025-08-07 04:28:07,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:28:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 891.26837 ± 337.566
2025-08-07 04:28:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [776.6273, 1148.1562, 534.871, 120.046974, 1272.9191, 833.37555, 1002.14246, 1263.3357, 873.6619, 1087.5477]
2025-08-07 04:28:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [623.0, 1000.0, 393.0, 128.0, 1000.0, 660.0, 825.0, 1000.0, 713.0, 849.0]
2025-08-07 04:28:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (891.27) for latency ExtremeClogL1U23
2025-08-07 04:28:18,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 42 minutes, 27 seconds)
2025-08-07 04:30:03,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:30:10,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 574.26990 ± 347.501
2025-08-07 04:30:10,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [335.0651, 403.50534, 68.29092, 638.3876, 1047.634, 934.179, 452.77484, 463.88907, 1173.3823, 225.59053]
2025-08-07 04:30:10,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [248.0, 265.0, 57.0, 422.0, 858.0, 1000.0, 309.0, 388.0, 1000.0, 164.0]
2025-08-07 04:30:10,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 41 minutes, 38 seconds)
2025-08-07 04:31:54,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:32:00,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 487.25006 ± 409.466
2025-08-07 04:32:00,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [98.779785, 867.4962, 1418.3035, 498.50836, 170.62494, 250.42468, 275.94995, 879.90717, 229.24095, 183.26532]
2025-08-07 04:32:00,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 1000.0, 1000.0, 394.0, 132.0, 226.0, 190.0, 676.0, 173.0, 117.0]
2025-08-07 04:32:00,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 37 minutes, 44 seconds)
2025-08-07 04:33:53,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:34:02,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 767.20264 ± 543.359
2025-08-07 04:34:02,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1348.1093, 1503.7897, 1147.7108, 256.27905, 546.9844, 899.0308, 1448.1198, 91.94057, 26.265738, 403.79675]
2025-08-07 04:34:02,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 180.0, 351.0, 1000.0, 1000.0, 64.0, 35.0, 376.0]
2025-08-07 04:34:02,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 40 minutes, 16 seconds)
2025-08-07 04:35:45,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:35:50,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 518.86255 ± 330.029
2025-08-07 04:35:50,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [613.7039, 269.61594, 467.60498, 395.95172, 74.38915, 630.09314, 495.56528, 1326.5352, 202.6145, 712.5522]
2025-08-07 04:35:50,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [427.0, 162.0, 376.0, 280.0, 54.0, 440.0, 305.0, 1000.0, 143.0, 442.0]
2025-08-07 04:35:50,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 37 minutes, 2 seconds)
2025-08-07 04:37:29,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:37:34,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 495.02655 ± 450.661
2025-08-07 04:37:34,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [145.9724, 63.987198, 104.50348, 562.96674, 862.00977, 1553.8607, 166.8759, 129.71417, 661.6699, 698.70496]
2025-08-07 04:37:34,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [138.0, 45.0, 89.0, 372.0, 508.0, 1000.0, 113.0, 127.0, 491.0, 420.0]
2025-08-07 04:37:34,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 32 minutes, 1 second)
2025-08-07 04:39:23,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:39:30,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 645.71259 ± 517.655
2025-08-07 04:39:30,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1347.9623, 802.5852, 1740.7747, 633.5782, 301.15814, 63.649025, 555.24225, 712.68036, 134.75804, 164.7383]
2025-08-07 04:39:30,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [776.0, 439.0, 1000.0, 1000.0, 226.0, 50.0, 299.0, 408.0, 86.0, 103.0]
2025-08-07 04:39:30,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 31 minutes, 3 seconds)
2025-08-07 04:41:12,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:41:26,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1768.56677 ± 516.029
2025-08-07 04:41:26,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2182.0723, 2090.2632, 400.03955, 1339.5724, 2000.6803, 2036.9097, 1939.4398, 2119.7678, 1626.8112, 1950.112]
2025-08-07 04:41:26,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 237.0, 674.0, 1000.0, 1000.0, 962.0, 1000.0, 851.0, 1000.0]
2025-08-07 04:41:26,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (1768.57) for latency ExtremeClogL1U23
2025-08-07 04:41:26,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 30 minutes, 54 seconds)
2025-08-07 04:43:12,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:43:24,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1428.56506 ± 461.586
2025-08-07 04:43:24,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1402.2468, 236.80971, 1893.4827, 1785.9631, 1225.3156, 1434.2308, 1224.8712, 1847.1914, 1752.0845, 1483.4556]
2025-08-07 04:43:24,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [699.0, 131.0, 1000.0, 1000.0, 1000.0, 762.0, 667.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:43:24,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 28 minutes, 2 seconds)
2025-08-07 04:45:18,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:45:29,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1187.27661 ± 609.090
2025-08-07 04:45:29,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1765.6866, 1344.3274, 2021.9797, 987.3138, 351.22372, 1671.2817, 193.84424, 1853.1495, 919.35657, 764.6033]
2025-08-07 04:45:29,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [809.0, 680.0, 1000.0, 464.0, 164.0, 790.0, 137.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:45:29,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 32 seconds)
2025-08-07 04:47:08,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:47:20,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1417.00842 ± 642.083
2025-08-07 04:47:20,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1875.5013, 1990.7031, 1210.9595, 1306.4866, 335.49448, 289.8142, 2121.9722, 1915.1664, 1173.8738, 1950.1119]
2025-08-07 04:47:20,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 634.0, 680.0, 190.0, 139.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:47:20,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes, 16 seconds)
2025-08-07 04:49:03,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:49:14,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1323.46118 ± 645.361
2025-08-07 04:49:14,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1466.7917, 816.2335, 1657.3894, 879.2978, 2224.4526, 2190.6006, 459.049, 630.7673, 841.446, 2068.584]
2025-08-07 04:49:14,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [625.0, 370.0, 790.0, 1000.0, 1000.0, 1000.0, 212.0, 297.0, 357.0, 1000.0]
2025-08-07 04:49:14,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 27 minutes, 53 seconds)
2025-08-07 04:50:59,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:51:07,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 967.78662 ± 780.400
2025-08-07 04:51:07,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [366.25702, 124.910416, 857.5453, 1906.1794, 1664.4756, 455.46896, 2019.2898, 1956.7004, 169.90428, 157.13571]
2025-08-07 04:51:07,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [193.0, 82.0, 464.0, 1000.0, 891.0, 271.0, 1000.0, 1000.0, 89.0, 87.0]
2025-08-07 04:51:07,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 25 minutes, 10 seconds)
2025-08-07 04:52:56,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:53:08,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1730.74878 ± 704.923
2025-08-07 04:53:08,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2220.9275, 2114.6062, 2390.5708, 2062.4834, 1218.3892, 2115.856, 1931.8502, 253.72643, 694.4449, 2304.633]
2025-08-07 04:53:08,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 516.0, 1000.0, 1000.0, 143.0, 339.0, 1000.0]
2025-08-07 04:53:08,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 23 minutes, 57 seconds)
2025-08-07 04:54:53,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:55:08,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2104.30322 ± 206.153
2025-08-07 04:55:08,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2228.4993, 2046.8741, 2088.7107, 1704.9338, 2204.095, 2125.356, 2213.1233, 2157.374, 2468.1863, 1805.8787]
2025-08-07 04:55:08,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 688.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:55:08,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2104.30) for latency ExtremeClogL1U23
2025-08-07 04:55:08,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 20 minutes, 57 seconds)
2025-08-07 04:56:54,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:57:03,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1487.30493 ± 937.131
2025-08-07 04:57:03,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [592.1661, 2297.6077, 222.67503, 1847.9915, 482.27924, 2479.7825, 269.57495, 2507.698, 2509.3354, 1663.9382]
2025-08-07 04:57:03,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [227.0, 804.0, 94.0, 701.0, 202.0, 1000.0, 105.0, 1000.0, 1000.0, 618.0]
2025-08-07 04:57:03,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 59 seconds)
2025-08-07 04:58:52,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:59:08,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2329.04077 ± 463.880
2025-08-07 04:59:08,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1395.1742, 1504.9824, 2560.233, 2714.4783, 2427.545, 2221.2207, 2565.2886, 2760.663, 2473.7312, 2667.093]
2025-08-07 04:59:08,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 895.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:59:08,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2329.04) for latency ExtremeClogL1U23
2025-08-07 04:59:08,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 36 seconds)
2025-08-07 05:00:52,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:01:03,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1711.19897 ± 968.677
2025-08-07 05:01:03,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [374.98663, 2070.203, 595.6336, 2576.3823, 2707.6567, 2737.3013, 2346.5027, 2469.7637, 783.53827, 450.02182]
2025-08-07 05:01:03,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [185.0, 1000.0, 265.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 331.0, 172.0]
2025-08-07 05:01:03,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 12 seconds)
2025-08-07 05:02:49,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:02:59,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1306.39392 ± 775.201
2025-08-07 05:02:59,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [674.6229, 983.90155, 265.3287, 868.31555, 937.9042, 2411.3015, 1795.232, 2278.1697, 512.81085, 2336.3523]
2025-08-07 05:02:59,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 449.0, 137.0, 1000.0, 406.0, 1000.0, 785.0, 1000.0, 250.0, 1000.0]
2025-08-07 05:02:59,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 1 second)
2025-08-07 05:04:41,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:04:53,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1516.66040 ± 625.655
2025-08-07 05:04:53,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1536.5049, 1307.0444, 251.20296, 1437.8528, 1137.4707, 1144.9751, 2006.6687, 2344.3135, 1458.2163, 2542.3542]
2025-08-07 05:04:53,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [530.0, 555.0, 107.0, 1000.0, 1000.0, 1000.0, 885.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:04:53,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 37 seconds)
2025-08-07 05:06:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:06:59,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2636.33398 ± 186.134
2025-08-07 05:06:59,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2870.635, 2534.6882, 2461.6409, 2610.4885, 2788.4976, 2337.2598, 2807.5159, 2588.942, 2456.2751, 2907.3965]
2025-08-07 05:06:59,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:06:59,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2636.33) for latency ExtremeClogL1U23
2025-08-07 05:06:59,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 6 seconds)
2025-08-07 05:08:40,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:08:52,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1844.45923 ± 965.662
2025-08-07 05:08:52,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2423.1375, 411.51465, 102.08678, 651.95966, 2595.4204, 2521.2344, 2590.0752, 2411.604, 2247.2727, 2490.2874]
2025-08-07 05:08:52,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 186.0, 55.0, 310.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:08:52,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 27 seconds)
2025-08-07 05:10:38,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:10:49,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1970.51562 ± 841.932
2025-08-07 05:10:49,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2607.687, 1047.5524, 2426.4395, 1768.318, 276.80545, 2697.0054, 2695.563, 2650.4832, 1020.96027, 2514.3413]
2025-08-07 05:10:49,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 464.0, 936.0, 693.0, 127.0, 1000.0, 1000.0, 945.0, 366.0, 1000.0]
2025-08-07 05:10:49,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 59 seconds)
2025-08-07 05:12:34,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:12:49,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2686.23584 ± 167.515
2025-08-07 05:12:49,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2879.3823, 2801.2961, 2424.946, 2836.556, 2693.397, 2661.603, 2352.2395, 2785.1548, 2625.362, 2802.4214]
2025-08-07 05:12:49,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:12:49,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2686.24) for latency ExtremeClogL1U23
2025-08-07 05:12:49,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 47 seconds)
2025-08-07 05:14:43,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:14:55,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1967.11951 ± 981.057
2025-08-07 05:14:55,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2531.1943, 2778.4639, 1812.8406, 715.7401, 2621.2505, 431.14658, 2970.062, 2514.631, 472.77603, 2823.0925]
2025-08-07 05:14:55,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [960.0, 1000.0, 1000.0, 341.0, 1000.0, 185.0, 1000.0, 1000.0, 243.0, 1000.0]
2025-08-07 05:14:55,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 14 seconds)
2025-08-07 05:16:41,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:16:57,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2874.07422 ± 204.842
2025-08-07 05:16:57,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2819.781, 2975.8462, 2873.807, 2712.4507, 2635.8613, 3004.2847, 3159.838, 3098.9895, 2472.3784, 2987.5056]
2025-08-07 05:16:57,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:16:57,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2874.07) for latency ExtremeClogL1U23
2025-08-07 05:16:57,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 3 minutes, 31 seconds)
2025-08-07 05:18:35,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:18:50,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2651.32227 ± 180.005
2025-08-07 05:18:50,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2605.026, 2782.76, 2767.4421, 2268.3875, 2576.0574, 2520.712, 2860.4822, 2519.781, 2885.892, 2726.6812]
2025-08-07 05:18:50,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:18:50,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute, 41 seconds)
2025-08-07 05:20:40,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:20:56,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2897.15430 ± 143.610
2025-08-07 05:20:56,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2760.6838, 2676.379, 2948.375, 2952.3672, 2947.351, 2721.9849, 2996.028, 3077.3545, 2782.6138, 3108.4065]
2025-08-07 05:20:56,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:20:56,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2897.15) for latency ExtremeClogL1U23
2025-08-07 05:20:56,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 22 seconds)
2025-08-07 05:22:36,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:22:51,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2613.60034 ± 400.346
2025-08-07 05:22:51,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2612.3276, 1450.726, 2641.912, 2846.9106, 2616.7505, 2639.504, 2884.5642, 2823.8003, 2812.8062, 2806.7002]
2025-08-07 05:22:51,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [998.0, 502.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:22:51,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 22 seconds)
2025-08-07 05:24:46,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:25:01,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2674.97363 ± 289.604
2025-08-07 05:25:01,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2670.8037, 2862.2307, 1978.1451, 2820.4531, 2943.0474, 2321.4854, 2957.1755, 2793.4646, 2751.414, 2651.5183]
2025-08-07 05:25:01,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 714.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:25:01,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 57 minutes, 12 seconds)
2025-08-07 05:26:44,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:26:57,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2526.71240 ± 664.238
2025-08-07 05:26:57,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2957.0393, 2656.99, 2834.233, 746.0094, 3026.6936, 1941.9463, 2569.2942, 2979.6824, 2876.7808, 2678.4578]
2025-08-07 05:26:57,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 297.0, 1000.0, 681.0, 900.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:26:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 54 minutes, 7 seconds)
2025-08-07 05:28:47,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:29:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3083.00244 ± 309.674
2025-08-07 05:29:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3154.5942, 3059.974, 2984.755, 3195.1982, 3225.4146, 2193.7866, 3254.247, 3211.3447, 3263.073, 3287.6392]
2025-08-07 05:29:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 662.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:29:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3083.00) for latency ExtremeClogL1U23
2025-08-07 05:29:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 54 minutes, 11 seconds)
2025-08-07 05:30:40,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:30:55,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2829.37158 ± 183.053
2025-08-07 05:30:55,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2687.4766, 3148.5842, 2893.2163, 2662.2944, 2866.751, 2604.5994, 2859.4175, 3077.6982, 2584.1272, 2909.55]
2025-08-07 05:30:55,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:30:55,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 48 seconds)
2025-08-07 05:32:41,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:32:56,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2999.08936 ± 182.939
2025-08-07 05:32:56,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3174.7744, 2736.4434, 2718.3115, 3134.904, 3176.229, 3136.8699, 3040.4727, 2894.5073, 3181.9702, 2796.4102]
2025-08-07 05:32:56,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:32:56,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 48 minutes, 58 seconds)
2025-08-07 05:34:51,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:35:06,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2816.04907 ± 124.753
2025-08-07 05:35:06,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2836.4473, 2939.0547, 3092.6384, 2730.9036, 2786.6697, 2754.0457, 2670.6228, 2873.5598, 2651.4587, 2825.089]
2025-08-07 05:35:06,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:35:06,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 57 seconds)
2025-08-07 05:36:43,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:36:57,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2319.65942 ± 941.951
2025-08-07 05:36:57,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [450.9256, 3223.5723, 2611.0933, 2860.0525, 2934.8535, 3141.9001, 1307.4437, 1045.6304, 2844.3447, 2776.778]
2025-08-07 05:36:57,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [181.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 424.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:36:57,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 54 seconds)
2025-08-07 05:38:42,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:38:54,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1817.07458 ± 1091.616
2025-08-07 05:38:54,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2605.3325, 552.68115, 839.31903, 2930.5728, 329.9274, 2539.6465, 2923.423, 2586.708, 279.82147, 2583.315]
2025-08-07 05:38:54,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 269.0, 1000.0, 1000.0, 132.0, 1000.0, 1000.0, 1000.0, 126.0, 1000.0]
2025-08-07 05:38:54,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 40 minutes, 38 seconds)
2025-08-07 05:40:45,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:41:00,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3137.12891 ± 131.428
2025-08-07 05:41:00,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3140.8225, 3033.3828, 3166.192, 3150.0884, 3120.653, 3277.8665, 2807.629, 3219.9407, 3159.7014, 3295.013]
2025-08-07 05:41:00,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:41:00,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3137.13) for latency ExtremeClogL1U23
2025-08-07 05:41:00,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 40 minutes, 53 seconds)
2025-08-07 05:42:43,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:42:57,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2824.88916 ± 657.689
2025-08-07 05:42:57,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3286.985, 965.9371, 3130.7998, 3239.7227, 2480.5796, 2956.06, 3148.7288, 2858.0525, 3030.707, 3151.3162]
2025-08-07 05:42:57,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 311.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:42:57,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 38 minutes, 10 seconds)
2025-08-07 05:44:42,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:44:56,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2836.89014 ± 863.584
2025-08-07 05:44:56,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [996.1884, 2959.6418, 3243.9263, 3564.987, 3364.4612, 3542.147, 3230.0146, 1315.6659, 3080.6663, 3071.2058]
2025-08-07 05:44:56,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [327.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 414.0, 1000.0, 1000.0]
2025-08-07 05:44:56,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 21 seconds)
2025-08-07 05:46:40,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:46:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3275.31714 ± 213.889
2025-08-07 05:46:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3376.4624, 3433.8198, 3615.9067, 3098.6614, 2997.8, 3397.815, 3322.9136, 2864.7612, 3260.7786, 3384.253]
2025-08-07 05:46:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:46:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3275.32) for latency ExtremeClogL1U23
2025-08-07 05:46:56,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 48 seconds)
2025-08-07 05:48:37,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:48:51,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2871.57373 ± 499.575
2025-08-07 05:48:51,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2976.5627, 3066.1372, 2786.2173, 2908.4734, 3365.8174, 2847.1355, 3179.6758, 2875.9524, 1470.4425, 3239.3228]
2025-08-07 05:48:51,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 617.0, 1000.0]
2025-08-07 05:48:51,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 37 seconds)
2025-08-07 05:50:36,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:50:49,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2779.96582 ± 699.587
2025-08-07 05:50:49,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3397.073, 1318.9714, 3192.2786, 1853.9224, 2215.1484, 2717.9294, 3137.9133, 3212.1392, 3364.1948, 3390.0862]
2025-08-07 05:50:49,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 425.0, 1000.0, 524.0, 784.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:50:49,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 28 minutes, 19 seconds)
2025-08-07 05:52:33,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:52:47,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2864.00854 ± 955.192
2025-08-07 05:52:47,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2793.219, 2997.5703, 3590.2085, 90.7178, 3126.5908, 3398.308, 3297.3494, 2816.482, 3356.4216, 3173.2188]
2025-08-07 05:52:47,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [798.0, 1000.0, 1000.0, 74.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:52:47,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 26 minutes, 27 seconds)
2025-08-07 05:54:37,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:54:48,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2403.07178 ± 1233.495
2025-08-07 05:54:48,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2860.692, 659.5228, 182.51247, 3209.3022, 3044.1128, 793.08984, 3308.372, 3230.2625, 3335.274, 3407.5786]
2025-08-07 05:54:48,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 184.0, 82.0, 1000.0, 1000.0, 246.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:54:48,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 24 minutes, 54 seconds)
2025-08-07 05:56:34,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:56:49,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3248.92798 ± 202.027
2025-08-07 05:56:49,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3009.2588, 3484.474, 2995.694, 3062.2212, 3032.4575, 3283.5742, 3532.5137, 3363.568, 3242.9673, 3482.55]
2025-08-07 05:56:49,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:56:49,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 4 seconds)
2025-08-07 05:58:34,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:58:48,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2939.97900 ± 924.210
2025-08-07 05:58:48,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3391.9417, 2225.6746, 3177.0403, 3231.1228, 378.8482, 3197.7666, 3485.8372, 3481.4841, 3320.764, 3509.31]
2025-08-07 05:58:48,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 721.0, 1000.0, 1000.0, 143.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:58:48,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 28 seconds)
2025-08-07 06:00:22,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:00:37,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3027.08081 ± 792.137
2025-08-07 06:00:37,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2827.2063, 3212.8345, 3226.3892, 3349.2908, 3144.4941, 3215.8198, 3504.6897, 3468.8237, 731.87445, 3589.384]
2025-08-07 06:00:37,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 256.0, 1000.0]
2025-08-07 06:00:37,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 18 seconds)
2025-08-07 06:02:21,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:02:36,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3082.01099 ± 694.819
2025-08-07 06:02:36,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3292.7922, 3532.1797, 2311.535, 3251.8523, 3445.0774, 3411.6301, 3275.0598, 1267.751, 3593.605, 3438.626]
2025-08-07 06:02:36,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 419.0, 1000.0, 1000.0]
2025-08-07 06:02:36,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 32 seconds)
2025-08-07 06:04:20,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:04:35,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3293.28857 ± 228.792
2025-08-07 06:04:35,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3612.9895, 3556.2117, 3156.359, 3284.147, 2942.9778, 3392.3403, 3440.4504, 3334.8042, 2874.5159, 3338.0916]
2025-08-07 06:04:35,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:04:35,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3293.29) for latency ExtremeClogL1U23
2025-08-07 06:04:35,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 22 seconds)
2025-08-07 06:06:20,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:06:35,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3432.41675 ± 188.802
2025-08-07 06:06:35,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3580.5686, 3559.3496, 3104.2139, 3413.1194, 3603.803, 3268.4158, 3707.0833, 3307.624, 3562.3738, 3217.6194]
2025-08-07 06:06:35,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:06:35,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3432.42) for latency ExtremeClogL1U23
2025-08-07 06:06:35,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 16 seconds)
2025-08-07 06:08:19,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:08:34,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3245.19238 ± 475.243
2025-08-07 06:08:34,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3213.0012, 3312.1523, 1888.3127, 3639.2573, 3450.4504, 3283.8838, 3484.7283, 3632.4402, 3318.672, 3229.0261]
2025-08-07 06:08:34,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 749.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:08:34,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 21 seconds)
2025-08-07 06:10:24,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:10:39,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3436.11084 ± 289.804
2025-08-07 06:10:39,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3360.407, 3690.009, 3451.0347, 3559.9858, 2702.6306, 3183.7285, 3742.3662, 3476.0818, 3647.2761, 3547.59]
2025-08-07 06:10:39,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:10:39,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3436.11) for latency ExtremeClogL1U23
2025-08-07 06:10:39,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 16 seconds)
2025-08-07 06:12:23,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:12:38,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3354.13672 ± 289.587
2025-08-07 06:12:38,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3487.1511, 2934.653, 3400.663, 3512.2825, 3663.8997, 3371.5845, 3016.3435, 3846.9983, 2948.5845, 3359.2085]
2025-08-07 06:12:38,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:12:38,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 14 seconds)
2025-08-07 06:14:16,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:14:31,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3628.76562 ± 197.862
2025-08-07 06:14:31,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3536.3718, 3740.2532, 3715.618, 3845.6587, 3549.1616, 3754.534, 3095.5027, 3690.3945, 3666.8274, 3693.3347]
2025-08-07 06:14:31,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:14:31,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3628.77) for latency ExtremeClogL1U23
2025-08-07 06:14:31,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 30 seconds)
2025-08-07 06:16:20,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:16:34,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2943.54370 ± 848.997
2025-08-07 06:16:34,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1086.7275, 3623.856, 3405.6802, 3064.7234, 3254.731, 3125.3115, 1492.0593, 3326.0823, 3564.1865, 3492.0818]
2025-08-07 06:16:34,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [374.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 509.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:16:34,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 52 seconds)
2025-08-07 06:18:18,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:18:33,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3529.43604 ± 186.921
2025-08-07 06:18:33,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3542.0872, 3667.9048, 3736.6182, 3336.2522, 3896.257, 3423.7559, 3468.5933, 3276.8806, 3595.963, 3350.0486]
2025-08-07 06:18:33,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:18:33,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 53 seconds)
2025-08-07 06:20:15,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:20:29,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3157.00952 ± 1043.412
2025-08-07 06:20:29,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3462.177, 67.02248, 3515.2952, 3588.2556, 3403.1633, 3729.837, 3441.5654, 3336.7075, 3216.2925, 3809.7786]
2025-08-07 06:20:29,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 49.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:20:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes)
2025-08-07 06:22:13,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:22:28,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3461.59326 ± 157.788
2025-08-07 06:22:28,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3425.2458, 3505.5146, 3276.2915, 3718.3574, 3688.4246, 3422.6484, 3454.8022, 3537.5, 3170.6533, 3416.4963]
2025-08-07 06:22:28,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:22:28,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 3 seconds)
2025-08-07 06:24:13,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:28,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3464.54150 ± 142.705
2025-08-07 06:24:28,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3662.5598, 3505.2363, 3497.227, 3249.8254, 3326.6758, 3514.7515, 3675.8362, 3561.4436, 3354.3638, 3297.4946]
2025-08-07 06:24:28,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:24:29,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 45 seconds)
2025-08-07 06:26:10,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:26:25,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3589.55200 ± 167.930
2025-08-07 06:26:25,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3623.7153, 3555.8596, 3526.473, 3596.4849, 3722.0852, 3400.8196, 3226.9766, 3685.6243, 3707.282, 3850.2007]
2025-08-07 06:26:25,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:26:25,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 14 seconds)
2025-08-07 06:28:07,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:28:23,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3364.36914 ± 169.162
2025-08-07 06:28:23,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3497.0454, 3275.3892, 3458.502, 3336.8877, 3486.936, 3371.7634, 3580.6711, 3195.9973, 3464.4524, 2976.0486]
2025-08-07 06:28:23,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:28:23,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 7 seconds)
2025-08-07 06:30:06,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:30:21,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3362.58325 ± 571.302
2025-08-07 06:30:21,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3429.3506, 3634.8857, 3767.775, 3429.7898, 3302.7407, 3625.904, 3465.143, 3577.2673, 3696.4048, 1696.5708]
2025-08-07 06:30:21,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 538.0]
2025-08-07 06:30:21,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 19 seconds)
2025-08-07 06:32:05,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:32:20,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3558.30713 ± 151.385
2025-08-07 06:32:20,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3435.16, 3516.3413, 3835.9941, 3457.53, 3587.4448, 3387.2402, 3851.153, 3516.9666, 3493.3835, 3501.8567]
2025-08-07 06:32:20,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:32:20,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 21 seconds)
2025-08-07 06:34:04,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:34:19,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3447.59448 ± 176.726
2025-08-07 06:34:19,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3559.1567, 3444.7695, 3761.401, 3290.0872, 3553.2852, 3537.7097, 3182.4092, 3582.2517, 3366.9443, 3197.9312]
2025-08-07 06:34:19,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:34:19,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 16 seconds)
2025-08-07 06:36:03,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:36:18,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3391.01221 ± 204.583
2025-08-07 06:36:18,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3105.4558, 3581.5088, 2941.0745, 3448.0034, 3482.2944, 3560.091, 3579.4956, 3327.4053, 3361.017, 3523.7722]
2025-08-07 06:36:18,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:36:18,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 28 seconds)
2025-08-07 06:38:00,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:38:16,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3576.39209 ± 218.224
2025-08-07 06:38:16,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3517.5645, 3345.25, 4009.2102, 3445.5083, 3579.6555, 3279.553, 3807.4075, 3777.8835, 3604.3652, 3397.5225]
2025-08-07 06:38:16,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:38:16,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 30 seconds)
2025-08-07 06:39:59,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:40:15,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3579.19385 ± 132.057
2025-08-07 06:40:15,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3506.3882, 3347.8933, 3678.68, 3385.052, 3540.2034, 3579.3125, 3742.172, 3588.2678, 3759.7812, 3664.1877]
2025-08-07 06:40:15,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:40:15,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 34 seconds)
2025-08-07 06:41:56,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:42:11,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3736.69922 ± 108.617
2025-08-07 06:42:11,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3817.1536, 3875.5107, 3839.3674, 3775.4011, 3492.4985, 3731.977, 3655.5828, 3676.6228, 3821.8564, 3681.025]
2025-08-07 06:42:11,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:42:11,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3736.70) for latency ExtremeClogL1U23
2025-08-07 06:42:11,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 24 seconds)
2025-08-07 06:44:02,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:17,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3433.57373 ± 241.166
2025-08-07 06:44:17,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3387.8667, 3535.0195, 3446.6462, 3178.8916, 3878.105, 3580.5034, 3510.8123, 3427.5671, 3482.6704, 2907.6575]
2025-08-07 06:44:17,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:44:17,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 51 seconds)
2025-08-07 06:46:00,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:46:15,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3456.48950 ± 176.957
2025-08-07 06:46:15,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3298.7073, 3610.463, 3196.9922, 3312.5227, 3470.4531, 3369.5598, 3570.9185, 3832.3596, 3366.9136, 3536.0027]
2025-08-07 06:46:15,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:46:15,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 50 seconds)
2025-08-07 06:47:59,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:48:14,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3303.39771 ± 486.652
2025-08-07 06:48:14,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1907.6915, 3492.8486, 3426.8538, 3192.8523, 3394.5024, 3735.905, 3659.9187, 3394.0027, 3387.3228, 3442.0789]
2025-08-07 06:48:14,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:48:14,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 54 seconds)
2025-08-07 06:49:57,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:50:13,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3535.11084 ± 86.556
2025-08-07 06:50:13,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3499.265, 3348.2102, 3583.9272, 3551.2976, 3424.654, 3503.4253, 3594.2173, 3612.057, 3620.0325, 3614.0227]
2025-08-07 06:50:13,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:50:13,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 54 seconds)
2025-08-07 06:51:58,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:52:13,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3656.69727 ± 141.492
2025-08-07 06:52:13,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3502.7583, 3690.455, 3730.9163, 3488.3445, 3626.3054, 3577.4268, 3520.9365, 3980.0347, 3693.0793, 3756.716]
2025-08-07 06:52:13,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:52:13,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 6 seconds)
2025-08-07 06:53:49,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:54:04,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3403.75537 ± 198.464
2025-08-07 06:54:04,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3545.8943, 3657.905, 2984.888, 3395.2556, 3380.571, 3303.1604, 3478.3523, 3546.9658, 3590.0527, 3154.511]
2025-08-07 06:54:04,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:54:04,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 27 seconds)
2025-08-07 06:55:55,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:10,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3707.28442 ± 184.715
2025-08-07 06:56:10,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3859.0635, 3683.9507, 3761.2407, 3828.3728, 3799.9712, 3325.017, 3867.51, 3392.409, 3850.4531, 3704.8562]
2025-08-07 06:56:10,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:56:10,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 47 seconds)
2025-08-07 06:57:45,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:00,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3702.24365 ± 162.190
2025-08-07 06:58:00,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3864.2036, 3450.8643, 3538.1318, 3854.3403, 3527.7996, 3816.7976, 3544.6855, 3918.3357, 3793.7144, 3713.561]
2025-08-07 06:58:00,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:58:00,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 29 seconds)
2025-08-07 06:59:51,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:06,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3630.65698 ± 163.773
2025-08-07 07:00:06,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3422.8596, 3601.8105, 3758.6892, 3572.0752, 3615.8806, 3970.2788, 3785.2046, 3650.008, 3534.1482, 3395.616]
2025-08-07 07:00:06,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:00:07,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 47 seconds)
2025-08-07 07:01:50,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3567.47803 ± 136.927
2025-08-07 07:02:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3436.5928, 3588.5623, 3273.7217, 3728.2014, 3612.776, 3748.7957, 3551.5115, 3680.2834, 3469.5786, 3584.7576]
2025-08-07 07:02:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:02:06,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 46 seconds)
2025-08-07 07:03:43,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:58,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3645.97852 ± 224.876
2025-08-07 07:03:58,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3585.1328, 3529.0107, 4007.5793, 3595.4924, 3614.181, 3555.1802, 3367.1672, 3600.5164, 4131.9927, 3473.5312]
2025-08-07 07:03:58,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:03:58,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 49 seconds)
2025-08-07 07:05:42,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:56,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3317.92651 ± 834.225
2025-08-07 07:05:56,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3814.6042, 3935.984, 3467.704, 3246.4739, 3767.8142, 3286.474, 3638.2717, 3796.4895, 3309.1853, 916.26447]
2025-08-07 07:05:56,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 287.0]
2025-08-07 07:05:56,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 40 seconds)
2025-08-07 07:07:45,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:00,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3599.45239 ± 156.780
2025-08-07 07:08:00,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3542.529, 3820.9568, 3435.508, 3636.1377, 3772.4832, 3731.8289, 3693.918, 3615.443, 3318.357, 3427.3599]
2025-08-07 07:08:00,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:08:00,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes)
2025-08-07 07:09:44,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:59,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3639.69849 ± 146.958
2025-08-07 07:09:59,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3721.22, 3566.0193, 3579.7925, 3715.7554, 3327.8135, 3536.562, 3846.0977, 3671.7126, 3590.8313, 3841.1792]
2025-08-07 07:09:59,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:09:59,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 52 seconds)
2025-08-07 07:11:41,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:55,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3172.68921 ± 1024.547
2025-08-07 07:11:55,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3481.8525, 3415.5115, 3535.6677, 3489.377, 3752.993, 3534.283, 3336.8416, 3339.7415, 3716.2976, 124.32555]
2025-08-07 07:11:55,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 72.0]
2025-08-07 07:11:55,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 51 seconds)
2025-08-07 07:13:39,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:54,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3595.17725 ± 110.298
2025-08-07 07:13:54,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3538.5618, 3649.9722, 3593.2158, 3648.3203, 3761.504, 3671.128, 3637.1714, 3412.2524, 3647.5103, 3392.1392]
2025-08-07 07:13:54,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:13:54,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 57 seconds)
2025-08-07 07:15:38,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3512.94458 ± 218.530
2025-08-07 07:15:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2997.5325, 3764.2449, 3806.5647, 3619.8704, 3571.0437, 3438.1917, 3525.6409, 3313.6123, 3548.7307, 3544.0151]
2025-08-07 07:15:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:15:53,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 59 seconds)
2025-08-07 07:17:37,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:53,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3728.46948 ± 136.657
2025-08-07 07:17:53,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3850.3037, 3780.714, 3809.463, 3754.6943, 3904.2966, 3697.27, 3705.016, 3370.986, 3724.5051, 3687.448]
2025-08-07 07:17:53,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:17:53,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 58 seconds)
2025-08-07 07:19:40,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:54,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3109.27490 ± 941.122
2025-08-07 07:19:54,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3131.6785, 3215.1377, 2893.3438, 3671.5361, 3778.6357, 3291.8489, 398.59476, 3609.2717, 3630.778, 3471.923]
2025-08-07 07:19:54,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 272.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:19:54,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1251 [DEBUG]: Training session finished
