2025-08-07 00:48:06,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:06,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:06,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14b566aabb90>}
2025-08-07 00:48:06,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:06,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1133 [INFO]: Creating new trainer
2025-08-07 00:48:06,434 baseline-bpql-noiseperc5-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=283, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:48:06,434 baseline-bpql-noiseperc5-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:07,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:07,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:47,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:48,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 7.18905 ± 18.011
2025-08-07 00:49:48,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [26.97441, -9.05197, -2.8813157, -27.04957, 10.676809, 1.6437281, 25.531063, 23.53947, -6.5668874, 29.074759]
2025-08-07 00:49:48,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 62.0, 89.0, 59.0, 49.0, 53.0, 44.0, 50.0, 72.0, 58.0]
2025-08-07 00:49:48,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (7.19) for latency ExtremeSparseL4U32
2025-08-07 00:49:48,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 45 minutes, 34 seconds)
2025-08-07 00:51:32,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:33,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -4.67935 ± 22.288
2025-08-07 00:51:33,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [19.548119, -32.603943, 20.614157, -5.9047728, 7.0773573, -18.401333, 13.566572, -28.925102, -39.477364, 17.712801]
2025-08-07 00:51:33,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 100.0, 63.0, 63.0, 70.0, 79.0, 54.0, 120.0, 79.0, 53.0]
2025-08-07 00:51:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 48 minutes, 14 seconds)
2025-08-07 00:53:19,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:21,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -47.43706 ± 42.351
2025-08-07 00:53:21,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [-73.22993, 22.36962, 1.1580632, -60.273792, -49.1143, -124.54624, 5.1351132, -54.286137, -74.04472, -67.53824]
2025-08-07 00:53:21,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [108.0, 55.0, 80.0, 109.0, 126.0, 176.0, 76.0, 105.0, 114.0, 106.0]
2025-08-07 00:53:21,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 49 minutes, 5 seconds)
2025-08-07 00:55:03,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:06,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -104.99162 ± 245.339
2025-08-07 00:55:06,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [-37.72419, 17.441505, -38.653652, 16.015745, -64.44425, -833.2281, 21.72131, -66.030945, -73.10929, 8.095721]
2025-08-07 00:55:06,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [140.0, 79.0, 91.0, 68.0, 137.0, 1000.0, 54.0, 100.0, 111.0, 76.0]
2025-08-07 00:55:06,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 47 minutes, 37 seconds)
2025-08-07 00:56:51,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:54,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -104.71230 ± 234.897
2025-08-07 00:56:54,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [24.053194, -798.86145, -83.22969, -41.688953, -47.214535, -69.894264, -23.839128, -66.562485, 24.26665, 35.847652]
2025-08-07 00:56:54,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 1000.0, 158.0, 109.0, 102.0, 164.0, 83.0, 200.0, 69.0, 50.0]
2025-08-07 00:56:54,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 46 minutes, 51 seconds)
2025-08-07 00:58:42,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:48,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 0.19173 ± 41.758
2025-08-07 00:58:48,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [-21.475739, 44.517715, 28.153011, -91.69189, 40.699215, -48.353962, 10.290164, -12.559527, 35.311634, 17.026655]
2025-08-07 00:58:48,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 124.0, 114.0, 311.0, 116.0, 190.0, 150.0, 1000.0, 145.0, 158.0]
2025-08-07 00:58:48,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 49 minutes, 25 seconds)
2025-08-07 01:00:39,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:58,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 514.91290 ± 44.796
2025-08-07 01:00:58,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [534.82495, 517.46844, 403.95654, 553.6075, 565.8449, 510.12036, 487.83176, 493.05078, 559.147, 523.2762]
2025-08-07 01:00:58,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:00:58,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (514.91) for latency ExtremeSparseL4U32
2025-08-07 01:00:58,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 54 minutes, 56 seconds)
2025-08-07 01:02:44,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:02,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 696.29327 ± 23.436
2025-08-07 01:03:02,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [721.62885, 688.9044, 710.01117, 701.1576, 673.9769, 668.84656, 728.1322, 654.5451, 720.28516, 695.4448]
2025-08-07 01:03:02,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:03:02,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (696.29) for latency ExtremeSparseL4U32
2025-08-07 01:03:02,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 58 minutes, 16 seconds)
2025-08-07 01:04:49,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:05:07,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 812.43781 ± 18.807
2025-08-07 01:05:07,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [788.05756, 800.4889, 859.8804, 803.8693, 818.22845, 819.9458, 805.0849, 814.8959, 795.43506, 818.4919]
2025-08-07 01:05:07,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:05:07,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (812.44) for latency ExtremeSparseL4U32
2025-08-07 01:05:07,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 2 minutes, 14 seconds)
2025-08-07 01:06:53,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:07:12,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 794.90869 ± 13.815
2025-08-07 01:07:12,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [796.0568, 810.8898, 789.68774, 785.4788, 828.90643, 785.897, 783.2462, 782.7558, 795.107, 791.06165]
2025-08-07 01:07:12,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:07:12,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 5 minutes, 15 seconds)
2025-08-07 01:08:51,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:09:07,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 686.85022 ± 192.062
2025-08-07 01:09:07,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [812.5153, 814.4195, 327.54745, 789.4844, 333.38196, 794.77606, 810.09937, 829.7145, 566.8218, 789.7416]
2025-08-07 01:09:07,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 366.0, 1000.0, 421.0, 1000.0, 1000.0, 1000.0, 812.0, 1000.0]
2025-08-07 01:09:07,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 3 minutes, 27 seconds)
2025-08-07 01:10:51,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:11:01,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 458.77753 ± 278.447
2025-08-07 01:11:01,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [636.9842, 223.90776, 881.9689, 201.68434, 388.01627, 153.82715, 852.9875, 161.3111, 335.02994, 752.0581]
2025-08-07 01:11:01,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [618.0, 244.0, 1000.0, 182.0, 404.0, 139.0, 1000.0, 160.0, 340.0, 1000.0]
2025-08-07 01:11:01,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 56 minutes, 52 seconds)
2025-08-07 01:12:49,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:13:04,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 646.99048 ± 247.681
2025-08-07 01:13:04,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [355.07858, 842.5783, 840.174, 832.1243, 172.44273, 431.6009, 469.07086, 821.1111, 846.9493, 858.77484]
2025-08-07 01:13:04,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [376.0, 1000.0, 1000.0, 1000.0, 131.0, 1000.0, 480.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:13:04,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 54 minutes, 21 seconds)
2025-08-07 01:14:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:15:04,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 581.89197 ± 208.622
2025-08-07 01:15:04,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [579.9756, 679.1933, 515.79956, 809.2265, 154.50273, 281.99573, 563.9702, 805.0584, 791.29297, 637.9048]
2025-08-07 01:15:04,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 567.0, 1000.0, 151.0, 302.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:15:04,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 51 minutes, 5 seconds)
2025-08-07 01:16:48,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:00,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 535.76581 ± 290.340
2025-08-07 01:17:00,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [749.217, 144.34464, 853.94543, 750.1671, 747.0264, 760.5819, 757.8307, 160.50189, 282.16656, 151.87657]
2025-08-07 01:17:00,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 136.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 184.0, 267.0, 145.0]
2025-08-07 01:17:00,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 46 minutes, 44 seconds)
2025-08-07 01:18:45,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:01,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 779.74359 ± 220.668
2025-08-07 01:19:01,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [978.87604, 918.83417, 945.9209, 1053.3865, 875.61884, 485.0263, 779.8908, 655.8138, 786.9132, 317.15598]
2025-08-07 01:19:01,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 710.0, 1000.0, 301.0]
2025-08-07 01:19:01,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 46 minutes, 24 seconds)
2025-08-07 01:20:49,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:04,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 802.54236 ± 397.298
2025-08-07 01:21:04,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1051.7584, 1150.5142, 701.1738, 1219.4028, 1046.2893, 91.71498, 1133.4266, 819.5537, 72.67299, 738.9163]
2025-08-07 01:21:04,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 73.0, 1000.0, 910.0, 61.0, 1000.0]
2025-08-07 01:21:04,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 46 minutes, 55 seconds)
2025-08-07 01:22:42,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:22:51,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 519.67236 ± 326.164
2025-08-07 01:22:51,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [310.10065, 185.82294, 495.31073, 89.565216, 507.53238, 264.0041, 620.9331, 1261.5488, 669.9606, 791.9449]
2025-08-07 01:22:51,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [253.0, 151.0, 348.0, 75.0, 499.0, 220.0, 487.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:22:51,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 34 seconds)
2025-08-07 01:24:45,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:24:54,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 472.48599 ± 344.895
2025-08-07 01:24:54,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [808.78033, 981.125, 257.85916, 90.6543, 888.0897, 87.74268, 446.54416, 51.45015, 794.1198, 318.49466]
2025-08-07 01:24:54,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 265.0, 72.0, 938.0, 73.0, 384.0, 46.0, 1000.0, 223.0]
2025-08-07 01:24:54,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 39 minutes, 26 seconds)
2025-08-07 01:26:34,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:26:47,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 836.66931 ± 356.957
2025-08-07 01:26:47,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [843.3771, 1307.0891, 416.60156, 479.3803, 796.9664, 1286.3295, 1055.0409, 1281.9158, 450.56958, 449.42322]
2025-08-07 01:26:47,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 345.0, 390.0, 1000.0, 1000.0, 1000.0, 1000.0, 407.0, 363.0]
2025-08-07 01:26:47,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (836.67) for latency ExtremeSparseL4U32
2025-08-07 01:26:47,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes, 34 seconds)
2025-08-07 01:28:30,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:28:39,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 637.77216 ± 517.054
2025-08-07 01:28:39,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [561.6023, 1311.4999, 475.659, 1526.0417, 119.49158, 1341.9187, 384.30176, 343.83362, 55.526035, 257.847]
2025-08-07 01:28:39,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [391.0, 1000.0, 357.0, 1000.0, 78.0, 1000.0, 265.0, 213.0, 51.0, 202.0]
2025-08-07 01:28:39,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 32 minutes, 6 seconds)
2025-08-07 01:30:25,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:30:42,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1309.09668 ± 326.771
2025-08-07 01:30:42,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1086.2291, 1348.1431, 1299.939, 1402.7568, 1464.361, 1573.5212, 1572.5823, 1521.935, 420.02097, 1401.48]
2025-08-07 01:30:42,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 865.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 313.0, 1000.0]
2025-08-07 01:30:42,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1309.10) for latency ExtremeSparseL4U32
2025-08-07 01:30:42,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 17 seconds)
2025-08-07 01:32:23,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:39,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1214.45898 ± 474.964
2025-08-07 01:32:39,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1555.9028, 1540.0698, 109.68176, 1653.9738, 1152.3999, 1567.6017, 1550.9646, 726.2202, 1398.3132, 889.46234]
2025-08-07 01:32:39,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 89.0, 1000.0, 1000.0, 1000.0, 1000.0, 543.0, 1000.0, 1000.0]
2025-08-07 01:32:39,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes, 54 seconds)
2025-08-07 01:34:25,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:34,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 680.80402 ± 446.094
2025-08-07 01:34:34,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [815.47766, 375.41025, 1476.6459, 1332.485, 360.6157, 348.8718, 939.0432, 304.80978, 59.03329, 795.6474]
2025-08-07 01:34:34,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [585.0, 236.0, 1000.0, 872.0, 215.0, 232.0, 537.0, 208.0, 52.0, 1000.0]
2025-08-07 01:34:34,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 49 seconds)
2025-08-07 01:36:19,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:27,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 621.70789 ± 523.864
2025-08-07 01:36:27,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1699.7626, 233.24881, 305.1817, 505.56244, 47.44807, 467.971, 573.22687, 206.89197, 1509.1301, 668.6558]
2025-08-07 01:36:27,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [998.0, 149.0, 258.0, 463.0, 44.0, 244.0, 348.0, 147.0, 1000.0, 469.0]
2025-08-07 01:36:27,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 49 seconds)
2025-08-07 01:38:16,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 963.27502 ± 464.597
2025-08-07 01:38:28,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [876.819, 1440.6039, 295.75812, 947.2344, 146.92444, 746.6391, 958.20026, 1644.1176, 1499.45, 1077.0032]
2025-08-07 01:38:28,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 219.0, 489.0, 114.0, 461.0, 1000.0, 1000.0, 918.0, 630.0]
2025-08-07 01:38:28,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 25 minutes, 21 seconds)
2025-08-07 01:40:03,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:14,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 906.64716 ± 490.349
2025-08-07 01:40:14,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1188.7454, 1567.094, 1166.0723, 1578.0559, 611.4052, 1083.755, 875.78235, 156.4794, 92.60142, 746.4807]
2025-08-07 01:40:14,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 961.0, 722.0, 1000.0, 402.0, 1000.0, 531.0, 95.0, 64.0, 485.0]
2025-08-07 01:40:14,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 19 minutes, 9 seconds)
2025-08-07 01:42:01,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:42:17,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1430.79456 ± 610.335
2025-08-07 01:42:17,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1738.3259, 944.65955, 121.170975, 592.8237, 1654.5028, 1921.1848, 1717.5663, 1782.8407, 1974.5817, 1860.2891]
2025-08-07 01:42:17,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 106.0, 358.0, 1000.0, 1000.0, 1000.0, 1000.0, 984.0, 1000.0]
2025-08-07 01:42:17,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1430.79) for latency ExtremeSparseL4U32
2025-08-07 01:42:17,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 18 minutes, 39 seconds)
2025-08-07 01:44:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:44:15,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1011.05615 ± 499.270
2025-08-07 01:44:15,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [284.5024, 1182.772, 1800.1687, 1128.1765, 425.7029, 876.54803, 449.48273, 948.5021, 1758.6387, 1256.0673]
2025-08-07 01:44:15,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [158.0, 711.0, 1000.0, 665.0, 248.0, 510.0, 274.0, 526.0, 1000.0, 632.0]
2025-08-07 01:44:15,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 23 seconds)
2025-08-07 01:45:53,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:46:03,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 880.44043 ± 516.827
2025-08-07 01:46:03,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1500.8108, 1684.6246, 817.5307, 187.68802, 1492.3757, 879.96185, 115.474335, 882.3619, 445.40808, 798.16846]
2025-08-07 01:46:03,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [790.0, 907.0, 600.0, 150.0, 1000.0, 1000.0, 92.0, 460.0, 240.0, 405.0]
2025-08-07 01:46:03,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 14 minutes, 29 seconds)
2025-08-07 01:47:53,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:48:04,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1115.66650 ± 701.201
2025-08-07 01:48:04,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1883.4316, 283.83026, 1791.487, 1523.0945, 539.3631, 1829.0469, 162.66882, 995.3109, 1863.3666, 285.06506]
2025-08-07 01:48:04,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 279.0, 1000.0, 1000.0, 269.0, 1000.0, 138.0, 714.0, 1000.0, 168.0]
2025-08-07 01:48:04,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 12 minutes, 34 seconds)
2025-08-07 01:49:44,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:58,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1313.44800 ± 614.458
2025-08-07 01:49:58,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [203.44498, 1754.841, 1170.6454, 1207.683, 1935.9026, 1448.8474, 1829.4851, 1696.466, 1726.107, 161.05803]
2025-08-07 01:49:58,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [110.0, 1000.0, 1000.0, 578.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 121.0]
2025-08-07 01:49:58,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 22 seconds)
2025-08-07 01:51:51,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:05,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1415.11096 ± 426.810
2025-08-07 01:52:05,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1634.9866, 2008.4442, 1637.8945, 1179.9374, 630.6021, 1734.023, 704.5528, 1698.2471, 1531.4742, 1390.9478]
2025-08-07 01:52:05,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 970.0, 1000.0, 615.0, 324.0, 1000.0, 322.0, 816.0, 1000.0, 757.0]
2025-08-07 01:52:05,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 11 minutes, 27 seconds)
2025-08-07 01:53:42,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:53:56,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1241.92871 ± 469.574
2025-08-07 01:53:56,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1572.9325, 626.3544, 805.896, 1456.3706, 1679.5605, 1627.5844, 1433.5366, 365.933, 1059.0796, 1792.0394]
2025-08-07 01:53:56,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 332.0, 484.0, 1000.0, 1000.0, 1000.0, 1000.0, 176.0, 598.0, 1000.0]
2025-08-07 01:53:56,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 56 seconds)
2025-08-07 01:55:41,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:55,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1482.80908 ± 670.873
2025-08-07 01:55:55,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1948.4324, 1813.4045, 455.85056, 1842.2064, 1860.9279, 1984.0887, 476.36487, 1846.3708, 2132.3154, 468.12897]
2025-08-07 01:55:55,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 276.0, 1000.0, 880.0, 1000.0, 237.0, 1000.0, 1000.0, 248.0]
2025-08-07 01:55:55,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1482.81) for latency ExtremeSparseL4U32
2025-08-07 01:55:55,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 8 minutes, 18 seconds)
2025-08-07 01:57:46,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:58:02,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1822.23376 ± 423.466
2025-08-07 01:58:02,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [627.45825, 2195.3503, 2055.244, 2120.7317, 1988.6274, 1975.2203, 1720.9894, 1958.3896, 1814.672, 1765.6548]
2025-08-07 01:58:02,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [336.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 836.0, 1000.0, 900.0, 865.0]
2025-08-07 01:58:02,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1822.23) for latency ExtremeSparseL4U32
2025-08-07 01:58:02,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 7 minutes, 28 seconds)
2025-08-07 01:59:45,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:59,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1401.63196 ± 675.685
2025-08-07 01:59:59,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1815.3727, 1697.7186, 1951.0321, 985.50366, 98.6584, 1917.8691, 1260.292, 1962.6333, 2006.0592, 321.1801]
2025-08-07 01:59:59,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 530.0, 102.0, 1000.0, 1000.0, 1000.0, 1000.0, 209.0]
2025-08-07 01:59:59,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 18 seconds)
2025-08-07 02:01:38,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:50,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1312.33264 ± 838.759
2025-08-07 02:01:50,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [311.84857, 340.2165, 2283.5361, 129.74706, 1021.02783, 2086.68, 2267.2583, 781.2254, 2114.4504, 1787.3365]
2025-08-07 02:01:50,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [184.0, 162.0, 1000.0, 121.0, 1000.0, 1000.0, 1000.0, 466.0, 1000.0, 820.0]
2025-08-07 02:01:50,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 49 seconds)
2025-08-07 02:03:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:54,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2005.61597 ± 212.598
2025-08-07 02:03:54,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2108.628, 1963.0359, 1788.9443, 2137.729, 2039.2769, 2145.3691, 2254.3044, 2161.376, 1968.6161, 1488.8806]
2025-08-07 02:03:54,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 902.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 689.0]
2025-08-07 02:03:54,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2005.62) for latency ExtremeSparseL4U32
2025-08-07 02:03:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute, 32 seconds)
2025-08-07 02:05:40,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:05:55,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1697.49573 ± 611.018
2025-08-07 02:05:55,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1879.925, 2099.122, 1301.4939, 2337.0344, 254.15045, 1997.902, 2075.0142, 2141.8032, 1037.2295, 1851.2828]
2025-08-07 02:05:55,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [931.0, 1000.0, 586.0, 1000.0, 159.0, 1000.0, 1000.0, 1000.0, 517.0, 1000.0]
2025-08-07 02:05:55,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 57 seconds)
2025-08-07 02:07:44,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:59,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1803.21069 ± 623.176
2025-08-07 02:07:59,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2190.598, 2385.0073, 2421.1929, 808.76886, 2047.4893, 2083.5403, 2121.4954, 748.72284, 1064.9333, 2160.3594]
2025-08-07 02:07:59,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 382.0, 1000.0, 897.0, 1000.0, 405.0, 519.0, 1000.0]
2025-08-07 02:07:59,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 21 seconds)
2025-08-07 02:09:45,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:09:55,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1083.83496 ± 708.908
2025-08-07 02:09:55,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [160.67545, 256.34402, 1136.6561, 2173.5747, 2047.5076, 595.1543, 1527.672, 434.1321, 1729.8594, 776.7733]
2025-08-07 02:09:55,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [93.0, 152.0, 536.0, 1000.0, 1000.0, 298.0, 683.0, 233.0, 1000.0, 447.0]
2025-08-07 02:09:55,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 4 seconds)
2025-08-07 02:11:37,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:45,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1137.33862 ± 682.915
2025-08-07 02:11:45,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [751.64874, 264.90503, 331.95044, 1123.3845, 1330.9337, 1093.7418, 591.85443, 1264.8368, 2285.305, 2334.8252]
2025-08-07 02:11:45,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [326.0, 127.0, 168.0, 464.0, 633.0, 457.0, 255.0, 502.0, 950.0, 1000.0]
2025-08-07 02:11:45,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 4 seconds)
2025-08-07 02:13:30,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:45,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1905.73499 ± 654.599
2025-08-07 02:13:45,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2235.0825, 2200.654, 2375.8696, 622.02783, 2189.573, 2201.0312, 2303.554, 583.89996, 2124.4917, 2221.1655]
2025-08-07 02:13:45,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 349.0, 1000.0, 1000.0, 1000.0, 290.0, 1000.0, 1000.0]
2025-08-07 02:13:45,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 50 minutes, 20 seconds)
2025-08-07 02:15:31,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:41,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1008.53143 ± 788.773
2025-08-07 02:15:41,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [113.482765, 2317.778, 786.8126, 2163.43, 135.72694, 360.06308, 249.6044, 1688.748, 1322.6732, 946.9953]
2025-08-07 02:15:41,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 1000.0, 434.0, 1000.0, 69.0, 157.0, 118.0, 1000.0, 1000.0, 487.0]
2025-08-07 02:15:41,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 47 minutes, 22 seconds)
2025-08-07 02:17:28,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:44,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2032.47009 ± 469.934
2025-08-07 02:17:44,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2432.5027, 2343.2068, 2364.3054, 1712.0961, 2126.575, 918.43414, 2480.131, 1550.6133, 2171.5278, 2225.3088]
2025-08-07 02:17:44,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 784.0, 809.0, 454.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:17:44,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2032.47) for latency ExtremeSparseL4U32
2025-08-07 02:17:44,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 26 seconds)
2025-08-07 02:19:21,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:34,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1324.38635 ± 642.018
2025-08-07 02:19:34,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1866.4785, 1800.4976, 1275.5255, 2130.9668, 2262.6887, 1008.39343, 394.06076, 330.9792, 1126.9437, 1047.3298]
2025-08-07 02:19:34,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [968.0, 1000.0, 529.0, 1000.0, 1000.0, 1000.0, 205.0, 193.0, 635.0, 486.0]
2025-08-07 02:19:34,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 19 seconds)
2025-08-07 02:21:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:42,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1965.33521 ± 532.552
2025-08-07 02:21:42,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2366.8767, 2130.014, 2212.837, 1917.6123, 549.8603, 2234.1738, 2183.7256, 2263.9639, 2322.7097, 1471.5773]
2025-08-07 02:21:42,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 252.0, 1000.0, 1000.0, 1000.0, 1000.0, 665.0]
2025-08-07 02:21:42,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 28 seconds)
2025-08-07 02:23:20,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:32,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1682.81506 ± 777.996
2025-08-07 02:23:32,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1133.3501, 2355.0254, 2307.5637, 2305.3242, 1715.5885, 606.2363, 2548.4756, 1238.2524, 2342.3286, 276.00583]
2025-08-07 02:23:32,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [493.0, 1000.0, 1000.0, 1000.0, 745.0, 366.0, 1000.0, 540.0, 1000.0, 122.0]
2025-08-07 02:23:33,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 39 minutes, 52 seconds)
2025-08-07 02:25:16,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:28,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1557.71570 ± 822.363
2025-08-07 02:25:28,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [477.93076, 2429.7544, 1404.7152, 2468.3857, 2341.883, 2342.8567, 1092.4756, 2018.9308, 292.62, 707.6052]
2025-08-07 02:25:28,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [228.0, 1000.0, 583.0, 1000.0, 1000.0, 1000.0, 457.0, 867.0, 151.0, 344.0]
2025-08-07 02:25:28,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 37 minutes, 52 seconds)
2025-08-07 02:27:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:31,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2331.50903 ± 156.606
2025-08-07 02:27:31,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2323.28, 2376.5034, 2340.1833, 2486.7478, 2309.5876, 2368.563, 1954.7972, 2176.0347, 2512.1035, 2467.289]
2025-08-07 02:27:31,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:27:31,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2331.51) for latency ExtremeSparseL4U32
2025-08-07 02:27:31,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 48 seconds)
2025-08-07 02:29:21,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:36,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1965.91077 ± 544.324
2025-08-07 02:29:36,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1422.5037, 2570.9329, 1112.7372, 2399.86, 2380.6816, 2544.0825, 1051.3809, 1981.4465, 2035.9407, 2159.541]
2025-08-07 02:29:36,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [628.0, 1000.0, 423.0, 1000.0, 1000.0, 1000.0, 467.0, 1000.0, 1000.0, 900.0]
2025-08-07 02:29:36,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes, 22 seconds)
2025-08-07 02:31:17,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:31:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2156.53857 ± 628.511
2025-08-07 02:31:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1752.9773, 2554.6077, 2615.6418, 2385.844, 2277.429, 2328.0898, 2324.9421, 401.16357, 2573.7048, 2350.9849]
2025-08-07 02:31:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [749.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 175.0, 1000.0, 1000.0]
2025-08-07 02:31:33,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 32 minutes, 30 seconds)
2025-08-07 02:33:16,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:33:30,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1798.49707 ± 750.458
2025-08-07 02:33:30,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1778.5521, 2249.2886, 2399.3965, 2457.7534, 2116.1104, 2306.4712, 1821.1991, 213.03871, 513.94403, 2129.216]
2025-08-07 02:33:30,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [802.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 798.0, 113.0, 209.0, 1000.0]
2025-08-07 02:33:30,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 40 seconds)
2025-08-07 02:35:15,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:35:27,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1549.36475 ± 824.276
2025-08-07 02:35:27,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [247.006, 210.26144, 701.233, 2408.4502, 2106.7944, 2315.5872, 1327.5948, 2121.5686, 2268.2192, 1786.9329]
2025-08-07 02:35:27,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [134.0, 148.0, 315.0, 1000.0, 910.0, 1000.0, 600.0, 1000.0, 1000.0, 726.0]
2025-08-07 02:35:27,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 52 seconds)
2025-08-07 02:37:10,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:27,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2224.42822 ± 630.106
2025-08-07 02:37:27,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2447.57, 2564.5317, 2178.447, 2260.2278, 2355.476, 2590.7444, 376.54056, 2612.039, 2404.8447, 2453.8596]
2025-08-07 02:37:27,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 972.0, 1000.0, 886.0, 1000.0, 1000.0, 207.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:37:27,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 27 minutes, 21 seconds)
2025-08-07 02:39:15,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:39:32,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2248.12622 ± 358.341
2025-08-07 02:39:32,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2324.9941, 2404.6934, 2350.8936, 2390.1328, 2530.3452, 2185.4104, 1200.139, 2370.3125, 2370.3352, 2354.005]
2025-08-07 02:39:32,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 997.0, 1000.0, 542.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:39:32,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 27 seconds)
2025-08-07 02:41:16,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:31,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1968.39551 ± 507.550
2025-08-07 02:41:31,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2540.6584, 2044.0538, 2466.048, 1633.5007, 2661.7322, 1449.3136, 1740.3103, 2477.0195, 1232.0936, 1439.2246]
2025-08-07 02:41:31,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 806.0, 1000.0, 656.0, 1000.0, 592.0, 740.0, 1000.0, 1000.0, 536.0]
2025-08-07 02:41:31,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 47 seconds)
2025-08-07 02:43:23,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:43:36,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1785.66724 ± 746.143
2025-08-07 02:43:36,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1319.2089, 2577.9583, 924.2693, 2457.9607, 2503.1558, 749.58307, 2266.5642, 1484.0281, 2686.2754, 887.6672]
2025-08-07 02:43:36,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [693.0, 1000.0, 381.0, 1000.0, 1000.0, 309.0, 1000.0, 659.0, 1000.0, 413.0]
2025-08-07 02:43:36,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 45 seconds)
2025-08-07 02:45:20,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:45:30,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1337.04761 ± 482.156
2025-08-07 02:45:30,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [978.5488, 2158.1982, 2164.4937, 801.5811, 1246.6127, 1636.8126, 983.6904, 766.48865, 1342.5918, 1291.458]
2025-08-07 02:45:30,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [375.0, 1000.0, 1000.0, 342.0, 661.0, 636.0, 429.0, 315.0, 524.0, 552.0]
2025-08-07 02:45:30,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 23 seconds)
2025-08-07 02:47:07,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:47:22,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1884.04077 ± 916.983
2025-08-07 02:47:22,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2426.6956, 2481.1936, 2327.2197, 2324.772, 2587.5413, 2174.9875, 1884.0581, 2458.5725, 46.370525, 128.9969]
2025-08-07 02:47:22,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [950.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 865.0, 1000.0, 43.0, 80.0]
2025-08-07 02:47:22,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 21 seconds)
2025-08-07 02:49:13,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:30,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2273.80371 ± 467.165
2025-08-07 02:49:30,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2281.7388, 1980.3228, 2473.466, 2538.9954, 2584.737, 2408.654, 2424.336, 967.2692, 2555.9417, 2522.5757]
2025-08-07 02:49:30,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 391.0, 1000.0, 1000.0]
2025-08-07 02:49:30,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 42 seconds)
2025-08-07 02:51:14,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:51:27,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1629.32886 ± 733.857
2025-08-07 02:51:27,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [73.637794, 2330.3547, 1624.3134, 2267.1943, 1031.996, 858.16254, 2294.205, 2201.4534, 1429.3967, 2182.5767]
2025-08-07 02:51:27,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 1000.0, 634.0, 1000.0, 475.0, 425.0, 1000.0, 1000.0, 628.0, 1000.0]
2025-08-07 02:51:27,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 30 seconds)
2025-08-07 02:53:05,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:19,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1940.44336 ± 964.327
2025-08-07 02:53:19,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2687.713, 2602.5254, 2622.5757, 127.38584, 2453.6167, 810.2528, 540.2564, 2364.8872, 2547.986, 2647.2349]
2025-08-07 02:53:19,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 76.0, 1000.0, 397.0, 267.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:53:19,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 9 minutes, 59 seconds)
2025-08-07 02:55:07,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:25,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2434.30615 ± 113.267
2025-08-07 02:55:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2410.7559, 2312.3733, 2397.9202, 2513.8975, 2286.2656, 2430.4763, 2398.7378, 2356.8142, 2683.4487, 2552.3708]
2025-08-07 02:55:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:55:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2434.31) for latency ExtremeSparseL4U32
2025-08-07 02:55:25,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 26 seconds)
2025-08-07 02:57:13,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:30,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2303.30908 ± 687.951
2025-08-07 02:57:30,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2542.8843, 2386.0674, 2346.9958, 2514.933, 2650.6265, 264.2886, 2567.6377, 2551.6074, 2728.9485, 2479.1023]
2025-08-07 02:57:30,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 103.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:57:30,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 55 seconds)
2025-08-07 02:59:22,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:37,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2002.76660 ± 774.678
2025-08-07 02:59:37,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2247.254, 2635.2473, 2599.6406, 108.6551, 2436.3123, 2278.4065, 2582.87, 2245.0728, 1847.6356, 1046.5724]
2025-08-07 02:59:37,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 80.0, 1000.0, 1000.0, 1000.0, 1000.0, 874.0, 444.0]
2025-08-07 02:59:37,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 46 seconds)
2025-08-07 03:01:21,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:36,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2027.71606 ± 679.451
2025-08-07 03:01:36,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2255.449, 2303.7236, 2478.5066, 2258.3372, 859.1532, 2454.9365, 2403.405, 2498.3943, 2245.494, 519.7601]
2025-08-07 03:01:36,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 369.0, 1000.0, 1000.0, 1000.0, 1000.0, 270.0]
2025-08-07 03:01:36,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 58 seconds)
2025-08-07 03:03:28,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:40,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1467.47668 ± 806.689
2025-08-07 03:03:40,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [285.2817, 2382.5894, 2169.9153, 733.25867, 2404.953, 2489.9426, 509.03812, 865.1655, 1296.395, 1538.2279]
2025-08-07 03:03:40,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [138.0, 1000.0, 1000.0, 359.0, 1000.0, 1000.0, 291.0, 337.0, 581.0, 1000.0]
2025-08-07 03:03:40,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 4 minutes, 9 seconds)
2025-08-07 03:05:24,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:38,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1879.39490 ± 869.243
2025-08-07 03:05:38,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2523.8018, 1816.9551, 2599.617, 499.4347, 1176.2881, 2286.7256, 240.65411, 2498.1733, 2492.1829, 2660.1165]
2025-08-07 03:05:38,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 742.0, 1000.0, 237.0, 540.0, 1000.0, 119.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:05:38,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 1 minute, 15 seconds)
2025-08-07 03:07:17,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:31,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1753.13708 ± 849.264
2025-08-07 03:07:31,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [506.86163, 549.53735, 749.0731, 1256.3743, 2590.2617, 2534.57, 1879.0803, 2514.8486, 2425.53, 2525.233]
2025-08-07 03:07:31,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [221.0, 222.0, 1000.0, 579.0, 1000.0, 1000.0, 765.0, 955.0, 1000.0, 1000.0]
2025-08-07 03:07:31,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 7 seconds)
2025-08-07 03:09:20,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:36,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2095.91943 ± 615.170
2025-08-07 03:09:36,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2499.8384, 2350.3572, 2177.1243, 2281.8018, 2203.6584, 2433.3699, 2500.8862, 2536.6455, 451.3161, 1524.1975]
2025-08-07 03:09:36,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 225.0, 622.0]
2025-08-07 03:09:36,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 51 seconds)
2025-08-07 03:11:24,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2198.79468 ± 521.540
2025-08-07 03:11:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2663.63, 1994.9409, 820.98834, 2462.6619, 2408.2747, 2617.8403, 2492.5002, 1893.8008, 2500.899, 2132.4116]
2025-08-07 03:11:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 743.0, 304.0, 1000.0, 1000.0, 1000.0, 1000.0, 750.0, 1000.0, 993.0]
2025-08-07 03:11:39,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 17 seconds)
2025-08-07 03:13:24,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:41,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2081.94360 ± 670.180
2025-08-07 03:13:41,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2303.2512, 356.9184, 1436.2819, 2125.8628, 2621.8408, 2435.5408, 1963.1553, 2512.57, 2647.6025, 2416.4124]
2025-08-07 03:13:41,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 203.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:13:41,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 5 seconds)
2025-08-07 03:15:32,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:49,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2355.55859 ± 402.181
2025-08-07 03:15:49,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2326.653, 2432.513, 2746.2383, 2491.091, 1269.4539, 2628.5115, 2538.5894, 2056.8833, 2537.993, 2527.66]
2025-08-07 03:15:49,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 937.0, 1000.0, 1000.0, 563.0, 1000.0, 1000.0, 808.0, 1000.0, 1000.0]
2025-08-07 03:15:49,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 53 seconds)
2025-08-07 03:17:30,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:46,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2152.37891 ± 703.536
2025-08-07 03:17:46,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2456.1619, 2624.497, 2535.4465, 2424.055, 2278.4917, 2612.02, 876.86835, 655.78217, 2661.4636, 2399.0037]
2025-08-07 03:17:46,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 360.0, 261.0, 1000.0, 1000.0]
2025-08-07 03:17:46,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 49 minutes, 10 seconds)
2025-08-07 03:19:36,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2374.60693 ± 711.070
2025-08-07 03:19:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2505.0881, 2615.441, 2773.755, 2572.0579, 2762.3638, 2478.4536, 278.6753, 2775.6338, 2634.2217, 2350.3777]
2025-08-07 03:19:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 127.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:19:52,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 47 minutes, 17 seconds)
2025-08-07 03:21:43,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2334.38428 ± 567.219
2025-08-07 03:22:00,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2519.6177, 2369.312, 2895.2908, 2724.484, 2123.5527, 2342.5388, 2639.1528, 2604.952, 2372.6384, 752.303]
2025-08-07 03:22:00,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 874.0, 1000.0, 1000.0, 1000.0, 1000.0, 353.0]
2025-08-07 03:22:00,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 45 minutes, 28 seconds)
2025-08-07 03:23:42,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:58,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2229.63525 ± 594.817
2025-08-07 03:23:58,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2450.6477, 2313.508, 2311.8271, 2538.5867, 2332.8284, 480.02997, 2302.4626, 2487.638, 2688.4568, 2390.3662]
2025-08-07 03:23:58,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 227.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:23:58,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 43 minutes, 12 seconds)
2025-08-07 03:25:43,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:58,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2021.51624 ± 830.743
2025-08-07 03:25:58,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [693.3845, 202.6643, 1817.9287, 2376.5276, 2231.9175, 2720.665, 2453.1045, 2458.2224, 2558.8516, 2701.8965]
2025-08-07 03:25:58,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [325.0, 101.0, 823.0, 1000.0, 789.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:25:58,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 36 seconds)
2025-08-07 03:27:50,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:28:07,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2333.90283 ± 658.250
2025-08-07 03:28:07,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2497.832, 1801.087, 2608.032, 2599.3018, 2810.111, 2458.2334, 529.8332, 2654.2192, 2797.8906, 2582.4866]
2025-08-07 03:28:07,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 730.0, 1000.0, 1000.0, 1000.0, 1000.0, 255.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:28:07,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 19 seconds)
2025-08-07 03:29:53,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:08,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2119.73511 ± 650.588
2025-08-07 03:30:08,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2164.1724, 2837.6184, 2710.6865, 1601.1758, 2274.0935, 1613.4174, 2388.402, 573.4598, 2379.3955, 2654.9297]
2025-08-07 03:30:08,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [850.0, 1000.0, 1000.0, 634.0, 1000.0, 604.0, 1000.0, 270.0, 1000.0, 1000.0]
2025-08-07 03:30:08,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 56 seconds)
2025-08-07 03:31:54,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:08,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1934.66187 ± 822.513
2025-08-07 03:32:08,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [755.6507, 2638.7632, 2520.724, 2438.5632, 2817.1975, 2485.646, 982.55566, 2485.3958, 548.4066, 1673.7172]
2025-08-07 03:32:08,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [450.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 405.0, 1000.0, 245.0, 675.0]
2025-08-07 03:32:08,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 28 seconds)
2025-08-07 03:33:59,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:14,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2085.47290 ± 802.322
2025-08-07 03:34:14,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2532.8225, 2426.2695, 1298.5105, 130.56645, 1499.8458, 2707.0796, 2497.044, 2514.778, 2716.3977, 2531.414]
2025-08-07 03:34:14,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 491.0, 81.0, 625.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:34:14,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 50 seconds)
2025-08-07 03:35:55,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:10,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1952.84448 ± 733.114
2025-08-07 03:36:10,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2451.2078, 2459.1687, 2149.9548, 103.52524, 2521.1228, 2353.6245, 1100.8419, 1880.9762, 2167.5347, 2340.4893]
2025-08-07 03:36:10,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 68.0, 1000.0, 1000.0, 448.0, 741.0, 1000.0, 1000.0]
2025-08-07 03:36:10,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 35 seconds)
2025-08-07 03:37:55,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:07,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1545.44373 ± 927.646
2025-08-07 03:38:07,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1140.6326, 192.80977, 1441.3369, 2962.1194, 1199.2249, 2479.4705, 1703.4417, 1612.7112, 2685.059, 37.63288]
2025-08-07 03:38:07,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 93.0, 599.0, 1000.0, 477.0, 1000.0, 717.0, 1000.0, 1000.0, 45.0]
2025-08-07 03:38:07,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 1 second)
2025-08-07 03:40:00,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:19,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2534.45044 ± 186.933
2025-08-07 03:40:19,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2261.888, 2213.281, 2648.1372, 2573.692, 2712.182, 2709.189, 2780.1091, 2349.2073, 2561.79, 2535.0276]
2025-08-07 03:40:19,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:40:19,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2534.45) for latency ExtremeSparseL4U32
2025-08-07 03:40:19,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 26 seconds)
2025-08-07 03:41:58,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:16,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2384.50098 ± 178.851
2025-08-07 03:42:16,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2625.9326, 2420.279, 2027.3701, 2445.2183, 2162.2334, 2438.6426, 2214.7134, 2562.9663, 2481.2268, 2466.4285]
2025-08-07 03:42:16,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:42:16,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 19 seconds)
2025-08-07 03:44:02,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:18,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2169.70679 ± 756.374
2025-08-07 03:44:18,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2357.4434, 2616.622, 2766.8123, 2619.7898, 2583.3062, 2443.1677, 2249.442, 2403.767, 113.68724, 1543.0284]
2025-08-07 03:44:18,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 70.0, 595.0]
2025-08-07 03:44:18,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 8 seconds)
2025-08-07 03:46:06,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:22,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2343.18140 ± 759.892
2025-08-07 03:46:22,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2553.341, 2538.7168, 2816.4143, 2595.28, 2172.6921, 2640.5925, 129.03175, 2625.0305, 2877.4094, 2483.306]
2025-08-07 03:46:22,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 77.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:46:22,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 24 seconds)
2025-08-07 03:48:16,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:34,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2392.36035 ± 326.248
2025-08-07 03:48:34,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1525.5667, 2715.0461, 2672.717, 2455.1782, 2405.9744, 2512.5017, 2559.7253, 2578.5864, 2185.729, 2312.5808]
2025-08-07 03:48:34,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [644.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:48:34,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 47 seconds)
2025-08-07 03:50:13,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:29,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2168.85303 ± 676.324
2025-08-07 03:50:29,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [919.32544, 2623.951, 2595.4888, 2391.1794, 2622.1118, 2592.5645, 746.1342, 2457.9316, 2350.7725, 2389.068]
2025-08-07 03:50:29,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [325.0, 1000.0, 1000.0, 994.0, 1000.0, 1000.0, 349.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:50:29,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 16 seconds)
2025-08-07 03:52:18,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:52:34,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2115.12402 ± 708.478
2025-08-07 03:52:34,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2652.6978, 2397.0989, 2513.8289, 1516.0199, 260.2182, 2757.2954, 2289.4917, 1938.5653, 2269.9768, 2556.0469]
2025-08-07 03:52:34,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 615.0, 120.0, 1000.0, 1000.0, 790.0, 1000.0, 1000.0]
2025-08-07 03:52:34,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 24 seconds)
2025-08-07 03:54:20,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:36,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2331.86304 ± 541.172
2025-08-07 03:54:36,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [887.95386, 2512.0864, 2656.4458, 2589.3828, 2525.2703, 2700.0403, 2565.4568, 2631.0176, 1780.9254, 2470.05]
2025-08-07 03:54:36,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [375.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 680.0, 1000.0]
2025-08-07 03:54:36,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 22 seconds)
2025-08-07 03:56:21,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:38,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2409.16309 ± 535.443
2025-08-07 03:56:38,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1039.0469, 2667.915, 2673.0276, 1754.1158, 2482.2456, 2745.2886, 2715.6118, 2724.7366, 2669.4785, 2620.1643]
2025-08-07 03:56:38,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 714.0, 1000.0, 1000.0, 1000.0, 925.0, 1000.0, 1000.0]
2025-08-07 03:56:38,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 16 seconds)
2025-08-07 03:58:31,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:45,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1808.88147 ± 963.171
2025-08-07 03:58:45,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [666.3901, 2284.1777, 737.54584, 2543.9583, 887.59564, 2619.0437, 2527.2039, 313.7306, 2833.302, 2675.867]
2025-08-07 03:58:45,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [312.0, 852.0, 330.0, 1000.0, 1000.0, 1000.0, 1000.0, 132.0, 1000.0, 1000.0]
2025-08-07 03:58:45,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 8 seconds)
2025-08-07 04:00:28,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:46,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2582.85107 ± 102.221
2025-08-07 04:00:46,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2700.0366, 2654.0022, 2640.013, 2605.0986, 2479.3027, 2521.5356, 2634.4146, 2353.2568, 2552.3967, 2688.4539]
2025-08-07 04:00:46,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:00:46,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2582.85) for latency ExtremeSparseL4U32
2025-08-07 04:00:46,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 10 seconds)
2025-08-07 04:02:36,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2409.01221 ± 238.540
2025-08-07 04:02:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2637.705, 2191.3774, 2485.084, 2329.228, 2422.0002, 2279.6904, 1864.4033, 2656.2363, 2609.3105, 2615.0876]
2025-08-07 04:02:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 724.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:02:53,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 7 seconds)
2025-08-07 04:04:36,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:52,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2444.06982 ± 592.739
2025-08-07 04:04:52,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2763.144, 2950.1924, 2540.2437, 2539.559, 2723.63, 2678.5007, 992.94086, 1654.4899, 2905.7349, 2692.2651]
2025-08-07 04:04:52,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 392.0, 592.0, 1000.0, 1000.0]
2025-08-07 04:04:52,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 3 seconds)
2025-08-07 04:06:43,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:59,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2399.42432 ± 527.860
2025-08-07 04:06:59,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2738.9119, 1849.5475, 2816.995, 1043.424, 2458.7402, 2582.4143, 2857.4731, 2465.42, 2465.4915, 2715.8235]
2025-08-07 04:06:59,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 695.0, 1000.0, 388.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:06:59,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1251 [DEBUG]: Training session finished
