2025-08-07 00:48:05,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc15-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:05,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc15-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:05,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1454ba087890>}
2025-08-07 00:48:05,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:05,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1133 [INFO]: Creating new trainer
2025-08-07 00:48:05,297 baseline-bpql-noiseperc15-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=283, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:48:05,297 baseline-bpql-noiseperc15-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:06,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:06,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:50,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -83.19493 ± 113.794
2025-08-07 00:49:53,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-127.252495, -302.91476, 7.0926332, -257.3366, 15.267131, -12.934014, -27.611576, 17.092514, 8.172714, -151.5248]
2025-08-07 00:49:53,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [146.0, 323.0, 67.0, 334.0, 38.0, 63.0, 82.0, 36.0, 39.0, 135.0]
2025-08-07 00:49:53,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (-83.19) for latency ExtremeSparseL4U32
2025-08-07 00:49:53,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 56 minutes, 21 seconds)
2025-08-07 00:51:51,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:52,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -35.72791 ± 41.203
2025-08-07 00:51:52,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [3.6924648, -62.85089, 14.046058, -30.659342, -7.1661553, -8.595205, -61.46849, -114.94198, -86.70769, -2.6278596]
2025-08-07 00:51:52,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 114.0, 52.0, 78.0, 84.0, 63.0, 88.0, 159.0, 96.0, 51.0]
2025-08-07 00:51:52,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (-35.73) for latency ExtremeSparseL4U32
2025-08-07 00:51:52,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 5 seconds)
2025-08-07 00:53:38,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:39,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -12.89030 ± 23.651
2025-08-07 00:53:39,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-45.317844, -15.261437, 19.602255, 18.299046, -31.619112, -48.87167, 8.480995, 3.5079305, -24.594791, -13.128376]
2025-08-07 00:53:39,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [132.0, 58.0, 43.0, 60.0, 86.0, 102.0, 65.0, 85.0, 100.0, 64.0]
2025-08-07 00:53:39,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (-12.89) for latency ExtremeSparseL4U32
2025-08-07 00:53:39,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 59 minutes, 34 seconds)
2025-08-07 00:55:23,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:25,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -60.47746 ± 71.324
2025-08-07 00:55:25,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-40.158543, 6.200403, -53.74856, -19.101278, -193.28917, -174.0073, -10.853487, -9.014804, -120.83842, 10.036614]
2025-08-07 00:55:25,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [98.0, 91.0, 124.0, 140.0, 166.0, 286.0, 64.0, 63.0, 134.0, 59.0]
2025-08-07 00:55:25,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 55 minutes, 36 seconds)
2025-08-07 00:57:12,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:57:14,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -50.06136 ± 83.812
2025-08-07 00:57:14,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-43.77247, -52.75158, 23.413942, -68.47287, 26.097832, -41.731106, -155.88461, 34.26916, 17.326057, -239.108]
2025-08-07 00:57:14,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [140.0, 106.0, 73.0, 163.0, 75.0, 121.0, 234.0, 55.0, 63.0, 288.0]
2025-08-07 00:57:14,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 53 minutes, 40 seconds)
2025-08-07 00:59:11,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:59:17,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -107.36845 ± 185.125
2025-08-07 00:59:17,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-37.217285, -515.1865, -92.36241, 58.788628, 0.87483793, -29.72745, -414.07108, 25.304949, -78.45697, 8.368621]
2025-08-07 00:59:17,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [225.0, 1000.0, 186.0, 104.0, 56.0, 128.0, 1000.0, 147.0, 228.0, 122.0]
2025-08-07 00:59:17,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 56 minutes, 41 seconds)
2025-08-07 01:01:00,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:01:05,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 37.85198 ± 15.922
2025-08-07 01:01:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [12.406679, 36.552124, 18.424603, 55.671997, 46.41032, 27.734415, 64.16904, 24.797283, 43.5658, 48.7876]
2025-08-07 01:01:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 1000.0, 306.0, 135.0, 453.0, 165.0, 309.0, 224.0, 106.0, 60.0]
2025-08-07 01:01:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (37.85) for latency ExtremeSparseL4U32
2025-08-07 01:01:05,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 51 minutes, 10 seconds)
2025-08-07 01:03:03,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:07,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 40.31895 ± 12.551
2025-08-07 01:03:07,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [37.07545, 36.70579, 43.903458, 36.42052, 52.58209, 24.062172, 16.217815, 51.399483, 44.89992, 59.922756]
2025-08-07 01:03:07,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [484.0, 128.0, 110.0, 349.0, 171.0, 55.0, 192.0, 304.0, 163.0, 138.0]
2025-08-07 01:03:07,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (40.32) for latency ExtremeSparseL4U32
2025-08-07 01:03:07,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 54 minutes, 10 seconds)
2025-08-07 01:04:49,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:04:54,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 48.00048 ± 56.264
2025-08-07 01:04:54,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [78.495155, 194.12259, 49.594326, -30.135023, 15.193442, 57.631916, 28.753267, 24.512274, 48.124775, 13.71215]
2025-08-07 01:04:54,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [137.0, 1000.0, 145.0, 688.0, 51.0, 185.0, 55.0, 34.0, 95.0, 244.0]
2025-08-07 01:04:54,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (48.00) for latency ExtremeSparseL4U32
2025-08-07 01:04:54,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 52 minutes, 42 seconds)
2025-08-07 01:06:39,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:47,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 44.94357 ± 55.760
2025-08-07 01:06:47,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [195.50769, -3.8289983, 54.532925, 14.751263, 44.17866, 45.488583, 16.15052, 29.05077, -13.814412, 67.4187]
2025-08-07 01:06:47,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 335.0, 95.0, 191.0, 143.0, 286.0, 164.0, 207.0, 791.0, 1000.0]
2025-08-07 01:06:47,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 51 minutes, 46 seconds)
2025-08-07 01:08:45,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:48,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 44.41763 ± 41.735
2025-08-07 01:08:48,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [63.25987, -3.7765682, 43.429405, 40.169624, 33.774055, 70.1701, 0.69184893, 149.86275, 26.626373, 19.968803]
2025-08-07 01:08:48,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [168.0, 144.0, 92.0, 160.0, 143.0, 107.0, 223.0, 1000.0, 63.0, 25.0]
2025-08-07 01:08:48,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 49 minutes, 39 seconds)
2025-08-07 01:10:30,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:36,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 95.94270 ± 133.836
2025-08-07 01:10:36,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [79.598114, -11.788144, 403.06427, 297.7033, 9.504043, 32.08911, 2.2465057, 8.929971, 106.91295, 31.166803]
2025-08-07 01:10:36,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [100.0, 221.0, 1000.0, 1000.0, 53.0, 184.0, 93.0, 98.0, 259.0, 60.0]
2025-08-07 01:10:36,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (95.94) for latency ExtremeSparseL4U32
2025-08-07 01:10:36,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 47 minutes, 29 seconds)
2025-08-07 01:12:23,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:12:26,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 69.84071 ± 122.725
2025-08-07 01:12:26,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [53.57997, 433.98798, 5.871881, 43.2608, -4.4709063, 36.561985, 26.522715, 14.664009, 37.74309, 50.68554]
2025-08-07 01:12:26,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 1000.0, 113.0, 47.0, 107.0, 44.0, 129.0, 102.0, 65.0, 82.0]
2025-08-07 01:12:26,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 42 minutes, 8 seconds)
2025-08-07 01:14:21,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:24,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 78.23584 ± 119.977
2025-08-07 01:14:24,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [43.12912, 40.557083, 28.301716, 432.63174, 79.85096, -1.2832294, 15.343053, 54.836525, 51.43922, 37.55217]
2025-08-07 01:14:24,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 139.0, 46.0, 1000.0, 213.0, 94.0, 216.0, 87.0, 133.0, 68.0]
2025-08-07 01:14:24,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 43 minutes, 27 seconds)
2025-08-07 01:16:07,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:16:11,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 76.43028 ± 169.041
2025-08-07 01:16:11,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [37.33594, 21.886868, 49.976334, -13.294386, 10.313945, 8.752779, 24.41016, 580.2672, -1.4709756, 46.124992]
2025-08-07 01:16:11,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [172.0, 87.0, 91.0, 133.0, 158.0, 194.0, 98.0, 1000.0, 43.0, 62.0]
2025-08-07 01:16:11,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 39 minutes, 46 seconds)
2025-08-07 01:18:02,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:18:06,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 53.11533 ± 148.962
2025-08-07 01:18:06,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [34.697086, 55.34618, 492.52173, -39.91087, -28.203642, 16.920406, -15.414815, 9.432441, -8.204093, 13.968958]
2025-08-07 01:18:06,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [73.0, 124.0, 1000.0, 240.0, 295.0, 72.0, 218.0, 56.0, 88.0, 63.0]
2025-08-07 01:18:06,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 36 minutes, 10 seconds)
2025-08-07 01:19:53,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:54,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 23.01365 ± 14.391
2025-08-07 01:19:54,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [8.267364, 24.395195, 17.187069, 43.480923, 26.498056, 14.840976, 2.1041908, 51.686855, 25.888754, 15.787173]
2025-08-07 01:19:54,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 85.0, 101.0, 81.0, 84.0, 65.0, 76.0, 85.0, 51.0, 88.0]
2025-08-07 01:19:54,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 34 minutes, 35 seconds)
2025-08-07 01:21:43,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:46,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 18.62181 ± 44.625
2025-08-07 01:21:46,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [42.84154, 27.408415, 86.56254, -2.170539, 72.19044, -42.44169, -63.93724, -0.032283887, 36.878426, 28.918455]
2025-08-07 01:21:46,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 67.0, 309.0, 149.0, 78.0, 201.0, 229.0, 83.0, 94.0, 125.0]
2025-08-07 01:21:46,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 32 minutes, 58 seconds)
2025-08-07 01:23:35,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:39,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 73.65762 ± 147.660
2025-08-07 01:23:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [27.880322, -41.852207, 12.462015, 57.2312, 46.293037, 8.6443205, 44.162746, 509.45038, 38.21523, 34.089146]
2025-08-07 01:23:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [264.0, 419.0, 31.0, 100.0, 148.0, 18.0, 127.0, 1000.0, 57.0, 51.0]
2025-08-07 01:23:39,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 29 minutes, 38 seconds)
2025-08-07 01:25:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:36,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 65.02785 ± 120.823
2025-08-07 01:25:36,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [42.72031, 39.113365, 35.475548, -28.37466, 421.6954, 32.386894, 50.54271, 30.949951, 20.903486, 4.865511]
2025-08-07 01:25:36,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [298.0, 57.0, 228.0, 190.0, 1000.0, 45.0, 79.0, 63.0, 57.0, 52.0]
2025-08-07 01:25:36,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 30 minutes, 37 seconds)
2025-08-07 01:27:21,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:28,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 114.44829 ± 148.905
2025-08-07 01:27:28,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [358.50015, 13.046632, 31.0053, 22.763432, -9.883264, 335.79584, 53.29001, 318.60593, 61.20472, -39.845848]
2025-08-07 01:27:28,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 62.0, 105.0, 46.0, 136.0, 1000.0, 145.0, 1000.0, 65.0, 299.0]
2025-08-07 01:27:28,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (114.45) for latency ExtremeSparseL4U32
2025-08-07 01:27:28,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 27 minutes, 54 seconds)
2025-08-07 01:29:27,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:29:30,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 64.34081 ± 135.018
2025-08-07 01:29:30,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [17.869808, -87.1843, 12.0436535, 96.97951, 447.22958, 45.568764, 32.138157, 11.8882885, 14.410855, 52.46372]
2025-08-07 01:29:30,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 126.0, 79.0, 465.0, 1000.0, 82.0, 50.0, 66.0, 59.0, 151.0]
2025-08-07 01:29:30,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 29 minutes, 46 seconds)
2025-08-07 01:31:13,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:31:20,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 109.44645 ± 175.072
2025-08-07 01:31:20,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [499.3189, -39.37557, 401.96442, 31.938805, 93.62994, 1.2000337, 43.69456, -3.1277204, 34.690853, 30.530237]
2025-08-07 01:31:20,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 237.0, 1000.0, 170.0, 554.0, 98.0, 47.0, 136.0, 174.0, 428.0]
2025-08-07 01:31:20,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 27 minutes, 30 seconds)
2025-08-07 01:33:10,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:11,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 16.62213 ± 21.642
2025-08-07 01:33:11,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [15.87315, -16.727123, 46.52944, 41.376144, 23.75399, -22.78879, 22.78814, 12.120195, 8.401185, 34.894985]
2025-08-07 01:33:11,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [112.0, 74.0, 168.0, 44.0, 48.0, 101.0, 77.0, 68.0, 76.0, 86.0]
2025-08-07 01:33:11,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 25 minutes, 5 seconds)
2025-08-07 01:34:57,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:35:01,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 54.39608 ± 126.374
2025-08-07 01:35:01,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-26.639978, 425.33017, 18.041065, 19.888832, 18.821337, -44.8358, 30.770117, 25.934074, 42.456116, 34.194912]
2025-08-07 01:35:01,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [243.0, 1000.0, 91.0, 48.0, 49.0, 121.0, 75.0, 150.0, 86.0, 112.0]
2025-08-07 01:35:01,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 21 minutes, 18 seconds)
2025-08-07 01:36:55,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:59,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 62.40871 ± 156.538
2025-08-07 01:36:59,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [34.264347, 4.5948706, 33.80647, 1.7857091, 19.04584, -18.887793, -51.207214, 34.19221, 524.7843, 41.70839]
2025-08-07 01:36:59,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [115.0, 134.0, 81.0, 134.0, 84.0, 191.0, 147.0, 81.0, 1000.0, 72.0]
2025-08-07 01:36:59,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 20 minutes, 51 seconds)
2025-08-07 01:38:45,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:50,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 123.07037 ± 170.293
2025-08-07 01:38:50,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [64.81943, 36.19189, 31.262041, 479.38962, 37.635765, 24.942642, 32.391613, 445.704, 26.009129, 52.35772]
2025-08-07 01:38:50,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 43.0, 131.0, 1000.0, 40.0, 43.0, 84.0, 1000.0, 221.0, 57.0]
2025-08-07 01:38:50,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (123.07) for latency ExtremeSparseL4U32
2025-08-07 01:38:50,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 16 minutes, 8 seconds)
2025-08-07 01:40:36,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:38,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 14.16493 ± 26.050
2025-08-07 01:40:38,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [28.728148, 6.8037286, 27.600277, -36.10369, 15.502572, 30.787165, 0.270516, -12.172057, 65.618645, 14.613996]
2025-08-07 01:40:38,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [205.0, 69.0, 100.0, 107.0, 110.0, 56.0, 92.0, 120.0, 58.0, 78.0]
2025-08-07 01:40:38,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 13 minutes, 51 seconds)
2025-08-07 01:42:29,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:42:35,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 147.69746 ± 186.120
2025-08-07 01:42:35,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [38.140358, 68.78181, 499.95374, 16.550982, 54.788418, 334.63974, 436.166, 2.8375745, 48.540768, -23.424788]
2025-08-07 01:42:35,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 119.0, 1000.0, 62.0, 74.0, 1000.0, 1000.0, 106.0, 98.0, 141.0]
2025-08-07 01:42:35,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (147.70) for latency ExtremeSparseL4U32
2025-08-07 01:42:35,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 13 minutes, 31 seconds)
2025-08-07 01:44:34,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:44:39,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 132.24098 ± 149.939
2025-08-07 01:44:39,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [49.50294, 26.227877, 41.057655, 398.36426, 37.512245, 275.02084, 45.0654, 394.30045, 34.55695, 20.801203]
2025-08-07 01:44:39,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [96.0, 78.0, 55.0, 1000.0, 90.0, 577.0, 89.0, 1000.0, 54.0, 42.0]
2025-08-07 01:44:39,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 14 minutes, 56 seconds)
2025-08-07 01:46:25,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:46:30,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 83.35560 ± 130.145
2025-08-07 01:46:30,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [98.31275, 33.213856, 28.590324, 44.364326, 10.61536, 149.97842, 448.99216, 26.462622, 10.288505, -17.262402]
2025-08-07 01:46:30,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [254.0, 63.0, 105.0, 151.0, 47.0, 1000.0, 1000.0, 105.0, 39.0, 107.0]
2025-08-07 01:46:30,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 11 minutes, 23 seconds)
2025-08-07 01:48:14,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:48:19,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 110.33531 ± 161.320
2025-08-07 01:48:19,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [18.318787, 332.31973, 510.33438, 32.058517, 68.74717, 36.943718, 1.4583758, 39.941734, 26.541264, 36.68955]
2025-08-07 01:48:19,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [106.0, 1000.0, 1000.0, 49.0, 130.0, 55.0, 65.0, 85.0, 98.0, 82.0]
2025-08-07 01:48:19,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 9 minutes, 1 second)
2025-08-07 01:50:10,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:50:13,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 71.21190 ± 89.301
2025-08-07 01:50:13,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [48.923775, 28.531118, 0.6217246, 333.80093, 69.73418, 49.759148, 37.50033, 39.94974, 60.32783, 42.970295]
2025-08-07 01:50:13,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [121.0, 205.0, 102.0, 1000.0, 115.0, 68.0, 83.0, 41.0, 96.0, 94.0]
2025-08-07 01:50:13,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 8 minutes, 24 seconds)
2025-08-07 01:52:08,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:15,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 144.36848 ± 186.549
2025-08-07 01:52:15,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [23.939165, 305.16324, 49.507305, 1.8271682, 473.7289, 18.993368, 33.66373, 27.556293, 26.718485, 482.58713]
2025-08-07 01:52:15,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 1000.0, 60.0, 40.0, 1000.0, 81.0, 113.0, 263.0, 50.0, 1000.0]
2025-08-07 01:52:15,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 25 seconds)
2025-08-07 01:53:59,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:54:05,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 108.12575 ± 170.188
2025-08-07 01:54:05,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-20.106998, 40.61552, 471.45224, 56.008488, 418.0938, -8.486008, 42.285034, 15.976027, 36.95259, 28.466705]
2025-08-07 01:54:05,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [347.0, 135.0, 1000.0, 236.0, 1000.0, 289.0, 82.0, 72.0, 42.0, 237.0]
2025-08-07 01:54:05,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 2 minutes, 39 seconds)
2025-08-07 01:56:01,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:56:04,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 33.00043 ± 33.806
2025-08-07 01:56:04,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [12.302065, 91.13152, 88.32737, -5.0825763, -9.4145775, 25.304565, 9.6087265, 28.449318, 59.01288, 30.365007]
2025-08-07 01:56:04,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [297.0, 153.0, 252.0, 67.0, 137.0, 217.0, 193.0, 107.0, 143.0, 57.0]
2025-08-07 01:56:04,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes, 24 seconds)
2025-08-07 01:57:51,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:58:00,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 212.09273 ± 207.048
2025-08-07 01:58:00,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [33.816093, 502.2631, 92.53356, 42.302273, -22.084778, 84.24125, 450.31894, 480.22018, 414.67618, 42.64048]
2025-08-07 01:58:00,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [117.0, 1000.0, 87.0, 188.0, 256.0, 144.0, 1000.0, 1000.0, 1000.0, 55.0]
2025-08-07 01:58:00,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (212.09) for latency ExtremeSparseL4U32
2025-08-07 01:58:00,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 1 minute, 55 seconds)
2025-08-07 01:59:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:47,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 46.13533 ± 18.476
2025-08-07 01:59:47,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [63.010536, 14.337511, 68.92144, 70.806435, 27.539936, 35.551517, 45.382233, 44.182148, 62.050186, 29.571344]
2025-08-07 01:59:47,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [104.0, 43.0, 387.0, 79.0, 41.0, 82.0, 149.0, 224.0, 173.0, 324.0]
2025-08-07 01:59:47,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 58 minutes, 39 seconds)
2025-08-07 02:01:38,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:41,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 54.87280 ± 43.247
2025-08-07 02:01:41,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [32.47472, 25.102713, 62.312393, 9.598921, 25.98898, 16.806128, 156.28775, 52.12982, 62.378845, 105.64778]
2025-08-07 02:01:41,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [164.0, 153.0, 213.0, 40.0, 99.0, 138.0, 410.0, 64.0, 99.0, 309.0]
2025-08-07 02:01:41,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 55 minutes, 9 seconds)
2025-08-07 02:03:29,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:34,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 118.51353 ± 149.747
2025-08-07 02:03:34,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [46.98131, 70.629, 359.51352, 26.725458, 42.750553, 46.238182, 35.447063, 466.7269, 59.557617, 30.565676]
2025-08-07 02:03:34,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 157.0, 1000.0, 110.0, 70.0, 113.0, 201.0, 1000.0, 56.0, 39.0]
2025-08-07 02:03:34,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 53 minutes, 48 seconds)
2025-08-07 02:05:22,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:05:32,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 236.44357 ± 182.011
2025-08-07 02:05:32,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [70.883446, 108.40235, 97.56617, 108.18512, 498.2742, 107.91733, 449.20724, 418.50516, 462.17142, 43.32317]
2025-08-07 02:05:32,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [204.0, 433.0, 359.0, 89.0, 1000.0, 150.0, 1000.0, 1000.0, 1000.0, 51.0]
2025-08-07 02:05:32,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (236.44) for latency ExtremeSparseL4U32
2025-08-07 02:05:32,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 51 minutes, 41 seconds)
2025-08-07 02:07:20,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 124.22872 ± 129.174
2025-08-07 02:07:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [46.44313, 63.793785, 486.0675, 91.44411, 25.78245, 73.166725, 158.28906, 54.234394, 63.307438, 179.75858]
2025-08-07 02:07:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 180.0, 1000.0, 183.0, 92.0, 187.0, 599.0, 128.0, 156.0, 571.0]
2025-08-07 02:07:26,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 49 minutes, 22 seconds)
2025-08-07 02:09:17,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:09:21,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 69.21123 ± 72.617
2025-08-07 02:09:21,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [33.765823, -1.9963435, 58.761105, 4.557819, 262.38037, 49.071037, 69.04075, 125.90541, 49.764755, 40.86151]
2025-08-07 02:09:21,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [237.0, 299.0, 66.0, 190.0, 701.0, 197.0, 103.0, 178.0, 88.0, 114.0]
2025-08-07 02:09:21,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 48 minutes, 57 seconds)
2025-08-07 02:11:08,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:14,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 119.73225 ± 121.299
2025-08-07 02:11:14,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [100.37288, 28.421612, 90.15207, 56.490818, 148.53853, 128.30902, 179.15201, 444.19476, 2.9718513, 18.718956]
2025-08-07 02:11:14,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [277.0, 50.0, 203.0, 96.0, 278.0, 484.0, 493.0, 1000.0, 158.0, 61.0]
2025-08-07 02:11:14,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 46 minutes, 59 seconds)
2025-08-07 02:13:10,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:16,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 138.07208 ± 176.961
2025-08-07 02:13:16,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [56.916496, 56.713932, 43.659683, 455.39972, 50.498116, 43.030827, 20.005745, 94.864746, 522.35004, 37.28167]
2025-08-07 02:13:16,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 154.0, 79.0, 1000.0, 192.0, 75.0, 136.0, 336.0, 1000.0, 116.0]
2025-08-07 02:13:16,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 46 minutes, 37 seconds)
2025-08-07 02:15:00,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:04,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 119.85225 ± 133.916
2025-08-07 02:15:04,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [138.24564, 49.175808, 497.43576, 166.56516, 7.199531, 44.24831, 64.12447, 45.052063, 74.17389, 112.30171]
2025-08-07 02:15:04,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [260.0, 47.0, 1000.0, 316.0, 113.0, 72.0, 379.0, 85.0, 168.0, 187.0]
2025-08-07 02:15:05,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 43 minutes, 4 seconds)
2025-08-07 02:16:55,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:04,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 226.03613 ± 174.221
2025-08-07 02:17:04,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [42.601467, 175.67627, 110.39505, 179.74637, 506.08105, 575.1954, 105.1237, 335.89154, 120.688545, 108.96185]
2025-08-07 02:17:04,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 478.0, 206.0, 523.0, 1000.0, 1000.0, 315.0, 758.0, 460.0, 208.0]
2025-08-07 02:17:04,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 8 seconds)
2025-08-07 02:18:56,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:02,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 140.24335 ± 160.537
2025-08-07 02:19:02,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [190.53683, 89.33484, 15.485662, 470.8865, 10.740946, 50.672134, 61.676434, 420.88693, 34.733253, 57.479923]
2025-08-07 02:19:02,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [434.0, 134.0, 38.0, 1000.0, 156.0, 73.0, 88.0, 1000.0, 42.0, 294.0]
2025-08-07 02:19:02,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 40 minutes, 44 seconds)
2025-08-07 02:20:49,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:20:55,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 132.55847 ± 128.750
2025-08-07 02:20:55,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [119.64449, 193.12753, 23.525444, 48.997635, 130.95464, 159.56302, 59.22881, 484.55887, 39.1197, 66.86459]
2025-08-07 02:20:55,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [328.0, 340.0, 42.0, 136.0, 196.0, 470.0, 82.0, 1000.0, 171.0, 175.0]
2025-08-07 02:20:55,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 38 minutes, 40 seconds)
2025-08-07 02:22:54,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:00,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 153.46829 ± 168.479
2025-08-07 02:23:00,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [38.66507, 204.54555, 9.857352, 53.59928, 127.49967, 31.186638, 487.53223, 457.929, 81.777756, 42.09052]
2025-08-07 02:23:00,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [339.0, 314.0, 91.0, 88.0, 189.0, 127.0, 842.0, 1000.0, 161.0, 42.0]
2025-08-07 02:23:00,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 37 minutes, 23 seconds)
2025-08-07 02:24:43,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:24:51,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 220.34384 ± 191.580
2025-08-07 02:24:51,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [19.54194, 56.153957, 133.63707, 441.47498, 101.1531, 507.04245, 235.65263, 32.987797, 542.78314, 133.01117]
2025-08-07 02:24:51,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 166.0, 214.0, 1000.0, 243.0, 1000.0, 439.0, 56.0, 1000.0, 261.0]
2025-08-07 02:24:51,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 48 seconds)
2025-08-07 02:26:37,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:26:42,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 136.56841 ± 128.014
2025-08-07 02:26:42,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [15.697461, 263.7499, 44.523537, 117.35999, 77.360664, 117.84682, 122.16342, 86.06003, 51.972614, 468.94968]
2025-08-07 02:26:42,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 566.0, 49.0, 176.0, 119.0, 197.0, 223.0, 138.0, 143.0, 1000.0]
2025-08-07 02:26:42,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 29 seconds)
2025-08-07 02:28:32,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:28:40,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 233.74692 ± 202.807
2025-08-07 02:28:40,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [328.41168, 27.453188, 64.60095, 95.787964, 144.27234, 516.3373, 110.46665, 15.062867, 527.17554, 507.90067]
2025-08-07 02:28:40,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [788.0, 120.0, 182.0, 148.0, 249.0, 1000.0, 319.0, 78.0, 1000.0, 1000.0]
2025-08-07 02:28:41,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 39 seconds)
2025-08-07 02:30:30,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 202.89154 ± 180.192
2025-08-07 02:30:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [424.3758, 173.99129, 89.02267, 60.775894, 539.1753, 446.54474, 93.57004, 102.75098, 44.051323, 54.657528]
2025-08-07 02:30:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 303.0, 164.0, 239.0, 1000.0, 1000.0, 178.0, 125.0, 106.0, 160.0]
2025-08-07 02:30:38,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 29 minutes, 24 seconds)
2025-08-07 02:32:31,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:40,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 204.87842 ± 186.552
2025-08-07 02:32:40,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [77.524185, 33.62602, 415.32864, 88.38724, 20.664213, 118.255936, 173.92372, 504.5326, 521.6253, 94.91626]
2025-08-07 02:32:40,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [325.0, 174.0, 957.0, 228.0, 81.0, 190.0, 379.0, 1000.0, 1000.0, 241.0]
2025-08-07 02:32:40,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 26 minutes, 54 seconds)
2025-08-07 02:34:26,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:31,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 149.54959 ± 128.551
2025-08-07 02:34:31,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [145.78378, 215.4181, 63.881653, -8.760897, 83.45865, 252.99852, 46.92141, 70.34258, 165.46655, 459.98553]
2025-08-07 02:34:31,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [290.0, 451.0, 155.0, 61.0, 123.0, 306.0, 111.0, 164.0, 213.0, 1000.0]
2025-08-07 02:34:31,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 25 minutes, 6 seconds)
2025-08-07 02:36:27,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:34,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 200.10828 ± 139.356
2025-08-07 02:36:34,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [413.86792, 9.623736, 115.61239, 454.6457, 82.73416, 260.32178, 272.73474, 95.716774, 173.11194, 122.71357]
2025-08-07 02:36:34,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [618.0, 19.0, 145.0, 1000.0, 155.0, 453.0, 777.0, 239.0, 341.0, 356.0]
2025-08-07 02:36:34,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 24 minutes, 54 seconds)
2025-08-07 02:38:22,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:29,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 195.64508 ± 191.979
2025-08-07 02:38:29,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [52.69176, 77.524864, 24.646402, 222.47583, 85.44667, 20.839378, 114.40497, 469.75308, 276.87793, 611.7899]
2025-08-07 02:38:29,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [100.0, 157.0, 40.0, 297.0, 157.0, 184.0, 235.0, 1000.0, 459.0, 954.0]
2025-08-07 02:38:29,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 22 minutes, 19 seconds)
2025-08-07 02:40:17,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:22,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 152.21594 ± 157.614
2025-08-07 02:40:22,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [29.966526, 56.791286, 146.94162, 146.89528, 179.24055, 233.45517, 44.775806, 96.92354, 578.55396, 8.615721]
2025-08-07 02:40:22,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [231.0, 179.0, 242.0, 174.0, 278.0, 296.0, 101.0, 201.0, 1000.0, 41.0]
2025-08-07 02:40:22,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 19 minutes, 47 seconds)
2025-08-07 02:42:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:14,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 149.46187 ± 150.693
2025-08-07 02:42:14,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [575.8598, 93.89684, 106.25152, 89.827034, 6.2981834, 128.67253, 80.4335, 61.515186, 206.99176, 144.87231]
2025-08-07 02:42:14,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 172.0, 195.0, 184.0, 116.0, 251.0, 130.0, 67.0, 303.0, 172.0]
2025-08-07 02:42:14,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 16 minutes, 32 seconds)
2025-08-07 02:44:10,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:18,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 211.97412 ± 162.557
2025-08-07 02:44:18,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [123.991745, 86.68085, 46.798325, 165.846, 196.79341, 21.463411, 311.63873, 171.24286, 491.8541, 503.4318]
2025-08-07 02:44:18,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [179.0, 248.0, 177.0, 287.0, 300.0, 39.0, 1000.0, 266.0, 1000.0, 1000.0]
2025-08-07 02:44:18,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 15 seconds)
2025-08-07 02:46:02,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:07,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 165.08171 ± 153.315
2025-08-07 02:46:07,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [268.24518, 266.64017, 26.98092, 61.654266, 84.11711, 485.31125, 23.76874, 50.927353, 334.451, 48.721176]
2025-08-07 02:46:07,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [338.0, 438.0, 43.0, 49.0, 86.0, 1000.0, 75.0, 62.0, 502.0, 50.0]
2025-08-07 02:46:07,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 12 minutes, 31 seconds)
2025-08-07 02:48:06,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:48:12,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 231.72307 ± 180.851
2025-08-07 02:48:12,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [216.33339, 638.81305, 39.50294, 66.0059, 218.90466, 337.53162, 127.665344, 107.43363, 118.35099, 446.6892]
2025-08-07 02:48:12,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [432.0, 1000.0, 47.0, 88.0, 385.0, 632.0, 164.0, 89.0, 167.0, 790.0]
2025-08-07 02:48:12,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes)
2025-08-07 02:50:01,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:11,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 304.16281 ± 202.299
2025-08-07 02:50:11,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [180.97505, 615.88055, 120.75481, 51.11334, 87.96875, 484.93607, 540.9848, 263.4188, 518.474, 177.12189]
2025-08-07 02:50:11,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [245.0, 1000.0, 376.0, 161.0, 107.0, 1000.0, 1000.0, 446.0, 1000.0, 353.0]
2025-08-07 02:50:11,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (304.16) for latency ExtremeSparseL4U32
2025-08-07 02:50:11,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 46 seconds)
2025-08-07 02:51:57,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:52:04,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 198.26622 ± 161.974
2025-08-07 02:52:04,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [196.19246, 92.78755, 42.159187, 45.99724, 18.434, 154.36072, 137.1233, 404.0468, 425.88943, 465.67154]
2025-08-07 02:52:04,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [323.0, 164.0, 62.0, 91.0, 41.0, 239.0, 149.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:52:04,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 53 seconds)
2025-08-07 02:53:49,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:53,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 151.54874 ± 136.354
2025-08-07 02:53:53,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [82.71017, 522.45654, 231.24916, 89.27195, 123.93148, 47.51844, 82.84373, 21.353004, 162.70935, 151.44362]
2025-08-07 02:53:53,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [161.0, 742.0, 399.0, 108.0, 172.0, 52.0, 210.0, 51.0, 352.0, 173.0]
2025-08-07 02:53:53,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 minutes, 12 seconds)
2025-08-07 02:55:46,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:58,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 423.26215 ± 216.737
2025-08-07 02:55:58,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [73.156296, 339.47018, 379.68698, 598.833, 50.105705, 440.37848, 660.06934, 555.13715, 730.0458, 405.7388]
2025-08-07 02:55:58,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [185.0, 534.0, 537.0, 844.0, 110.0, 540.0, 1000.0, 1000.0, 976.0, 551.0]
2025-08-07 02:55:58,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (423.26) for latency ExtremeSparseL4U32
2025-08-07 02:55:58,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 1 second)
2025-08-07 02:57:45,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:49,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 162.95102 ± 213.478
2025-08-07 02:57:49,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [56.109386, 504.1568, 76.690216, 34.322502, 255.80684, 12.717025, 24.007833, 624.377, 20.33064, 20.992031]
2025-08-07 02:57:49,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [121.0, 620.0, 88.0, 67.0, 340.0, 42.0, 32.0, 911.0, 29.0, 41.0]
2025-08-07 02:57:49,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 29 seconds)
2025-08-07 02:59:39,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:49,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 335.30826 ± 276.803
2025-08-07 02:59:49,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [432.23456, 72.4534, 20.14219, 620.2639, 683.05664, 126.45741, 596.4833, 45.26807, 72.91924, 683.8039]
2025-08-07 02:59:49,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [527.0, 122.0, 42.0, 1000.0, 1000.0, 229.0, 1000.0, 54.0, 125.0, 951.0]
2025-08-07 02:59:49,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 59 minutes, 38 seconds)
2025-08-07 03:01:42,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:48,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 224.34290 ± 194.143
2025-08-07 03:01:48,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [469.0548, 113.104324, 509.92902, 513.4122, 152.89227, 4.40118, 91.784904, 19.692352, 292.8403, 76.31753]
2025-08-07 03:01:48,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [726.0, 170.0, 514.0, 1000.0, 223.0, 38.0, 147.0, 42.0, 513.0, 122.0]
2025-08-07 03:01:49,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 26 seconds)
2025-08-07 03:03:42,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:54,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 419.19785 ± 247.824
2025-08-07 03:03:54,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [608.2514, 22.059252, 557.15924, 787.9649, 710.146, 515.0871, 90.76725, 198.20285, 296.96524, 405.37512]
2025-08-07 03:03:54,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 49.0, 1000.0, 1000.0, 1000.0, 1000.0, 169.0, 316.0, 498.0, 685.0]
2025-08-07 03:03:54,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 3 seconds)
2025-08-07 03:05:39,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:50,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 441.63287 ± 234.036
2025-08-07 03:05:50,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [808.3844, 139.4033, 608.50854, 842.58936, 343.08887, 169.6463, 238.53752, 409.4871, 487.26636, 369.41675]
2025-08-07 03:05:50,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 169.0, 1000.0, 1000.0, 502.0, 204.0, 395.0, 486.0, 667.0, 425.0]
2025-08-07 03:05:50,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (441.63) for latency ExtremeSparseL4U32
2025-08-07 03:05:50,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 15 seconds)
2025-08-07 03:07:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:47,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 327.55609 ± 189.432
2025-08-07 03:07:47,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [501.68158, 479.92725, 353.28543, 269.97028, 120.11777, 572.8816, 46.741608, 271.24142, 570.76886, 88.94502]
2025-08-07 03:07:47,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [806.0, 621.0, 557.0, 311.0, 168.0, 1000.0, 91.0, 376.0, 1000.0, 150.0]
2025-08-07 03:07:47,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 51 seconds)
2025-08-07 03:09:35,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:45,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 374.04053 ± 273.435
2025-08-07 03:09:45,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [144.79698, 139.49242, 652.523, 36.637123, 533.90265, 287.72595, 380.5485, 747.6998, 778.4449, 38.63385]
2025-08-07 03:09:45,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [219.0, 223.0, 1000.0, 55.0, 706.0, 431.0, 501.0, 1000.0, 1000.0, 51.0]
2025-08-07 03:09:45,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 40 seconds)
2025-08-07 03:11:38,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:49,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 435.19379 ± 246.251
2025-08-07 03:11:49,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [520.03937, 80.58434, 613.99536, 697.68475, 330.00577, 692.2674, 565.04645, 144.10767, 654.6709, 53.535637]
2025-08-07 03:11:49,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 90.0, 1000.0, 890.0, 416.0, 1000.0, 674.0, 171.0, 749.0, 67.0]
2025-08-07 03:11:49,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 3 seconds)
2025-08-07 03:13:36,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:46,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 356.96048 ± 236.172
2025-08-07 03:13:46,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [565.04474, 521.2818, 19.204914, 375.83838, 198.96886, 677.3845, 302.51117, 28.334797, 685.9563, 195.07913]
2025-08-07 03:13:46,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [863.0, 1000.0, 51.0, 1000.0, 311.0, 864.0, 376.0, 42.0, 1000.0, 250.0]
2025-08-07 03:13:46,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 23 seconds)
2025-08-07 03:15:42,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:55,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 491.75439 ± 267.511
2025-08-07 03:15:55,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [91.380936, 521.42883, 770.5518, 773.00037, 791.6025, 730.17285, 443.16327, 343.29932, 19.45603, 433.48788]
2025-08-07 03:15:55,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [126.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 529.0, 506.0, 53.0, 481.0]
2025-08-07 03:15:55,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (491.75) for latency ExtremeSparseL4U32
2025-08-07 03:15:55,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 22 seconds)
2025-08-07 03:17:39,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:49,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 363.20242 ± 227.774
2025-08-07 03:17:49,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [691.60736, 652.42957, 87.40834, 538.41656, 116.520256, 365.9767, 570.81757, 296.1188, 41.21761, 271.51154]
2025-08-07 03:17:49,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 878.0, 92.0, 1000.0, 147.0, 450.0, 798.0, 351.0, 53.0, 351.0]
2025-08-07 03:17:49,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 5 seconds)
2025-08-07 03:19:38,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:45,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 308.69846 ± 301.697
2025-08-07 03:19:45,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [896.4321, 18.822601, 534.59753, 73.88038, 24.534184, 653.0179, 44.24099, 238.89471, 77.277885, 525.2864]
2025-08-07 03:19:45,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 43.0, 687.0, 80.0, 41.0, 1000.0, 96.0, 335.0, 114.0, 617.0]
2025-08-07 03:19:46,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 2 seconds)
2025-08-07 03:21:39,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:21:51,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 410.48697 ± 201.044
2025-08-07 03:21:51,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [131.42831, 321.8546, 504.1266, 388.31393, 567.08136, 646.5686, 88.367256, 470.92105, 725.3493, 260.85858]
2025-08-07 03:21:51,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [238.0, 1000.0, 1000.0, 590.0, 714.0, 675.0, 107.0, 615.0, 1000.0, 299.0]
2025-08-07 03:21:51,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 6 seconds)
2025-08-07 03:23:39,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:52,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 505.58569 ± 192.116
2025-08-07 03:23:52,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [389.39767, 765.90186, 716.1427, 616.23663, 650.754, 255.04529, 532.0248, 450.9717, 129.12753, 550.25494]
2025-08-07 03:23:52,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [555.0, 1000.0, 1000.0, 1000.0, 1000.0, 292.0, 1000.0, 688.0, 216.0, 776.0]
2025-08-07 03:23:52,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (505.59) for latency ExtremeSparseL4U32
2025-08-07 03:23:52,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 23 seconds)
2025-08-07 03:25:35,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:42,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 275.60013 ± 208.884
2025-08-07 03:25:42,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [404.09445, 708.3851, 40.434586, 80.03597, 115.6939, 159.61783, 481.1824, 149.85838, 449.27643, 167.42213]
2025-08-07 03:25:42,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [463.0, 1000.0, 47.0, 114.0, 150.0, 194.0, 1000.0, 268.0, 599.0, 174.0]
2025-08-07 03:25:42,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 14 seconds)
2025-08-07 03:27:33,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:41,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 266.53561 ± 212.253
2025-08-07 03:27:41,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [94.50838, 460.4537, 86.398735, 557.93774, 6.0632305, 77.34077, 519.19073, 535.45184, 214.6685, 113.3426]
2025-08-07 03:27:41,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [139.0, 1000.0, 74.0, 1000.0, 40.0, 130.0, 1000.0, 1000.0, 170.0, 109.0]
2025-08-07 03:27:41,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 34 seconds)
2025-08-07 03:29:36,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:44,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 301.40582 ± 207.453
2025-08-07 03:29:44,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [123.22005, 303.43375, 14.147546, 336.77274, 427.18964, 452.52097, 562.39124, 633.42993, 90.84315, 70.10891]
2025-08-07 03:29:44,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [156.0, 448.0, 43.0, 555.0, 450.0, 491.0, 1000.0, 1000.0, 125.0, 132.0]
2025-08-07 03:29:44,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 55 seconds)
2025-08-07 03:31:28,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:34,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 280.85361 ± 178.852
2025-08-07 03:31:34,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [214.92522, 493.19205, 425.98572, 362.97424, 346.16287, 573.1535, 90.3775, 210.41867, 22.50447, 68.84196]
2025-08-07 03:31:34,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [353.0, 629.0, 568.0, 406.0, 399.0, 688.0, 84.0, 288.0, 41.0, 87.0]
2025-08-07 03:31:34,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 10 seconds)
2025-08-07 03:33:18,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:33:26,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 341.92975 ± 253.897
2025-08-07 03:33:26,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [212.1955, 830.89044, 145.5783, 79.57055, 116.94395, 518.2591, 34.23489, 640.36536, 378.81537, 462.4438]
2025-08-07 03:33:26,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [190.0, 1000.0, 189.0, 100.0, 247.0, 657.0, 129.0, 1000.0, 440.0, 514.0]
2025-08-07 03:33:26,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 45 seconds)
2025-08-07 03:35:14,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:35:23,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 332.95270 ± 256.953
2025-08-07 03:35:23,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [560.00885, 537.63293, 114.83755, 59.619267, 38.504505, 669.54297, 751.04224, 99.10865, 225.63565, 273.59445]
2025-08-07 03:35:23,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 111.0, 68.0, 56.0, 1000.0, 871.0, 176.0, 290.0, 322.0]
2025-08-07 03:35:23,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 11 seconds)
2025-08-07 03:37:10,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:37:20,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 403.07529 ± 271.608
2025-08-07 03:37:20,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [77.09018, 75.70947, 236.41696, 146.03647, 790.2576, 505.03156, 516.70654, 744.6236, 726.10126, 212.7795]
2025-08-07 03:37:20,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 216.0, 263.0, 138.0, 1000.0, 525.0, 1000.0, 975.0, 1000.0, 249.0]
2025-08-07 03:37:20,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 9 seconds)
2025-08-07 03:39:08,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:39:18,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 446.16180 ± 342.699
2025-08-07 03:39:18,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [127.24687, 676.8265, 884.2696, 764.40686, 941.74274, 53.226967, 49.34242, 366.73898, 542.99274, 54.82441]
2025-08-07 03:39:18,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [188.0, 1000.0, 904.0, 1000.0, 1000.0, 129.0, 52.0, 419.0, 692.0, 59.0]
2025-08-07 03:39:18,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 1 second)
2025-08-07 03:41:06,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:41:14,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 301.55682 ± 174.976
2025-08-07 03:41:14,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [183.07193, 113.35979, 274.20306, 485.01218, 65.80743, 484.51706, 596.2263, 159.25708, 215.53738, 438.57617]
2025-08-07 03:41:14,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [205.0, 220.0, 422.0, 1000.0, 121.0, 750.0, 719.0, 180.0, 266.0, 439.0]
2025-08-07 03:41:14,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 19 seconds)
2025-08-07 03:43:08,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:43:21,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 443.10220 ± 198.693
2025-08-07 03:43:21,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [653.25574, 262.32806, 626.22534, 507.04865, 174.93942, 48.804928, 596.1676, 556.25604, 572.0315, 433.9647]
2025-08-07 03:43:21,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 234.0, 719.0, 1000.0, 188.0, 80.0, 1000.0, 1000.0, 1000.0, 532.0]
2025-08-07 03:43:21,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 51 seconds)
2025-08-07 03:45:11,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:45:21,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 437.54352 ± 260.755
2025-08-07 03:45:21,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [547.65967, 454.8244, 884.68884, 401.24814, 115.72995, 13.552034, 680.3579, 146.62735, 507.71472, 623.03186]
2025-08-07 03:45:21,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [583.0, 480.0, 1000.0, 402.0, 141.0, 38.0, 1000.0, 290.0, 616.0, 1000.0]
2025-08-07 03:45:21,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 56 seconds)
2025-08-07 03:47:00,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:47:13,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 547.33850 ± 237.212
2025-08-07 03:47:13,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [644.82855, 758.52997, 525.2274, 540.1108, 109.486496, 599.36096, 718.1333, 861.7997, 119.37927, 596.52783]
2025-08-07 03:47:13,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [785.0, 1000.0, 1000.0, 1000.0, 108.0, 546.0, 1000.0, 1000.0, 158.0, 746.0]
2025-08-07 03:47:13,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (547.34) for latency ExtremeSparseL4U32
2025-08-07 03:47:13,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 50 seconds)
2025-08-07 03:49:04,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:49:15,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 403.99872 ± 267.748
2025-08-07 03:49:15,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [102.124374, 548.4169, 135.57442, 133.82697, 595.97833, 99.976074, 900.6541, 556.49023, 638.4258, 328.52014]
2025-08-07 03:49:15,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [174.0, 1000.0, 236.0, 207.0, 1000.0, 179.0, 1000.0, 655.0, 1000.0, 426.0]
2025-08-07 03:49:15,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 56 seconds)
2025-08-07 03:50:59,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:51:04,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 233.67868 ± 163.069
2025-08-07 03:51:04,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [214.18484, 49.002323, 366.5347, 366.73865, 14.326632, 36.716522, 518.32574, 389.48068, 181.29018, 200.1865]
2025-08-07 03:51:04,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [232.0, 56.0, 467.0, 407.0, 43.0, 51.0, 1000.0, 509.0, 207.0, 234.0]
2025-08-07 03:51:04,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 50 seconds)
2025-08-07 03:52:58,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:05,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 318.85370 ± 269.982
2025-08-07 03:53:05,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [5.7652326, 413.58572, 376.1037, 323.9417, 46.186676, 819.0995, 162.73643, 749.64996, 255.53094, 35.937286]
2025-08-07 03:53:05,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 516.0, 435.0, 307.0, 85.0, 1000.0, 177.0, 1000.0, 259.0, 60.0]
2025-08-07 03:53:05,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 47 seconds)
2025-08-07 03:54:51,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:58,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 356.68436 ± 298.722
2025-08-07 03:54:58,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [114.75564, 10.359274, 94.80463, 86.1345, 451.44177, 436.53885, 806.9025, 363.58, 943.5533, 258.7731]
2025-08-07 03:54:58,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [130.0, 40.0, 116.0, 104.0, 458.0, 575.0, 1000.0, 324.0, 1000.0, 298.0]
2025-08-07 03:54:58,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 46 seconds)
2025-08-07 03:56:45,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:54,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 382.37778 ± 223.127
2025-08-07 03:56:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [309.8693, 351.04248, 676.87665, 287.25046, 411.0643, 122.620255, 578.962, 766.2855, 14.094508, 305.71243]
2025-08-07 03:56:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [374.0, 436.0, 1000.0, 292.0, 536.0, 136.0, 643.0, 1000.0, 24.0, 391.0]
2025-08-07 03:56:54,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 52 seconds)
2025-08-07 03:58:45,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:55,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 473.84082 ± 281.625
2025-08-07 03:58:55,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [672.7212, 101.98167, 259.31143, 775.91846, 940.0343, 370.10654, 129.04198, 672.6867, 611.90356, 204.70229]
2025-08-07 03:58:55,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [964.0, 95.0, 283.0, 1000.0, 1000.0, 424.0, 161.0, 861.0, 755.0, 249.0]
2025-08-07 03:58:55,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 56 seconds)
2025-08-07 04:00:37,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:43,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 281.75339 ± 118.353
2025-08-07 04:00:43,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [325.53555, 327.96054, 186.39177, 316.2625, 324.01047, 97.7826, 306.89432, 227.93442, 548.0857, 156.67598]
2025-08-07 04:00:43,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [422.0, 329.0, 213.0, 351.0, 452.0, 117.0, 370.0, 308.0, 1000.0, 240.0]
2025-08-07 04:00:43,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1251 [DEBUG]: Training session finished
