2025-08-07 00:47:48,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:47:48,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:47:48,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14cf9475c550>}
2025-08-07 00:47:48,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1111 [DEBUG]: using device: cuda
2025-08-07 00:47:48,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1133 [INFO]: Creating new trainer
2025-08-07 00:47:48,849 baseline-bpql-noiseperc25-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=283, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:47:48,849 baseline-bpql-noiseperc25-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:47:49,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1194 [DEBUG]: Starting training session...
2025-08-07 00:47:49,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:30,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:36,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -406.50049 ± 550.697
2025-08-07 00:49:36,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-1240.1847, -63.76754, -1260.4298, -114.80229, -3.6007552, 4.3772497, -94.52557, -9.601838, -1236.9207, -45.54911]
2025-08-07 00:49:36,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 92.0, 1000.0, 85.0, 53.0, 12.0, 90.0, 41.0, 1000.0, 49.0]
2025-08-07 00:49:36,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-406.50) for latency ExtremeSparseL4U32
2025-08-07 00:49:36,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 56 minutes, 44 seconds)
2025-08-07 00:51:22,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:23,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -31.22313 ± 55.390
2025-08-07 00:51:23,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-56.58776, -0.5530488, 13.444678, 23.704447, -42.242092, -21.778294, 10.40755, -142.56848, -115.77172, 19.713478]
2025-08-07 00:51:23,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 15.0, 43.0, 48.0, 125.0, 81.0, 23.0, 118.0, 135.0, 30.0]
2025-08-07 00:51:23,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-31.22) for latency ExtremeSparseL4U32
2025-08-07 00:51:23,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 54 minutes, 20 seconds)
2025-08-07 00:53:09,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:12,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -152.24510 ± 310.474
2025-08-07 00:53:12,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.357412, -161.35515, 12.236214, -100.10521, -1.362454, -24.115944, -40.566357, -75.63952, -1072.1495, -42.03579]
2025-08-07 00:53:12,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 174.0, 26.0, 145.0, 20.0, 87.0, 70.0, 152.0, 1000.0, 60.0]
2025-08-07 00:53:12,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 53 minutes, 58 seconds)
2025-08-07 00:54:55,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:00,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -254.04187 ± 436.051
2025-08-07 00:55:00,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-29.131048, -1177.0759, -55.18253, -210.28958, -10.640338, -1054.935, 8.821407, -6.007098, 0.45454502, -6.432846]
2025-08-07 00:55:00,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 1000.0, 87.0, 178.0, 41.0, 1000.0, 27.0, 28.0, 35.0, 57.0]
2025-08-07 00:55:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 52 minutes, 19 seconds)
2025-08-07 00:56:46,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:49,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -141.50316 ± 311.069
2025-08-07 00:56:49,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.1018806, -91.86055, -45.57228, -53.43339, 37.13095, -156.45888, -1061.4297, -44.704426, -0.9758147, -0.829466]
2025-08-07 00:56:49,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 98.0, 64.0, 76.0, 34.0, 161.0, 1000.0, 72.0, 17.0, 60.0]
2025-08-07 00:56:49,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 50 minutes, 46 seconds)
2025-08-07 00:58:36,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:41,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -214.43851 ± 366.424
2025-08-07 00:58:41,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-37.58533, -47.126537, -12.923214, -138.61267, 16.756441, -919.44684, -22.594212, -965.85156, 3.732018, -20.73288]
2025-08-07 00:58:41,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 73.0, 80.0, 125.0, 52.0, 1000.0, 65.0, 1000.0, 13.0, 67.0]
2025-08-07 00:58:41,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 50 minutes, 32 seconds)
2025-08-07 01:00:25,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:30,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -153.44606 ± 270.555
2025-08-07 01:00:30,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-26.260935, -679.6305, 13.066756, -10.449099, -122.77092, 3.9405174, -699.47107, -9.506728, 6.8230076, -10.201693]
2025-08-07 01:00:30,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 1000.0, 24.0, 18.0, 141.0, 72.0, 1000.0, 101.0, 42.0, 47.0]
2025-08-07 01:00:30,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 49 minutes, 36 seconds)
2025-08-07 01:02:14,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:02:17,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -86.46620 ± 180.952
2025-08-07 01:02:17,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.188655, -30.398468, 1.023859, -29.479902, -37.438255, -93.157364, 8.712052, -620.79895, 1.1950425, -69.50872]
2025-08-07 01:02:17,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 149.0, 40.0, 116.0, 65.0, 187.0, 41.0, 1000.0, 95.0, 174.0]
2025-08-07 01:02:17,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 47 minutes, 10 seconds)
2025-08-07 01:04:12,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:04:15,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -113.66347 ± 254.115
2025-08-07 01:04:15,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-45.058517, -864.758, -19.42752, -120.03904, 6.219405, 25.174215, -6.8219094, -96.06046, -18.957167, 3.0943003]
2025-08-07 01:04:15,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [70.0, 1000.0, 37.0, 93.0, 14.0, 56.0, 48.0, 115.0, 52.0, 83.0]
2025-08-07 01:04:15,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 48 minutes, 18 seconds)
2025-08-07 01:05:52,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:05:53,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -19.77351 ± 24.357
2025-08-07 01:05:53,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-32.9861, -4.9800477, -5.9740148, -55.4081, -7.823613, -5.7152667, -13.502087, -14.391825, 13.467735, -70.42181]
2025-08-07 01:05:53,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 41.0, 71.0, 66.0, 60.0, 36.0, 70.0, 48.0, 71.0, 76.0]
2025-08-07 01:05:53,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-19.77) for latency ExtremeSparseL4U32
2025-08-07 01:05:53,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 43 minutes, 18 seconds)
2025-08-07 01:07:38,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:07:39,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -24.34591 ± 61.961
2025-08-07 01:07:39,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.7662673, -203.65717, 15.233164, 23.375784, -16.073, -14.719997, 9.028961, -25.420712, -28.6079, 3.1480715]
2025-08-07 01:07:39,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 129.0, 33.0, 50.0, 38.0, 56.0, 38.0, 103.0, 43.0, 23.0]
2025-08-07 01:07:39,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 39 minutes, 38 seconds)
2025-08-07 01:09:23,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:09:24,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -30.08877 ± 52.119
2025-08-07 01:09:24,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [22.188005, -36.387474, -35.1058, 1.6851064, -25.167465, -33.238148, -4.676328, 2.8306584, -16.601921, -176.4143]
2025-08-07 01:09:24,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 99.0, 50.0, 86.0, 97.0, 75.0, 46.0, 21.0, 69.0, 146.0]
2025-08-07 01:09:24,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 36 minutes, 46 seconds)
2025-08-07 01:11:10,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:11:15,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -197.54610 ± 356.237
2025-08-07 01:11:15,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-23.493444, 31.724262, -857.21075, -953.554, -25.5121, -67.535706, 20.172235, -89.27991, -2.764414, -8.007126]
2025-08-07 01:11:15,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 46.0, 1000.0, 1000.0, 85.0, 96.0, 89.0, 141.0, 43.0, 28.0]
2025-08-07 01:11:15,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 35 minutes, 51 seconds)
2025-08-07 01:13:00,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:13:05,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -205.65202 ± 348.142
2025-08-07 01:13:05,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-893.95795, -107.78581, -6.7133017, -78.07251, -50.44211, 12.991945, -902.4112, 10.758832, -24.280544, -16.607624]
2025-08-07 01:13:05,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 135.0, 83.0, 128.0, 53.0, 20.0, 1000.0, 39.0, 46.0, 74.0]
2025-08-07 01:13:05,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 31 minutes, 48 seconds)
2025-08-07 01:14:49,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:52,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -127.89128 ± 231.875
2025-08-07 01:14:52,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.6623397, -72.73945, -100.63681, -34.325382, -39.33327, -103.2855, -55.523083, -40.848583, -12.554309, -817.004]
2025-08-07 01:14:52,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 91.0, 130.0, 62.0, 79.0, 143.0, 54.0, 80.0, 56.0, 1000.0]
2025-08-07 01:14:52,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 32 minutes, 44 seconds)
2025-08-07 01:16:38,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:16:39,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.39458 ± 22.209
2025-08-07 01:16:39,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [0.97070044, -23.794641, -39.4782, -0.21225737, -58.79745, -11.864013, -17.862114, -70.19741, -18.454044, -24.25642]
2025-08-07 01:16:39,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 76.0, 77.0, 77.0, 96.0, 46.0, 52.0, 73.0, 68.0, 69.0]
2025-08-07 01:16:39,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 31 minutes, 15 seconds)
2025-08-07 01:18:22,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:18:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -139.66942 ± 234.166
2025-08-07 01:18:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-46.158, 12.712926, -798.48755, -279.76382, 9.729124, -41.067276, -135.06668, -38.988506, -56.50807, -23.096388]
2025-08-07 01:18:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 32.0, 1000.0, 318.0, 78.0, 93.0, 120.0, 67.0, 137.0, 86.0]
2025-08-07 01:18:25,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 29 minutes, 36 seconds)
2025-08-07 01:20:10,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:20:11,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -20.41266 ± 17.006
2025-08-07 01:20:11,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [0.7883757, -31.908657, -37.62814, -7.8930697, -25.685106, 3.287382, -17.231863, -52.920845, -7.8881, -27.046614]
2025-08-07 01:20:11,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 72.0, 47.0, 23.0, 53.0, 47.0, 50.0, 106.0, 41.0, 66.0]
2025-08-07 01:20:11,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 26 minutes, 31 seconds)
2025-08-07 01:21:57,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:58,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -31.65155 ± 34.306
2025-08-07 01:21:58,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-26.2176, -37.400005, 12.01589, -97.30695, 10.552549, -69.445946, -51.415466, -49.73922, -1.7784923, -5.780278]
2025-08-07 01:21:58,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 76.0, 36.0, 159.0, 45.0, 68.0, 116.0, 247.0, 58.0, 27.0]
2025-08-07 01:21:58,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 24 minutes, 9 seconds)
2025-08-07 01:23:49,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.68823 ± 29.685
2025-08-07 01:23:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-39.457207, -17.44577, 0.8171078, -44.342846, -10.131295, -13.976823, -9.195656, 5.366891, 13.607132, 67.87613]
2025-08-07 01:23:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 45.0, 42.0, 61.0, 38.0, 51.0, 41.0, 15.0, 52.0, 73.0]
2025-08-07 01:23:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-4.69) for latency ExtremeSparseL4U32
2025-08-07 01:23:49,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 23 minutes, 17 seconds)
2025-08-07 01:25:27,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:34,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -201.31595 ± 257.516
2025-08-07 01:25:34,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-42.19025, 8.4114895, -25.928972, -541.4407, -90.127106, -644.31934, -38.880672, -6.421004, -44.059837, -588.2031]
2025-08-07 01:25:34,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 14.0, 60.0, 1000.0, 105.0, 1000.0, 60.0, 23.0, 88.0, 1000.0]
2025-08-07 01:25:34,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 20 minutes, 47 seconds)
2025-08-07 01:27:18,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:19,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.23925 ± 27.445
2025-08-07 01:27:19,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-11.550104, -0.21911025, -28.48621, -3.7029014, -7.5728726, -88.05883, -16.808744, -21.43592, -68.06388, -16.493967]
2025-08-07 01:27:19,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 44.0, 54.0, 41.0, 53.0, 86.0, 45.0, 45.0, 124.0, 51.0]
2025-08-07 01:27:19,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 18 minutes, 45 seconds)
2025-08-07 01:29:03,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:29:06,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -51.44027 ± 141.555
2025-08-07 01:29:06,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [10.648798, -12.787369, 13.259976, 17.081043, -27.368858, 20.696491, -10.507242, -59.077232, 4.0783463, -470.4267]
2025-08-07 01:29:06,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 52.0, 49.0, 78.0, 67.0, 41.0, 25.0, 82.0, 33.0, 1000.0]
2025-08-07 01:29:06,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 17 minutes, 20 seconds)
2025-08-07 01:31:01,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:31:02,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.79713 ± 46.026
2025-08-07 01:31:02,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-5.1870956, -41.932613, 13.029972, 1.3100746, 37.9615, -36.859814, -34.982548, -116.825806, -96.32644, 1.8414314]
2025-08-07 01:31:02,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 60.0, 34.0, 84.0, 59.0, 72.0, 47.0, 198.0, 93.0, 41.0]
2025-08-07 01:31:02,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 17 minutes, 43 seconds)
2025-08-07 01:32:43,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:45,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -48.08316 ± 61.025
2025-08-07 01:32:45,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [2.8495712, -2.0472105, -32.962326, -15.408526, -112.21905, -3.3565674, -173.13632, 4.6053667, -126.048225, -23.108335]
2025-08-07 01:32:45,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 46.0, 60.0, 24.0, 84.0, 42.0, 1000.0, 27.0, 84.0, 121.0]
2025-08-07 01:32:45,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 14 minutes)
2025-08-07 01:34:26,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:28,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -23.10504 ± 40.639
2025-08-07 01:34:28,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-13.011151, 8.711297, -36.73856, -18.801638, -20.973358, 21.467224, -11.509296, -15.775581, -136.21939, -8.199954]
2025-08-07 01:34:28,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 26.0, 73.0, 46.0, 78.0, 49.0, 42.0, 48.0, 207.0, 56.0]
2025-08-07 01:34:28,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 11 minutes, 42 seconds)
2025-08-07 01:36:15,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:17,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.46818 ± 28.224
2025-08-07 01:36:17,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.752629, 40.92286, -2.697054, -24.205494, -10.02527, -4.1498075, -78.75605, -17.17861, -14.495466, -19.849516]
2025-08-07 01:36:17,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 40.0, 53.0, 101.0, 33.0, 37.0, 1000.0, 47.0, 54.0, 55.0]
2025-08-07 01:36:17,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 11 minutes, 2 seconds)
2025-08-07 01:38:00,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:03,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.93309 ± 20.940
2025-08-07 01:38:03,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.4274, -68.45372, 2.218432, -22.447226, -42.06498, -24.230452, -53.527016, -19.989027, -0.6332753, -22.776222]
2025-08-07 01:38:03,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 72.0, 43.0, 50.0, 1000.0, 51.0, 61.0, 61.0, 45.0, 51.0]
2025-08-07 01:38:03,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 8 minutes, 51 seconds)
2025-08-07 01:39:48,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:39:49,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -51.63842 ± 34.013
2025-08-07 01:39:49,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-49.807167, -45.361877, -67.706566, 0.44610345, -63.08678, -22.996403, -32.135323, -50.427307, -136.57204, -48.736794]
2025-08-07 01:39:49,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [115.0, 52.0, 85.0, 14.0, 126.0, 42.0, 50.0, 87.0, 121.0, 66.0]
2025-08-07 01:39:49,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 4 minutes, 42 seconds)
2025-08-07 01:41:38,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:41:39,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.08997 ± 42.921
2025-08-07 01:41:39,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.0440955, 14.1922455, -16.797647, -14.143363, 0.22467014, -4.400242, -101.04305, -115.03387, -37.970764, 11.116434]
2025-08-07 01:41:39,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 41.0, 69.0, 52.0, 48.0, 44.0, 99.0, 129.0, 56.0, 33.0]
2025-08-07 01:41:39,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 4 minutes, 34 seconds)
2025-08-07 01:43:20,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:21,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -43.04436 ± 32.078
2025-08-07 01:43:21,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-9.821898, -17.455772, -108.921, -52.388363, -84.71129, -31.470007, -31.650873, -48.67345, -47.60444, 2.2534533]
2025-08-07 01:43:21,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 38.0, 87.0, 115.0, 124.0, 73.0, 49.0, 85.0, 57.0, 38.0]
2025-08-07 01:43:21,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 2 minutes, 45 seconds)
2025-08-07 01:45:15,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:18,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -33.42086 ± 51.418
2025-08-07 01:45:18,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.311244, -26.991661, 9.063611, -10.768682, -5.3573494, -153.05641, 0.5146382, -7.362916, -112.73777, -20.200775]
2025-08-07 01:45:18,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 74.0, 51.0, 54.0, 73.0, 188.0, 47.0, 1000.0, 132.0, 52.0]
2025-08-07 01:45:18,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 2 minutes, 28 seconds)
2025-08-07 01:46:55,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:46:56,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -38.23554 ± 58.768
2025-08-07 01:46:56,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-207.64604, -2.0424387, -6.765133, -8.758646, -15.760298, -38.304234, -48.602512, -11.629788, -1.6477213, -41.198547]
2025-08-07 01:46:56,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [154.0, 44.0, 34.0, 45.0, 86.0, 53.0, 57.0, 39.0, 47.0, 47.0]
2025-08-07 01:46:56,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 59 minutes, 7 seconds)
2025-08-07 01:48:42,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:48:43,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -36.01788 ± 31.404
2025-08-07 01:48:43,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-18.629555, 3.119886, -54.482395, -37.051064, -5.1719418, -47.890553, -70.82599, -93.52903, 8.738555, -44.456726]
2025-08-07 01:48:43,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 20.0, 70.0, 76.0, 50.0, 111.0, 84.0, 98.0, 22.0, 49.0]
2025-08-07 01:48:43,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 57 minutes, 24 seconds)
2025-08-07 01:50:27,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:50:28,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.12546 ± 19.279
2025-08-07 01:50:28,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-12.923083, -37.089684, 14.363795, 9.481573, 7.488729, -24.393576, 5.800446, 3.0681853, -34.03056, 16.979559]
2025-08-07 01:50:28,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 83.0, 61.0, 51.0, 82.0, 52.0, 38.0, 40.0, 51.0, 45.0]
2025-08-07 01:50:28,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 54 minutes, 36 seconds)
2025-08-07 01:52:14,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:16,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.84576 ± 27.917
2025-08-07 01:52:16,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [21.030106, 23.246933, -53.605225, 2.6692693, -0.4545661, -8.822136, 25.238613, -34.456352, -49.111076, -24.193127]
2025-08-07 01:52:16,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 36.0, 67.0, 51.0, 48.0, 43.0, 1000.0, 40.0, 62.0, 54.0]
2025-08-07 01:52:16,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 54 minutes, 10 seconds)
2025-08-07 01:54:01,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:54:02,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -14.94134 ± 18.686
2025-08-07 01:54:02,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-23.863293, 3.011581, -27.25588, -26.026354, -4.3438067, -5.103306, -21.227623, -55.700466, -0.18121819, 11.276954]
2025-08-07 01:54:02,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 40.0, 55.0, 76.0, 29.0, 47.0, 73.0, 119.0, 52.0, 62.0]
2025-08-07 01:54:02,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 50 minutes, 1 second)
2025-08-07 01:55:48,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:49,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -19.51022 ± 16.812
2025-08-07 01:55:49,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.111647, -33.665543, -3.3859246, -35.237064, -5.0035124, -24.880917, -51.85026, -15.154713, 6.2562046, -24.068844]
2025-08-07 01:55:49,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 56.0, 44.0, 53.0, 59.0, 71.0, 79.0, 39.0, 46.0, 56.0]
2025-08-07 01:55:49,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 50 minutes, 5 seconds)
2025-08-07 01:57:31,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:34,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.08510 ± 20.229
2025-08-07 01:57:34,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [8.153039, -29.176079, -13.453314, 18.842617, -9.863885, -51.544308, -16.5135, 12.740041, -12.658899, -27.376694]
2025-08-07 01:57:34,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 54.0, 59.0, 43.0, 43.0, 58.0, 1000.0, 72.0, 68.0, 49.0]
2025-08-07 01:57:34,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 48 minutes, 3 seconds)
2025-08-07 01:59:19,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:22,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -8.73361 ± 13.357
2025-08-07 01:59:22,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-11.022546, -11.638913, 14.614493, -5.0898266, -14.107269, -11.492377, -37.347176, -16.026674, 7.695044, -2.920852]
2025-08-07 01:59:22,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 1000.0, 21.0, 44.0, 35.0, 43.0, 64.0, 46.0, 23.0, 47.0]
2025-08-07 01:59:22,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 46 minutes, 42 seconds)
2025-08-07 02:01:08,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:09,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.99787 ± 19.178
2025-08-07 02:01:09,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.9065464, -26.702753, 5.7329164, 6.9440017, -36.12894, -19.431475, 3.5493777, 8.22195, 6.266699, -45.523884]
2025-08-07 02:01:09,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 61.0, 49.0, 31.0, 42.0, 70.0, 40.0, 42.0, 19.0, 118.0]
2025-08-07 02:01:09,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 44 minutes, 44 seconds)
2025-08-07 02:02:52,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:02:53,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -16.57270 ± 24.097
2025-08-07 02:02:53,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-55.990612, -11.439738, 4.7321906, -36.67775, 3.1444361, -21.876343, -10.705087, -55.03914, 16.905922, 1.2191675]
2025-08-07 02:02:53,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 43.0, 39.0, 53.0, 47.0, 47.0, 18.0, 102.0, 43.0, 85.0]
2025-08-07 02:02:53,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 42 minutes, 41 seconds)
2025-08-07 02:04:38,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:39,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.38086 ± 15.891
2025-08-07 02:04:39,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.10005719, -30.74709, -3.2260108, -5.0195594, 7.8379893, -36.69012, -13.727187, 3.2756023, 5.8966246, -31.308815]
2025-08-07 02:04:39,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 136.0, 61.0, 17.0, 43.0, 77.0, 57.0, 33.0, 33.0, 47.0]
2025-08-07 02:04:39,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 40 minutes, 41 seconds)
2025-08-07 02:06:24,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:06:29,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.12712 ± 25.027
2025-08-07 02:06:29,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-26.725325, -14.689238, -13.865716, -23.40433, -48.176163, -6.129846, 14.0728655, -13.522825, -78.23066, -50.599915]
2025-08-07 02:06:29,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 70.0, 1000.0, 1000.0, 63.0, 41.0, 34.0, 55.0, 115.0, 59.0]
2025-08-07 02:06:29,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 39 minutes, 44 seconds)
2025-08-07 02:08:14,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:08:15,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -24.41762 ± 37.426
2025-08-07 02:08:15,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.720423, -14.914383, -93.61762, -86.62325, -14.374162, -6.7338142, 17.338324, 10.466101, 10.839418, -48.836395]
2025-08-07 02:08:15,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 41.0, 92.0, 116.0, 48.0, 35.0, 33.0, 27.0, 31.0, 85.0]
2025-08-07 02:08:15,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 37 minutes, 44 seconds)
2025-08-07 02:10:07,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:10,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.92348 ± 22.924
2025-08-07 02:10:10,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.23677497, -41.41252, -31.440863, -12.142167, -29.763897, -77.40766, -15.438181, -14.75833, -57.969875, -8.664553]
2025-08-07 02:10:10,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 78.0, 75.0, 47.0, 56.0, 101.0, 43.0, 31.0, 68.0, 1000.0]
2025-08-07 02:10:10,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 37 minutes, 25 seconds)
2025-08-07 02:11:47,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:50,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -20.06598 ± 31.605
2025-08-07 02:11:50,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-34.956387, -21.936394, 19.955322, 1.8743521, 16.563494, -14.1930275, -62.34125, -85.54632, -13.80178, -6.27781]
2025-08-07 02:11:50,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 69.0, 41.0, 1000.0, 56.0, 60.0, 146.0, 189.0, 65.0, 51.0]
2025-08-07 02:11:50,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 34 minutes, 57 seconds)
2025-08-07 02:13:34,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:35,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -27.74889 ± 40.175
2025-08-07 02:13:35,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.013462, -141.45203, -2.7741795, -28.333902, 4.9041944, -26.82269, -32.60135, -30.856146, -2.1285508, -1.4107766]
2025-08-07 02:13:35,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 120.0, 27.0, 48.0, 26.0, 89.0, 51.0, 70.0, 50.0, 53.0]
2025-08-07 02:13:35,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 32 minutes, 59 seconds)
2025-08-07 02:15:20,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:23,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.89436 ± 17.950
2025-08-07 02:15:23,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-33.308624, 32.579193, 3.2339432, 2.1689138, -11.736273, -11.472449, -15.86351, -16.173576, -29.780539, -18.59067]
2025-08-07 02:15:23,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 68.0, 32.0, 40.0, 85.0, 46.0, 18.0, 50.0, 95.0, 41.0]
2025-08-07 02:15:23,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 30 minutes, 52 seconds)
2025-08-07 02:17:18,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:21,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -4.04162 ± 18.436
2025-08-07 02:17:21,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.0649734, -16.446836, -34.771698, -5.999596, -4.596149, 35.65742, 4.236167, -13.553462, 10.435088, -20.442093]
2025-08-07 02:17:21,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 82.0, 66.0, 39.0, 42.0, 1000.0, 15.0, 52.0, 97.0, 59.0]
2025-08-07 02:17:21,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-4.04) for latency ExtremeSparseL4U32
2025-08-07 02:17:21,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 31 minutes, 2 seconds)
2025-08-07 02:18:57,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:18:58,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -9.94917 ± 23.413
2025-08-07 02:18:58,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-28.992086, 9.2646055, -65.1491, 3.678426, 16.309822, 3.2011638, -5.3201427, 9.652731, -20.184294, -21.952772]
2025-08-07 02:18:58,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 58.0, 71.0, 26.0, 68.0, 48.0, 44.0, 64.0, 51.0, 70.0]
2025-08-07 02:18:58,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 26 minutes, 7 seconds)
2025-08-07 02:20:46,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:20:47,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.41101 ± 25.019
2025-08-07 02:20:47,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.929237, 1.3091884, -15.004984, -59.97997, -20.874554, 15.56709, -10.027089, -61.603695, 13.889519, -19.456343]
2025-08-07 02:20:47,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 52.0, 48.0, 83.0, 39.0, 32.0, 38.0, 131.0, 25.0, 41.0]
2025-08-07 02:20:47,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 25 minutes, 49 seconds)
2025-08-07 02:22:34,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:22:40,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.74751 ± 33.051
2025-08-07 02:22:40,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-13.33633, 0.8557848, -17.999119, -71.752335, -32.665695, -21.501965, -18.79743, 68.48301, -2.053074, -8.707968]
2025-08-07 02:22:40,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 41.0, 49.0, 1000.0, 51.0, 1000.0, 49.0, 1000.0, 41.0, 48.0]
2025-08-07 02:22:40,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 25 minutes, 19 seconds)
2025-08-07 02:24:30,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:24:34,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -26.40179 ± 32.807
2025-08-07 02:24:34,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-70.25596, -0.14853616, 38.97187, -28.791355, -44.755398, -8.181906, -71.24828, -49.597404, -29.032959, -0.9779594]
2025-08-07 02:24:34,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 14.0, 1000.0, 70.0, 69.0, 53.0, 63.0, 62.0, 36.0, 32.0]
2025-08-07 02:24:34,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 24 minutes, 31 seconds)
2025-08-07 02:26:17,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:26:20,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.86959 ± 18.672
2025-08-07 02:26:20,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-0.5699139, -8.79424, -33.876167, -5.7488666, 8.099952, 18.442596, -20.2123, -48.120045, -23.176933, -14.740015]
2025-08-07 02:26:20,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 59.0, 44.0, 1000.0, 45.0, 61.0, 61.0, 114.0, 92.0, 42.0]
2025-08-07 02:26:20,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 20 minutes, 51 seconds)
2025-08-07 02:27:56,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:57,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -18.58009 ± 50.643
2025-08-07 02:27:57,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-42.207993, -41.867706, 17.649681, 8.809631, -29.033598, 29.314308, -150.97537, -5.260637, 25.428823, 2.3419495]
2025-08-07 02:27:57,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 61.0, 42.0, 42.0, 73.0, 49.0, 136.0, 54.0, 50.0, 42.0]
2025-08-07 02:27:57,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 19 minutes, 10 seconds)
2025-08-07 02:29:46,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:47,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -25.29134 ± 32.840
2025-08-07 02:29:47,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-16.955973, 0.3852349, -48.94647, -56.29133, -12.561287, -104.720566, -9.747105, -5.01145, 6.9680157, -6.0324526]
2025-08-07 02:29:47,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 42.0, 79.0, 53.0, 42.0, 105.0, 40.0, 40.0, 63.0, 61.0]
2025-08-07 02:29:47,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 17 minutes, 27 seconds)
2025-08-07 02:31:30,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:31:33,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -15.28137 ± 41.426
2025-08-07 02:31:33,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-117.61445, 5.480596, -16.428425, -45.106808, -22.515905, -6.82235, -13.834103, 47.590378, -1.3630106, 17.80034]
2025-08-07 02:31:33,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 42.0, 64.0, 108.0, 53.0, 42.0, 50.0, 1000.0, 48.0, 40.0]
2025-08-07 02:31:33,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 14 minutes, 33 seconds)
2025-08-07 02:33:16,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:33:19,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.96090 ± 29.540
2025-08-07 02:33:19,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-58.983635, -38.25122, 4.216271, -18.449335, 1.3470827, -30.83433, -18.732426, -90.46651, 8.084036, -47.538998]
2025-08-07 02:33:19,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [106.0, 89.0, 48.0, 50.0, 42.0, 1000.0, 37.0, 80.0, 34.0, 78.0]
2025-08-07 02:33:19,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 11 minutes, 39 seconds)
2025-08-07 02:35:13,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:35:16,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.98771 ± 15.710
2025-08-07 02:35:16,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-7.7149835, -35.253853, -15.036741, -15.672505, 8.870028, 10.727701, 6.4876513, -28.316338, -28.957407, -5.010698]
2025-08-07 02:35:16,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 60.0, 45.0, 32.0, 43.0, 40.0, 63.0, 1000.0, 72.0, 54.0]
2025-08-07 02:35:16,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 11 minutes, 27 seconds)
2025-08-07 02:37:01,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:01,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -19.84830 ± 19.764
2025-08-07 02:37:01,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-24.536072, -0.22062084, 26.367191, -47.6851, -35.853325, -18.19157, -25.143517, -36.8627, -19.933662, -16.423594]
2025-08-07 02:37:01,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 60.0, 41.0, 76.0, 58.0, 41.0, 47.0, 78.0, 56.0, 38.0]
2025-08-07 02:37:02,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 10 minutes, 43 seconds)
2025-08-07 02:38:37,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:38,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -16.62897 ± 24.221
2025-08-07 02:38:38,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [25.16063, -19.383638, -31.319479, -16.788237, -62.108585, 6.200781, -8.405447, -47.184483, -12.92247, 0.4612805]
2025-08-07 02:38:38,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 49.0, 46.0, 48.0, 82.0, 23.0, 26.0, 56.0, 41.0, 36.0]
2025-08-07 02:38:38,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 7 minutes, 15 seconds)
2025-08-07 02:40:32,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:32,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.11949 ± 20.786
2025-08-07 02:40:32,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [5.4437222, -31.26524, -47.88457, -2.1163828, -36.190807, -10.5694, -21.67669, -3.2534378, 24.034391, 2.2835584]
2025-08-07 02:40:32,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 45.0, 84.0, 28.0, 63.0, 55.0, 48.0, 28.0, 55.0, 41.0]
2025-08-07 02:40:32,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 6 minutes, 34 seconds)
2025-08-07 02:42:10,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:11,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.19763 ± 14.439
2025-08-07 02:42:11,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.221145, -8.726247, -11.345694, 17.510387, -28.29225, 3.0000408, 6.4235954, 13.501179, -18.789154, 12.962998]
2025-08-07 02:42:11,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 60.0, 73.0, 45.0, 66.0, 17.0, 39.0, 44.0, 45.0, 30.0]
2025-08-07 02:42:11,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (-2.20) for latency ExtremeSparseL4U32
2025-08-07 02:42:11,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes, 51 seconds)
2025-08-07 02:43:51,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:43:52,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.01393 ± 23.218
2025-08-07 02:43:52,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-27.827398, -55.2543, -35.67481, -25.47301, -6.305727, -6.7806077, -69.1572, 13.573272, -43.015316, -24.22414]
2025-08-07 02:43:52,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 80.0, 90.0, 41.0, 43.0, 40.0, 156.0, 45.0, 71.0, 70.0]
2025-08-07 02:43:52,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 seconds)
2025-08-07 02:45:38,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:45:40,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.88122 ± 17.509
2025-08-07 02:45:40,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [11.8446, -32.866173, -21.588535, 7.43925, -27.175882, -2.3724377, 24.960634, -14.420552, -14.902296, 0.2691708]
2025-08-07 02:45:40,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 65.0, 40.0, 46.0, 1000.0, 62.0, 41.0, 42.0, 44.0, 59.0]
2025-08-07 02:45:40,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 48 seconds)
2025-08-07 02:47:25,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:47:26,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -13.71743 ± 31.029
2025-08-07 02:47:26,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-87.191536, -25.979052, -0.2188361, -49.51348, 16.132057, -3.2474418, -4.632042, 12.552275, -10.160209, 15.083915]
2025-08-07 02:47:26,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [88.0, 51.0, 45.0, 61.0, 36.0, 40.0, 41.0, 59.0, 43.0, 44.0]
2025-08-07 02:47:26,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 58 minutes, 2 seconds)
2025-08-07 02:49:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:09,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -36.39062 ± 37.479
2025-08-07 02:49:09,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-27.241995, -34.41759, 6.1206493, -36.03132, 1.9988592, -116.9651, -52.58347, -81.209076, 6.973665, -30.550797]
2025-08-07 02:49:09,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [88.0, 61.0, 40.0, 44.0, 40.0, 1000.0, 46.0, 82.0, 34.0, 53.0]
2025-08-07 02:49:09,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 3 seconds)
2025-08-07 02:50:53,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -11.63201 ± 17.995
2025-08-07 02:50:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-27.232025, 9.0660715, -20.509296, -44.1478, -19.230158, -16.515173, 6.387887, -22.374374, 13.99469, 4.2400594]
2025-08-07 02:50:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 22.0, 72.0, 1000.0, 62.0, 41.0, 19.0, 47.0, 66.0, 41.0]
2025-08-07 02:50:55,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 11 seconds)
2025-08-07 02:52:37,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:52:38,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.54144 ± 15.530
2025-08-07 02:52:38,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [4.7185683, 15.579824, -18.400581, -27.68297, 7.352778, -5.837792, -13.476458, -28.48088, 15.641869, -4.8287873]
2025-08-07 02:52:38,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 38.0, 57.0, 41.0, 20.0, 41.0, 50.0, 88.0, 39.0, 35.0]
2025-08-07 02:52:38,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 36 seconds)
2025-08-07 02:54:21,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:54:22,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -13.49352 ± 23.479
2025-08-07 02:54:22,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-33.403206, 17.877855, -15.369587, -43.205017, 8.79153, 5.649417, -3.2013412, -59.9469, -4.0168934, -8.11105]
2025-08-07 02:54:22,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 38.0, 52.0, 43.0, 41.0, 58.0, 41.0, 80.0, 39.0, 21.0]
2025-08-07 02:54:22,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes, 23 seconds)
2025-08-07 02:56:04,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:56:05,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -18.12417 ± 31.389
2025-08-07 02:56:05,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-21.634205, -5.360365, -46.923985, -12.588898, -98.18414, -20.57176, 11.43394, 8.700232, 3.1893375, 0.6981247]
2025-08-07 02:56:05,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 17.0, 60.0, 74.0, 131.0, 40.0, 18.0, 40.0, 27.0, 65.0]
2025-08-07 02:56:05,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 25 seconds)
2025-08-07 02:57:48,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:49,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -21.69347 ± 48.643
2025-08-07 02:57:49,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [3.3776257, 1.910096, -5.8080134, 4.6734004, -157.38664, 14.246723, -54.53582, -8.611747, 1.6892028, -16.489532]
2025-08-07 02:57:49,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 42.0, 45.0, 71.0, 143.0, 56.0, 85.0, 55.0, 76.0, 57.0]
2025-08-07 02:57:49,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 49 seconds)
2025-08-07 02:59:31,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:34,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -7.47535 ± 17.264
2025-08-07 02:59:34,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-21.627058, -8.279589, -17.928585, 0.88742495, -42.480072, -3.93999, 6.1653585, -9.561042, 26.643929, -4.633884]
2025-08-07 02:59:34,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 1000.0, 57.0, 41.0, 61.0, 51.0, 22.0, 47.0, 50.0, 64.0]
2025-08-07 02:59:34,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 56 seconds)
2025-08-07 03:01:17,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:18,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -28.48649 ± 50.024
2025-08-07 03:01:18,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.510222, -171.27176, -23.207113, -34.77013, -10.213333, -36.190918, -0.30681422, 2.7097368, -18.265078, 15.160775]
2025-08-07 03:01:18,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 130.0, 52.0, 54.0, 44.0, 45.0, 40.0, 41.0, 39.0, 25.0]
2025-08-07 03:01:18,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 19 seconds)
2025-08-07 03:03:03,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:04,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -29.22047 ± 31.337
2025-08-07 03:03:04,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-23.625675, -20.754408, -3.2060392, -95.69837, -6.6585217, -30.601778, -9.385041, -34.711002, -76.47125, 8.907355]
2025-08-07 03:03:04,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 43.0, 59.0, 65.0, 40.0, 44.0, 57.0, 63.0, 72.0, 42.0]
2025-08-07 03:03:04,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 44 seconds)
2025-08-07 03:04:53,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:04:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.22383 ± 14.126
2025-08-07 03:04:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-14.033103, -24.445904, -8.315178, -36.30164, 8.824025, -20.303135, -12.356015, -10.246347, 7.468998, 7.470044]
2025-08-07 03:04:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 34.0, 51.0, 74.0, 49.0, 52.0, 40.0, 56.0, 72.0, 41.0]
2025-08-07 03:04:54,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 34 seconds)
2025-08-07 03:06:27,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:06:30,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -40.79615 ± 53.119
2025-08-07 03:06:30,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-8.62871, -4.892818, -10.198404, 8.697661, -166.86653, -4.268619, -81.68406, -20.143421, -25.03476, -94.941864]
2025-08-07 03:06:30,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 45.0, 39.0, 65.0, 1000.0, 53.0, 115.0, 51.0, 58.0, 93.0]
2025-08-07 03:06:30,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 11 seconds)
2025-08-07 03:08:13,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:14,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.16766 ± 9.965
2025-08-07 03:08:14,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [7.06133, -21.397202, 0.15199472, 5.674819, 8.881372, -5.7203817, -18.012802, -11.639037, -6.152665, -10.524068]
2025-08-07 03:08:14,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 39.0, 47.0, 60.0, 41.0, 43.0, 67.0, 35.0, 50.0, 45.0]
2025-08-07 03:08:14,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 23 seconds)
2025-08-07 03:09:58,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:01,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.96266 ± 13.075
2025-08-07 03:10:01,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.9954302, -1.1924137, -19.350428, 3.6465154, -15.595014, -23.847143, 11.09501, 5.938832, -6.867731, 19.541191]
2025-08-07 03:10:01,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 75.0, 45.0, 24.0, 41.0, 40.0, 40.0, 1000.0, 52.0, 44.0]
2025-08-07 03:10:01,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 51 seconds)
2025-08-07 03:11:42,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:43,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: 0.28332 ± 8.699
2025-08-07 03:11:43,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-6.7643876, 5.2785096, 3.5507212, -8.442413, 9.223646, 10.86512, 7.0932093, -12.765969, -11.825898, 6.620666]
2025-08-07 03:11:43,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 40.0, 47.0, 47.0, 52.0, 42.0, 49.0, 40.0, 39.0, 60.0]
2025-08-07 03:11:43,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1226 [INFO]: New best (0.28) for latency ExtremeSparseL4U32
2025-08-07 03:11:43,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 53 seconds)
2025-08-07 03:13:26,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:27,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -7.69867 ± 21.976
2025-08-07 03:13:27,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-55.13713, 12.631003, 1.0859703, -1.8324006, -1.6567973, -1.3977685, -41.27777, -11.885402, 18.837992, 3.645642]
2025-08-07 03:13:27,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 70.0, 39.0, 59.0, 40.0, 83.0, 74.0, 72.0, 42.0, 40.0]
2025-08-07 03:13:27,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 48 seconds)
2025-08-07 03:15:16,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:17,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -31.96675 ± 40.681
2025-08-07 03:15:17,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [8.896427, 17.196775, -39.680653, -18.437555, -125.80859, -70.49991, -17.707964, -30.909895, 6.0208116, -48.73692]
2025-08-07 03:15:17,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 40.0, 74.0, 39.0, 130.0, 107.0, 43.0, 64.0, 58.0, 51.0]
2025-08-07 03:15:17,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 52 seconds)
2025-08-07 03:16:53,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:16:54,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.98270 ± 27.559
2025-08-07 03:16:54,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.8115058, -64.63309, -65.547005, -7.650517, 1.0280753, -2.683365, -14.665148, 19.18484, 11.615716, -2.6650076]
2025-08-07 03:16:54,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 66.0, 60.0, 39.0, 16.0, 48.0, 55.0, 40.0, 16.0, 41.0]
2025-08-07 03:16:54,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 45 seconds)
2025-08-07 03:18:36,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:18:39,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -15.15575 ± 26.442
2025-08-07 03:18:39,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.2183502, -7.6710052, -22.044786, -76.30125, 11.812791, -12.58261, -9.919636, -48.404396, 4.405997, 12.3657255]
2025-08-07 03:18:39,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 44.0, 40.0, 1000.0, 40.0, 57.0, 48.0, 53.0, 42.0, 40.0]
2025-08-07 03:18:39,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 54 seconds)
2025-08-07 03:20:25,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:28,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -7.75239 ± 14.635
2025-08-07 03:20:28,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [11.261934, -6.2010913, 1.7751616, -12.8054085, 20.167463, -31.290789, -22.841684, -14.161835, -7.9736533, -15.45403]
2025-08-07 03:20:28,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 40.0, 41.0, 52.0, 38.0, 1000.0, 40.0, 84.0, 41.0, 41.0]
2025-08-07 03:20:28,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 30 seconds)
2025-08-07 03:22:11,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:12,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -0.11834 ± 17.614
2025-08-07 03:22:12,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [7.917839, -34.62254, -7.1278887, 14.962127, -12.500481, 20.061424, 6.6799307, 25.130424, -17.78935, -3.8948746]
2025-08-07 03:22:12,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 56.0, 115.0, 63.0, 41.0, 28.0, 41.0, 51.0, 40.0, 41.0]
2025-08-07 03:22:12,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 43 seconds)
2025-08-07 03:23:55,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:55,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.14685 ± 15.059
2025-08-07 03:23:55,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-19.086176, 4.1314282, 1.9267266, 2.938993, -1.5027475, -36.38355, -7.7326946, -20.85621, -5.046268, 20.141968]
2025-08-07 03:23:55,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 48.0, 52.0, 26.0, 51.0, 45.0, 51.0, 47.0, 23.0, 56.0]
2025-08-07 03:23:55,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 45 seconds)
2025-08-07 03:25:32,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:33,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -7.91849 ± 26.959
2025-08-07 03:25:33,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-13.356259, -8.882982, 4.1733665, 28.92982, 12.174911, -11.915836, -63.64713, -45.160973, 20.002445, -1.5022187]
2025-08-07 03:25:33,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 16.0, 39.0, 41.0, 25.0, 41.0, 114.0, 65.0, 55.0, 67.0]
2025-08-07 03:25:33,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 1 second)
2025-08-07 03:27:22,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:25,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.69782 ± 30.710
2025-08-07 03:27:25,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-94.64995, 4.5899625, 13.782398, 6.946851, -7.7877884, -0.79342747, -12.027727, -32.196445, 7.87044, 7.287453]
2025-08-07 03:27:25,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 1000.0, 87.0, 24.0, 39.0, 42.0, 27.0, 41.0, 40.0, 39.0]
2025-08-07 03:27:25,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 32 seconds)
2025-08-07 03:29:00,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:01,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -23.66017 ± 35.890
2025-08-07 03:29:01,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-3.9775746, -95.49392, -46.314697, -11.149143, -81.95114, 13.95281, 1.7888935, -2.000568, -12.489709, 1.0333979]
2025-08-07 03:29:01,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 109.0, 56.0, 43.0, 88.0, 39.0, 29.0, 46.0, 43.0, 57.0]
2025-08-07 03:29:01,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 22 seconds)
2025-08-07 03:30:49,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:50,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -7.62507 ± 17.826
2025-08-07 03:30:50,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [17.439753, -0.55565035, 5.323897, -12.1953945, -5.298816, -14.942013, -47.54717, 6.579182, -26.812798, 1.7583534]
2025-08-07 03:30:50,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 21.0, 40.0, 39.0, 14.0, 73.0, 64.0, 43.0, 94.0, 42.0]
2025-08-07 03:30:50,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 48 seconds)
2025-08-07 03:32:27,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:27,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -2.37714 ± 9.054
2025-08-07 03:32:27,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [1.8676554, -5.0737143, -0.25267115, 0.26882932, -22.21723, 9.474353, 8.850607, -13.344959, -2.517813, -0.8264584]
2025-08-07 03:32:27,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 65.0, 30.0, 30.0, 51.0, 51.0, 57.0, 58.0, 40.0, 41.0]
2025-08-07 03:32:28,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 56 seconds)
2025-08-07 03:34:10,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:13,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -5.08690 ± 22.508
2025-08-07 03:34:13,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-24.753675, -16.805933, -4.1887894, -35.753757, 32.021793, -23.886595, 33.00609, -14.532873, -8.696081, 12.720791]
2025-08-07 03:34:13,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 47.0, 35.0, 40.0, 61.0, 43.0, 1000.0, 95.0, 41.0, 63.0]
2025-08-07 03:34:13,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 24 seconds)
2025-08-07 03:35:54,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:35:55,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -13.76926 ± 17.828
2025-08-07 03:35:55,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-2.3414273, -15.120055, -12.71382, 0.58168113, -0.03805888, -9.576506, -12.98643, -27.889273, -60.275116, 2.666433]
2025-08-07 03:35:55,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 35.0, 65.0, 40.0, 48.0, 65.0, 51.0, 48.0, 77.0, 14.0]
2025-08-07 03:35:55,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 30 seconds)
2025-08-07 03:37:37,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:37:38,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.40053 ± 23.699
2025-08-07 03:37:38,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-13.361543, -73.48369, 14.645819, -6.728043, 8.543111, -17.588207, -6.9104104, -29.668188, -1.4304357, 1.976319]
2025-08-07 03:37:38,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 82.0, 44.0, 52.0, 22.0, 30.0, 47.0, 60.0, 59.0, 47.0]
2025-08-07 03:37:38,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 53 seconds)
2025-08-07 03:39:21,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:39:22,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -17.31736 ± 32.238
2025-08-07 03:39:22,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-17.603497, 15.812323, -17.21595, -1.8390462, -3.950611, 22.093243, -8.475279, -93.94478, -13.828271, -54.22173]
2025-08-07 03:39:22,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 51.0, 50.0, 40.0, 51.0, 42.0, 39.0, 111.0, 62.0, 55.0]
2025-08-07 03:39:22,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 7 seconds)
2025-08-07 03:41:03,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:41:04,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -12.52483 ± 17.982
2025-08-07 03:41:04,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-44.768906, -36.97775, -9.648008, -1.8853916, 7.2781243, 4.9187512, 10.746701, -27.839094, -13.551121, -13.521617]
2025-08-07 03:41:04,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 68.0, 43.0, 55.0, 41.0, 51.0, 52.0, 62.0, 55.0, 64.0]
2025-08-07 03:41:04,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 26 seconds)
2025-08-07 03:42:47,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:48,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -10.45752 ± 20.301
2025-08-07 03:42:48,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-32.130276, 7.483973, 1.557537, -33.800106, 7.557078, -49.85379, -8.9251585, 16.260456, -3.6862683, -9.038692]
2025-08-07 03:42:48,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [80.0, 42.0, 59.0, 67.0, 45.0, 64.0, 54.0, 40.0, 41.0, 40.0]
2025-08-07 03:42:48,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 43 seconds)
2025-08-07 03:44:30,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:33,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1221 [DEBUG]: Total Reward: -6.73680 ± 22.672
2025-08-07 03:44:33,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1222 [DEBUG]: All rewards: [-4.5008254, -44.130997, -20.931, 14.065118, -33.536602, 9.009317, 33.87352, -22.483923, -7.454034, 8.721413]
2025-08-07 03:44:33,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 69.0, 40.0, 70.0, 154.0, 38.0, 1000.0, 48.0, 47.0, 12.0]
2025-08-07 03:44:33,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-ant):1251 [DEBUG]: Training session finished
