2025-08-07 00:47:48,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:47:48,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-ant/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:47:48,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14622d08fb90>}
2025-08-07 00:47:48,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1111 [DEBUG]: using device: cuda
2025-08-07 00:47:48,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1133 [INFO]: Creating new trainer
2025-08-07 00:47:48,203 baseline-bpql-noiseperc20-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=283, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:47:48,203 baseline-bpql-noiseperc20-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:47:49,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1194 [DEBUG]: Starting training session...
2025-08-07 00:47:49,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:37,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:42,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -332.95609 ± 493.075
2025-08-07 00:49:42,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-91.94329, 4.5150185, -83.98566, -25.306126, -27.306713, -21.889389, -1305.9805, -266.63385, -205.90045, -1305.1299]
2025-08-07 00:49:42,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [131.0, 39.0, 125.0, 59.0, 51.0, 53.0, 1000.0, 203.0, 301.0, 1000.0]
2025-08-07 00:49:42,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-332.96) for latency ExtremeSparseL4U32
2025-08-07 00:49:42,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 7 minutes, 42 seconds)
2025-08-07 00:51:27,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:30,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -112.97347 ± 280.827
2025-08-07 00:51:30,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [3.305306, -18.415377, -21.836018, 9.494356, -952.51184, -5.0907426, 2.3239698, -55.72033, -26.050652, -65.233376]
2025-08-07 00:51:30,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 90.0, 86.0, 28.0, 1000.0, 50.0, 39.0, 86.0, 57.0, 155.0]
2025-08-07 00:51:30,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-112.97) for latency ExtremeSparseL4U32
2025-08-07 00:51:30,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 32 seconds)
2025-08-07 00:53:19,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:22,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -122.48075 ± 301.121
2025-08-07 00:53:22,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [17.89568, -182.18379, 42.694283, -38.888397, -13.941661, 2.6013267, -11.5813055, -58.50895, 24.840456, -1007.7351]
2025-08-07 00:53:22,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 214.0, 75.0, 64.0, 44.0, 29.0, 52.0, 118.0, 47.0, 1000.0]
2025-08-07 00:53:22,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 59 minutes, 24 seconds)
2025-08-07 00:55:20,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:23,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -119.78562 ± 290.783
2025-08-07 00:55:23,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-22.518038, -13.722883, -5.158557, -96.10069, -25.362976, -11.219965, -987.83014, -48.563732, 0.78909606, 11.831617]
2025-08-07 00:55:23,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 46.0, 51.0, 95.0, 74.0, 46.0, 1000.0, 107.0, 48.0, 52.0]
2025-08-07 00:55:23,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 1 minute, 42 seconds)
2025-08-07 00:57:11,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:57:13,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -45.91824 ± 43.862
2025-08-07 00:57:13,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-37.644093, -131.84619, -6.970697, -79.78453, -111.396866, -6.2828918, -1.2299813, -19.732971, -22.088783, -42.20542]
2025-08-07 00:57:13,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [92.0, 161.0, 54.0, 116.0, 137.0, 57.0, 56.0, 53.0, 108.0, 98.0]
2025-08-07 00:57:13,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-45.92) for latency ExtremeSparseL4U32
2025-08-07 00:57:13,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 58 minutes, 32 seconds)
2025-08-07 00:58:55,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:58,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -96.13169 ± 198.972
2025-08-07 00:58:58,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-33.82155, 14.443439, -683.1391, -94.32966, -81.00446, -40.523327, 2.2286718, -34.733047, 25.706852, -36.14471]
2025-08-07 00:58:58,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [123.0, 81.0, 1000.0, 147.0, 129.0, 76.0, 102.0, 83.0, 34.0, 132.0]
2025-08-07 00:58:58,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 54 minutes, 2 seconds)
2025-08-07 01:00:44,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:46,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -14.57790 ± 31.320
2025-08-07 01:00:46,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-79.27283, 37.090576, -19.975727, 5.3592286, -19.576735, -56.199646, -4.1091046, -16.074993, 1.8489437, 5.1313114]
2025-08-07 01:00:46,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [116.0, 93.0, 106.0, 100.0, 72.0, 311.0, 196.0, 129.0, 79.0, 82.0]
2025-08-07 01:00:46,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-14.58) for latency ExtremeSparseL4U32
2025-08-07 01:00:46,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 52 minutes, 23 seconds)
2025-08-07 01:02:37,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:02:41,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -16.26439 ± 26.688
2025-08-07 01:02:41,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-10.528876, -36.020367, -54.161396, -12.794992, -0.71803784, 0.71734864, -63.917656, -14.54456, -1.6792015, 31.003813]
2025-08-07 01:02:41,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 229.0, 194.0, 229.0, 80.0, 129.0, 1000.0, 54.0, 49.0, 84.0]
2025-08-07 01:02:41,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 51 minutes, 23 seconds)
2025-08-07 01:04:35,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:04:40,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -15.94398 ± 54.272
2025-08-07 01:04:40,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [13.738724, 15.866834, 16.251507, -3.3049858, -36.3803, -57.986717, 28.696226, -11.894291, 32.361595, -156.78844]
2025-08-07 01:04:40,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [93.0, 47.0, 27.0, 218.0, 225.0, 414.0, 183.0, 166.0, 286.0, 1000.0]
2025-08-07 01:04:40,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 48 minutes, 58 seconds)
2025-08-07 01:06:27,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:28,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -2.55056 ± 37.367
2025-08-07 01:06:28,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [39.126614, -15.063737, 5.798042, 14.591952, -25.524107, 25.435425, 16.175844, -100.3377, 19.257969, -4.9659085]
2025-08-07 01:06:28,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 58.0, 46.0, 27.0, 108.0, 56.0, 62.0, 122.0, 46.0, 86.0]
2025-08-07 01:06:28,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-2.55) for latency ExtremeSparseL4U32
2025-08-07 01:06:28,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 46 minutes, 46 seconds)
2025-08-07 01:08:14,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:18,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -44.58833 ± 65.284
2025-08-07 01:08:18,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-117.29116, -34.294643, -35.946495, 4.25532, -110.199844, 30.384346, 20.364338, -179.73076, -1.7498335, -21.67451]
2025-08-07 01:08:18,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 98.0, 65.0, 83.0, 563.0, 64.0, 76.0, 176.0, 63.0, 67.0]
2025-08-07 01:08:18,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 46 minutes, 1 second)
2025-08-07 01:10:10,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 0.90002 ± 77.621
2025-08-07 01:10:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [195.74454, -3.0034575, 4.3756204, -125.393036, -4.713822, -69.36643, 8.749944, -27.10192, 22.06576, 7.6429715]
2025-08-07 01:10:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 68.0, 54.0, 1000.0, 78.0, 216.0, 53.0, 74.0, 73.0, 180.0]
2025-08-07 01:10:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (0.90) for latency ExtremeSparseL4U32
2025-08-07 01:10:15,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 46 minutes, 49 seconds)
2025-08-07 01:12:05,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:12:07,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -11.84176 ± 57.742
2025-08-07 01:12:07,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [11.572734, -1.785988, 26.229446, 33.008095, 13.062797, -16.010977, -2.672495, -1.4629343, -0.3797951, -179.97853]
2025-08-07 01:12:07,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 51.0, 47.0, 51.0, 46.0, 58.0, 105.0, 86.0, 81.0, 338.0]
2025-08-07 01:12:07,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 44 minutes, 16 seconds)
2025-08-07 01:13:56,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:00,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 11.72427 ± 84.928
2025-08-07 01:14:00,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [13.8636, -4.3399415, 17.915276, -10.059577, -131.53062, -34.18332, -21.021524, 213.0894, 90.8388, -17.329418]
2025-08-07 01:14:00,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 54.0, 36.0, 103.0, 229.0, 90.0, 75.0, 1000.0, 1000.0, 58.0]
2025-08-07 01:14:00,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (11.72) for latency ExtremeSparseL4U32
2025-08-07 01:14:00,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 40 minutes, 39 seconds)
2025-08-07 01:15:43,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:15:45,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -7.22491 ± 26.717
2025-08-07 01:15:45,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-17.156956, -40.212887, -2.9047081, 41.856735, -22.280352, -1.9525839, -17.720547, 2.921653, 31.689497, -46.488953]
2025-08-07 01:15:45,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 104.0, 48.0, 42.0, 77.0, 44.0, 89.0, 71.0, 77.0, 126.0]
2025-08-07 01:15:45,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 37 minutes, 33 seconds)
2025-08-07 01:17:33,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:36,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -11.77149 ± 54.078
2025-08-07 01:17:36,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-24.145823, 6.660509, 4.243604, 11.72125, -65.24558, -48.7744, -43.772953, 131.30862, -48.654922, -41.05526]
2025-08-07 01:17:36,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 52.0, 104.0, 114.0, 126.0, 103.0, 93.0, 1000.0, 169.0, 89.0]
2025-08-07 01:17:36,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 36 minutes, 24 seconds)
2025-08-07 01:19:25,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 18.20348 ± 77.315
2025-08-07 01:19:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [181.28166, -15.29427, -86.281586, -0.084211536, 31.732737, 131.65816, -1.5228343, -12.104276, 14.651348, -62.00194]
2025-08-07 01:19:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 66.0, 129.0, 29.0, 1000.0, 1000.0, 23.0, 151.0, 66.0, 89.0]
2025-08-07 01:19:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (18.20) for latency ExtremeSparseL4U32
2025-08-07 01:19:31,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 33 minutes, 58 seconds)
2025-08-07 01:21:20,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:23,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -17.67696 ± 36.873
2025-08-07 01:21:23,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-4.846269, -40.741257, 30.856491, -100.61467, 12.231345, -25.333689, -3.517114, -3.7224646, -54.459934, 13.377974]
2025-08-07 01:21:23,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 101.0, 57.0, 171.0, 56.0, 67.0, 140.0, 85.0, 1000.0, 62.0]
2025-08-07 01:21:23,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 31 minutes, 58 seconds)
2025-08-07 01:23:12,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:16,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -27.84991 ± 47.276
2025-08-07 01:23:16,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-21.736847, -41.917168, -10.414251, 21.752335, -5.065194, 5.884163, -60.421314, -1.4351599, -153.09196, -12.053736]
2025-08-07 01:23:16,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [104.0, 83.0, 66.0, 38.0, 45.0, 1000.0, 100.0, 64.0, 184.0, 41.0]
2025-08-07 01:23:16,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 29 minutes, 51 seconds)
2025-08-07 01:25:08,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:12,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -20.33015 ± 31.611
2025-08-07 01:25:12,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [16.858246, -60.149303, 14.529073, -54.70048, -10.2761, -56.3408, -58.872726, 1.1130397, -9.190437, 13.727994]
2025-08-07 01:25:12,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 1000.0, 46.0, 351.0, 80.0, 82.0, 126.0, 63.0, 84.0, 46.0]
2025-08-07 01:25:12,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 31 minutes, 16 seconds)
2025-08-07 01:26:56,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:26:57,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -4.97843 ± 20.465
2025-08-07 01:26:57,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [14.34422, 14.248856, -7.9284277, 21.649765, 20.119707, -35.387455, -13.338904, -24.507387, -31.741863, -7.2428107]
2025-08-07 01:26:57,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 47.0, 62.0, 46.0, 38.0, 130.0, 65.0, 64.0, 131.0, 67.0]
2025-08-07 01:26:57,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 27 minutes, 37 seconds)
2025-08-07 01:28:46,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:28:50,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -27.99335 ± 50.569
2025-08-07 01:28:50,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-47.80958, 0.6520269, 20.918625, -65.81733, -71.10043, 62.694035, 27.21006, -95.91165, -80.81106, -29.95821]
2025-08-07 01:28:50,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [122.0, 58.0, 45.0, 90.0, 177.0, 67.0, 63.0, 1000.0, 112.0, 99.0]
2025-08-07 01:28:50,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 25 minutes, 13 seconds)
2025-08-07 01:30:37,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:30:39,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -27.63995 ± 26.172
2025-08-07 01:30:39,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-26.297012, -31.715866, -73.61399, -2.694734, -1.0508944, -72.7926, -39.465317, -17.972004, 2.5741389, -13.371252]
2025-08-07 01:30:39,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 79.0, 156.0, 70.0, 77.0, 149.0, 110.0, 57.0, 40.0, 64.0]
2025-08-07 01:30:39,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 22 minutes, 37 seconds)
2025-08-07 01:32:29,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:30,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -12.48772 ± 25.815
2025-08-07 01:32:30,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-20.45072, 11.498805, 26.983131, -14.145336, 13.641314, -60.57159, -8.9537, -11.32338, -50.61404, -10.941664]
2025-08-07 01:32:30,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [70.0, 48.0, 45.0, 65.0, 49.0, 196.0, 45.0, 43.0, 77.0, 45.0]
2025-08-07 01:32:30,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 20 minutes, 24 seconds)
2025-08-07 01:34:18,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:23,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 9.28427 ± 43.683
2025-08-07 01:34:23,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [8.80457, 36.04561, 12.285717, 22.399223, 45.431564, 86.68227, -31.690695, -81.398605, 15.717496, -21.434414]
2025-08-07 01:34:23,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 58.0, 72.0, 65.0, 1000.0, 1000.0, 140.0, 132.0, 44.0, 76.0]
2025-08-07 01:34:23,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 17 minutes, 42 seconds)
2025-08-07 01:36:15,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:16,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -15.01546 ± 44.509
2025-08-07 01:36:16,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-62.919193, -103.26806, -57.45598, -2.3756983, 36.288673, 23.363605, 13.211489, 37.35928, -9.7042055, -24.654558]
2025-08-07 01:36:16,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [107.0, 127.0, 70.0, 58.0, 52.0, 71.0, 87.0, 85.0, 112.0, 69.0]
2025-08-07 01:36:16,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 17 minutes, 57 seconds)
2025-08-07 01:38:00,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:04,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 11.59665 ± 54.134
2025-08-07 01:38:04,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [132.02571, 17.684416, 0.7727219, -0.45745248, 56.164482, 11.776639, 6.9568453, -97.4012, -3.5718286, -7.9838514]
2025-08-07 01:38:04,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 44.0, 48.0, 22.0, 1000.0, 47.0, 67.0, 157.0, 58.0, 77.0]
2025-08-07 01:38:04,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 14 minutes, 58 seconds)
2025-08-07 01:39:56,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:00,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 13.77949 ± 53.343
2025-08-07 01:40:00,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-43.16927, -42.956425, 146.89099, 2.5782597, 21.492285, 54.791744, -28.763966, -7.089831, 8.624139, 25.396944]
2025-08-07 01:40:00,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 106.0, 1000.0, 13.0, 51.0, 1000.0, 102.0, 70.0, 55.0, 28.0]
2025-08-07 01:40:00,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 14 minutes, 45 seconds)
2025-08-07 01:41:44,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:41:49,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -5.95187 ± 99.199
2025-08-07 01:41:49,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-43.828873, 178.48549, -116.59829, 151.42497, -50.23666, 18.128794, -44.83984, -152.5739, 12.776866, -12.257309]
2025-08-07 01:41:49,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [101.0, 1000.0, 148.0, 1000.0, 67.0, 50.0, 122.0, 196.0, 100.0, 161.0]
2025-08-07 01:41:49,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 12 minutes, 23 seconds)
2025-08-07 01:43:37,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:38,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -23.17065 ± 44.879
2025-08-07 01:43:38,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-91.52939, -56.61893, -26.931337, -105.55092, 0.9481552, 25.391668, 17.577646, 21.3343, 7.9155746, -24.243252]
2025-08-07 01:43:38,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [158.0, 75.0, 61.0, 142.0, 45.0, 62.0, 61.0, 20.0, 47.0, 80.0]
2025-08-07 01:43:38,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 9 minutes, 34 seconds)
2025-08-07 01:45:30,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:34,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -3.87328 ± 57.007
2025-08-07 01:45:34,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-118.10949, 22.58994, 80.67373, -7.7948527, -73.20549, -24.840935, 20.062994, 70.722824, 5.566448, -14.397971]
2025-08-07 01:45:34,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [295.0, 90.0, 1000.0, 53.0, 152.0, 55.0, 36.0, 415.0, 46.0, 59.0]
2025-08-07 01:45:34,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 8 minutes, 17 seconds)
2025-08-07 01:47:18,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:47:20,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -16.83748 ± 42.375
2025-08-07 01:47:20,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [0.7007237, 6.7360945, -47.942577, 13.603474, 20.842434, -14.910003, -8.389685, -13.569606, -131.57805, 6.1324034]
2025-08-07 01:47:20,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 195.0, 76.0, 42.0, 51.0, 99.0, 59.0, 84.0, 257.0, 73.0]
2025-08-07 01:47:20,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 5 minutes, 55 seconds)
2025-08-07 01:49:04,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:10,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 21.49641 ± 62.491
2025-08-07 01:49:10,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [102.158844, -16.722412, -14.417091, -2.5507207, 2.1386533, -38.27528, -56.26642, 4.8730326, 98.32177, 135.70374]
2025-08-07 01:49:10,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 85.0, 67.0, 49.0, 26.0, 75.0, 129.0, 34.0, 1000.0, 1000.0]
2025-08-07 01:49:10,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (21.50) for latency ExtremeSparseL4U32
2025-08-07 01:49:10,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 2 minutes, 52 seconds)
2025-08-07 01:51:07,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:51:10,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -2.01285 ± 53.582
2025-08-07 01:51:10,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [127.84134, -3.355241, 1.3585944, -19.911533, 20.900728, -9.10445, -99.38589, 8.039041, -38.33423, -8.176902]
2025-08-07 01:51:10,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 85.0, 51.0, 144.0, 38.0, 67.0, 324.0, 57.0, 76.0, 56.0]
2025-08-07 01:51:10,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 3 minutes, 22 seconds)
2025-08-07 01:52:47,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:51,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 0.67900 ± 61.898
2025-08-07 01:52:51,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [172.30412, 0.5168612, -44.928715, -14.909224, 18.583145, -38.3958, -42.454185, 4.9611063, 3.362925, -52.250267]
2025-08-07 01:52:51,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 100.0, 67.0, 68.0, 49.0, 147.0, 75.0, 42.0, 93.0, 109.0]
2025-08-07 01:52:51,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 59 minutes, 42 seconds)
2025-08-07 01:54:38,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:54:41,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 2.67458 ± 46.631
2025-08-07 01:54:41,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [21.869526, -15.619783, -37.22536, 20.236439, 117.43257, -64.73561, -27.862354, -8.929668, 21.47155, 0.10854087]
2025-08-07 01:54:41,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 54.0, 113.0, 46.0, 1000.0, 85.0, 134.0, 55.0, 40.0, 59.0]
2025-08-07 01:54:41,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 56 minutes, 39 seconds)
2025-08-07 01:56:27,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:56:29,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 0.73455 ± 26.133
2025-08-07 01:56:29,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [4.4311833, -1.1426969, 49.698387, 10.517194, -4.368767, -5.143842, -8.678598, -61.82868, 13.50918, 10.35217]
2025-08-07 01:56:29,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 42.0, 337.0, 49.0, 46.0, 44.0, 60.0, 165.0, 44.0, 199.0]
2025-08-07 01:56:29,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 55 minutes, 17 seconds)
2025-08-07 01:58:17,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:58:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 17.95181 ± 33.265
2025-08-07 01:58:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-10.484129, 9.362112, -22.351725, 19.962742, 28.27012, 6.240153, 8.334163, 20.687757, 11.183942, 108.31296]
2025-08-07 01:58:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [91.0, 74.0, 123.0, 54.0, 54.0, 61.0, 61.0, 71.0, 58.0, 1000.0]
2025-08-07 01:58:20,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 53 minutes, 32 seconds)
2025-08-07 02:00:07,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:00:11,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -11.32981 ± 54.668
2025-08-07 02:00:11,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-124.97567, 5.3825083, -58.78195, 52.755363, -2.136311, 11.851841, -54.348816, 2.318161, 75.722336, -21.085512]
2025-08-07 02:00:11,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [402.0, 48.0, 210.0, 247.0, 51.0, 56.0, 133.0, 53.0, 1000.0, 55.0]
2025-08-07 02:00:11,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 50 minutes)
2025-08-07 02:01:58,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:59,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 0.44389 ± 24.333
2025-08-07 02:01:59,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-11.189456, -38.96914, -6.1803446, 48.97556, -2.567009, 4.218831, 17.91928, -8.682538, -26.196447, 27.110165]
2025-08-07 02:01:59,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [99.0, 74.0, 69.0, 62.0, 66.0, 100.0, 50.0, 93.0, 57.0, 109.0]
2025-08-07 02:01:59,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 49 minutes, 40 seconds)
2025-08-07 02:03:46,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:48,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 8.02040 ± 40.685
2025-08-07 02:03:48,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [0.0901816, -14.219758, 12.037097, -7.157713, -83.2172, 21.249233, 25.24483, 89.74556, 12.485092, 23.946646]
2025-08-07 02:03:48,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [155.0, 73.0, 25.0, 113.0, 173.0, 56.0, 52.0, 514.0, 20.0, 99.0]
2025-08-07 02:03:48,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 47 minutes, 43 seconds)
2025-08-07 02:05:34,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:05:36,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -5.46850 ± 32.163
2025-08-07 02:05:36,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-94.40881, -26.103073, 6.4093885, 13.502859, 11.187074, 11.270698, 1.4536028, 23.859472, -1.4161986, -0.4400443]
2025-08-07 02:05:36,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [173.0, 73.0, 43.0, 33.0, 51.0, 55.0, 135.0, 51.0, 82.0, 71.0]
2025-08-07 02:05:36,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 45 minutes, 42 seconds)
2025-08-07 02:07:31,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:35,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -13.02368 ± 66.320
2025-08-07 02:07:35,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-0.60967654, -5.1167903, -41.80517, -4.4878364, -64.4936, -8.454025, 1.884672, 131.30615, 10.842981, -149.30347]
2025-08-07 02:07:35,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 137.0, 85.0, 84.0, 188.0, 72.0, 30.0, 1000.0, 99.0, 306.0]
2025-08-07 02:07:35,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 45 minutes, 27 seconds)
2025-08-07 02:09:13,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:09:14,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -1.72983 ± 30.307
2025-08-07 02:09:14,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [10.787469, 0.5687707, -9.101683, -34.996006, -11.271323, 49.984066, 6.960278, 13.764707, 22.498278, -66.492836]
2025-08-07 02:09:14,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 110.0, 89.0, 110.0, 81.0, 110.0, 21.0, 30.0, 45.0, 150.0]
2025-08-07 02:09:14,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 41 minutes, 21 seconds)
2025-08-07 02:11:00,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:01,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 18.08545 ± 19.639
2025-08-07 02:11:01,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [38.499966, 23.201061, 8.496213, -4.4532027, 38.466595, 7.4497876, 54.36958, 2.3076394, -8.577255, 21.094128]
2025-08-07 02:11:01,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 83.0, 51.0, 91.0, 56.0, 51.0, 73.0, 56.0, 61.0, 48.0]
2025-08-07 02:11:01,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 39 minutes, 26 seconds)
2025-08-07 02:12:48,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:12:50,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -12.55275 ± 48.623
2025-08-07 02:12:50,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [17.462404, -26.758835, 25.486876, -145.25003, 27.657768, 14.265484, -22.90396, 16.779556, -24.932951, -7.333852]
2025-08-07 02:12:50,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 142.0, 57.0, 280.0, 68.0, 141.0, 69.0, 52.0, 48.0, 70.0]
2025-08-07 02:12:50,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 37 minutes, 28 seconds)
2025-08-07 02:14:46,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:14:49,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 8.81842 ± 52.538
2025-08-07 02:14:49,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [10.061471, 13.459411, -19.755272, 7.3986917, -17.69255, -18.395086, -33.145893, 10.260203, -22.97257, 158.96577]
2025-08-07 02:14:49,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 47.0, 64.0, 40.0, 58.0, 120.0, 71.0, 18.0, 72.0, 1000.0]
2025-08-07 02:14:49,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 37 minutes, 41 seconds)
2025-08-07 02:16:35,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:16:39,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 8.44468 ± 76.049
2025-08-07 02:16:39,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [27.31134, -3.53497, -61.3009, -28.912413, 13.934057, -39.589905, 213.81963, 12.737761, -71.39068, 21.372856]
2025-08-07 02:16:39,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 75.0, 154.0, 130.0, 47.0, 98.0, 1000.0, 50.0, 185.0, 56.0]
2025-08-07 02:16:39,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 34 minutes, 15 seconds)
2025-08-07 02:18:15,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:18:19,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 32.64234 ± 71.959
2025-08-07 02:18:19,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [19.780863, 186.07335, 0.03438191, -20.70036, 15.29231, 21.31189, 16.945784, 140.26088, 23.522972, -76.09868]
2025-08-07 02:18:19,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 1000.0, 82.0, 61.0, 56.0, 64.0, 51.0, 1000.0, 41.0, 73.0]
2025-08-07 02:18:19,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (32.64) for latency ExtremeSparseL4U32
2025-08-07 02:18:19,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 32 minutes, 39 seconds)
2025-08-07 02:20:10,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:20:11,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -4.85666 ± 25.429
2025-08-07 02:20:11,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-62.56613, -16.264496, 8.755549, -10.041972, 8.038648, 26.036987, 4.9656825, -34.722412, 12.612645, 14.618891]
2025-08-07 02:20:11,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [111.0, 71.0, 71.0, 44.0, 38.0, 55.0, 61.0, 68.0, 45.0, 48.0]
2025-08-07 02:20:11,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 31 minutes, 35 seconds)
2025-08-07 02:21:54,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:57,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -5.31438 ± 30.658
2025-08-07 02:21:57,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.6347895, -8.406894, -0.7043597, -36.75984, 7.6629934, -74.91594, 16.90141, -20.678848, 30.542803, 30.580107]
2025-08-07 02:21:57,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 64.0, 24.0, 96.0, 87.0, 178.0, 43.0, 59.0, 70.0, 1000.0]
2025-08-07 02:21:57,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 29 minutes, 17 seconds)
2025-08-07 02:23:45,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:48,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 23.69188 ± 65.410
2025-08-07 02:23:48,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [15.307225, 212.93434, 37.54953, 2.1565816, -3.651319, 7.2526436, 3.2819984, 7.2576046, -11.218407, -33.951435]
2025-08-07 02:23:48,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 1000.0, 66.0, 71.0, 75.0, 81.0, 56.0, 29.0, 46.0, 77.0]
2025-08-07 02:23:48,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 26 minutes, 15 seconds)
2025-08-07 02:25:33,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:34,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 4.74089 ± 14.147
2025-08-07 02:25:34,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-10.333903, 6.5419655, 25.260069, -25.540995, 21.938702, 4.0739007, 15.349311, 1.9608681, 6.8660846, 1.292897]
2025-08-07 02:25:34,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 46.0, 48.0, 50.0, 95.0, 69.0, 58.0, 72.0, 33.0, 66.0]
2025-08-07 02:25:34,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 23 minutes, 48 seconds)
2025-08-07 02:27:19,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:22,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 39.16826 ± 81.844
2025-08-07 02:27:22,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [19.615164, 26.964922, 1.7982205, -23.077784, 7.9168043, 22.209332, 22.338396, 3.109624, 30.279657, 280.5283]
2025-08-07 02:27:22,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 61.0, 39.0, 140.0, 50.0, 45.0, 83.0, 102.0, 73.0, 1000.0]
2025-08-07 02:27:22,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (39.17) for latency ExtremeSparseL4U32
2025-08-07 02:27:22,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 23 minutes, 14 seconds)
2025-08-07 02:29:10,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:11,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 0.74114 ± 29.621
2025-08-07 02:29:11,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [20.242414, 13.3927965, -72.60686, -26.626024, 5.2421393, 1.1831018, 30.370321, -4.2359815, 5.688816, 34.760693]
2025-08-07 02:29:11,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 71.0, 102.0, 85.0, 14.0, 47.0, 57.0, 57.0, 57.0, 51.0]
2025-08-07 02:29:11,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 21 minutes, 2 seconds)
2025-08-07 02:31:04,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:31:05,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -4.39281 ± 16.814
2025-08-07 02:31:05,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [5.2052994, 2.1458223, -1.2630731, -23.49968, 12.99734, -31.67366, -14.178265, 15.468945, -24.32062, 15.189756]
2025-08-07 02:31:05,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 56.0, 68.0, 135.0, 60.0, 181.0, 50.0, 57.0, 77.0, 59.0]
2025-08-07 02:31:05,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 20 minutes, 29 seconds)
2025-08-07 02:32:49,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:50,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 10.31058 ± 17.322
2025-08-07 02:32:50,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-9.248841, -20.131758, 2.3095598, 39.54619, 26.448063, 8.302941, 26.90379, 19.92988, -1.521009, 10.566986]
2025-08-07 02:32:50,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 61.0, 76.0, 48.0, 98.0, 60.0, 47.0, 55.0, 75.0, 150.0]
2025-08-07 02:32:50,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 17 minutes, 41 seconds)
2025-08-07 02:34:36,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:37,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 12.65632 ± 14.668
2025-08-07 02:34:37,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [37.814777, 26.037745, 1.5910101, -2.6124415, 9.887567, -4.55089, -7.4516068, 19.448631, 21.274115, 25.124329]
2025-08-07 02:34:37,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 64.0, 58.0, 92.0, 73.0, 70.0, 74.0, 105.0, 49.0, 57.0]
2025-08-07 02:34:37,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 16 minutes, 5 seconds)
2025-08-07 02:36:25,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:28,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 2.89439 ± 12.635
2025-08-07 02:36:28,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [7.246246, 6.049772, -8.987545, -15.065222, 10.07515, 10.289661, -12.724816, -1.8195977, 3.9624481, 29.91785]
2025-08-07 02:36:28,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 93.0, 68.0, 73.0, 94.0, 1000.0, 87.0, 82.0, 48.0, 57.0]
2025-08-07 02:36:28,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 14 minutes, 37 seconds)
2025-08-07 02:38:09,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:11,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 12.62634 ± 22.536
2025-08-07 02:38:11,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [23.373106, 17.855131, 22.971584, 35.962475, -7.860726, 14.105908, 16.454794, -45.91946, 16.579916, 32.740704]
2025-08-07 02:38:11,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 57.0, 50.0, 94.0, 48.0, 36.0, 49.0, 138.0, 76.0, 682.0]
2025-08-07 02:38:11,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 11 minutes, 58 seconds)
2025-08-07 02:39:58,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:01,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 13.79893 ± 41.130
2025-08-07 02:40:01,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [9.975953, 25.489616, 38.739017, -19.196682, -73.212524, 100.08755, 10.855626, 14.906211, 11.662058, 18.682533]
2025-08-07 02:40:01,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 51.0, 74.0, 94.0, 117.0, 1000.0, 43.0, 59.0, 91.0, 45.0]
2025-08-07 02:40:01,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 9 minutes, 35 seconds)
2025-08-07 02:41:48,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:50,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -4.47219 ± 38.897
2025-08-07 02:41:50,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-4.0732055, 9.2296715, -2.445068, -17.934757, -8.5226145, -106.56608, 55.725163, 10.845866, 15.31456, 3.7045472]
2025-08-07 02:41:50,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 57.0, 85.0, 77.0, 89.0, 209.0, 98.0, 69.0, 58.0, 59.0]
2025-08-07 02:41:50,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 8 minutes, 22 seconds)
2025-08-07 02:43:36,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:43:37,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 9.40539 ± 19.865
2025-08-07 02:43:37,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [28.651058, 36.34214, -17.58593, -15.157281, 7.8065715, 3.964384, -18.769018, 15.617159, 20.987043, 32.197727]
2025-08-07 02:43:37,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 61.0, 68.0, 62.0, 68.0, 113.0, 118.0, 45.0, 53.0, 90.0]
2025-08-07 02:43:37,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 6 minutes, 34 seconds)
2025-08-07 02:45:31,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:45:34,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 25.75420 ± 87.563
2025-08-07 02:45:34,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-3.3964927, 6.2563868, 24.386501, -6.564347, -30.899605, 282.97647, -13.000744, -16.183191, -16.017166, 29.984154]
2025-08-07 02:45:34,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 116.0, 74.0, 48.0, 84.0, 1000.0, 96.0, 164.0, 81.0, 42.0]
2025-08-07 02:45:34,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 5 minutes, 31 seconds)
2025-08-07 02:47:21,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:47:23,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -20.77286 ± 36.887
2025-08-07 02:47:23,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-17.456808, -3.4635282, -27.775177, -111.72847, -5.630477, -32.90496, -47.320393, 13.484594, 28.3502, -3.2836044]
2025-08-07 02:47:23,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 35.0, 115.0, 220.0, 124.0, 146.0, 111.0, 60.0, 56.0, 104.0]
2025-08-07 02:47:23,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 4 minutes, 26 seconds)
2025-08-07 02:49:02,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:06,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 50.13201 ± 88.442
2025-08-07 02:49:06,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-7.8546424, -9.472194, -0.9337643, 28.889772, 0.33854005, 12.275488, 229.00462, 15.625433, 222.26714, 11.179692]
2025-08-07 02:49:06,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 86.0, 105.0, 82.0, 67.0, 64.0, 1000.0, 35.0, 1000.0, 136.0]
2025-08-07 02:49:06,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (50.13) for latency ExtremeSparseL4U32
2025-08-07 02:49:06,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 1 minute, 50 seconds)
2025-08-07 02:50:54,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:57,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 13.46813 ± 42.481
2025-08-07 02:50:57,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-5.185605, 19.92182, 22.934246, 26.079304, 124.624306, -13.963373, -1.0498564, -49.335297, 7.8424716, 2.8133368]
2025-08-07 02:50:57,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 60.0, 50.0, 93.0, 1000.0, 91.0, 84.0, 221.0, 46.0, 63.0]
2025-08-07 02:50:57,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 12 seconds)
2025-08-07 02:52:48,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:52:49,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 7.83565 ± 27.601
2025-08-07 02:52:49,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-57.919586, -2.1921473, 45.877636, 29.760546, 15.81497, 34.36896, 20.379501, -0.024758687, -5.5214977, -2.1871197]
2025-08-07 02:52:49,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [129.0, 62.0, 50.0, 60.0, 79.0, 80.0, 81.0, 69.0, 207.0, 56.0]
2025-08-07 02:52:49,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 58 minutes, 55 seconds)
2025-08-07 02:54:30,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:54:31,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -7.78261 ± 46.851
2025-08-07 02:54:31,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [3.238999, -22.755863, 24.637047, 32.822495, -10.319702, -2.7258413, 38.373055, -10.286354, -136.45084, 5.6409073]
2025-08-07 02:54:31,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 62.0, 51.0, 89.0, 237.0, 58.0, 71.0, 47.0, 321.0, 42.0]
2025-08-07 02:54:31,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 55 minutes, 31 seconds)
2025-08-07 02:56:19,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:56:20,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 5.86387 ± 26.384
2025-08-07 02:56:20,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-11.445167, 5.2798533, 18.171772, -42.166695, 14.896028, 24.296621, -34.077038, 37.324413, 5.9025493, 40.45636]
2025-08-07 02:56:20,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 57.0, 49.0, 83.0, 46.0, 42.0, 54.0, 74.0, 73.0, 88.0]
2025-08-07 02:56:20,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 53 minutes, 38 seconds)
2025-08-07 02:58:16,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:58:19,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 28.83627 ± 56.623
2025-08-07 02:58:19,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-16.295101, 8.1551695, 33.577015, 23.729696, 12.110884, 36.866203, 191.764, 0.39479476, -7.0322976, 5.0922856]
2025-08-07 02:58:19,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [88.0, 97.0, 94.0, 64.0, 115.0, 63.0, 1000.0, 34.0, 56.0, 136.0]
2025-08-07 02:58:19,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 53 minutes, 23 seconds)
2025-08-07 03:00:04,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:00:06,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 0.02204 ± 39.413
2025-08-07 03:00:06,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [3.0571845, 47.718246, 20.929485, -75.82514, 3.1599894, 54.084534, -21.826351, 12.23342, 14.126682, -57.43765]
2025-08-07 03:00:06,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [319.0, 52.0, 42.0, 211.0, 90.0, 90.0, 84.0, 54.0, 69.0, 176.0]
2025-08-07 03:00:06,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 51 minutes, 16 seconds)
2025-08-07 03:01:45,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:48,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 9.90494 ± 61.654
2025-08-07 03:01:48,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-19.47853, 38.899517, 174.22215, 12.26669, -46.45329, -53.771137, 20.82403, -9.432021, -29.786718, 11.758676]
2025-08-07 03:01:48,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [149.0, 79.0, 1000.0, 91.0, 114.0, 108.0, 66.0, 119.0, 104.0, 59.0]
2025-08-07 03:01:48,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 48 minutes, 27 seconds)
2025-08-07 03:03:34,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:38,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -29.95285 ± 114.938
2025-08-07 03:03:38,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-368.80585, -17.211264, -9.361248, 29.54721, -22.907394, 5.0624022, 42.285847, 32.91802, -8.620564, 17.56438]
2025-08-07 03:03:38,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 87.0, 61.0, 43.0, 83.0, 1000.0, 70.0, 56.0, 84.0, 43.0]
2025-08-07 03:03:38,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 47 minutes, 24 seconds)
2025-08-07 03:05:29,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:30,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 9.28321 ± 17.323
2025-08-07 03:05:30,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-5.5072694, 0.07716112, -22.588854, 7.204977, 46.460842, 24.954437, 15.307216, 6.2022767, 9.523079, 11.198199]
2025-08-07 03:05:30,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [126.0, 51.0, 42.0, 218.0, 56.0, 73.0, 96.0, 19.0, 46.0, 50.0]
2025-08-07 03:05:30,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 45 minutes, 54 seconds)
2025-08-07 03:07:15,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:18,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 20.24204 ± 73.314
2025-08-07 03:07:18,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [23.737019, 228.65201, -2.3688254, 6.677895, -4.7780037, -16.477852, -58.22031, 24.952394, -17.837864, 18.08393]
2025-08-07 03:07:18,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 1000.0, 290.0, 74.0, 47.0, 89.0, 125.0, 184.0, 104.0, 52.0]
2025-08-07 03:07:18,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 43 minutes, 9 seconds)
2025-08-07 03:09:03,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:06,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 25.78142 ± 51.598
2025-08-07 03:09:06,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-30.38883, 36.30686, 37.074535, 61.49239, 40.81677, 149.10739, -38.98013, 7.113253, 11.652404, -16.38046]
2025-08-07 03:09:06,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 50.0, 129.0, 152.0, 60.0, 1000.0, 83.0, 41.0, 50.0, 100.0]
2025-08-07 03:09:06,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 22 seconds)
2025-08-07 03:10:57,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:02,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 38.68067 ± 93.890
2025-08-07 03:11:02,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [226.8269, -11.326797, -9.068928, 200.4689, 60.23694, 22.864145, -77.45919, 4.4599214, 2.773966, -32.96912]
2025-08-07 03:11:02,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 80.0, 66.0, 1000.0, 59.0, 17.0, 176.0, 136.0, 88.0, 295.0]
2025-08-07 03:11:02,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 40 minutes, 38 seconds)
2025-08-07 03:12:43,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:12:45,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 10.30993 ± 18.801
2025-08-07 03:12:45,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-18.606201, 24.855629, 5.073966, 38.843895, 13.130449, 20.419777, 13.03317, 11.915848, -27.097143, 21.529934]
2025-08-07 03:12:45,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [92.0, 26.0, 133.0, 233.0, 55.0, 50.0, 97.0, 123.0, 128.0, 92.0]
2025-08-07 03:12:45,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 38 minutes, 16 seconds)
2025-08-07 03:14:40,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:14:41,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 13.68700 ± 17.822
2025-08-07 03:14:41,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [3.2149477, 35.793743, 14.817288, 27.295383, 14.577347, -29.837048, 13.3434305, 16.085562, 6.8334913, 34.74582]
2025-08-07 03:14:41,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 213.0, 43.0, 45.0, 57.0, 99.0, 102.0, 67.0, 21.0, 101.0]
2025-08-07 03:14:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 36 minutes, 42 seconds)
2025-08-07 03:16:19,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:16:21,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 8.60933 ± 26.310
2025-08-07 03:16:21,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-7.3378444, 44.591385, 26.877052, 13.742397, -17.782585, -42.62657, 17.042019, 47.514423, -1.1322781, 5.2052584]
2025-08-07 03:16:21,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [196.0, 100.0, 59.0, 49.0, 79.0, 142.0, 52.0, 67.0, 35.0, 50.0]
2025-08-07 03:16:21,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 34 minutes, 20 seconds)
2025-08-07 03:18:08,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:18:11,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 31.80930 ± 65.118
2025-08-07 03:18:11,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [19.304377, 2.6551642, -20.082985, 15.3293915, 222.14023, 9.389787, 39.39879, 16.568993, -2.066963, 15.456249]
2025-08-07 03:18:11,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 41.0, 48.0, 78.0, 1000.0, 44.0, 48.0, 69.0, 60.0, 55.0]
2025-08-07 03:18:11,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 40 seconds)
2025-08-07 03:20:04,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:09,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 49.60958 ± 58.893
2025-08-07 03:20:09,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [24.710531, 120.80728, -11.471368, 38.42143, 2.6906903, 155.78456, 29.481274, 131.19278, 9.845938, -5.3673043]
2025-08-07 03:20:09,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 261.0, 200.0, 43.0, 133.0, 1000.0, 68.0, 1000.0, 92.0, 152.0]
2025-08-07 03:20:09,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 59 seconds)
2025-08-07 03:21:49,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:21:52,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 44.34629 ± 79.661
2025-08-07 03:21:52,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-19.191616, 3.275588, 36.465557, 275.3066, -8.929817, 38.48425, 37.263718, 41.837055, 28.508892, 10.442617]
2025-08-07 03:21:52,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [112.0, 23.0, 134.0, 1000.0, 47.0, 122.0, 82.0, 69.0, 53.0, 49.0]
2025-08-07 03:21:52,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 8 seconds)
2025-08-07 03:23:39,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:40,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 4.98829 ± 21.491
2025-08-07 03:23:40,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-36.13951, -9.510015, 27.01287, 29.018589, 8.760142, 19.021822, 31.063646, -0.6399986, -22.007236, 3.3026333]
2025-08-07 03:23:40,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [88.0, 67.0, 172.0, 61.0, 61.0, 60.0, 124.0, 62.0, 226.0, 63.0]
2025-08-07 03:23:40,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 57 seconds)
2025-08-07 03:25:27,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:33,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 54.69713 ± 88.070
2025-08-07 03:25:33,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [210.83357, -12.774807, -29.909403, 27.701132, 37.297745, 239.33986, 20.350859, 21.503159, -7.366983, 39.99612]
2025-08-07 03:25:33,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 140.0, 148.0, 72.0, 75.0, 1000.0, 1000.0, 63.0, 56.0, 60.0]
2025-08-07 03:25:33,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (54.70) for latency ExtremeSparseL4U32
2025-08-07 03:25:33,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 47 seconds)
2025-08-07 03:27:19,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 29.32867 ± 65.556
2025-08-07 03:27:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [32.469147, -0.41289726, 215.57079, 18.560741, 20.92424, 19.349894, -45.317684, 27.133612, 9.352085, -4.3432107]
2025-08-07 03:27:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 56.0, 1000.0, 85.0, 44.0, 43.0, 87.0, 50.0, 15.0, 67.0]
2025-08-07 03:27:21,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 51 seconds)
2025-08-07 03:29:15,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:18,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 31.14236 ± 56.330
2025-08-07 03:29:18,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [64.592766, 7.493255, 169.28476, 41.560966, 9.503639, -10.066252, 51.860558, 9.755988, 24.39663, -56.958755]
2025-08-07 03:29:18,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 105.0, 1000.0, 94.0, 17.0, 89.0, 77.0, 50.0, 54.0, 142.0]
2025-08-07 03:29:18,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 57 seconds)
2025-08-07 03:31:02,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:05,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 22.68229 ± 73.332
2025-08-07 03:31:05,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-11.885058, 20.562843, -12.010464, -16.197876, -22.232374, 235.494, -4.7098355, 11.348341, -14.130707, 40.584003]
2025-08-07 03:31:05,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [100.0, 25.0, 110.0, 48.0, 74.0, 1000.0, 63.0, 161.0, 80.0, 59.0]
2025-08-07 03:31:05,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 16 seconds)
2025-08-07 03:32:47,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:52,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 67.77803 ± 122.294
2025-08-07 03:32:52,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.1932638, -15.057779, 12.803811, -34.818584, 8.882197, 5.650915, 40.41808, 265.4663, 46.54555, 345.69656]
2025-08-07 03:32:52,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [106.0, 83.0, 229.0, 133.0, 43.0, 49.0, 367.0, 1000.0, 146.0, 1000.0]
2025-08-07 03:32:52,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (67.78) for latency ExtremeSparseL4U32
2025-08-07 03:32:52,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 24 seconds)
2025-08-07 03:34:38,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:39,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 23.70810 ± 14.232
2025-08-07 03:34:39,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [43.1584, 12.758376, 19.17006, 37.539314, 42.658997, 26.862999, 10.587117, -3.429318, 19.548405, 28.22662]
2025-08-07 03:34:39,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [119.0, 64.0, 129.0, 73.0, 45.0, 61.0, 50.0, 46.0, 43.0, 55.0]
2025-08-07 03:34:39,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 22 seconds)
2025-08-07 03:36:27,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:30,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 34.50044 ± 70.121
2025-08-07 03:36:30,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.3257937, 235.81523, 30.145662, 34.172577, 19.731148, -2.6554353, -36.2159, 18.066978, 7.7265787, 35.89177]
2025-08-07 03:36:30,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 1000.0, 47.0, 49.0, 86.0, 41.0, 148.0, 59.0, 88.0, 173.0]
2025-08-07 03:36:30,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 38 seconds)
2025-08-07 03:38:15,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:19,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 47.77705 ± 73.083
2025-08-07 03:38:19,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [23.66909, 247.35518, 20.432472, 104.92638, 23.17395, 4.696049, 33.974068, 26.334152, 14.430901, -21.221714]
2025-08-07 03:38:19,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 1000.0, 67.0, 416.0, 58.0, 83.0, 176.0, 110.0, 77.0, 108.0]
2025-08-07 03:38:19,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 37 seconds)
2025-08-07 03:40:06,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:11,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 57.44990 ± 81.480
2025-08-07 03:40:11,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [239.47427, -10.952778, 59.39729, 2.3592033, 23.504076, 35.092697, 27.622404, 20.452173, -10.387002, 187.9367]
2025-08-07 03:40:11,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 103.0, 172.0, 22.0, 46.0, 42.0, 45.0, 75.0, 163.0, 1000.0]
2025-08-07 03:40:11,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 55 seconds)
2025-08-07 03:42:06,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:09,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 29.28968 ± 73.816
2025-08-07 03:42:09,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [239.5927, 6.1375427, -3.3356466, 9.082408, -54.72132, 27.996908, 32.842587, 6.031014, 4.411113, 24.859543]
2025-08-07 03:42:09,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 109.0, 48.0, 43.0, 170.0, 52.0, 62.0, 40.0, 60.0, 99.0]
2025-08-07 03:42:09,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 16 seconds)
2025-08-07 03:43:49,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:43:52,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 1.49920 ± 22.253
2025-08-07 03:43:52,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [6.037136, 13.20317, 12.149247, 8.397915, 13.786378, -32.861645, -50.4069, 8.708496, 20.425976, 15.552276]
2025-08-07 03:43:52,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 52.0, 74.0, 179.0, 90.0, 58.0, 343.0, 292.0, 72.0, 41.0]
2025-08-07 03:43:52,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 21 seconds)
2025-08-07 03:45:43,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:45:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 17.16022 ± 20.272
2025-08-07 03:45:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [21.792664, 55.441917, -0.5446557, 16.670263, 51.26675, 8.493691, 1.2574615, -9.723017, 19.298101, 7.649077]
2025-08-07 03:45:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [101.0, 238.0, 37.0, 81.0, 118.0, 116.0, 56.0, 149.0, 66.0, 101.0]
2025-08-07 03:45:45,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 32 seconds)
2025-08-07 03:47:29,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:47:31,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -6.48403 ± 41.923
2025-08-07 03:47:31,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [15.379882, 22.258753, -26.735922, -16.359482, 26.141342, 12.238681, 7.780125, -105.787605, -45.687706, 45.931618]
2025-08-07 03:47:31,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 41.0, 188.0, 53.0, 52.0, 32.0, 54.0, 184.0, 270.0, 80.0]
2025-08-07 03:47:31,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 40 seconds)
2025-08-07 03:49:11,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:49:13,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 17.88530 ± 29.604
2025-08-07 03:49:13,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-4.9579644, -23.297728, -6.262537, 59.72175, 49.31973, 3.4841037, -16.130327, 57.337658, 32.88594, 26.752432]
2025-08-07 03:49:13,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 64.0, 125.0, 89.0, 61.0, 51.0, 108.0, 91.0, 112.0, 86.0]
2025-08-07 03:49:13,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 48 seconds)
2025-08-07 03:51:00,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:51:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 17.94126 ± 25.706
2025-08-07 03:51:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [18.975977, 8.707312, 14.035119, 12.302209, 50.05038, -13.402312, 12.961015, 73.37252, 21.252619, -18.842283]
2025-08-07 03:51:01,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [108.0, 52.0, 168.0, 172.0, 76.0, 123.0, 16.0, 88.0, 49.0, 76.0]
2025-08-07 03:51:01,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1251 [DEBUG]: Training session finished
