2025-08-07 04:09:43,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc20-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:09:43,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc20-ant/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:09:43,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x154eedcf1e90>}
2025-08-07 04:09:43,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1111 [DEBUG]: using device: cuda
2025-08-07 04:09:43,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1133 [INFO]: Creating new trainer
2025-08-07 04:09:43,883 baseline-bpql-noiseperc20-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=219, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:09:43,884 baseline-bpql-noiseperc20-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:09:44,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1194 [DEBUG]: Starting training session...
2025-08-07 04:09:44,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 1/100
2025-08-07 04:11:38,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:11:41,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -163.34673 ± 338.860
2025-08-07 04:11:41,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [0.4386053, -1167.6372, 2.5296643, -1.8758209, -45.127895, -109.640755, -163.27252, -95.1733, -11.359485, -42.34856]
2025-08-07 04:11:41,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 1000.0, 31.0, 31.0, 76.0, 91.0, 171.0, 123.0, 54.0, 76.0]
2025-08-07 04:11:41,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-163.35) for latency ExtremeClogL1U23
2025-08-07 04:11:41,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 12 minutes, 32 seconds)
2025-08-07 04:13:22,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:13:26,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -218.82829 ± 422.090
2025-08-07 04:13:26,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-22.076569, -13.029646, -1077.7333, -4.4391694, 20.64653, -1046.9164, 19.549498, -26.872152, -20.244383, -17.167524]
2025-08-07 04:13:26,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 53.0, 1000.0, 93.0, 37.0, 1000.0, 35.0, 97.0, 40.0, 42.0]
2025-08-07 04:13:26,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 37 seconds)
2025-08-07 04:15:21,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:15:22,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -1.43095 ± 13.567
2025-08-07 04:15:22,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.8792381, 11.337276, 0.13381164, -14.257551, -8.481937, 17.62913, 13.599526, -6.286348, -30.221062, -0.64156204]
2025-08-07 04:15:22,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [81.0, 40.0, 67.0, 30.0, 57.0, 32.0, 46.0, 53.0, 53.0, 26.0]
2025-08-07 04:15:22,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-1.43) for latency ExtremeClogL1U23
2025-08-07 04:15:22,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 1 minute, 55 seconds)
2025-08-07 04:17:08,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:17:09,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -35.75111 ± 33.232
2025-08-07 04:17:09,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-2.4008262, -61.243385, -9.371703, -24.641273, -23.940155, -51.39091, -102.07415, -24.692078, -70.487526, 12.730884]
2025-08-07 04:17:09,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 85.0, 43.0, 51.0, 60.0, 109.0, 121.0, 76.0, 79.0, 41.0]
2025-08-07 04:17:09,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 57 minutes, 56 seconds)
2025-08-07 04:18:58,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:18:59,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -35.66216 ± 36.324
2025-08-07 04:18:59,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-60.46453, 17.704851, -48.34255, -107.69818, -4.1486526, -49.409584, -33.558628, 2.9605956, -7.676163, -65.98871]
2025-08-07 04:18:59,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [93.0, 22.0, 115.0, 106.0, 84.0, 83.0, 132.0, 46.0, 39.0, 74.0]
2025-08-07 04:18:59,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 55 minutes, 35 seconds)
2025-08-07 04:20:50,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:20:55,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -184.91881 ± 235.799
2025-08-07 04:20:55,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-133.53404, -98.901245, -608.53656, -689.1196, -114.787704, -49.999184, -45.560688, -36.212383, 5.2830467, -77.819664]
2025-08-07 04:20:55,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [202.0, 337.0, 1000.0, 1000.0, 161.0, 116.0, 137.0, 134.0, 64.0, 224.0]
2025-08-07 04:20:55,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 53 minutes, 41 seconds)
2025-08-07 04:22:47,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:22:52,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -97.00901 ± 133.663
2025-08-07 04:22:52,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [16.945938, -140.01033, -124.069405, -386.33945, 1.9659628, 1.6529179, -38.497814, -17.641527, 9.724403, -293.82083]
2025-08-07 04:22:52,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [94.0, 390.0, 534.0, 1000.0, 82.0, 47.0, 152.0, 197.0, 86.0, 1000.0]
2025-08-07 04:22:52,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 55 minutes, 37 seconds)
2025-08-07 04:24:41,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:24:43,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 10.88515 ± 20.090
2025-08-07 04:24:43,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [53.475143, 1.4900873, 23.733902, 30.34669, 6.280652, 18.73976, -1.7850238, 4.351564, -7.7195005, -20.061804]
2025-08-07 04:24:43,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [144.0, 75.0, 65.0, 200.0, 91.0, 182.0, 107.0, 50.0, 87.0, 78.0]
2025-08-07 04:24:43,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (10.89) for latency ExtremeClogL1U23
2025-08-07 04:24:43,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 52 minutes, 1 second)
2025-08-07 04:26:34,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:26:35,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -11.34005 ± 29.483
2025-08-07 04:26:35,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-15.453433, 2.2396567, -76.248344, 38.911106, -2.5232558, -42.031, -19.377193, 7.891005, -12.696394, 5.8873415]
2025-08-07 04:26:35,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [110.0, 81.0, 77.0, 62.0, 48.0, 76.0, 68.0, 36.0, 143.0, 48.0]
2025-08-07 04:26:35,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 51 minutes, 34 seconds)
2025-08-07 04:28:32,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:28:33,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -23.07890 ± 43.221
2025-08-07 04:28:33,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-6.5967703, -130.41237, 10.216369, -64.69375, 13.480553, -26.270027, -41.06581, 2.6194851, 9.843835, 2.0895095]
2025-08-07 04:28:33,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 276.0, 36.0, 154.0, 65.0, 163.0, 73.0, 31.0, 44.0, 60.0]
2025-08-07 04:28:33,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 16 seconds)
2025-08-07 04:30:17,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:30:20,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -69.28648 ± 177.015
2025-08-07 04:30:20,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-590.6944, -53.977406, 13.148453, 4.1683846, 35.994785, -20.546682, -16.63915, -84.26249, 5.8927, 14.051021]
2025-08-07 04:30:20,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 89.0, 22.0, 89.0, 45.0, 53.0, 153.0, 169.0, 57.0, 61.0]
2025-08-07 04:30:20,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 47 minutes, 30 seconds)
2025-08-07 04:32:08,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:32:09,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 3.93162 ± 16.746
2025-08-07 04:32:09,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [22.628246, -4.9867706, -0.4601651, 13.32552, 2.5691936, 38.702057, 4.686442, -22.779165, -0.2669817, -14.102219]
2025-08-07 04:32:09,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 82.0, 50.0, 79.0, 84.0, 99.0, 46.0, 54.0, 55.0, 97.0]
2025-08-07 04:32:09,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 43 minutes, 27 seconds)
2025-08-07 04:34:00,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:34:04,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -93.54504 ± 212.268
2025-08-07 04:34:04,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [48.248962, -4.7182174, -74.16777, 50.369843, 30.804897, -579.3088, 1.171222, 17.449648, 11.404908, -436.70505]
2025-08-07 04:34:04,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 73.0, 108.0, 66.0, 49.0, 1000.0, 38.0, 55.0, 66.0, 1000.0]
2025-08-07 04:34:04,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 42 minutes, 45 seconds)
2025-08-07 04:35:54,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:35:57,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -59.53589 ± 100.236
2025-08-07 04:35:57,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-0.2743246, -51.434177, 2.0984633, -104.28868, -8.62341, -23.897799, -345.09247, -15.591046, -49.130455, 0.8749454]
2025-08-07 04:35:57,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [114.0, 133.0, 57.0, 164.0, 99.0, 60.0, 1000.0, 106.0, 164.0, 59.0]
2025-08-07 04:35:57,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 41 minutes, 1 second)
2025-08-07 04:37:45,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:37:48,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -55.35259 ± 130.021
2025-08-07 04:37:48,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [40.835793, -9.5491085, -415.87338, -105.07527, 28.429237, -2.8829634, -9.701858, -107.51648, -9.407759, 37.21587]
2025-08-07 04:37:48,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [120.0, 63.0, 1000.0, 112.0, 42.0, 69.0, 91.0, 199.0, 110.0, 51.0]
2025-08-07 04:37:48,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 37 minutes, 8 seconds)
2025-08-07 04:39:39,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:39:40,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -12.08362 ± 23.132
2025-08-07 04:39:40,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-30.340672, 17.98606, -14.775828, -34.570652, -2.504308, 25.381145, -13.550427, -10.20133, -56.682587, -1.5775644]
2025-08-07 04:39:40,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [96.0, 53.0, 78.0, 46.0, 85.0, 56.0, 44.0, 90.0, 93.0, 36.0]
2025-08-07 04:39:40,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 36 minutes, 47 seconds)
2025-08-07 04:41:31,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:41:34,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -25.23519 ± 26.155
2025-08-07 04:41:34,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-38.710682, -47.98312, -54.466614, -1.609897, -16.48654, -68.62493, -3.3758903, 17.32619, -4.926207, -33.494164]
2025-08-07 04:41:34,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 158.0, 66.0, 49.0, 61.0, 117.0, 71.0, 34.0, 1000.0, 65.0]
2025-08-07 04:41:34,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 36 minutes, 16 seconds)
2025-08-07 04:43:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:43:34,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -1.12610 ± 34.160
2025-08-07 04:43:34,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [26.184872, -3.5990071, -42.268555, 81.5027, 6.671373, -33.612164, -23.609097, 8.417784, -27.146097, -3.8028448]
2025-08-07 04:43:34,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 16.0, 128.0, 621.0, 47.0, 73.0, 106.0, 59.0, 104.0, 84.0]
2025-08-07 04:43:34,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 35 minutes, 39 seconds)
2025-08-07 04:45:18,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:45:21,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 11.73352 ± 54.300
2025-08-07 04:45:21,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-8.648018, -2.9492214, 170.08966, -12.461445, -9.855394, 4.6502714, 6.2663546, 8.365357, 0.14179723, -38.264133]
2025-08-07 04:45:21,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [105.0, 84.0, 1000.0, 44.0, 94.0, 60.0, 99.0, 58.0, 77.0, 95.0]
2025-08-07 04:45:21,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (11.73) for latency ExtremeClogL1U23
2025-08-07 04:45:21,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 32 minutes, 27 seconds)
2025-08-07 04:47:17,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:47:20,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 8.33019 ± 92.135
2025-08-07 04:47:20,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [13.127693, 265.10968, -23.810236, -51.93832, -15.684133, -4.300765, -102.14356, -18.54211, 27.925581, -6.4419637]
2025-08-07 04:47:20,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 1000.0, 71.0, 59.0, 73.0, 50.0, 212.0, 65.0, 101.0, 71.0]
2025-08-07 04:47:20,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 32 minutes, 36 seconds)
2025-08-07 04:49:11,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:49:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 58.27787 ± 95.744
2025-08-07 04:49:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [229.8097, -22.22822, 18.236496, 15.714195, 263.58282, -1.8565048, 39.555607, 18.685846, 2.698606, 18.580217]
2025-08-07 04:49:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 115.0, 22.0, 111.0, 1000.0, 34.0, 460.0, 32.0, 39.0, 86.0]
2025-08-07 04:49:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (58.28) for latency ExtremeClogL1U23
2025-08-07 04:49:16,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 31 minutes, 36 seconds)
2025-08-07 04:51:04,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:51:05,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -1.57409 ± 18.487
2025-08-07 04:51:05,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [4.3111386, 20.786936, -18.299858, 7.3726783, -43.890892, -12.736435, -8.536327, 15.539564, 11.487998, 8.224333]
2025-08-07 04:51:05,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 51.0, 106.0, 86.0, 128.0, 101.0, 60.0, 48.0, 64.0, 72.0]
2025-08-07 04:51:05,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 28 minutes, 20 seconds)
2025-08-07 04:53:00,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:53:05,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 40.19326 ± 107.459
2025-08-07 04:53:05,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-48.11174, -50.987072, -18.291033, -19.159166, 184.494, 7.9686217, 301.17575, 8.773608, 2.572584, 33.497025]
2025-08-07 04:53:05,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [97.0, 306.0, 95.0, 55.0, 1000.0, 33.0, 1000.0, 45.0, 202.0, 100.0]
2025-08-07 04:53:05,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 26 minutes, 35 seconds)
2025-08-07 04:54:47,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:54:48,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -1.18549 ± 17.797
2025-08-07 04:54:48,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [23.788742, -12.220079, 0.98109293, 9.259226, 0.28595504, 8.468104, -15.479427, 16.414396, -0.9541583, -42.398746]
2025-08-07 04:54:48,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 44.0, 37.0, 46.0, 52.0, 103.0, 100.0, 48.0, 60.0, 114.0]
2025-08-07 04:54:48,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 23 minutes, 36 seconds)
2025-08-07 04:56:44,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:56:46,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 12.17559 ± 42.900
2025-08-07 04:56:46,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [29.306656, -20.193314, 19.73498, -34.612545, 122.52322, -6.50912, 32.49092, 8.036781, -29.2948, 0.27315125]
2025-08-07 04:56:46,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [135.0, 60.0, 38.0, 103.0, 443.0, 167.0, 49.0, 50.0, 105.0, 109.0]
2025-08-07 04:56:46,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 21 minutes, 26 seconds)
2025-08-07 04:58:35,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:58:37,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 4.31032 ± 15.703
2025-08-07 04:58:37,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [30.277758, -0.20758064, 9.585403, -18.113853, 20.312607, -11.40261, -1.5038483, 8.549537, -15.650888, 21.25664]
2025-08-07 04:58:37,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 98.0, 32.0, 122.0, 91.0, 33.0, 49.0, 68.0, 108.0, 43.0]
2025-08-07 04:58:37,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 18 minutes, 19 seconds)
2025-08-07 05:00:22,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:00:23,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -23.81506 ± 54.169
2025-08-07 05:00:23,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-14.844955, -171.08241, 14.527619, 8.112452, 1.2008202, -37.751724, 15.733358, 9.599047, -58.80222, -4.8425508]
2025-08-07 05:00:23,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 284.0, 48.0, 43.0, 28.0, 89.0, 45.0, 51.0, 127.0, 51.0]
2025-08-07 05:00:23,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 15 minutes, 49 seconds)
2025-08-07 05:02:15,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:02:16,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 1.11878 ± 23.710
2025-08-07 05:02:16,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [10.802071, 28.471628, 1.3682357, 19.544802, 28.53517, -55.221527, 0.54377276, 4.9006014, -8.892406, -18.864582]
2025-08-07 05:02:16,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 73.0, 22.0, 76.0, 42.0, 114.0, 71.0, 50.0, 249.0, 104.0]
2025-08-07 05:02:16,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 12 minutes, 19 seconds)
2025-08-07 05:04:07,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:04:08,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 11.63154 ± 16.997
2025-08-07 05:04:08,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [25.71442, 3.719865, 10.010314, 17.926636, 5.403358, -20.532478, -9.104878, 15.061654, 39.755398, 28.361103]
2025-08-07 05:04:08,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 54.0, 26.0, 43.0, 45.0, 71.0, 82.0, 41.0, 43.0, 72.0]
2025-08-07 05:04:08,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 12 minutes, 25 seconds)
2025-08-07 05:06:07,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:06:09,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 12.56392 ± 64.014
2025-08-07 05:06:09,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-16.240278, 9.836783, 181.40929, 18.247688, 15.327079, -50.99419, -72.29819, 26.445707, 13.834918, 0.07033785]
2025-08-07 05:06:09,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [106.0, 43.0, 1000.0, 39.0, 42.0, 117.0, 104.0, 42.0, 57.0, 18.0]
2025-08-07 05:06:09,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 11 minutes, 23 seconds)
2025-08-07 05:08:01,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:08:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -5.16443 ± 25.917
2025-08-07 05:08:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-41.521004, -3.9356549, 26.949358, -12.151322, 47.32709, -32.052853, -21.757774, -12.927378, -15.972342, 14.397608]
2025-08-07 05:08:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [138.0, 73.0, 77.0, 50.0, 63.0, 54.0, 95.0, 41.0, 81.0, 42.0]
2025-08-07 05:08:02,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 9 minutes, 59 seconds)
2025-08-07 05:09:48,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:09:52,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 6.06060 ± 86.890
2025-08-07 05:09:52,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [117.70774, 18.437614, 36.61679, -25.754616, -46.393013, -76.972115, -81.16727, 202.5212, -27.604025, -56.78627]
2025-08-07 05:09:52,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 49.0, 64.0, 107.0, 150.0, 72.0, 112.0, 1000.0, 99.0, 84.0]
2025-08-07 05:09:52,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 9 minutes, 2 seconds)
2025-08-07 05:11:42,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:11:43,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -1.11160 ± 24.853
2025-08-07 05:11:43,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-12.25566, -61.168728, 18.717707, 0.37575188, -10.651117, 24.959658, 21.681976, -10.671125, -6.3180375, 24.213606]
2025-08-07 05:11:43,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [94.0, 182.0, 45.0, 19.0, 68.0, 63.0, 50.0, 93.0, 72.0, 49.0]
2025-08-07 05:11:43,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 6 minutes, 42 seconds)
2025-08-07 05:13:28,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:13:29,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -10.93364 ± 32.863
2025-08-07 05:13:29,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-2.440918, -80.22178, -3.8930283, 2.9661825, -44.511856, 34.923504, -42.630775, -0.68694264, 22.00774, 5.151512]
2025-08-07 05:13:29,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 219.0, 40.0, 61.0, 96.0, 100.0, 57.0, 94.0, 52.0, 39.0]
2025-08-07 05:13:29,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 3 minutes, 28 seconds)
2025-08-07 05:15:30,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:15:32,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 25.13385 ± 70.299
2025-08-07 05:15:32,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [216.27129, 13.954822, -14.495498, -57.772594, 51.28967, -6.8843856, 29.132696, 18.256302, 26.858477, -25.272232]
2025-08-07 05:15:32,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 69.0, 37.0, 72.0, 57.0, 38.0, 36.0, 52.0, 45.0, 66.0]
2025-08-07 05:15:32,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 2 minutes, 2 seconds)
2025-08-07 05:17:14,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:17:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 33.23862 ± 60.065
2025-08-07 05:17:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [1.520063, 1.6208557, 25.927362, 208.88821, -2.4028, 24.697622, 12.964697, -2.4588268, 38.73179, 22.897276]
2025-08-07 05:17:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 96.0, 53.0, 1000.0, 36.0, 44.0, 43.0, 42.0, 59.0, 37.0]
2025-08-07 05:17:16,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 58 minutes, 12 seconds)
2025-08-07 05:19:07,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:19:11,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 42.28965 ± 99.775
2025-08-07 05:19:11,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [1.7755961, 17.347898, -14.255869, 233.6481, -27.7775, 33.045998, -18.29732, -35.236683, -9.838374, 242.48462]
2025-08-07 05:19:11,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 57.0, 80.0, 1000.0, 73.0, 60.0, 56.0, 107.0, 45.0, 1000.0]
2025-08-07 05:19:11,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 57 minutes, 17 seconds)
2025-08-07 05:21:02,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:21:04,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -7.05369 ± 64.353
2025-08-07 05:21:04,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-63.569733, -39.00803, -2.2239828, -7.123755, 170.86357, -37.099495, -14.012839, -12.94044, -72.81951, 7.397346]
2025-08-07 05:21:04,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [146.0, 92.0, 49.0, 69.0, 1000.0, 125.0, 118.0, 64.0, 227.0, 35.0]
2025-08-07 05:21:04,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 55 minutes, 56 seconds)
2025-08-07 05:22:59,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:23:02,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 34.79593 ± 64.106
2025-08-07 05:23:02,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [20.851568, 27.042486, 39.298054, 11.016246, 11.435311, -16.988638, 214.52856, -33.75176, 39.87653, 34.650948]
2025-08-07 05:23:02,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 113.0, 91.0, 86.0, 55.0, 42.0, 1000.0, 71.0, 135.0, 198.0]
2025-08-07 05:23:02,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 56 minutes, 34 seconds)
2025-08-07 05:24:48,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:24:50,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -4.43694 ± 28.671
2025-08-07 05:24:50,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-7.671021, -0.6458867, 7.7302485, -30.70057, 5.947418, -7.8767834, -67.84217, 43.96178, 24.282291, -11.5547]
2025-08-07 05:24:50,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 20.0, 47.0, 98.0, 35.0, 33.0, 148.0, 547.0, 85.0, 38.0]
2025-08-07 05:24:50,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 51 minutes, 32 seconds)
2025-08-07 05:26:39,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:26:41,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 6.42629 ± 52.553
2025-08-07 05:26:41,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [24.301382, -12.247034, 31.319223, -24.484795, 15.606771, -19.085835, -92.15573, 128.03792, -1.6442806, 14.615288]
2025-08-07 05:26:41,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 48.0, 56.0, 86.0, 89.0, 80.0, 208.0, 1000.0, 43.0, 111.0]
2025-08-07 05:26:42,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 51 minutes, 14 seconds)
2025-08-07 05:28:32,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:28:35,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 15.70910 ± 96.896
2025-08-07 05:28:35,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [12.909484, 1.6551874, -19.171452, 246.7427, 15.063025, 16.71737, -177.43817, 53.942574, 6.0245943, 0.64574003]
2025-08-07 05:28:35,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 86.0, 62.0, 1000.0, 67.0, 50.0, 268.0, 67.0, 18.0, 18.0]
2025-08-07 05:28:35,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 49 minutes, 3 seconds)
2025-08-07 05:30:26,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:30:28,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 22.37880 ± 51.862
2025-08-07 05:30:28,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [44.847137, 4.671048, 169.11801, -4.568472, -25.32659, 7.793177, 23.227716, -0.006566695, 1.5512931, 2.4811978]
2025-08-07 05:30:28,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [88.0, 63.0, 1000.0, 127.0, 80.0, 42.0, 133.0, 75.0, 52.0, 74.0]
2025-08-07 05:30:28,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 47 minutes, 9 seconds)
2025-08-07 05:32:17,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:32:18,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -15.17186 ± 40.923
2025-08-07 05:32:18,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [21.391203, 16.39139, 27.149267, 8.319211, -13.979329, -13.153057, -24.895481, -115.07602, -57.315697, -0.5500599]
2025-08-07 05:32:18,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 39.0, 49.0, 71.0, 36.0, 59.0, 89.0, 192.0, 129.0, 74.0]
2025-08-07 05:32:18,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 43 minutes, 43 seconds)
2025-08-07 05:34:14,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:34:19,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 42.86321 ± 63.483
2025-08-07 05:34:19,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [200.00383, -4.3197794, 39.981663, 2.7615068, 52.53479, -1.9353209, 117.99924, 18.540812, 3.8902714, -0.8248897]
2025-08-07 05:34:19,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 125.0, 69.0, 127.0, 95.0, 83.0, 1000.0, 82.0, 39.0, 87.0]
2025-08-07 05:34:19,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 44 minutes, 20 seconds)
2025-08-07 05:36:01,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:36:04,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 17.32146 ± 55.100
2025-08-07 05:36:04,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-18.47614, -10.722159, -15.221491, 3.8632908, 4.332716, 19.396595, 153.34892, -57.159004, 23.74143, 70.11047]
2025-08-07 05:36:04,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 98.0, 27.0, 116.0, 127.0, 66.0, 1000.0, 215.0, 43.0, 482.0]
2025-08-07 05:36:04,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 41 minutes, 16 seconds)
2025-08-07 05:37:57,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:38:00,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 38.55278 ± 70.491
2025-08-07 05:38:00,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-2.3969731, 10.582255, 6.2787914, 29.991095, 3.2586424, 31.214231, 0.8434904, 245.47758, 44.87515, 15.403532]
2025-08-07 05:38:00,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 35.0, 45.0, 41.0, 40.0, 57.0, 57.0, 1000.0, 80.0, 99.0]
2025-08-07 05:38:00,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 39 minutes, 51 seconds)
2025-08-07 05:39:47,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:39:50,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 14.00587 ± 64.983
2025-08-07 05:39:50,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [35.76713, -2.86633, 177.18262, 23.893997, -22.580301, 4.3966746, -98.515656, 22.235031, 10.474437, -9.928857]
2025-08-07 05:39:50,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 52.0, 1000.0, 42.0, 65.0, 136.0, 257.0, 51.0, 247.0, 35.0]
2025-08-07 05:39:50,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 37 minutes, 20 seconds)
2025-08-07 05:41:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:41:46,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 51.09696 ± 64.581
2025-08-07 05:41:46,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [0.65232646, -15.820102, 11.847652, 37.71729, 44.87807, 178.27226, 17.315123, 29.02452, 172.90897, 34.173496]
2025-08-07 05:41:46,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [99.0, 53.0, 41.0, 104.0, 75.0, 1000.0, 66.0, 35.0, 1000.0, 50.0]
2025-08-07 05:41:46,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 36 minutes, 30 seconds)
2025-08-07 05:43:41,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:43:44,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 40.83280 ± 53.308
2025-08-07 05:43:44,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [174.43741, 53.719147, 0.09873858, 20.726513, -3.771923, 87.52954, 2.2985227, 26.565184, -7.200908, 53.925777]
2025-08-07 05:43:44,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 87.0, 145.0, 47.0, 87.0, 159.0, 32.0, 88.0, 87.0, 92.0]
2025-08-07 05:43:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 34 minutes, 6 seconds)
2025-08-07 05:45:31,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:45:33,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 26.91695 ± 56.060
2025-08-07 05:45:33,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [5.756455, 4.4297924, 193.53342, 11.625897, 12.831979, -0.042033352, 22.038109, 12.595126, 13.189713, -6.788932]
2025-08-07 05:45:33,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 19.0, 1000.0, 31.0, 56.0, 16.0, 100.0, 43.0, 33.0, 51.0]
2025-08-07 05:45:34,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 33 minutes)
2025-08-07 05:47:30,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:47:34,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 30.57624 ± 65.519
2025-08-07 05:47:34,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [194.26036, 3.2237546, -5.8925853, 51.419865, -70.777725, 59.290363, 47.979687, 9.460435, 28.305485, -11.507256]
2025-08-07 05:47:34,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 45.0, 27.0, 72.0, 169.0, 1000.0, 50.0, 33.0, 90.0, 42.0]
2025-08-07 05:47:34,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 31 minutes, 49 seconds)
2025-08-07 05:49:18,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:49:21,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 26.02192 ± 59.364
2025-08-07 05:49:21,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [14.4257145, 12.863761, 5.8329935, 17.285963, -8.957806, 24.52919, 12.13828, -17.832884, -0.4611895, 200.39514]
2025-08-07 05:49:21,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 65.0, 51.0, 49.0, 91.0, 62.0, 74.0, 126.0, 67.0, 1000.0]
2025-08-07 05:49:21,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 29 minutes, 27 seconds)
2025-08-07 05:51:07,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:51:09,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 22.44184 ± 51.638
2025-08-07 05:51:09,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.4317958, 10.145391, 151.41086, 36.13175, 33.242393, 5.235784, 29.977114, -66.73671, -6.3351455, 28.915129]
2025-08-07 05:51:09,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 35.0, 1000.0, 62.0, 71.0, 18.0, 53.0, 186.0, 45.0, 44.0]
2025-08-07 05:51:10,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 26 minutes, 27 seconds)
2025-08-07 05:53:09,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:53:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 39.27741 ± 70.777
2025-08-07 05:53:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [6.3385825, 27.877544, 30.859789, 244.65395, 32.758736, 26.242172, 24.312239, -31.423943, 9.644436, 21.510622]
2025-08-07 05:53:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 67.0, 57.0, 1000.0, 66.0, 34.0, 128.0, 101.0, 68.0, 65.0]
2025-08-07 05:53:11,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 25 minutes, 5 seconds)
2025-08-07 05:54:51,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:54:54,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 15.44824 ± 55.687
2025-08-07 05:54:54,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-55.79395, 18.180145, 67.96594, 30.68823, 18.547049, 135.7138, 6.588995, -32.296936, -61.86588, 26.754995]
2025-08-07 05:54:54,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [97.0, 24.0, 87.0, 50.0, 33.0, 1000.0, 52.0, 87.0, 209.0, 46.0]
2025-08-07 05:54:54,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 22 minutes, 11 seconds)
2025-08-07 05:56:56,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:57:00,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 40.37177 ± 73.388
2025-08-07 05:57:00,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-20.652275, 28.984564, -14.34195, 222.43672, -3.061112, 14.406803, 6.716201, 43.626076, 130.45041, -4.8477387]
2025-08-07 05:57:00,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [72.0, 30.0, 76.0, 1000.0, 37.0, 34.0, 74.0, 42.0, 1000.0, 106.0]
2025-08-07 05:57:00,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 21 minutes, 9 seconds)
2025-08-07 05:58:44,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:58:45,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 17.52201 ± 32.989
2025-08-07 05:58:45,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [11.778726, 43.556477, -53.688572, 64.38909, 24.45542, -0.23041107, -9.2581215, 57.498398, 28.953238, 7.765811]
2025-08-07 05:58:45,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 56.0, 198.0, 73.0, 34.0, 31.0, 44.0, 97.0, 96.0, 87.0]
2025-08-07 05:58:45,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 18 minutes, 57 seconds)
2025-08-07 06:00:33,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:00:37,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 13.14924 ± 76.198
2025-08-07 06:00:37,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [21.537498, 2.4306667, 0.104514994, -81.36607, -69.566826, -35.848934, 6.378504, 208.17873, 43.317898, 36.326378]
2025-08-07 06:00:37,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [131.0, 222.0, 24.0, 132.0, 84.0, 212.0, 19.0, 1000.0, 77.0, 82.0]
2025-08-07 06:00:37,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 17 minutes, 30 seconds)
2025-08-07 06:02:34,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:02:37,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 57.56321 ± 75.476
2025-08-07 06:02:37,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [34.59918, 69.18252, 16.279589, 272.96555, 29.87966, 17.593603, 7.3164997, 34.75729, 10.823996, 82.23416]
2025-08-07 06:02:37,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 94.0, 128.0, 1000.0, 119.0, 45.0, 103.0, 63.0, 34.0, 130.0]
2025-08-07 06:02:37,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 15 minutes, 26 seconds)
2025-08-07 06:04:20,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:04:23,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 34.15593 ± 68.012
2025-08-07 06:04:23,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [12.69381, -46.08251, 25.425055, 47.47621, 11.307873, -1.4268069, 27.647442, 12.981996, 225.86484, 25.671413]
2025-08-07 06:04:23,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 118.0, 34.0, 108.0, 166.0, 95.0, 62.0, 69.0, 1000.0, 58.0]
2025-08-07 06:04:23,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 13 minutes, 55 seconds)
2025-08-07 06:06:17,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:06:18,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 24.90849 ± 24.627
2025-08-07 06:06:18,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [44.96163, -5.926118, 40.6929, 10.2700815, -15.413597, 35.119175, 2.349744, 23.841785, 56.97202, 56.21727]
2025-08-07 06:06:18,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 28.0, 53.0, 59.0, 90.0, 134.0, 16.0, 30.0, 67.0, 108.0]
2025-08-07 06:06:18,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 10 minutes, 38 seconds)
2025-08-07 06:08:05,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:08:07,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 9.22018 ± 20.539
2025-08-07 06:08:07,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-4.490497, 29.170065, 0.0073129945, 30.150059, -29.8149, -3.5987122, 22.42958, -9.383642, 20.83567, 36.89686]
2025-08-07 06:08:07,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 44.0, 14.0, 55.0, 172.0, 118.0, 204.0, 86.0, 59.0, 44.0]
2025-08-07 06:08:07,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 9 minutes, 15 seconds)
2025-08-07 06:10:02,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:10:04,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 33.05383 ± 34.164
2025-08-07 06:10:04,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [33.239437, 8.881154, 25.286837, 114.56029, -4.535893, 19.71804, 66.670044, 43.822296, 28.959454, -6.0633116]
2025-08-07 06:10:04,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [109.0, 28.0, 34.0, 318.0, 54.0, 32.0, 44.0, 42.0, 113.0, 127.0]
2025-08-07 06:10:04,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 8 minutes, 2 seconds)
2025-08-07 06:11:53,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:11:55,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 39.60205 ± 87.889
2025-08-07 06:11:55,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-52.1892, 30.16598, 8.310003, 28.946224, 9.417324, 5.8181067, 30.124304, 294.08823, 23.363962, 17.975552]
2025-08-07 06:11:55,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [90.0, 35.0, 38.0, 184.0, 53.0, 43.0, 64.0, 1000.0, 105.0, 33.0]
2025-08-07 06:11:55,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 5 minutes, 10 seconds)
2025-08-07 06:13:49,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:13:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 31.44202 ± 62.896
2025-08-07 06:13:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [53.58489, 35.33289, -45.738327, -40.909473, 30.413473, 196.8868, 15.397165, 22.764208, 26.899584, 19.789011]
2025-08-07 06:13:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [179.0, 75.0, 263.0, 97.0, 125.0, 1000.0, 45.0, 46.0, 56.0, 204.0]
2025-08-07 06:13:52,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 4 minutes, 31 seconds)
2025-08-07 06:15:36,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:15:37,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 11.18103 ± 22.027
2025-08-07 06:15:37,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-38.63137, 17.729053, 11.023099, 2.2757564, 29.06705, 20.523987, 46.991356, -9.841776, 21.971416, 10.701749]
2025-08-07 06:15:37,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [115.0, 50.0, 60.0, 45.0, 32.0, 95.0, 89.0, 38.0, 48.0, 41.0]
2025-08-07 06:15:37,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 1 minute, 31 seconds)
2025-08-07 06:17:28,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:17:30,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 29.14087 ± 61.369
2025-08-07 06:17:30,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-19.130962, 16.573677, -35.824596, 24.091864, 16.001286, 2.2129183, -2.6189733, 35.950706, 197.04865, 57.104137]
2025-08-07 06:17:30,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [147.0, 52.0, 138.0, 106.0, 34.0, 42.0, 32.0, 69.0, 1000.0, 105.0]
2025-08-07 06:17:30,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 8 seconds)
2025-08-07 06:19:30,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:19:32,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 25.22344 ± 70.938
2025-08-07 06:19:32,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [5.1691203, -0.26361597, -7.072179, 8.37235, -16.535698, 18.503994, 18.760532, 5.549172, 235.16083, -15.410064]
2025-08-07 06:19:32,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 128.0, 87.0, 93.0, 106.0, 43.0, 63.0, 71.0, 1000.0, 204.0]
2025-08-07 06:19:32,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 58 minutes, 46 seconds)
2025-08-07 06:21:17,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:21:18,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 20.75218 ± 21.910
2025-08-07 06:21:18,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-8.277912, 18.542585, 31.010612, 4.577206, 69.470375, 5.966246, 14.951375, 16.00726, 48.168705, 7.1053343]
2025-08-07 06:21:18,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [125.0, 56.0, 66.0, 58.0, 104.0, 58.0, 82.0, 50.0, 69.0, 53.0]
2025-08-07 06:21:18,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 56 minutes, 14 seconds)
2025-08-07 06:23:05,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:23:08,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 55.17736 ± 60.807
2025-08-07 06:23:08,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-21.269375, 33.215828, 36.11543, 30.286524, 42.992607, 41.2202, 69.32308, 48.683037, 45.759777, 225.4465]
2025-08-07 06:23:08,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [81.0, 56.0, 140.0, 92.0, 132.0, 252.0, 158.0, 81.0, 57.0, 1000.0]
2025-08-07 06:23:08,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 53 minutes, 43 seconds)
2025-08-07 06:25:02,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:25:04,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 19.47640 ± 26.303
2025-08-07 06:25:04,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [20.315325, 19.010637, 52.293945, 19.13198, 12.289527, -39.595436, 17.680986, 63.216537, 2.0527494, 28.367723]
2025-08-07 06:25:04,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 79.0, 67.0, 34.0, 63.0, 183.0, 32.0, 281.0, 162.0, 62.0]
2025-08-07 06:25:04,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 52 minutes, 52 seconds)
2025-08-07 06:26:59,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:27:00,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 21.90829 ± 31.416
2025-08-07 06:27:00,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-10.104439, 32.614017, 24.33512, 5.9634256, 51.227573, -41.21775, 13.404005, 16.376703, 55.974262, 70.50996]
2025-08-07 06:27:00,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 41.0, 53.0, 32.0, 107.0, 115.0, 35.0, 32.0, 84.0, 140.0]
2025-08-07 06:27:00,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 16 seconds)
2025-08-07 06:28:47,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:28:48,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 14.97689 ± 18.483
2025-08-07 06:28:48,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [29.898832, 27.331093, -21.75897, 10.903352, 17.460869, 9.198721, 23.048113, 48.79927, 9.929154, -5.041527]
2025-08-07 06:28:48,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 119.0, 91.0, 44.0, 135.0, 55.0, 48.0, 79.0, 80.0, 44.0]
2025-08-07 06:28:48,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 48 minutes, 8 seconds)
2025-08-07 06:30:31,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:30:34,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 40.11021 ± 67.033
2025-08-07 06:30:34,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [1.3156798, 50.967407, 17.105679, -8.6842575, 232.03133, 2.2787318, 46.306854, 36.2624, 27.169853, -3.6516087]
2025-08-07 06:30:34,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 70.0, 54.0, 65.0, 1000.0, 97.0, 137.0, 49.0, 40.0, 33.0]
2025-08-07 06:30:34,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 46 minutes, 19 seconds)
2025-08-07 06:32:25,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:32:29,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 60.40079 ± 74.709
2025-08-07 06:32:29,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [196.44498, 58.869164, 38.38618, 29.013302, -16.539127, 36.013634, 204.10715, 61.07241, -23.58249, 20.222746]
2025-08-07 06:32:29,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 81.0, 134.0, 47.0, 46.0, 72.0, 1000.0, 119.0, 241.0, 60.0]
2025-08-07 06:32:29,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (60.40) for latency ExtremeClogL1U23
2025-08-07 06:32:29,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 44 minutes, 56 seconds)
2025-08-07 06:34:27,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:34:30,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 31.32731 ± 68.768
2025-08-07 06:34:30,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [60.759613, -13.068297, 228.08081, 1.2165161, -1.6860298, -12.275611, 7.5384583, 22.683725, -2.1083624, 22.132254]
2025-08-07 06:34:30,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [117.0, 144.0, 1000.0, 34.0, 42.0, 34.0, 34.0, 78.0, 52.0, 36.0]
2025-08-07 06:34:30,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 23 seconds)
2025-08-07 06:36:24,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:36:25,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 22.55608 ± 17.226
2025-08-07 06:36:25,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [32.65275, 35.8303, 10.236361, 55.22917, 13.221401, 14.729145, -5.847684, 40.7688, 19.327196, 9.413385]
2025-08-07 06:36:25,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 40.0, 41.0, 109.0, 34.0, 101.0, 143.0, 44.0, 43.0, 73.0]
2025-08-07 06:36:25,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 26 seconds)
2025-08-07 06:38:17,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:38:18,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 37.62151 ± 39.460
2025-08-07 06:38:18,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [51.45922, 57.11232, 1.9632233, 73.232346, 13.639241, -28.727419, 85.08044, 96.518715, -3.545788, 29.482784]
2025-08-07 06:38:18,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [106.0, 61.0, 28.0, 113.0, 55.0, 119.0, 83.0, 95.0, 33.0, 94.0]
2025-08-07 06:38:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 53 seconds)
2025-08-07 06:39:57,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:40:00,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 31.40265 ± 69.652
2025-08-07 06:40:00,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-69.399536, 50.87921, 27.71996, -7.869881, 12.977123, 8.367741, 28.23628, 40.25348, 217.84956, 5.0125175]
2025-08-07 06:40:00,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [132.0, 119.0, 32.0, 70.0, 32.0, 34.0, 43.0, 43.0, 1000.0, 43.0]
2025-08-07 06:40:00,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 42 seconds)
2025-08-07 06:41:52,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:41:55,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 45.68246 ± 89.111
2025-08-07 06:41:55,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [18.784319, -72.108765, 37.306194, 29.36632, 35.62547, 73.8659, 20.767664, -0.12775926, 22.099632, 291.2456]
2025-08-07 06:41:55,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [159.0, 470.0, 56.0, 33.0, 123.0, 109.0, 64.0, 70.0, 34.0, 1000.0]
2025-08-07 06:41:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 48 seconds)
2025-08-07 06:43:49,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:43:52,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 50.90852 ± 66.546
2025-08-07 06:43:52,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [18.337957, 48.74613, 14.04197, 246.56625, 12.298365, 15.419059, 29.359964, 49.352524, 37.751896, 37.21113]
2025-08-07 06:43:52,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 430.0, 90.0, 1000.0, 70.0, 83.0, 189.0, 147.0, 50.0, 104.0]
2025-08-07 06:43:52,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 33 minutes, 44 seconds)
2025-08-07 06:45:40,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:45:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 37.55178 ± 33.938
2025-08-07 06:45:42,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [22.735804, 68.85238, 2.8546746, 0.26516557, 24.44211, 33.44142, 90.37485, 98.59236, 7.6716924, 26.287317]
2025-08-07 06:45:42,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 115.0, 92.0, 81.0, 43.0, 51.0, 77.0, 143.0, 91.0, 45.0]
2025-08-07 06:45:42,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 31 seconds)
2025-08-07 06:47:24,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 34.44135 ± 76.971
2025-08-07 06:47:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [4.1780915, 4.23065, 8.0475, 8.978181, 7.243369, 263.25656, 33.111958, -3.670124, 21.33092, -2.2936642]
2025-08-07 06:47:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [117.0, 135.0, 94.0, 33.0, 29.0, 1000.0, 43.0, 39.0, 66.0, 81.0]
2025-08-07 06:47:27,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 17 seconds)
2025-08-07 06:49:15,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:19,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 58.09177 ± 82.957
2025-08-07 06:49:19,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [21.785648, -7.255147, 36.7547, 228.96477, 9.539567, 19.073301, 214.35022, 15.843328, 1.7744138, 40.086903]
2025-08-07 06:49:19,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 53.0, 47.0, 1000.0, 21.0, 41.0, 1000.0, 150.0, 18.0, 55.0]
2025-08-07 06:49:19,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 58 seconds)
2025-08-07 06:51:08,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:10,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 20.88531 ± 33.142
2025-08-07 06:51:10,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.0708315, 9.785764, 19.492697, 6.7312307, 4.167408, 27.18698, 116.4849, 19.072298, -5.951879, 9.812884]
2025-08-07 06:51:10,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 35.0, 45.0, 53.0, 21.0, 56.0, 1000.0, 51.0, 181.0, 49.0]
2025-08-07 06:51:10,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 54 seconds)
2025-08-07 06:53:11,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:12,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 8.26145 ± 15.954
2025-08-07 06:53:12,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [7.6791167, 9.796056, -23.46218, 27.836216, 11.951076, -17.01624, 28.125847, 7.013574, 13.927702, 16.763351]
2025-08-07 06:53:12,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 33.0, 72.0, 44.0, 77.0, 152.0, 72.0, 74.0, 47.0, 37.0]
2025-08-07 06:53:12,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 15 seconds)
2025-08-07 06:54:50,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:54:53,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 41.61083 ± 63.632
2025-08-07 06:54:53,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [23.564024, 22.7483, -37.240826, 207.25313, 30.625032, 66.80701, 52.66005, -15.697651, 62.971912, 2.4173124]
2025-08-07 06:54:53,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 37.0, 86.0, 1000.0, 89.0, 86.0, 82.0, 56.0, 224.0, 32.0]
2025-08-07 06:54:53,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 2 seconds)
2025-08-07 06:56:45,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:46,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 36.15916 ± 18.570
2025-08-07 06:56:46,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [11.573344, 45.829105, 78.20311, 37.144127, 30.345854, 16.62998, 42.44932, 27.605574, 50.96858, 20.842617]
2025-08-07 06:56:46,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 121.0, 138.0, 63.0, 60.0, 126.0, 100.0, 38.0, 132.0, 49.0]
2025-08-07 06:56:46,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 30 seconds)
2025-08-07 06:58:34,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:35,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 34.30146 ± 34.053
2025-08-07 06:58:35,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [3.5325313, 2.397073, 36.078144, 109.463524, 5.078945, 1.4874343, 50.29729, 59.3867, 61.530098, 13.762876]
2025-08-07 06:58:35,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 116.0, 102.0, 171.0, 43.0, 38.0, 126.0, 107.0, 87.0, 68.0]
2025-08-07 06:58:35,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 32 seconds)
2025-08-07 07:00:18,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:20,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 24.64420 ± 33.673
2025-08-07 07:00:20,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [20.204172, 14.959857, 26.318562, 108.47944, -2.938204, 28.520834, 36.909454, 18.096218, -32.050053, 27.941736]
2025-08-07 07:00:20,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 55.0, 63.0, 116.0, 76.0, 120.0, 65.0, 168.0, 56.0, 85.0]
2025-08-07 07:00:20,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 29 seconds)
2025-08-07 07:02:09,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:12,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 56.04721 ± 51.221
2025-08-07 07:02:12,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [0.9070889, 98.80999, 38.455963, 178.69267, 46.974266, 4.3053694, 45.797813, 22.543623, 93.276634, 30.708702]
2025-08-07 07:02:12,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [210.0, 147.0, 53.0, 1000.0, 135.0, 50.0, 87.0, 91.0, 226.0, 45.0]
2025-08-07 07:02:12,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 23 seconds)
2025-08-07 07:03:58,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:00,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 28.94115 ± 23.482
2025-08-07 07:04:00,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [5.5961065, 55.826134, 30.754833, 12.338139, 24.536354, -6.8970046, 23.589975, 60.829716, 15.857555, 66.97968]
2025-08-07 07:04:00,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 53.0, 48.0, 35.0, 181.0, 51.0, 91.0, 130.0, 42.0, 91.0]
2025-08-07 07:04:00,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 45 seconds)
2025-08-07 07:05:48,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:53,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 69.68575 ± 75.576
2025-08-07 07:05:53,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [161.33675, 19.21663, 189.67255, 7.8483014, 73.53117, -1.2146391, 173.90245, 24.875591, 77.86346, -30.17488]
2025-08-07 07:05:53,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [299.0, 48.0, 1000.0, 26.0, 90.0, 35.0, 1000.0, 41.0, 178.0, 73.0]
2025-08-07 07:05:53,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (69.69) for latency ExtremeClogL1U23
2025-08-07 07:05:53,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 55 seconds)
2025-08-07 07:07:49,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:50,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 31.53509 ± 40.554
2025-08-07 07:07:50,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [32.814243, 69.10492, 60.00958, -65.85629, 25.313553, 36.40458, 15.753837, 16.8129, 96.34565, 28.647896]
2025-08-07 07:07:50,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 101.0, 110.0, 100.0, 242.0, 43.0, 49.0, 78.0, 121.0, 69.0]
2025-08-07 07:07:50,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 14 seconds)
2025-08-07 07:09:32,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:34,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 47.29822 ± 64.041
2025-08-07 07:09:34,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [32.06848, 57.476524, 1.4055362, 0.22083066, 15.21545, 230.03662, 22.587593, 15.661574, 36.64483, 61.66476]
2025-08-07 07:09:34,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [110.0, 158.0, 34.0, 43.0, 34.0, 1000.0, 68.0, 24.0, 60.0, 196.0]
2025-08-07 07:09:35,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 23 seconds)
2025-08-07 07:11:23,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:27,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 65.38022 ± 85.657
2025-08-07 07:11:27,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-13.920658, 12.876966, 53.83549, 244.61766, 62.676407, 35.603344, 18.810163, 20.183832, 2.1887329, 216.93028]
2025-08-07 07:11:27,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 34.0, 50.0, 1000.0, 225.0, 64.0, 35.0, 79.0, 32.0, 1000.0]
2025-08-07 07:11:27,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 33 seconds)
2025-08-07 07:13:14,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:16,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 37.26418 ± 28.549
2025-08-07 07:13:16,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [51.68163, 36.257248, 20.359713, 89.71234, 51.130905, 20.007387, 8.165762, -7.55365, 75.5018, 27.378721]
2025-08-07 07:13:16,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [266.0, 106.0, 42.0, 177.0, 132.0, 88.0, 55.0, 95.0, 84.0, 34.0]
2025-08-07 07:13:16,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 42 seconds)
2025-08-07 07:15:08,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:10,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 21.95094 ± 24.850
2025-08-07 07:15:10,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [46.318565, -5.7221084, 32.598705, 43.49396, 23.78316, 27.484667, 42.280956, -39.42441, 26.84404, 21.851917]
2025-08-07 07:15:10,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 145.0, 58.0, 220.0, 46.0, 140.0, 121.0, 211.0, 99.0, 34.0]
2025-08-07 07:15:10,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 51 seconds)
2025-08-07 07:16:59,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:03,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 56.03444 ± 52.852
2025-08-07 07:17:03,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [13.486693, -9.678513, 79.961624, 41.78882, 170.51997, 122.78473, 53.173717, 26.302214, 5.3024735, 56.702747]
2025-08-07 07:17:03,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 90.0, 172.0, 97.0, 1000.0, 1000.0, 83.0, 48.0, 78.0, 151.0]
2025-08-07 07:17:03,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1251 [DEBUG]: Training session finished
