2025-08-07 07:46:47,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 07:46:47,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 07:46:47,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x151e550f6e10>}
2025-08-07 07:46:47,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1111 [DEBUG]: using device: cuda
2025-08-07 07:46:47,731 baseline-bpql-noiseperc10-ant:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 07:46:47,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1133 [INFO]: Creating new trainer
2025-08-07 07:46:47,748 baseline-bpql-noiseperc10-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=155, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 07:46:47,748 baseline-bpql-noiseperc10-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:46:48,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1194 [DEBUG]: Starting training session...
2025-08-07 07:46:48,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 1/100
2025-08-07 07:48:29,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:48:31,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -155.08774 ± 377.099
2025-08-07 07:48:31,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [0.095824845, 0.872729, -134.72147, -43.021214, -0.17496884, -1278.4951, -9.765213, -88.70871, 5.933005, -2.892296]
2025-08-07 07:48:31,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 34.0, 117.0, 61.0, 30.0, 1000.0, 60.0, 81.0, 23.0, 45.0]
2025-08-07 07:48:31,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (-155.09) for latency MM1Queue_a033_s075
2025-08-07 07:48:31,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 50 minutes, 41 seconds)
2025-08-07 07:50:03,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:50:04,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -17.04424 ± 38.249
2025-08-07 07:50:04,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-14.258413, 13.627675, -18.322454, -60.891678, -0.39669558, 13.296267, -111.80373, 16.196835, 4.1715612, -12.061753]
2025-08-07 07:50:04,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [187.0, 76.0, 66.0, 106.0, 64.0, 55.0, 133.0, 35.0, 49.0, 55.0]
2025-08-07 07:50:04,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (-17.04) for latency MM1Queue_a033_s075
2025-08-07 07:50:04,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 39 minutes, 55 seconds)
2025-08-07 07:51:44,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:51:49,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -269.11258 ± 348.743
2025-08-07 07:51:49,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-53.140514, -47.2265, -131.67087, 16.83333, -820.1861, -54.393692, -12.826184, -755.5376, -819.0834, -13.894406]
2025-08-07 07:51:49,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [111.0, 144.0, 172.0, 30.0, 1000.0, 137.0, 64.0, 1000.0, 1000.0, 113.0]
2025-08-07 07:51:49,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 42 minutes, 20 seconds)
2025-08-07 07:53:27,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:53:29,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: -16.74367 ± 29.959
2025-08-07 07:53:29,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-69.17811, 28.947063, -12.220728, 10.783368, -10.258044, -1.8959821, -24.065666, 9.63741, -39.851566, -59.33448]
2025-08-07 07:53:29,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [278.0, 68.0, 89.0, 85.0, 85.0, 124.0, 218.0, 57.0, 233.0, 122.0]
2025-08-07 07:53:29,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (-16.74) for latency MM1Queue_a033_s075
2025-08-07 07:53:29,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 40 minutes, 22 seconds)
2025-08-07 07:55:07,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:55:11,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 45.04987 ± 39.297
2025-08-07 07:55:11,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [-25.743113, 110.79538, 26.061974, 36.16521, 86.2192, -1.1064644, 22.650743, 70.95442, 55.97194, 68.529434]
2025-08-07 07:55:11,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [184.0, 1000.0, 115.0, 61.0, 282.0, 262.0, 149.0, 160.0, 125.0, 131.0]
2025-08-07 07:55:11,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (45.05) for latency MM1Queue_a033_s075
2025-08-07 07:55:11,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 39 minutes, 9 seconds)
2025-08-07 07:56:57,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:57:05,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 307.56018 ± 173.427
2025-08-07 07:57:05,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [363.2968, 285.0478, 135.16878, 587.6276, 521.877, 114.478546, 426.02902, 132.10658, 421.4768, 88.49275]
2025-08-07 07:57:05,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [544.0, 399.0, 255.0, 1000.0, 1000.0, 322.0, 1000.0, 256.0, 1000.0, 167.0]
2025-08-07 07:57:05,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (307.56) for latency MM1Queue_a033_s075
2025-08-07 07:57:05,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 41 minutes, 2 seconds)
2025-08-07 07:58:38,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:58:46,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 281.18469 ± 180.218
2025-08-07 07:58:46,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [171.48802, 172.7169, 72.98368, 159.47462, 241.9045, 683.59753, 323.90823, 352.9182, 503.36603, 129.48936]
2025-08-07 07:58:46,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [535.0, 904.0, 163.0, 472.0, 434.0, 1000.0, 568.0, 492.0, 1000.0, 243.0]
2025-08-07 07:58:46,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 41 minutes, 57 seconds)
2025-08-07 08:00:27,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:00:35,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 317.97641 ± 265.119
2025-08-07 08:00:35,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [43.63499, 662.8908, 489.53513, 53.906586, 571.8299, 625.2411, 51.71456, 79.40155, 53.588753, 548.0204]
2025-08-07 08:00:35,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [181.0, 1000.0, 1000.0, 82.0, 1000.0, 1000.0, 65.0, 163.0, 58.0, 1000.0]
2025-08-07 08:00:35,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (317.98) for latency MM1Queue_a033_s075
2025-08-07 08:00:35,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 41 minutes, 16 seconds)
2025-08-07 08:02:16,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:02:22,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 271.28833 ± 237.959
2025-08-07 08:02:22,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [73.631615, 114.31379, 243.6021, 34.034157, 666.03735, 629.36487, 575.2321, 178.49382, 122.89781, 75.27572]
2025-08-07 08:02:22,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [76.0, 153.0, 375.0, 45.0, 1000.0, 1000.0, 1000.0, 241.0, 130.0, 163.0]
2025-08-07 08:02:22,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 41 minutes, 35 seconds)
2025-08-07 08:04:00,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:04:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 339.66095 ± 252.899
2025-08-07 08:04:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [355.33002, 95.2615, 128.22069, 830.24994, 220.0956, 53.77359, 101.2373, 654.985, 518.9389, 438.5165]
2025-08-07 08:04:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [738.0, 108.0, 289.0, 1000.0, 460.0, 61.0, 120.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:04:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (339.66) for latency MM1Queue_a033_s075
2025-08-07 08:04:08,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 41 minutes, 4 seconds)
2025-08-07 08:05:50,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:05:56,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 349.63245 ± 249.389
2025-08-07 08:05:56,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [646.317, 209.92038, 38.01316, 101.5727, 349.44012, 749.465, 103.777336, 682.5103, 410.42685, 204.88156]
2025-08-07 08:05:56,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 456.0, 50.0, 106.0, 254.0, 938.0, 95.0, 1000.0, 396.0, 188.0]
2025-08-07 08:05:56,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (349.63) for latency MM1Queue_a033_s075
2025-08-07 08:05:56,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 37 minutes, 24 seconds)
2025-08-07 08:07:32,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:07:39,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 404.11798 ± 262.141
2025-08-07 08:07:39,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [406.36792, 39.508297, 500.28055, 238.75578, 160.76038, 109.761826, 376.4739, 658.1243, 652.0145, 899.13184]
2025-08-07 08:07:39,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [361.0, 36.0, 647.0, 321.0, 187.0, 85.0, 345.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:07:39,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (404.12) for latency MM1Queue_a033_s075
2025-08-07 08:07:39,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 36 minutes, 9 seconds)
2025-08-07 08:09:19,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:09:26,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 439.51837 ± 281.452
2025-08-07 08:09:26,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [101.55276, 528.0846, 156.18205, 394.01352, 672.56805, 235.05023, 448.88718, 1073.4401, 599.21124, 186.19366]
2025-08-07 08:09:26,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 1000.0, 163.0, 382.0, 636.0, 207.0, 479.0, 1000.0, 1000.0, 196.0]
2025-08-07 08:09:26,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (439.52) for latency MM1Queue_a033_s075
2025-08-07 08:09:26,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 33 minutes, 57 seconds)
2025-08-07 08:11:03,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:11:07,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 256.69836 ± 209.316
2025-08-07 08:11:07,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [166.32283, 161.02647, 238.928, 284.4813, 47.980545, 770.5467, 431.593, 84.76441, 43.616253, 337.72415]
2025-08-07 08:11:07,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [154.0, 125.0, 216.0, 244.0, 44.0, 1000.0, 380.0, 68.0, 70.0, 303.0]
2025-08-07 08:11:07,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 30 minutes, 32 seconds)
2025-08-07 08:12:45,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:12:48,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 185.59262 ± 104.864
2025-08-07 08:12:48,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [194.29242, 44.47024, 208.74088, 326.5283, 367.176, 75.39216, 130.04567, 90.44773, 284.19678, 134.636]
2025-08-07 08:12:48,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [185.0, 52.0, 168.0, 1000.0, 263.0, 92.0, 107.0, 76.0, 212.0, 107.0]
2025-08-07 08:12:48,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 27 minutes, 25 seconds)
2025-08-07 08:14:27,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:14:31,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 299.82068 ± 166.782
2025-08-07 08:14:31,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [260.6492, 307.37103, 542.65717, 325.68195, 130.79553, 580.95074, 192.67825, 81.47241, 445.91968, 130.03076]
2025-08-07 08:14:31,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [215.0, 304.0, 525.0, 293.0, 147.0, 1000.0, 158.0, 119.0, 401.0, 102.0]
2025-08-07 08:14:31,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 24 minutes, 15 seconds)
2025-08-07 08:16:12,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:16:16,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 288.58789 ± 183.467
2025-08-07 08:16:16,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [291.28226, 523.04004, 71.38826, 172.28293, 361.9338, 68.2322, 238.88353, 348.92767, 663.5973, 146.31116]
2025-08-07 08:16:16,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [258.0, 1000.0, 53.0, 143.0, 312.0, 49.0, 227.0, 261.0, 731.0, 193.0]
2025-08-07 08:16:16,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 23 minutes, 12 seconds)
2025-08-07 08:17:58,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:18:09,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 682.18103 ± 306.009
2025-08-07 08:18:09,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [696.9343, 953.5973, 605.5622, 1137.6239, 361.41815, 982.5058, 942.0859, 267.5931, 657.1595, 217.33005]
2025-08-07 08:18:09,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 954.0, 333.0, 1000.0, 1000.0, 201.0, 1000.0, 205.0]
2025-08-07 08:18:09,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (682.18) for latency MM1Queue_a033_s075
2025-08-07 08:18:09,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 22 minutes, 54 seconds)
2025-08-07 08:19:42,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:19:46,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 305.26346 ± 274.184
2025-08-07 08:19:46,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [117.1205, 76.98863, 299.7376, 969.05725, 306.63663, 591.26794, 80.51984, 149.77975, 63.397076, 398.12933]
2025-08-07 08:19:46,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [97.0, 65.0, 300.0, 851.0, 243.0, 1000.0, 66.0, 104.0, 45.0, 332.0]
2025-08-07 08:19:46,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 20 minutes, 10 seconds)
2025-08-07 08:21:28,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:21:31,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 220.24146 ± 231.375
2025-08-07 08:21:31,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [103.63331, 68.71238, 161.29233, 97.957794, 846.06525, 258.85175, 41.024883, 345.95404, 254.00253, 24.920465]
2025-08-07 08:21:31,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 58.0, 162.0, 86.0, 1000.0, 237.0, 36.0, 276.0, 188.0, 28.0]
2025-08-07 08:21:31,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 19 minutes, 26 seconds)
2025-08-07 08:23:10,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:23:15,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 360.35297 ± 354.504
2025-08-07 08:23:15,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [960.9119, 45.565166, 29.521898, 186.32329, 413.79608, 762.6425, 898.45605, 91.05616, 70.69651, 144.55997]
2025-08-07 08:23:15,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 39.0, 37.0, 143.0, 346.0, 1000.0, 790.0, 60.0, 50.0, 131.0]
2025-08-07 08:23:15,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 17 minutes, 49 seconds)
2025-08-07 08:24:51,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:24:58,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 543.28455 ± 422.645
2025-08-07 08:24:58,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1385.188, 660.5594, 951.24335, 217.73848, 721.59204, 204.92508, 890.2493, 22.027214, 151.90962, 227.41368]
2025-08-07 08:24:58,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 691.0, 159.0, 1000.0, 159.0, 787.0, 35.0, 141.0, 212.0]
2025-08-07 08:24:58,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 15 minutes, 39 seconds)
2025-08-07 08:26:34,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:26:40,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 542.41541 ± 381.972
2025-08-07 08:26:40,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [342.73132, 406.09225, 325.55185, 1261.6595, 899.0942, 1066.1168, 302.43607, 597.5985, 75.2389, 147.63487]
2025-08-07 08:26:40,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [316.0, 370.0, 319.0, 1000.0, 692.0, 879.0, 288.0, 449.0, 57.0, 113.0]
2025-08-07 08:26:40,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 11 minutes, 4 seconds)
2025-08-07 08:28:22,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:28:27,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 436.14868 ± 397.228
2025-08-07 08:28:27,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [138.42224, 318.78104, 61.2977, 224.08653, 174.34013, 425.38382, 441.6042, 1173.75, 184.22455, 1219.5968]
2025-08-07 08:28:27,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [100.0, 212.0, 48.0, 168.0, 195.0, 331.0, 300.0, 1000.0, 127.0, 886.0]
2025-08-07 08:28:27,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 11 minutes, 52 seconds)
2025-08-07 08:30:04,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:30:10,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 440.91852 ± 387.624
2025-08-07 08:30:10,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [169.50104, 61.372208, 520.30695, 485.32632, 136.10472, 1001.31433, 782.7367, 63.592148, 1125.1252, 63.805096]
2025-08-07 08:30:10,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [131.0, 59.0, 449.0, 359.0, 109.0, 1000.0, 1000.0, 47.0, 824.0, 50.0]
2025-08-07 08:30:10,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 9 minutes, 44 seconds)
2025-08-07 08:31:43,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:31:48,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 372.14795 ± 347.760
2025-08-07 08:31:48,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [66.65159, 175.12617, 260.34088, 95.57052, 122.762955, 39.7733, 358.33307, 692.2326, 1083.1726, 827.5157]
2025-08-07 08:31:48,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 146.0, 213.0, 82.0, 102.0, 34.0, 265.0, 1000.0, 749.0, 1000.0]
2025-08-07 08:31:48,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes, 40 seconds)
2025-08-07 08:33:25,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:33:29,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 354.97345 ± 283.048
2025-08-07 08:33:29,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [612.2536, 168.92667, 369.34467, 610.7929, 196.47968, 83.64804, 357.93393, 100.13079, 70.623886, 979.6002]
2025-08-07 08:33:29,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [516.0, 95.0, 250.0, 359.0, 142.0, 49.0, 268.0, 64.0, 58.0, 1000.0]
2025-08-07 08:33:29,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 4 minutes, 20 seconds)
2025-08-07 08:35:09,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:35:15,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 531.01202 ± 341.253
2025-08-07 08:35:15,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [252.86641, 152.82367, 1307.1069, 748.6264, 803.48785, 530.6925, 530.28925, 564.3397, 129.49345, 290.39407]
2025-08-07 08:35:15,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [197.0, 114.0, 824.0, 1000.0, 1000.0, 343.0, 411.0, 388.0, 93.0, 224.0]
2025-08-07 08:35:15,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 3 minutes, 39 seconds)
2025-08-07 08:36:52,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:36:58,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 557.75555 ± 286.370
2025-08-07 08:36:58,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [857.3807, 275.2656, 904.2183, 887.8975, 764.2723, 736.4953, 203.51042, 408.13257, 140.55424, 399.82877]
2025-08-07 08:36:58,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 166.0, 520.0, 582.0, 530.0, 1000.0, 175.0, 271.0, 83.0, 279.0]
2025-08-07 08:36:58,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute, 7 seconds)
2025-08-07 08:38:35,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:38:39,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 506.69147 ± 432.704
2025-08-07 08:38:39,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1149.8552, 344.301, 481.7209, 94.020676, 940.07153, 491.18372, 176.44931, 1262.4589, 56.296116, 70.556946]
2025-08-07 08:38:39,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [684.0, 213.0, 388.0, 67.0, 600.0, 308.0, 138.0, 884.0, 40.0, 47.0]
2025-08-07 08:38:40,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 58 minutes, 56 seconds)
2025-08-07 08:40:20,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:40:26,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 585.53986 ± 588.067
2025-08-07 08:40:26,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1592.1675, 1609.4062, 405.1825, 588.48706, 44.326317, 278.42188, 39.64709, 156.39009, 1067.8018, 73.56847]
2025-08-07 08:40:26,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 225.0, 340.0, 52.0, 221.0, 70.0, 107.0, 1000.0, 45.0]
2025-08-07 08:40:26,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 59 minutes, 2 seconds)
2025-08-07 08:42:03,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:42:12,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 679.55481 ± 273.140
2025-08-07 08:42:12,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [161.4047, 598.3421, 729.6081, 530.61755, 923.90485, 1244.1671, 659.7606, 802.8241, 693.44293, 451.4759]
2025-08-07 08:42:12,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [130.0, 370.0, 1000.0, 340.0, 1000.0, 1000.0, 462.0, 473.0, 1000.0, 273.0]
2025-08-07 08:42:12,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 58 minutes, 28 seconds)
2025-08-07 08:43:47,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:43:52,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 568.84094 ± 305.380
2025-08-07 08:43:52,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [488.69376, 822.69977, 709.8003, 45.027126, 86.864555, 714.9271, 768.7271, 328.94504, 1011.64484, 711.0797]
2025-08-07 08:43:52,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [273.0, 584.0, 481.0, 62.0, 61.0, 425.0, 580.0, 252.0, 583.0, 452.0]
2025-08-07 08:43:52,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 55 minutes, 29 seconds)
2025-08-07 08:45:37,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:45:44,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 681.60382 ± 401.537
2025-08-07 08:45:44,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [411.1525, 920.96387, 688.36084, 209.75684, 222.34866, 721.3944, 998.63367, 376.85422, 661.4376, 1605.1356]
2025-08-07 08:45:44,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [219.0, 499.0, 1000.0, 150.0, 169.0, 1000.0, 645.0, 266.0, 380.0, 1000.0]
2025-08-07 08:45:44,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 55 minutes, 34 seconds)
2025-08-07 08:47:13,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:47:20,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 718.50500 ± 596.509
2025-08-07 08:47:20,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [401.7204, 1392.8381, 1558.8876, 199.21053, 1133.5658, 208.92047, 32.673782, 603.1125, 89.48226, 1564.6387]
2025-08-07 08:47:20,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [270.0, 1000.0, 1000.0, 143.0, 724.0, 155.0, 35.0, 379.0, 62.0, 1000.0]
2025-08-07 08:47:20,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (718.51) for latency MM1Queue_a033_s075
2025-08-07 08:47:20,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 52 minutes, 41 seconds)
2025-08-07 08:48:57,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:49:03,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 489.58813 ± 348.749
2025-08-07 08:49:03,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [326.94666, 261.60385, 96.806854, 799.09814, 972.739, 1104.1516, 658.2948, 232.60803, 360.96808, 82.66401]
2025-08-07 08:49:03,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [230.0, 158.0, 79.0, 461.0, 1000.0, 683.0, 1000.0, 169.0, 258.0, 69.0]
2025-08-07 08:49:03,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 50 minutes, 16 seconds)
2025-08-07 08:50:50,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:51:01,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 918.05743 ± 334.198
2025-08-07 08:51:01,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [908.3948, 1605.4509, 1270.6038, 665.0799, 306.67026, 967.0853, 760.3364, 756.4288, 862.22156, 1078.3031]
2025-08-07 08:51:01,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [565.0, 1000.0, 740.0, 1000.0, 183.0, 623.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:51:01,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (918.06) for latency MM1Queue_a033_s075
2025-08-07 08:51:01,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 51 minutes, 4 seconds)
2025-08-07 08:52:30,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:52:38,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 723.24377 ± 455.870
2025-08-07 08:52:38,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [377.77158, 220.86954, 276.47583, 688.9422, 805.3961, 1496.7034, 435.3246, 808.66925, 521.63873, 1600.6466]
2025-08-07 08:52:38,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [264.0, 150.0, 201.0, 1000.0, 1000.0, 1000.0, 315.0, 505.0, 367.0, 1000.0]
2025-08-07 08:52:38,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 48 minutes, 42 seconds)
2025-08-07 08:54:16,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:54:24,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 805.04749 ± 407.154
2025-08-07 08:54:24,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [973.48517, 59.86424, 255.86874, 541.8099, 1426.565, 988.9588, 1234.5166, 774.5004, 1098.2296, 696.6763]
2025-08-07 08:54:24,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [552.0, 43.0, 152.0, 302.0, 834.0, 1000.0, 1000.0, 448.0, 1000.0, 444.0]
2025-08-07 08:54:24,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 45 minutes, 49 seconds)
2025-08-07 08:56:04,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:56:10,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 618.98987 ± 486.028
2025-08-07 08:56:10,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [150.81383, 727.5183, 152.25879, 704.14594, 997.2556, 282.68405, 190.49104, 1779.4702, 846.68365, 358.5777]
2025-08-07 08:56:10,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [108.0, 1000.0, 126.0, 380.0, 553.0, 199.0, 122.0, 1000.0, 1000.0, 272.0]
2025-08-07 08:56:10,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 46 minutes, 8 seconds)
2025-08-07 08:57:47,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:57:52,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 461.38663 ± 350.385
2025-08-07 08:57:52,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [203.05962, 338.5828, 164.63147, 332.19363, 167.12054, 790.428, 1144.8656, 372.00464, 971.9515, 129.02846]
2025-08-07 08:57:52,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [157.0, 272.0, 118.0, 223.0, 97.0, 446.0, 1000.0, 236.0, 537.0, 109.0]
2025-08-07 08:57:52,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 44 minutes, 1 second)
2025-08-07 08:59:33,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:59:42,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1090.39148 ± 598.544
2025-08-07 08:59:42,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1701.2937, 306.50073, 821.124, 1783.5695, 1040.8359, 723.3426, 421.94788, 464.6306, 1838.4733, 1802.1974]
2025-08-07 08:59:42,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 183.0, 486.0, 1000.0, 603.0, 385.0, 256.0, 299.0, 989.0, 1000.0]
2025-08-07 08:59:42,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (1090.39) for latency MM1Queue_a033_s075
2025-08-07 08:59:42,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 40 minutes, 45 seconds)
2025-08-07 09:01:19,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:01:23,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 556.97040 ± 547.904
2025-08-07 09:01:23,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1064.7772, 78.96254, 524.82355, 514.0308, 73.762505, 766.34827, 122.48889, 112.70101, 403.6329, 1908.1761]
2025-08-07 09:01:23,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [570.0, 49.0, 264.0, 270.0, 48.0, 429.0, 77.0, 82.0, 196.0, 979.0]
2025-08-07 09:01:23,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 39 minutes, 43 seconds)
2025-08-07 09:02:57,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:03:05,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1121.86841 ± 538.193
2025-08-07 09:03:05,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [345.9711, 1219.7247, 883.7078, 1806.4216, 1394.3303, 1203.2333, 152.92891, 1404.0914, 1904.5833, 903.6915]
2025-08-07 09:03:05,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [247.0, 757.0, 451.0, 1000.0, 749.0, 635.0, 121.0, 664.0, 948.0, 486.0]
2025-08-07 09:03:05,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (1121.87) for latency MM1Queue_a033_s075
2025-08-07 09:03:05,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 37 minutes, 16 seconds)
2025-08-07 09:04:45,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:04:51,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 731.39642 ± 592.512
2025-08-07 09:04:51,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [770.8996, 808.8459, 429.537, 1697.3385, 1386.3953, 123.28505, 121.45035, 121.95871, 1582.9065, 271.34695]
2025-08-07 09:04:51,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [431.0, 457.0, 228.0, 1000.0, 1000.0, 70.0, 86.0, 66.0, 942.0, 151.0]
2025-08-07 09:04:51,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 35 minutes, 30 seconds)
2025-08-07 09:06:31,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:06:38,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 775.06946 ± 468.764
2025-08-07 09:06:38,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [514.3539, 744.0932, 963.04224, 1093.8782, 559.7766, 100.397224, 1774.068, 976.6694, 907.8888, 116.52778]
2025-08-07 09:06:38,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [281.0, 1000.0, 538.0, 605.0, 405.0, 98.0, 1000.0, 1000.0, 526.0, 84.0]
2025-08-07 09:06:38,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 34 minutes, 48 seconds)
2025-08-07 09:08:14,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:08:22,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 915.81366 ± 525.920
2025-08-07 09:08:22,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [924.4151, 719.25476, 2069.5583, 520.2516, 1749.1038, 718.653, 451.34595, 641.5904, 912.2959, 451.6683]
2025-08-07 09:08:22,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 349.0, 981.0, 411.0, 234.0, 315.0, 507.0, 257.0]
2025-08-07 09:08:22,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 31 minutes, 49 seconds)
2025-08-07 09:09:59,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:10:07,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 949.07483 ± 630.856
2025-08-07 09:10:07,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1546.0336, 89.0715, 1897.8229, 305.78433, 188.92996, 822.2585, 900.773, 908.74854, 897.67834, 1933.6477]
2025-08-07 09:10:07,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [856.0, 51.0, 1000.0, 159.0, 123.0, 422.0, 1000.0, 500.0, 1000.0, 1000.0]
2025-08-07 09:10:07,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 30 minutes, 52 seconds)
2025-08-07 09:11:52,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:11:58,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 832.43713 ± 687.936
2025-08-07 09:11:58,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [345.3342, 231.9414, 1825.7681, 1579.7644, 235.30638, 364.97464, 1454.7186, 73.43888, 435.44202, 1777.6824]
2025-08-07 09:11:58,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [189.0, 142.0, 1000.0, 799.0, 135.0, 169.0, 766.0, 55.0, 252.0, 1000.0]
2025-08-07 09:11:58,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 30 minutes, 38 seconds)
2025-08-07 09:13:30,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:13:37,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 712.12421 ± 641.498
2025-08-07 09:13:37,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1046.3229, 241.63718, 115.15588, 1939.1958, 827.66705, 1705.3519, 78.210686, 178.77017, 758.24365, 230.6872]
2025-08-07 09:13:37,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 120.0, 83.0, 1000.0, 436.0, 1000.0, 64.0, 99.0, 1000.0, 119.0]
2025-08-07 09:13:37,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 27 minutes, 37 seconds)
2025-08-07 09:15:18,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:15:22,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 500.99048 ± 237.568
2025-08-07 09:15:22,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [467.33035, 29.059387, 196.44623, 792.89484, 438.6577, 631.6662, 809.05615, 598.92017, 657.3941, 388.47968]
2025-08-07 09:15:22,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [286.0, 28.0, 115.0, 1000.0, 254.0, 396.0, 499.0, 334.0, 347.0, 248.0]
2025-08-07 09:15:22,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 36 seconds)
2025-08-07 09:16:58,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:17:04,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 787.48181 ± 456.302
2025-08-07 09:17:04,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [760.0206, 1358.877, 834.3393, 723.24994, 410.3845, 1204.303, 335.58624, 55.74477, 594.9586, 1597.3547]
2025-08-07 09:17:04,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [414.0, 1000.0, 1000.0, 419.0, 291.0, 624.0, 211.0, 49.0, 318.0, 796.0]
2025-08-07 09:17:05,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 23 minutes, 39 seconds)
2025-08-07 09:18:41,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:18:50,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1102.80884 ± 601.830
2025-08-07 09:18:50,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [58.91873, 1844.7975, 903.7463, 1322.2878, 245.23996, 696.36957, 1118.5015, 1460.4973, 1987.3738, 1390.3564]
2025-08-07 09:18:50,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 1000.0, 1000.0, 702.0, 115.0, 1000.0, 585.0, 1000.0, 1000.0, 781.0]
2025-08-07 09:18:50,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 21 minutes, 58 seconds)
2025-08-07 09:20:31,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:20:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 637.22589 ± 462.707
2025-08-07 09:20:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [365.33713, 1153.3329, 415.54358, 139.16574, 86.9253, 654.9625, 1059.0303, 1464.5594, 909.6901, 123.7119]
2025-08-07 09:20:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [194.0, 650.0, 213.0, 73.0, 58.0, 366.0, 1000.0, 1000.0, 481.0, 96.0]
2025-08-07 09:20:36,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 24 seconds)
2025-08-07 09:22:17,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:22:25,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1102.15479 ± 571.392
2025-08-07 09:22:25,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1369.1674, 1966.3173, 204.39178, 1388.3231, 752.10925, 1175.7512, 1845.8385, 742.44135, 1313.9163, 263.29193]
2025-08-07 09:22:25,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [734.0, 917.0, 99.0, 737.0, 397.0, 575.0, 1000.0, 408.0, 1000.0, 141.0]
2025-08-07 09:22:25,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 19 minutes, 10 seconds)
2025-08-07 09:23:56,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:24:06,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1410.52515 ± 463.764
2025-08-07 09:24:06,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1587.4839, 878.6331, 1911.372, 1770.8359, 1058.1128, 437.12927, 1622.1895, 1765.3168, 1235.8729, 1838.3052]
2025-08-07 09:24:06,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [919.0, 475.0, 1000.0, 913.0, 547.0, 273.0, 812.0, 1000.0, 589.0, 1000.0]
2025-08-07 09:24:06,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (1410.53) for latency MM1Queue_a033_s075
2025-08-07 09:24:06,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 48 seconds)
2025-08-07 09:25:44,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:25:52,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1002.68298 ± 663.368
2025-08-07 09:25:52,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [742.1649, 1851.3569, 713.6706, 148.00761, 133.77063, 979.90405, 1904.5978, 650.2451, 858.9817, 2044.1311]
2025-08-07 09:25:52,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [398.0, 1000.0, 1000.0, 78.0, 68.0, 1000.0, 1000.0, 357.0, 436.0, 1000.0]
2025-08-07 09:25:52,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 15 minutes, 38 seconds)
2025-08-07 09:27:36,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:27:45,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 869.41321 ± 469.292
2025-08-07 09:27:45,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [898.8624, 1326.119, 721.0158, 855.7484, 782.13104, 701.68915, 438.29773, 246.1094, 700.5912, 2023.5675]
2025-08-07 09:27:45,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [530.0, 708.0, 1000.0, 445.0, 1000.0, 331.0, 241.0, 116.0, 1000.0, 1000.0]
2025-08-07 09:27:45,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 14 minutes, 51 seconds)
2025-08-07 09:29:25,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:29:34,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1307.18896 ± 485.817
2025-08-07 09:29:34,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [868.5008, 1492.8864, 1204.6576, 1381.5247, 1180.8123, 544.54205, 718.936, 2131.1858, 1604.1556, 1944.6895]
2025-08-07 09:29:34,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 728.0, 626.0, 671.0, 584.0, 301.0, 344.0, 1000.0, 900.0, 1000.0]
2025-08-07 09:29:34,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 13 minutes, 33 seconds)
2025-08-07 09:31:11,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:31:21,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1241.42419 ± 553.173
2025-08-07 09:31:21,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [878.1314, 1914.4152, 1322.5963, 1276.0446, 712.65375, 1558.697, 889.53107, 1875.8641, 1840.9285, 145.3794]
2025-08-07 09:31:21,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [446.0, 1000.0, 729.0, 1000.0, 408.0, 1000.0, 1000.0, 938.0, 1000.0, 94.0]
2025-08-07 09:31:21,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 11 minutes, 32 seconds)
2025-08-07 09:32:57,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:33:03,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 978.73212 ± 631.627
2025-08-07 09:33:03,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [640.5705, 518.9454, 1912.865, 2060.1904, 323.287, 666.31305, 1526.64, 800.5403, 134.87247, 1203.0975]
2025-08-07 09:33:03,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [301.0, 239.0, 991.0, 998.0, 155.0, 329.0, 755.0, 437.0, 79.0, 622.0]
2025-08-07 09:33:03,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 9 minutes, 47 seconds)
2025-08-07 09:34:47,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:34:55,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1158.16064 ± 567.547
2025-08-07 09:34:55,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1252.304, 972.5836, 1472.614, 913.7029, 1372.5712, 2066.242, 336.2592, 1988.3656, 893.71765, 313.24615]
2025-08-07 09:34:55,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [675.0, 493.0, 725.0, 512.0, 1000.0, 1000.0, 152.0, 983.0, 533.0, 193.0]
2025-08-07 09:34:55,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 8 minutes, 48 seconds)
2025-08-07 09:36:27,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:34,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1080.44067 ± 803.848
2025-08-07 09:36:34,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1647.5878, 765.6522, 264.8892, 2047.2266, 353.7424, 27.280485, 1540.5884, 1935.5293, 146.81004, 2075.101]
2025-08-07 09:36:34,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 338.0, 176.0, 1000.0, 217.0, 28.0, 768.0, 1000.0, 73.0, 1000.0]
2025-08-07 09:36:34,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 5 minutes, 14 seconds)
2025-08-07 09:38:14,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:23,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1240.84094 ± 622.661
2025-08-07 09:38:23,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1611.2441, 1829.6226, 140.62056, 818.2608, 1924.2887, 1576.6438, 1052.0797, 1351.7505, 244.1755, 1859.7228]
2025-08-07 09:38:23,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [782.0, 1000.0, 70.0, 400.0, 1000.0, 1000.0, 568.0, 1000.0, 157.0, 1000.0]
2025-08-07 09:38:23,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes, 26 seconds)
2025-08-07 09:39:59,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:40:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 959.66473 ± 579.819
2025-08-07 09:40:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1247.9512, 92.4798, 1954.2905, 1637.019, 835.28217, 554.3767, 1490.9171, 230.58948, 872.68066, 681.06195]
2025-08-07 09:40:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [544.0, 56.0, 1000.0, 1000.0, 423.0, 270.0, 1000.0, 131.0, 421.0, 331.0]
2025-08-07 09:40:06,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 1 minute, 14 seconds)
2025-08-07 09:41:51,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:42:00,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1010.85950 ± 783.782
2025-08-07 09:42:00,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [278.91925, 1807.7189, 2018.307, 516.9625, 341.63727, 905.3388, 223.98656, 1924.518, 1993.8073, 97.39904]
2025-08-07 09:42:00,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [146.0, 951.0, 1000.0, 294.0, 178.0, 540.0, 136.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:42:00,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 48 seconds)
2025-08-07 09:43:35,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:43:44,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1256.20483 ± 651.279
2025-08-07 09:43:44,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1944.9264, 1991.7942, 980.75714, 428.66653, 266.2726, 419.5046, 1646.982, 1616.1362, 1302.9592, 1964.0496]
2025-08-07 09:43:44,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 568.0, 241.0, 186.0, 276.0, 1000.0, 1000.0, 677.0, 932.0]
2025-08-07 09:43:44,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 58 minutes, 9 seconds)
2025-08-07 09:45:21,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:45:31,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1479.57104 ± 639.135
2025-08-07 09:45:31,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1885.5985, 1158.0358, 1992.6434, 1974.2494, 1798.7097, 80.15927, 2031.0077, 597.64496, 1398.0602, 1879.6003]
2025-08-07 09:45:31,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 567.0, 1000.0, 1000.0, 1000.0, 62.0, 1000.0, 347.0, 682.0, 1000.0]
2025-08-07 09:45:31,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (1479.57) for latency MM1Queue_a033_s075
2025-08-07 09:45:31,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 57 minutes, 19 seconds)
2025-08-07 09:47:09,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:47:16,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1104.67908 ± 609.801
2025-08-07 09:47:16,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1071.4651, 417.42203, 600.55383, 1163.0143, 91.3664, 1949.8684, 1739.7817, 1030.9666, 972.0875, 2010.2645]
2025-08-07 09:47:16,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [550.0, 209.0, 298.0, 637.0, 116.0, 1000.0, 835.0, 503.0, 547.0, 1000.0]
2025-08-07 09:47:16,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 55 minutes, 5 seconds)
2025-08-07 09:48:51,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:48:57,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 875.94073 ± 631.122
2025-08-07 09:48:57,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [524.0028, 976.1366, 1799.0103, 2153.5044, 1146.6422, 242.49915, 605.0438, 52.335217, 535.43774, 724.7951]
2025-08-07 09:48:57,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [276.0, 506.0, 938.0, 1000.0, 564.0, 115.0, 357.0, 36.0, 275.0, 383.0]
2025-08-07 09:48:57,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 53 minutes, 5 seconds)
2025-08-07 09:50:37,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:50:44,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 897.36511 ± 692.189
2025-08-07 09:50:44,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [807.2537, 42.26601, 196.94408, 582.6736, 615.9364, 1761.7999, 1111.9044, 1817.2831, 93.02281, 1944.567]
2025-08-07 09:50:44,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [380.0, 36.0, 135.0, 300.0, 319.0, 1000.0, 610.0, 1000.0, 67.0, 1000.0]
2025-08-07 09:50:44,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes, 40 seconds)
2025-08-07 09:52:27,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:52:35,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1023.87878 ± 786.628
2025-08-07 09:52:35,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [279.21573, 1793.8147, 2057.7542, 301.0004, 79.7025, 1891.4473, 924.52966, 106.951965, 824.23553, 1980.136]
2025-08-07 09:52:35,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [160.0, 1000.0, 1000.0, 208.0, 46.0, 1000.0, 1000.0, 73.0, 454.0, 1000.0]
2025-08-07 09:52:35,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 49 minutes, 30 seconds)
2025-08-07 09:54:12,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:54:20,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1095.71802 ± 780.378
2025-08-07 09:54:20,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [145.99777, 1980.2507, 428.19302, 1132.8492, 1649.5927, 1978.6794, 107.92904, 699.8654, 541.2948, 2292.5269]
2025-08-07 09:54:20,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [94.0, 1000.0, 218.0, 620.0, 767.0, 1000.0, 65.0, 372.0, 375.0, 1000.0]
2025-08-07 09:54:20,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 32 seconds)
2025-08-07 09:55:52,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:56:04,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1680.23474 ± 307.703
2025-08-07 09:56:04,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1766.0176, 1974.0024, 1761.4529, 1832.2551, 1459.2748, 1778.9893, 1590.959, 906.4405, 1655.7986, 2077.1567]
2025-08-07 09:56:04,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 755.0, 1000.0, 1000.0, 496.0, 1000.0, 1000.0]
2025-08-07 09:56:04,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1226 [INFO]: New best (1680.23) for latency MM1Queue_a033_s075
2025-08-07 09:56:04,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 46 seconds)
2025-08-07 09:57:45,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:57:56,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1318.50012 ± 579.578
2025-08-07 09:57:56,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1976.8435, 836.6624, 1211.5363, 2033.1869, 897.1036, 1933.4269, 671.5156, 2087.6235, 751.69434, 785.4078]
2025-08-07 09:57:56,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 653.0, 1000.0, 424.0, 1000.0, 324.0, 1000.0, 1000.0, 454.0]
2025-08-07 09:57:56,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 44 minutes, 53 seconds)
2025-08-07 09:59:39,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:59:49,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1237.39697 ± 618.491
2025-08-07 09:59:49,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1943.784, 1930.3177, 706.24866, 232.91946, 1354.7786, 2034.543, 583.298, 1116.7653, 754.2666, 1717.0492]
2025-08-07 09:59:49,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 96.0, 702.0, 1000.0, 337.0, 548.0, 1000.0, 1000.0]
2025-08-07 09:59:49,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 43 minutes, 38 seconds)
2025-08-07 10:01:22,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:01:32,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1257.69336 ± 687.215
2025-08-07 10:01:32,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1843.2355, 193.44077, 1107.1854, 922.70135, 2017.8147, 1669.8679, 1963.6769, 585.7007, 1968.9191, 304.3919]
2025-08-07 10:01:32,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 107.0, 1000.0, 521.0, 1000.0, 1000.0, 1000.0, 334.0, 1000.0, 196.0]
2025-08-07 10:01:32,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 11 seconds)
2025-08-07 10:03:10,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:03:15,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 694.28357 ± 605.335
2025-08-07 10:03:15,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [380.46982, 74.424126, 1882.429, 537.53375, 338.97607, 1430.6989, 216.21545, 135.19762, 537.66925, 1409.2222]
2025-08-07 10:03:15,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [192.0, 44.0, 1000.0, 309.0, 179.0, 749.0, 122.0, 98.0, 299.0, 750.0]
2025-08-07 10:03:15,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 16 seconds)
2025-08-07 10:04:50,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:04:57,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 828.85205 ± 459.319
2025-08-07 10:04:57,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [615.0628, 437.29544, 192.65405, 448.9827, 612.5071, 867.1317, 1098.8219, 1147.6973, 983.63086, 1884.7372]
2025-08-07 10:04:57,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [305.0, 208.0, 104.0, 211.0, 317.0, 1000.0, 533.0, 596.0, 518.0, 1000.0]
2025-08-07 10:04:57,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 14 seconds)
2025-08-07 10:06:43,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:06:53,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1284.66846 ± 600.656
2025-08-07 10:06:53,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1923.9607, 1082.2998, 1955.2, 302.66998, 1312.7507, 348.01187, 1004.94507, 1151.6328, 2048.7112, 1716.5033]
2025-08-07 10:06:53,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [933.0, 1000.0, 1000.0, 182.0, 1000.0, 200.0, 505.0, 632.0, 1000.0, 867.0]
2025-08-07 10:06:53,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 49 seconds)
2025-08-07 10:08:26,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:37,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1429.20886 ± 491.914
2025-08-07 10:08:37,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1733.6646, 2032.5812, 950.8082, 1931.706, 811.51074, 940.15796, 1316.7706, 1908.3903, 1881.1, 785.3992]
2025-08-07 10:08:37,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 478.0, 1000.0, 1000.0, 512.0, 726.0, 1000.0, 1000.0, 460.0]
2025-08-07 10:08:37,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 23 seconds)
2025-08-07 10:10:17,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:26,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1168.93933 ± 684.220
2025-08-07 10:10:26,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1858.0494, 1501.7151, 1227.7798, 869.68274, 1908.8499, 116.343765, 130.15373, 1960.1428, 502.47443, 1614.2024]
2025-08-07 10:10:26,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 445.0, 1000.0, 67.0, 73.0, 1000.0, 292.0, 1000.0]
2025-08-07 10:10:27,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 4 seconds)
2025-08-07 10:12:01,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:10,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 994.19250 ± 634.127
2025-08-07 10:12:10,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1247.2434, 369.4915, 579.03284, 288.89532, 1970.4873, 1143.3169, 735.4049, 2152.6277, 297.17374, 1158.251]
2025-08-07 10:12:10,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 184.0, 290.0, 145.0, 1000.0, 629.0, 1000.0, 1000.0, 143.0, 1000.0]
2025-08-07 10:12:10,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 16 seconds)
2025-08-07 10:13:46,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:13:53,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 846.32654 ± 699.460
2025-08-07 10:13:53,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [2226.2563, 1026.355, 236.62521, 134.55948, 202.75534, 1098.9554, 903.1269, 224.3312, 515.2752, 1895.0253]
2025-08-07 10:13:53,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 540.0, 123.0, 74.0, 134.0, 528.0, 1000.0, 113.0, 272.0, 902.0]
2025-08-07 10:13:53,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 35 seconds)
2025-08-07 10:15:38,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:48,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1399.13330 ± 714.282
2025-08-07 10:15:48,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1388.6698, 1930.7789, 1972.3955, 1975.5514, 2000.9674, 2080.1445, 1020.6027, 1342.27, 226.28322, 53.670197]
2025-08-07 10:15:48,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [763.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 731.0, 139.0, 41.0]
2025-08-07 10:15:48,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 45 seconds)
2025-08-07 10:17:22,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:31,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1289.98279 ± 736.267
2025-08-07 10:17:31,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [361.57114, 2065.4143, 197.18135, 1319.9009, 2055.6497, 519.1641, 2012.5013, 2034.8483, 1654.0989, 679.4988]
2025-08-07 10:17:31,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [191.0, 960.0, 102.0, 1000.0, 1000.0, 310.0, 1000.0, 1000.0, 770.0, 307.0]
2025-08-07 10:17:31,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 55 seconds)
2025-08-07 10:19:06,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:14,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 972.21143 ± 663.927
2025-08-07 10:19:14,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [777.97925, 1617.3022, 456.4269, 795.62115, 1143.633, 497.42505, 267.44916, 2010.3378, 127.88318, 2028.056]
2025-08-07 10:19:14,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [395.0, 825.0, 241.0, 419.0, 1000.0, 276.0, 126.0, 1000.0, 107.0, 1000.0]
2025-08-07 10:19:14,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 50 seconds)
2025-08-07 10:20:56,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:06,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1283.68787 ± 727.327
2025-08-07 10:21:06,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1583.9684, 1850.0834, 912.9374, 447.71307, 1619.0667, 333.3057, 1988.5159, 73.676155, 2015.2704, 2012.3413]
2025-08-07 10:21:06,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 576.0, 237.0, 885.0, 152.0, 1000.0, 49.0, 1000.0, 1000.0]
2025-08-07 10:21:06,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 26 seconds)
2025-08-07 10:22:49,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:00,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1459.36938 ± 621.551
2025-08-07 10:23:00,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1242.3507, 47.94247, 909.8605, 1585.1035, 1932.737, 1160.3682, 1449.5726, 2133.983, 2096.3867, 2035.3888]
2025-08-07 10:23:00,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [611.0, 34.0, 1000.0, 805.0, 1000.0, 1000.0, 773.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:23:00,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 4 seconds)
2025-08-07 10:24:29,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:41,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1595.32581 ± 502.059
2025-08-07 10:24:41,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1966.6301, 559.8676, 1725.3905, 1005.74207, 1847.6971, 1891.1469, 1948.9357, 2002.4177, 1996.145, 1009.28516]
2025-08-07 10:24:41,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 287.0, 1000.0, 497.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 505.0]
2025-08-07 10:24:41,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 44 seconds)
2025-08-07 10:26:25,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:35,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1294.71753 ± 629.004
2025-08-07 10:26:35,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1941.3219, 215.35516, 1814.7462, 1861.4, 1590.295, 834.8597, 1933.8813, 1418.8251, 1038.4308, 298.06076]
2025-08-07 10:26:35,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 128.0, 1000.0, 1000.0, 872.0, 1000.0, 1000.0, 723.0, 587.0, 181.0]
2025-08-07 10:26:35,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 19 seconds)
2025-08-07 10:28:10,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:19,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1380.99463 ± 709.528
2025-08-07 10:28:19,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [734.7179, 123.99523, 208.0684, 1929.7219, 1648.8546, 1513.0013, 1991.8048, 2040.9906, 2027.1123, 1591.6793]
2025-08-07 10:28:19,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [360.0, 71.0, 141.0, 1000.0, 821.0, 823.0, 1000.0, 1000.0, 1000.0, 825.0]
2025-08-07 10:28:19,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 32 seconds)
2025-08-07 10:29:59,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:07,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1243.45264 ± 790.880
2025-08-07 10:30:07,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [2125.7766, 1668.3273, 1231.0232, 1887.2699, 179.7856, 50.160347, 1224.9401, 2004.1917, 1941.752, 121.30011]
2025-08-07 10:30:07,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 582.0, 1000.0, 118.0, 43.0, 583.0, 1000.0, 1000.0, 91.0]
2025-08-07 10:30:07,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 38 seconds)
2025-08-07 10:31:42,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:51,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1109.87183 ± 699.995
2025-08-07 10:31:51,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1991.362, 436.56747, 1612.2218, 1752.6934, 88.46304, 1931.5806, 1479.0388, 112.34511, 757.2178, 937.228]
2025-08-07 10:31:51,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 230.0, 850.0, 1000.0, 61.0, 1000.0, 719.0, 63.0, 1000.0, 450.0]
2025-08-07 10:31:51,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 36 seconds)
2025-08-07 10:33:31,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:41,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1368.82446 ± 564.177
2025-08-07 10:33:41,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1731.5474, 792.11786, 738.40533, 2051.4258, 2032.6415, 1972.2314, 511.79977, 1455.7996, 845.91, 1556.3665]
2025-08-07 10:33:41,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [905.0, 442.0, 1000.0, 1000.0, 1000.0, 1000.0, 263.0, 708.0, 435.0, 1000.0]
2025-08-07 10:33:41,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes)
2025-08-07 10:35:22,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:30,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1082.06958 ± 726.827
2025-08-07 10:35:30,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1715.9164, 371.5466, 2004.5244, 1531.198, 1877.3208, 87.06912, 1731.5653, 903.882, 400.2791, 197.39334]
2025-08-07 10:35:30,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [859.0, 198.0, 1000.0, 732.0, 1000.0, 62.0, 882.0, 1000.0, 194.0, 96.0]
2025-08-07 10:35:30,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 7 seconds)
2025-08-07 10:37:09,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:21,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1599.21887 ± 462.660
2025-08-07 10:37:21,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1813.7157, 589.9414, 1022.6058, 1400.3032, 1930.3732, 1512.0288, 2108.0566, 1962.1204, 1611.3093, 2041.7338]
2025-08-07 10:37:21,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [889.0, 273.0, 552.0, 639.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 972.0]
2025-08-07 10:37:21,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 24 seconds)
2025-08-07 10:38:53,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:02,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1160.15320 ± 631.815
2025-08-07 10:39:02,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [1954.5547, 1229.0144, 790.95636, 1724.065, 996.3806, 205.44768, 694.6604, 2038.2728, 1660.0048, 308.17618]
2025-08-07 10:39:02,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 588.0, 368.0, 1000.0, 513.0, 115.0, 1000.0, 1000.0, 825.0, 189.0]
2025-08-07 10:39:02,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 33 seconds)
2025-08-07 10:40:40,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:50,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 1358.12720 ± 591.845
2025-08-07 10:40:50,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [324.68164, 2160.4714, 777.80096, 1563.5306, 1851.2615, 2003.2849, 1907.7933, 874.779, 921.44434, 1196.2231]
2025-08-07 10:40:50,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [165.0, 1000.0, 343.0, 1000.0, 1000.0, 965.0, 1000.0, 438.0, 540.0, 621.0]
2025-08-07 10:40:50,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 47 seconds)
2025-08-07 10:42:35,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:41,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1221 [DEBUG]: Total Reward: 889.45233 ± 720.417
2025-08-07 10:42:41,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1222 [DEBUG]: All rewards: [2183.269, 929.6181, 424.84814, 237.16583, 387.1767, 229.12502, 920.3021, 1530.2611, 94.79407, 1957.9633]
2025-08-07 10:42:41,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 476.0, 186.0, 152.0, 184.0, 119.0, 456.0, 764.0, 50.0, 1000.0]
2025-08-07 10:42:41,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-ant):1251 [DEBUG]: Training session finished
