2025-08-07 07:43:01,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc5-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 07:43:01,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc5-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 07:43:01,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x149f4a263710>}
2025-08-07 07:43:01,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1111 [DEBUG]: using device: cuda
2025-08-07 07:43:01,393 baseline-bpql-noiseperc5-ant:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 07:43:01,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1133 [INFO]: Creating new trainer
2025-08-07 07:43:01,409 baseline-bpql-noiseperc5-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=155, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 07:43:01,409 baseline-bpql-noiseperc5-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:43:02,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1194 [DEBUG]: Starting training session...
2025-08-07 07:43:02,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 1/100
2025-08-07 07:44:55,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:44:55,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -18.59352 ± 40.573
2025-08-07 07:44:55,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [-35.926273, -4.973548, -133.71252, -8.534764, 1.1999568, 8.127324, 4.188861, 8.937468, -21.780249, -3.461504]
2025-08-07 07:44:55,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 30.0, 100.0, 41.0, 24.0, 22.0, 42.0, 28.0, 63.0, 27.0]
2025-08-07 07:44:55,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (-18.59) for latency MM1Queue_a033_s075
2025-08-07 07:44:55,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 6 minutes, 12 seconds)
2025-08-07 07:46:34,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:46:36,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -65.74295 ± 185.760
2025-08-07 07:46:36,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [6.8941946, -10.473203, -16.168774, 15.22118, 1.3986089, -43.95051, 6.2437673, -3.373113, 7.7425427, -620.9642]
2025-08-07 07:46:36,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 59.0, 57.0, 75.0, 84.0, 107.0, 45.0, 52.0, 74.0, 1000.0]
2025-08-07 07:46:36,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 54 minutes, 42 seconds)
2025-08-07 07:48:23,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:48:27,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: -117.47567 ± 203.545
2025-08-07 07:48:27,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [-71.00627, 12.693457, -462.44003, -132.21277, 2.372996, 11.630647, -11.003053, 22.43441, -561.08673, 13.860671]
2025-08-07 07:48:27,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [181.0, 40.0, 1000.0, 221.0, 66.0, 100.0, 73.0, 46.0, 1000.0, 55.0]
2025-08-07 07:48:27,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 55 minutes)
2025-08-07 07:50:21,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:50:22,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 9.33052 ± 13.743
2025-08-07 07:50:22,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [20.996088, 19.7686, 24.169386, -6.7352076, 6.943997, 9.334518, -8.401619, 14.443258, -13.628422, 26.414593]
2025-08-07 07:50:22,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 99.0, 59.0, 58.0, 86.0, 44.0, 72.0, 28.0, 103.0, 110.0]
2025-08-07 07:50:22,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (9.33) for latency MM1Queue_a033_s075
2025-08-07 07:50:22,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 55 minutes, 47 seconds)
2025-08-07 07:52:06,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:52:14,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 92.50545 ± 65.489
2025-08-07 07:52:14,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [80.10503, 262.8786, 18.104406, 106.90039, 91.48775, 103.707886, 11.46598, 64.342064, 77.57949, 108.48289]
2025-08-07 07:52:14,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [108.0, 1000.0, 57.0, 516.0, 196.0, 1000.0, 415.0, 1000.0, 169.0, 981.0]
2025-08-07 07:52:14,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (92.51) for latency MM1Queue_a033_s075
2025-08-07 07:52:14,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 54 minutes, 35 seconds)
2025-08-07 07:54:08,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:54:22,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 579.36182 ± 41.613
2025-08-07 07:54:22,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [555.6959, 635.1566, 574.1736, 588.3905, 551.3605, 502.74985, 533.9574, 613.72614, 603.96454, 634.4432]
2025-08-07 07:54:22,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 998.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:54:22,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (579.36) for latency MM1Queue_a033_s075
2025-08-07 07:54:22,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 57 minutes, 37 seconds)
2025-08-07 07:56:14,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:56:27,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 673.88898 ± 171.405
2025-08-07 07:56:27,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [688.2178, 721.4771, 683.1318, 446.52945, 900.07764, 272.7254, 756.0559, 761.35333, 756.5362, 752.78534]
2025-08-07 07:56:27,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 933.0, 482.0, 1000.0, 327.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:56:27,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (673.89) for latency MM1Queue_a033_s075
2025-08-07 07:56:27,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 2 minutes, 58 seconds)
2025-08-07 07:58:09,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:58:17,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 418.56738 ± 228.434
2025-08-07 07:58:17,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [201.68774, 498.4056, 674.8218, 710.2271, 425.94336, 95.8807, 657.07715, 526.9637, 44.0799, 350.5872]
2025-08-07 07:58:17,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [228.0, 615.0, 844.0, 1000.0, 567.0, 228.0, 1000.0, 553.0, 33.0, 407.0]
2025-08-07 07:58:17,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 44 seconds)
2025-08-07 07:59:58,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:00:03,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 231.29391 ± 190.707
2025-08-07 08:00:03,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [135.7596, 298.7667, 47.478245, 258.71375, 112.78896, 432.39743, 43.50757, 668.9317, 49.49223, 265.10303]
2025-08-07 08:00:03,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [330.0, 367.0, 56.0, 295.0, 219.0, 451.0, 80.0, 1000.0, 82.0, 377.0]
2025-08-07 08:00:03,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 56 minutes, 11 seconds)
2025-08-07 08:01:54,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:02:03,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 576.38800 ± 323.583
2025-08-07 08:02:03,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [844.7108, 184.24133, 739.3652, 906.0988, 198.413, 865.4779, 790.4613, 284.38458, 82.88914, 867.83777]
2025-08-07 08:02:03,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 189.0, 1000.0, 1000.0, 179.0, 1000.0, 1000.0, 305.0, 104.0, 1000.0]
2025-08-07 08:02:03,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 50 seconds)
2025-08-07 08:03:49,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:04:00,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 772.29480 ± 237.310
2025-08-07 08:04:00,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [788.20807, 769.9925, 253.87198, 1079.6049, 842.98096, 856.206, 779.9395, 486.35385, 1097.5999, 768.1906]
2025-08-07 08:04:00,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [670.0, 1000.0, 236.0, 1000.0, 743.0, 1000.0, 1000.0, 394.0, 1000.0, 1000.0]
2025-08-07 08:04:00,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (772.29) for latency MM1Queue_a033_s075
2025-08-07 08:04:00,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 51 minutes, 26 seconds)
2025-08-07 08:05:55,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:06:04,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 591.52075 ± 366.338
2025-08-07 08:06:04,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [88.24527, 630.0361, 149.21202, 848.2775, 903.6208, 1119.3335, 847.2303, 401.18292, 861.2669, 66.80152]
2025-08-07 08:06:04,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [93.0, 458.0, 129.0, 703.0, 1000.0, 1000.0, 1000.0, 374.0, 1000.0, 52.0]
2025-08-07 08:06:04,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 49 minutes, 14 seconds)
2025-08-07 08:07:42,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:07:49,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 572.43201 ± 489.795
2025-08-07 08:07:49,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1111.3428, 228.78632, 244.39519, 496.37897, 109.50055, 1328.0736, 60.40508, 1228.1528, 890.9294, 26.355936]
2025-08-07 08:07:49,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 168.0, 173.0, 540.0, 71.0, 1000.0, 50.0, 1000.0, 1000.0, 33.0]
2025-08-07 08:07:49,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 46 minutes, 2 seconds)
2025-08-07 08:09:38,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:09:42,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 324.80026 ± 246.956
2025-08-07 08:09:42,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [394.64066, 208.81876, 52.08889, 657.0324, 777.5125, 124.41233, 204.67874, 164.37573, 578.61127, 85.83125]
2025-08-07 08:09:42,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [268.0, 222.0, 34.0, 490.0, 1000.0, 96.0, 109.0, 122.0, 400.0, 70.0]
2025-08-07 08:09:42,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 45 minutes, 59 seconds)
2025-08-07 08:11:31,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:11:34,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 371.15222 ± 263.993
2025-08-07 08:11:34,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [347.06473, 39.65909, 42.747856, 453.9109, 655.64777, 102.79162, 342.8736, 350.1156, 929.52374, 447.18756]
2025-08-07 08:11:34,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [244.0, 34.0, 39.0, 315.0, 458.0, 83.0, 242.0, 263.0, 635.0, 313.0]
2025-08-07 08:11:34,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 41 minutes, 49 seconds)
2025-08-07 08:13:23,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:13:33,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 673.77454 ± 441.418
2025-08-07 08:13:33,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1398.4349, 539.7733, 829.0776, 1491.772, 361.9415, 242.4248, 479.051, 870.41406, 179.13487, 345.7215]
2025-08-07 08:13:33,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 217.0, 221.0, 346.0, 692.0, 135.0, 214.0]
2025-08-07 08:13:33,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 40 minutes, 15 seconds)
2025-08-07 08:15:16,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:15:25,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 811.84412 ± 544.917
2025-08-07 08:15:25,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1533.8394, 138.10048, 1517.3091, 463.00525, 308.50366, 1373.9971, 365.05893, 715.54944, 1396.2245, 306.85294]
2025-08-07 08:15:25,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 121.0, 1000.0, 348.0, 205.0, 860.0, 288.0, 489.0, 1000.0, 171.0]
2025-08-07 08:15:25,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (811.84) for latency MM1Queue_a033_s075
2025-08-07 08:15:25,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 35 minutes, 14 seconds)
2025-08-07 08:17:16,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:17:22,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 658.81042 ± 512.854
2025-08-07 08:17:22,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1003.91534, 653.37634, 106.0674, 1377.1532, 139.03946, 961.5825, 24.32052, 38.253654, 1332.8522, 951.5435]
2025-08-07 08:17:22,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [642.0, 424.0, 99.0, 1000.0, 127.0, 685.0, 23.0, 33.0, 905.0, 640.0]
2025-08-07 08:17:22,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 36 minutes, 35 seconds)
2025-08-07 08:19:06,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:19:09,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 373.71552 ± 329.888
2025-08-07 08:19:09,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1263.7712, 344.7574, 452.00363, 380.56143, 283.26688, 295.2802, 479.231, 67.04789, 43.570343, 127.66538]
2025-08-07 08:19:09,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [715.0, 211.0, 319.0, 220.0, 149.0, 226.0, 335.0, 49.0, 34.0, 77.0]
2025-08-07 08:19:09,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 33 minutes, 12 seconds)
2025-08-07 08:21:03,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:21:11,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 819.41846 ± 609.340
2025-08-07 08:21:11,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1572.5897, 218.94571, 1573.5969, 1576.9634, 181.4104, 1182.7814, 76.278305, 818.0333, 82.70277, 910.8827]
2025-08-07 08:21:11,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [963.0, 163.0, 1000.0, 1000.0, 118.0, 786.0, 53.0, 509.0, 63.0, 1000.0]
2025-08-07 08:21:11,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (819.42) for latency MM1Queue_a033_s075
2025-08-07 08:21:11,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 33 minutes, 44 seconds)
2025-08-07 08:22:55,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:23:03,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 839.09534 ± 646.742
2025-08-07 08:23:03,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [954.71277, 36.631966, 1716.8572, 1667.5066, 650.54877, 881.3936, 542.258, 1750.4629, 39.755497, 150.8248]
2025-08-07 08:23:03,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [616.0, 35.0, 1000.0, 1000.0, 361.0, 1000.0, 340.0, 1000.0, 33.0, 104.0]
2025-08-07 08:23:03,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (839.10) for latency MM1Queue_a033_s075
2025-08-07 08:23:03,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 30 minutes, 6 seconds)
2025-08-07 08:24:53,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:24:59,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 641.77155 ± 366.965
2025-08-07 08:24:59,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [481.50858, 498.8948, 246.77887, 153.61513, 960.1764, 941.1064, 1158.1881, 112.9469, 924.0182, 940.48224]
2025-08-07 08:24:59,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [294.0, 357.0, 166.0, 95.0, 546.0, 615.0, 690.0, 68.0, 1000.0, 580.0]
2025-08-07 08:24:59,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 29 minutes, 14 seconds)
2025-08-07 08:26:45,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:26:51,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 715.72900 ± 347.462
2025-08-07 08:26:51,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1338.3097, 1126.7349, 770.79706, 509.7819, 675.047, 592.41425, 500.65765, 60.444656, 566.48596, 1016.61725]
2025-08-07 08:26:51,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 576.0, 395.0, 316.0, 369.0, 382.0, 281.0, 40.0, 305.0, 617.0]
2025-08-07 08:26:51,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 26 minutes, 5 seconds)
2025-08-07 08:28:43,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:28:45,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 281.74310 ± 199.413
2025-08-07 08:28:45,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [547.53424, 107.30042, 27.577288, 394.9136, 192.5124, 199.2884, 660.1999, 52.96818, 261.54575, 373.5911]
2025-08-07 08:28:45,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [379.0, 84.0, 29.0, 237.0, 196.0, 126.0, 361.0, 36.0, 163.0, 236.0]
2025-08-07 08:28:45,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 25 minutes, 51 seconds)
2025-08-07 08:30:30,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:30:38,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 744.24243 ± 435.331
2025-08-07 08:30:38,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [873.267, 781.54004, 1580.1385, 343.4748, 258.82678, 300.9865, 868.8563, 357.40305, 687.7174, 1390.2139]
2025-08-07 08:30:38,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [469.0, 489.0, 1000.0, 168.0, 175.0, 177.0, 1000.0, 191.0, 378.0, 1000.0]
2025-08-07 08:30:38,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 21 minutes, 47 seconds)
2025-08-07 08:32:30,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:32:38,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 983.11884 ± 571.011
2025-08-07 08:32:38,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [486.45102, 1172.7253, 1804.5916, 1807.773, 407.35825, 880.1889, 720.075, 808.1419, 111.54288, 1632.3407]
2025-08-07 08:32:38,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [274.0, 592.0, 1000.0, 1000.0, 214.0, 489.0, 351.0, 1000.0, 76.0, 845.0]
2025-08-07 08:32:38,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (983.12) for latency MM1Queue_a033_s075
2025-08-07 08:32:38,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 21 minutes, 59 seconds)
2025-08-07 08:34:23,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:34:31,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 812.09558 ± 598.359
2025-08-07 08:34:31,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [307.4139, 1245.5615, 65.40783, 309.14246, 829.0825, 1564.0098, 1983.9961, 562.41437, 268.2626, 985.66455]
2025-08-07 08:34:31,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [162.0, 749.0, 55.0, 180.0, 1000.0, 1000.0, 1000.0, 298.0, 137.0, 1000.0]
2025-08-07 08:34:31,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 19 minutes, 11 seconds)
2025-08-07 08:36:16,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:36:22,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 768.71997 ± 466.608
2025-08-07 08:36:22,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [739.5015, 872.2347, 1598.9298, 650.2451, 256.39633, 1332.894, 115.26816, 437.272, 433.6095, 1250.8488]
2025-08-07 08:36:22,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [302.0, 437.0, 940.0, 283.0, 230.0, 1000.0, 73.0, 257.0, 184.0, 624.0]
2025-08-07 08:36:22,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 17 minutes)
2025-08-07 08:38:16,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:38:25,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1361.60034 ± 645.671
2025-08-07 08:38:25,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1995.6471, 465.09708, 2245.8315, 2121.9631, 614.4278, 823.2885, 1451.458, 734.3464, 1940.0708, 1223.8733]
2025-08-07 08:38:25,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 237.0, 1000.0, 1000.0, 275.0, 378.0, 718.0, 352.0, 914.0, 595.0]
2025-08-07 08:38:25,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1361.60) for latency MM1Queue_a033_s075
2025-08-07 08:38:25,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 14 seconds)
2025-08-07 08:40:17,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:40:25,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1012.82147 ± 556.485
2025-08-07 08:40:25,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [182.68463, 1298.0549, 1128.7223, 1644.8026, 647.66077, 2069.9963, 672.1413, 376.04553, 1316.283, 791.823]
2025-08-07 08:40:25,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [88.0, 672.0, 571.0, 787.0, 330.0, 1000.0, 358.0, 196.0, 1000.0, 362.0]
2025-08-07 08:40:25,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 16 minutes, 51 seconds)
2025-08-07 08:42:06,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:42:16,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1483.03748 ± 664.608
2025-08-07 08:42:16,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [737.8341, 1424.9012, 1427.0437, 538.2771, 1336.9136, 1439.6777, 2197.1516, 2488.5732, 789.0938, 2450.909]
2025-08-07 08:42:16,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 605.0, 615.0, 237.0, 602.0, 593.0, 961.0, 1000.0, 310.0, 1000.0]
2025-08-07 08:42:16,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1483.04) for latency MM1Queue_a033_s075
2025-08-07 08:42:16,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 12 minutes, 50 seconds)
2025-08-07 08:44:04,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:44:12,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1028.72791 ± 582.810
2025-08-07 08:44:12,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [999.30286, 1835.1664, 270.11432, 725.30334, 383.47867, 598.06116, 1381.4283, 850.378, 2176.9443, 1067.1019]
2025-08-07 08:44:12,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [437.0, 828.0, 124.0, 355.0, 210.0, 325.0, 593.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:44:12,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 44 seconds)
2025-08-07 08:46:09,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:46:16,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1018.37286 ± 580.128
2025-08-07 08:46:16,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1162.1727, 183.3101, 567.5045, 680.3711, 181.23946, 933.0999, 1850.235, 1685.8645, 1665.5132, 1274.4183]
2025-08-07 08:46:16,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [586.0, 120.0, 319.0, 297.0, 80.0, 495.0, 974.0, 797.0, 855.0, 602.0]
2025-08-07 08:46:16,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 12 minutes, 44 seconds)
2025-08-07 08:47:59,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:48:06,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 828.33435 ± 397.872
2025-08-07 08:48:06,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [108.25235, 918.41614, 1101.109, 1366.576, 876.3293, 762.7059, 1165.4778, 974.9806, 917.7128, 91.78409]
2025-08-07 08:48:06,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 1000.0, 1000.0, 583.0, 1000.0, 335.0, 471.0, 439.0, 391.0, 46.0]
2025-08-07 08:48:06,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 50 seconds)
2025-08-07 08:49:56,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:50:00,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 650.48700 ± 468.694
2025-08-07 08:50:00,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [438.22766, 850.9971, 180.44293, 240.53447, 75.173065, 988.76874, 1657.8315, 1031.9728, 316.77625, 724.1455]
2025-08-07 08:50:00,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [204.0, 329.0, 84.0, 121.0, 44.0, 453.0, 713.0, 1000.0, 162.0, 306.0]
2025-08-07 08:50:00,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 4 minutes, 41 seconds)
2025-08-07 08:51:54,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:52:06,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1898.73022 ± 594.686
2025-08-07 08:52:06,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [833.8437, 2296.9749, 2406.2278, 2390.0483, 1122.3131, 2316.8782, 2377.2913, 1706.9465, 2346.1165, 1190.6624]
2025-08-07 08:52:06,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [371.0, 1000.0, 1000.0, 1000.0, 505.0, 1000.0, 1000.0, 1000.0, 1000.0, 599.0]
2025-08-07 08:52:06,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (1898.73) for latency MM1Queue_a033_s075
2025-08-07 08:52:06,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 52 seconds)
2025-08-07 08:53:47,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:53:54,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 793.19696 ± 709.404
2025-08-07 08:53:54,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1421.979, 1394.4165, 123.13879, 191.33202, 2182.4055, 1234.4551, 206.67018, 134.87755, 66.00573, 976.6896]
2025-08-07 08:53:54,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [620.0, 1000.0, 66.0, 105.0, 1000.0, 1000.0, 118.0, 71.0, 46.0, 494.0]
2025-08-07 08:53:54,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 10 seconds)
2025-08-07 08:55:46,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:56:01,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2121.99878 ± 342.465
2025-08-07 08:56:01,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2372.7844, 2104.19, 2279.1787, 1555.0492, 1781.0032, 2349.0947, 1559.1799, 2525.239, 2433.3916, 2260.879]
2025-08-07 08:56:01,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 823.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:56:01,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2122.00) for latency MM1Queue_a033_s075
2025-08-07 08:56:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 48 seconds)
2025-08-07 08:57:50,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:58:00,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1547.39087 ± 864.817
2025-08-07 08:58:00,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1211.0056, 2576.9927, 2467.3877, 597.2726, 2475.7637, 527.80316, 1961.2139, 384.78577, 2383.0867, 888.59753]
2025-08-07 08:58:00,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 253.0, 1000.0, 219.0, 759.0, 176.0, 1000.0, 363.0]
2025-08-07 08:58:00,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 41 seconds)
2025-08-07 08:59:47,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:59:56,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1549.35767 ± 969.291
2025-08-07 08:59:56,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [497.15482, 2429.4197, 1449.6672, 2377.0693, 2558.716, 298.70035, 2460.7148, 2495.6106, 737.5298, 188.99402]
2025-08-07 08:59:56,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [251.0, 1000.0, 553.0, 1000.0, 1000.0, 143.0, 1000.0, 1000.0, 302.0, 102.0]
2025-08-07 08:59:56,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 8 seconds)
2025-08-07 09:01:41,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:01:55,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2351.69287 ± 235.291
2025-08-07 09:01:55,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2553.8062, 2287.1182, 2413.3147, 2137.083, 2528.4539, 2461.7134, 2375.5251, 2410.6992, 1757.0974, 2592.116]
2025-08-07 09:01:55,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 954.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:01:55,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2351.69) for latency MM1Queue_a033_s075
2025-08-07 09:01:55,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 46 seconds)
2025-08-07 09:03:46,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:03:59,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2331.58398 ± 244.419
2025-08-07 09:03:59,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2497.9597, 2410.365, 2480.9243, 2504.673, 2360.726, 2342.5312, 2330.2998, 2348.3242, 2417.4216, 1622.6155]
2025-08-07 09:03:59,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 650.0]
2025-08-07 09:03:59,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 57 minutes)
2025-08-07 09:05:46,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:06:00,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2312.38159 ± 191.913
2025-08-07 09:06:00,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2535.6006, 2153.9824, 2495.9768, 2261.3862, 2005.2517, 1997.0056, 2419.264, 2510.738, 2423.9817, 2320.626]
2025-08-07 09:06:00,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 903.0, 1000.0, 1000.0, 1000.0, 944.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:06:00,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 48 seconds)
2025-08-07 09:07:49,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:08:02,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2199.62891 ± 313.381
2025-08-07 09:08:02,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2238.6606, 1377.3958, 2383.5723, 2329.907, 2382.5728, 2321.6118, 2125.9907, 2351.2534, 2536.9517, 1948.3737]
2025-08-07 09:08:02,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 607.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:08:02,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes, 31 seconds)
2025-08-07 09:09:48,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:10:01,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2262.97363 ± 343.044
2025-08-07 09:10:01,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2469.3699, 2384.497, 2241.2996, 2446.727, 2400.3643, 2503.228, 2285.3867, 2386.149, 1266.6648, 2246.051]
2025-08-07 09:10:01,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 585.0, 1000.0]
2025-08-07 09:10:01,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes, 56 seconds)
2025-08-07 09:11:51,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:12:04,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2056.43823 ± 681.836
2025-08-07 09:12:04,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1022.5385, 2648.3384, 2477.8418, 482.55606, 2388.7458, 2231.833, 2464.8958, 2013.801, 2365.5032, 2468.3271]
2025-08-07 09:12:04,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 243.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:12:04,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 43 seconds)
2025-08-07 09:13:54,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:14:03,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1787.56580 ± 950.793
2025-08-07 09:14:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1767.4387, 2622.0974, 214.67792, 2550.678, 1147.7882, 91.9193, 1690.1965, 2464.219, 2687.257, 2639.3857]
2025-08-07 09:14:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [660.0, 1000.0, 122.0, 1000.0, 412.0, 52.0, 580.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:14:03,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 43 seconds)
2025-08-07 09:15:53,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:16:04,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1688.96252 ± 740.417
2025-08-07 09:16:04,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2316.1582, 2544.3677, 1017.6099, 1270.0732, 2379.7756, 853.7667, 1027.9867, 2468.6858, 2355.312, 655.88934]
2025-08-07 09:16:04,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [928.0, 1000.0, 421.0, 1000.0, 1000.0, 385.0, 454.0, 1000.0, 867.0, 1000.0]
2025-08-07 09:16:04,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 46 seconds)
2025-08-07 09:17:56,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:18:06,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1755.11462 ± 913.391
2025-08-07 09:18:06,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [911.60657, 2463.3447, 2413.4307, 563.50397, 109.96077, 1843.2197, 1318.2174, 2601.314, 2609.9253, 2716.6226]
2025-08-07 09:18:06,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [364.0, 1000.0, 894.0, 245.0, 60.0, 736.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:18:06,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 42 minutes, 34 seconds)
2025-08-07 09:19:53,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:20:04,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1923.52319 ± 955.989
2025-08-07 09:20:04,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2588.7537, 2644.341, 539.5739, 2628.6243, 2644.8499, 2391.6995, 2629.07, 2282.4536, 358.8571, 527.0104]
2025-08-07 09:20:04,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 221.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 170.0, 204.0]
2025-08-07 09:20:04,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 40 minutes, 24 seconds)
2025-08-07 09:21:45,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:21:58,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2339.07495 ± 482.370
2025-08-07 09:21:58,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2377.9712, 2372.9233, 2507.4048, 2506.1824, 2487.7957, 2515.6145, 2367.6155, 2707.9768, 925.61914, 2621.6455]
2025-08-07 09:21:58,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 408.0, 1000.0]
2025-08-07 09:21:58,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 36 minutes, 56 seconds)
2025-08-07 09:23:46,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:23:55,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1744.32520 ± 923.654
2025-08-07 09:23:55,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2434.229, 2229.3135, 2734.7307, 121.939354, 2669.577, 2466.7913, 942.90656, 2290.619, 689.09875, 864.04755]
2025-08-07 09:23:55,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 900.0, 1000.0, 71.0, 1000.0, 1000.0, 393.0, 1000.0, 297.0, 331.0]
2025-08-07 09:23:56,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 47 seconds)
2025-08-07 09:25:46,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:25:58,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2336.53784 ± 591.376
2025-08-07 09:25:58,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2744.9924, 2782.7637, 2657.821, 2449.8232, 2627.4631, 2512.8777, 2660.3464, 809.0658, 1667.2593, 2452.9666]
2025-08-07 09:25:58,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 975.0, 1000.0, 1000.0, 1000.0, 320.0, 662.0, 1000.0]
2025-08-07 09:25:58,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 1 second)
2025-08-07 09:27:52,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:28:01,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1909.99121 ± 811.021
2025-08-07 09:28:01,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1793.6188, 444.36008, 2685.7708, 1832.2533, 1713.8047, 2668.1575, 2740.3108, 2001.4867, 533.7284, 2686.4219]
2025-08-07 09:28:01,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [649.0, 190.0, 1000.0, 679.0, 660.0, 1000.0, 1000.0, 703.0, 228.0, 1000.0]
2025-08-07 09:28:01,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 18 seconds)
2025-08-07 09:29:42,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:29:56,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2403.61841 ± 614.385
2025-08-07 09:29:56,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2508.8281, 2537.3909, 573.3336, 2580.5115, 2606.1692, 2507.1614, 2684.6096, 2740.4426, 2644.1565, 2653.5808]
2025-08-07 09:29:56,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:29:56,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2403.62) for latency MM1Queue_a033_s075
2025-08-07 09:29:56,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 28 minutes, 55 seconds)
2025-08-07 09:31:48,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:31:59,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1867.56079 ± 922.489
2025-08-07 09:31:59,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [990.8489, 541.08716, 2431.931, 2807.561, 2801.2654, 2737.958, 851.6396, 2599.279, 2246.1938, 667.8446]
2025-08-07 09:31:59,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [355.0, 231.0, 950.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 856.0, 298.0]
2025-08-07 09:31:59,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 11 seconds)
2025-08-07 09:33:51,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:34:05,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2417.24146 ± 382.498
2025-08-07 09:34:05,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2593.3718, 2777.6213, 2683.0657, 2225.9824, 2243.857, 2695.2273, 1435.575, 2264.6008, 2546.0125, 2707.102]
2025-08-07 09:34:05,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 844.0, 1000.0, 1000.0, 561.0, 933.0, 1000.0, 1000.0]
2025-08-07 09:34:05,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2417.24) for latency MM1Queue_a033_s075
2025-08-07 09:34:05,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 27 minutes, 23 seconds)
2025-08-07 09:35:51,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2510.40991 ± 154.354
2025-08-07 09:36:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2478.7412, 2486.1975, 2420.8564, 2638.292, 2600.1172, 2150.8057, 2468.207, 2566.8904, 2519.239, 2774.7544]
2025-08-07 09:36:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 857.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:36:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2510.41) for latency MM1Queue_a033_s075
2025-08-07 09:36:04,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 50 seconds)
2025-08-07 09:37:58,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:05,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1432.86707 ± 954.725
2025-08-07 09:38:05,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [165.16556, 2594.7212, 547.1766, 1433.641, 2634.08, 429.49496, 953.0947, 2504.9084, 2406.2136, 660.1744]
2025-08-07 09:38:05,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 1000.0, 245.0, 563.0, 1000.0, 154.0, 381.0, 1000.0, 928.0, 256.0]
2025-08-07 09:38:05,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 32 seconds)
2025-08-07 09:39:46,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:39:57,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2113.42065 ± 751.810
2025-08-07 09:39:57,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2611.6545, 1401.266, 660.0391, 2831.773, 1472.8572, 1401.9751, 2805.052, 2590.2317, 2729.5024, 2629.8545]
2025-08-07 09:39:57,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 556.0, 242.0, 1000.0, 550.0, 538.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:39:57,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 1 second)
2025-08-07 09:41:49,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:42:01,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1563.28638 ± 931.117
2025-08-07 09:42:01,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2714.966, 888.2645, 2418.2566, 101.88641, 979.58014, 2627.2944, 784.36456, 2799.324, 1368.7677, 950.15936]
2025-08-07 09:42:01,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 50.0, 1000.0, 1000.0, 1000.0, 1000.0, 634.0, 1000.0]
2025-08-07 09:42:01,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 17 seconds)
2025-08-07 09:43:47,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:44:01,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2539.44727 ± 248.485
2025-08-07 09:44:01,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2301.8962, 1979.7273, 2639.4958, 2748.2307, 2345.31, 2805.7075, 2792.5132, 2607.011, 2508.668, 2665.9146]
2025-08-07 09:44:01,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [931.0, 780.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:44:01,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2539.45) for latency MM1Queue_a033_s075
2025-08-07 09:44:01,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 26 seconds)
2025-08-07 09:45:57,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:46:09,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2326.48877 ± 644.492
2025-08-07 09:46:09,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2685.3728, 2626.1382, 943.03534, 1711.3842, 2817.9517, 1504.8949, 2760.8015, 2721.2554, 2628.529, 2865.525]
2025-08-07 09:46:09,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 357.0, 649.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:46:09,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 37 seconds)
2025-08-07 09:47:46,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:48:01,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2472.90576 ± 460.764
2025-08-07 09:48:01,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2533.5205, 2572.3616, 2739.2139, 1130.525, 2786.7915, 2714.7148, 2563.7612, 2739.584, 2464.5056, 2484.081]
2025-08-07 09:48:01,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 420.0, 995.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:48:01,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 29 seconds)
2025-08-07 09:49:58,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:50:11,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1989.60425 ± 831.721
2025-08-07 09:50:11,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2122.9043, 412.60562, 2421.762, 2741.6494, 1364.1785, 761.559, 2793.7695, 1824.0907, 2741.5557, 2711.9692]
2025-08-07 09:50:11,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [824.0, 187.0, 1000.0, 1000.0, 495.0, 1000.0, 1000.0, 744.0, 1000.0, 1000.0]
2025-08-07 09:50:11,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 11 minutes, 38 seconds)
2025-08-07 09:51:55,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:52:09,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2644.76001 ± 178.511
2025-08-07 09:52:09,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2266.3357, 2702.7847, 2518.281, 2587.377, 2859.7563, 2873.6743, 2611.4773, 2744.2627, 2788.7078, 2494.9436]
2025-08-07 09:52:09,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:52:09,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2644.76) for latency MM1Queue_a033_s075
2025-08-07 09:52:09,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 51 seconds)
2025-08-07 09:54:03,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:54:16,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2434.13818 ± 452.880
2025-08-07 09:54:16,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2599.4692, 2669.001, 2726.9714, 2501.9353, 1203.2158, 2070.0547, 2786.5479, 2537.393, 2513.9888, 2732.8057]
2025-08-07 09:54:16,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 447.0, 764.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:54:16,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 7 minutes, 40 seconds)
2025-08-07 09:56:06,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:56:17,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2137.13574 ± 711.001
2025-08-07 09:56:17,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2627.7444, 2612.2322, 2815.3179, 2904.4595, 1452.857, 1592.0128, 2633.4817, 959.0658, 2599.6416, 1174.5442]
2025-08-07 09:56:17,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 631.0, 1000.0, 331.0, 1000.0, 518.0]
2025-08-07 09:56:17,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 50 seconds)
2025-08-07 09:58:02,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:58:13,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2136.39404 ± 752.413
2025-08-07 09:58:13,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [879.719, 2712.5095, 2714.1912, 2564.774, 2517.2905, 2554.9114, 2332.18, 501.02664, 2567.7903, 2019.5477]
2025-08-07 09:58:13,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [338.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 184.0, 1000.0, 775.0]
2025-08-07 09:58:13,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 3 minutes, 16 seconds)
2025-08-07 10:00:04,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:00:15,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2078.06689 ± 843.534
2025-08-07 10:00:15,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1273.359, 2742.4985, 2712.8452, 2666.94, 204.8711, 2784.2993, 2472.3328, 1571.8601, 2823.2593, 1528.4038]
2025-08-07 10:00:15,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [474.0, 1000.0, 1000.0, 1000.0, 88.0, 1000.0, 1000.0, 1000.0, 1000.0, 605.0]
2025-08-07 10:00:15,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 27 seconds)
2025-08-07 10:02:03,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:02:14,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2125.63135 ± 988.773
2025-08-07 10:02:14,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2787.2783, 2866.3262, 2758.624, 123.47261, 1399.0802, 2789.1626, 2792.9617, 2565.922, 2633.8503, 539.6372]
2025-08-07 10:02:14,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 59.0, 536.0, 1000.0, 1000.0, 1000.0, 1000.0, 220.0]
2025-08-07 10:02:14,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 28 seconds)
2025-08-07 10:03:55,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:04:06,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2052.85352 ± 878.036
2025-08-07 10:04:06,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2753.7493, 1912.1926, 1292.4379, 2554.7908, 2607.3315, 2803.8499, 242.65942, 907.9493, 2747.1885, 2706.3875]
2025-08-07 10:04:06,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 746.0, 549.0, 897.0, 1000.0, 1000.0, 116.0, 411.0, 1000.0, 1000.0]
2025-08-07 10:04:06,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 2 seconds)
2025-08-07 10:05:57,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:06:10,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2085.31055 ± 678.044
2025-08-07 10:06:10,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2633.9832, 2453.438, 1143.6741, 1277.4633, 2545.192, 1358.0889, 2653.4165, 2705.2615, 2806.2397, 1276.3469]
2025-08-07 10:06:10,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 444.0, 426.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 557.0]
2025-08-07 10:06:10,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 22 seconds)
2025-08-07 10:07:57,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:06,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1403.52698 ± 1037.536
2025-08-07 10:08:06,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [752.9686, 309.60583, 2647.2522, 2263.3008, 2586.1292, 394.9671, 2805.9055, 458.53104, 169.07092, 1647.5385]
2025-08-07 10:08:06,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [271.0, 147.0, 1000.0, 1000.0, 1000.0, 165.0, 1000.0, 261.0, 88.0, 645.0]
2025-08-07 10:08:06,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 20 seconds)
2025-08-07 10:09:58,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:10,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2216.50146 ± 672.195
2025-08-07 10:10:10,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2187.9443, 2661.5334, 2600.239, 2813.9314, 2593.4622, 2069.1816, 2703.9192, 2590.7468, 674.915, 1269.1432]
2025-08-07 10:10:10,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [838.0, 1000.0, 1000.0, 1000.0, 956.0, 808.0, 1000.0, 1000.0, 284.0, 476.0]
2025-08-07 10:10:10,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 33 seconds)
2025-08-07 10:11:49,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:03,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2367.64136 ± 579.499
2025-08-07 10:12:03,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2700.298, 2671.8333, 1144.9856, 2851.981, 2667.8542, 1308.6855, 2653.0723, 2611.6995, 2437.3882, 2628.6143]
2025-08-07 10:12:03,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 463.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 985.0]
2025-08-07 10:12:03,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 7 seconds)
2025-08-07 10:13:55,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:10,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2709.95459 ± 105.749
2025-08-07 10:14:10,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2688.22, 2730.3357, 2876.5762, 2818.8281, 2591.527, 2511.2153, 2700.9646, 2655.931, 2695.2207, 2830.7285]
2025-08-07 10:14:10,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:14:10,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2709.95) for latency MM1Queue_a033_s075
2025-08-07 10:14:10,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 21 seconds)
2025-08-07 10:15:54,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:16:06,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2417.84766 ± 476.874
2025-08-07 10:16:06,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2481.191, 2566.5115, 2629.481, 2618.2964, 2851.244, 1354.9833, 2702.9634, 2777.151, 1634.1534, 2562.5022]
2025-08-07 10:16:06,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 547.0, 1000.0, 1000.0, 594.0, 1000.0]
2025-08-07 10:16:06,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 42 seconds)
2025-08-07 10:17:57,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:10,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2328.49243 ± 735.418
2025-08-07 10:18:10,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1319.5079, 2969.6199, 2764.609, 2927.7249, 2983.4482, 2791.6023, 2577.71, 2544.8323, 1471.1852, 934.6833]
2025-08-07 10:18:10,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 329.0]
2025-08-07 10:18:10,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 19 seconds)
2025-08-07 10:19:56,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:08,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2296.77490 ± 646.458
2025-08-07 10:20:08,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2696.6785, 2667.6284, 828.64343, 2668.679, 2525.3828, 2668.8755, 2559.3555, 2589.4966, 2551.4475, 1211.5635]
2025-08-07 10:20:08,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 311.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 479.0]
2025-08-07 10:20:08,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 51 seconds)
2025-08-07 10:21:58,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:10,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2386.12354 ± 562.458
2025-08-07 10:22:10,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2806.7773, 2827.4763, 2576.488, 2595.9128, 1154.164, 1450.7235, 2760.2537, 2331.4773, 2632.0076, 2725.9548]
2025-08-07 10:22:10,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 398.0, 565.0, 1000.0, 814.0, 1000.0, 1000.0]
2025-08-07 10:22:10,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 27 seconds)
2025-08-07 10:23:57,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:10,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2553.96631 ± 288.399
2025-08-07 10:24:10,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2588.4373, 1882.2094, 2727.8906, 2515.8413, 2164.946, 2887.0898, 2652.2915, 2655.1755, 2758.7966, 2706.9846]
2025-08-07 10:24:10,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 827.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:24:10,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 58 seconds)
2025-08-07 10:25:51,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:04,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2323.81494 ± 744.383
2025-08-07 10:26:04,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2509.1682, 2608.1611, 2826.8228, 2814.9587, 1554.9274, 2273.4177, 389.6597, 2840.7378, 2835.0713, 2585.2224]
2025-08-07 10:26:04,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 589.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:26:04,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 54 seconds)
2025-08-07 10:27:47,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:58,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2015.87537 ± 1024.032
2025-08-07 10:27:58,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2551.271, 55.276543, 1441.2125, 2731.288, 2783.2463, 2515.3374, 149.09262, 2620.91, 2600.2766, 2710.8416]
2025-08-07 10:27:58,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 42.0, 1000.0, 1000.0, 1000.0, 1000.0, 75.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:27:58,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 20 seconds)
2025-08-07 10:29:52,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:04,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2314.60181 ± 823.257
2025-08-07 10:30:04,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2892.5146, 2681.2134, 2944.381, 2641.694, 1674.9982, 2874.858, 2931.5315, 864.6316, 792.85504, 2847.3396]
2025-08-07 10:30:04,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 314.0, 295.0, 1000.0]
2025-08-07 10:30:04,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 47 seconds)
2025-08-07 10:31:47,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:01,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2479.28369 ± 466.431
2025-08-07 10:32:01,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2600.3665, 1130.695, 2690.0464, 2715.5476, 2774.0686, 2311.027, 2690.3438, 2619.193, 2539.2336, 2722.3135]
2025-08-07 10:32:01,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:32:01,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 33 seconds)
2025-08-07 10:33:45,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:56,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2218.77148 ± 884.920
2025-08-07 10:33:56,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [660.463, 2750.0962, 463.5867, 2895.8901, 2759.0386, 1771.0529, 2803.5278, 2751.3171, 2843.0874, 2489.655]
2025-08-07 10:33:56,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [275.0, 1000.0, 194.0, 1000.0, 1000.0, 667.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:33:56,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 24 seconds)
2025-08-07 10:35:39,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:51,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2498.07349 ± 469.002
2025-08-07 10:35:51,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2662.4858, 2633.351, 2867.9783, 1340.0311, 1881.1063, 2758.2654, 2901.9185, 2644.231, 2642.0093, 2649.3564]
2025-08-07 10:35:51,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 464.0, 715.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:35:51,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 28 seconds)
2025-08-07 10:37:43,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:55,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2328.44263 ± 647.653
2025-08-07 10:37:55,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2612.4062, 2716.0486, 2478.6875, 2574.9229, 2724.9656, 2712.3418, 618.65314, 2624.831, 1626.1381, 2595.43]
2025-08-07 10:37:55,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 241.0, 1000.0, 618.0, 1000.0]
2025-08-07 10:37:55,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 53 seconds)
2025-08-07 10:39:41,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:52,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2028.92749 ± 901.940
2025-08-07 10:39:52,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1031.7595, 994.12415, 2719.1428, 311.086, 2501.9917, 2823.742, 2908.2585, 2667.9646, 1632.7137, 2698.4917]
2025-08-07 10:39:52,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [405.0, 349.0, 1000.0, 135.0, 1000.0, 1000.0, 1000.0, 1000.0, 610.0, 1000.0]
2025-08-07 10:39:52,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 36 seconds)
2025-08-07 10:41:33,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:47,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2407.96826 ± 560.958
2025-08-07 10:41:47,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1210.3822, 2731.54, 2702.3018, 2583.3086, 2828.8276, 2787.851, 2633.479, 2599.9648, 1388.6077, 2613.418]
2025-08-07 10:41:47,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 512.0, 1000.0]
2025-08-07 10:41:47,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 36 seconds)
2025-08-07 10:43:35,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:49,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2572.66089 ± 407.480
2025-08-07 10:43:49,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1385.8696, 2748.9524, 2785.0566, 2736.4087, 2751.6606, 2451.287, 2750.9238, 2703.9053, 2799.6555, 2612.8896]
2025-08-07 10:43:49,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [515.0, 1000.0, 1000.0, 1000.0, 1000.0, 903.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:43:49,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 48 seconds)
2025-08-07 10:45:28,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:42,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2771.27563 ± 127.958
2025-08-07 10:45:42,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2913.6018, 2770.233, 2879.7998, 2584.8872, 2730.097, 2938.3262, 2733.0059, 2536.3088, 2873.0747, 2753.4233]
2025-08-07 10:45:42,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:45:42,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1226 [INFO]: New best (2771.28) for latency MM1Queue_a033_s075
2025-08-07 10:45:42,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 46 seconds)
2025-08-07 10:47:22,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 1898.96606 ± 807.434
2025-08-07 10:47:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2669.4285, 1150.6649, 959.5987, 729.5458, 1114.6239, 2423.8882, 2563.6804, 2745.816, 1717.0201, 2915.3943]
2025-08-07 10:47:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 417.0, 364.0, 269.0, 1000.0, 822.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:47:33,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 33 seconds)
2025-08-07 10:49:20,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:31,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2242.87915 ± 1009.253
2025-08-07 10:49:31,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [402.04004, 2799.3862, 2701.3867, 2850.7524, 2773.5083, 2779.0217, 70.56981, 2582.916, 2836.1428, 2633.0671]
2025-08-07 10:49:31,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [169.0, 1000.0, 1000.0, 989.0, 1000.0, 1000.0, 54.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:49:31,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 39 seconds)
2025-08-07 10:51:19,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:29,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2093.55371 ± 801.207
2025-08-07 10:51:29,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [1264.0875, 1159.6515, 2845.0735, 2845.6, 2893.081, 2578.925, 2249.6877, 2731.613, 1803.4054, 564.4113]
2025-08-07 10:51:29,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [510.0, 406.0, 1000.0, 1000.0, 1000.0, 927.0, 833.0, 1000.0, 610.0, 250.0]
2025-08-07 10:51:29,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 45 seconds)
2025-08-07 10:53:09,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:21,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2495.38550 ± 565.801
2025-08-07 10:53:21,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2702.178, 1453.7273, 1298.4263, 2699.957, 2691.9988, 2887.9736, 2831.9116, 2915.7847, 2750.5117, 2721.3875]
2025-08-07 10:53:21,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 530.0, 444.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:53:21,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 42 seconds)
2025-08-07 10:55:09,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:22,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2739.24243 ± 193.042
2025-08-07 10:55:22,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2916.981, 2799.6025, 2952.8352, 2710.1892, 2268.687, 2730.5261, 2847.4426, 2661.5645, 2592.0303, 2912.564]
2025-08-07 10:55:22,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 835.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:55:22,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 52 seconds)
2025-08-07 10:57:01,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:12,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2246.43115 ± 846.459
2025-08-07 10:57:12,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2798.589, 2810.6804, 1073.8419, 2014.1873, 262.16306, 2707.0796, 2378.857, 2864.364, 2765.3948, 2789.157]
2025-08-07 10:57:12,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 427.0, 840.0, 114.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:57:12,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 55 seconds)
2025-08-07 10:59:01,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1221 [DEBUG]: Total Reward: 2361.14771 ± 816.307
2025-08-07 10:59:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1222 [DEBUG]: All rewards: [2709.9739, 2787.8796, 2844.7065, 1254.7404, 2877.2874, 2781.6565, 2468.867, 2578.5125, 2957.6536, 350.19778]
2025-08-07 10:59:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 483.0, 1000.0, 1000.0, 855.0, 1000.0, 1000.0, 130.0]
2025-08-07 10:59:12,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-ant):1251 [DEBUG]: Training session finished
