2025-08-07 08:21:10,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 08:21:10,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 08:21:10,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x149408313e90>}
2025-08-07 08:21:10,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1111 [DEBUG]: using device: cuda
2025-08-07 08:21:10,559 baseline-bpql-noiseperc15-ant:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 08:21:10,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1133 [INFO]: Creating new trainer
2025-08-07 08:21:10,576 baseline-bpql-noiseperc15-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=155, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 08:21:10,576 baseline-bpql-noiseperc15-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 08:21:13,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1194 [DEBUG]: Starting training session...
2025-08-07 08:21:13,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 1/100
2025-08-07 08:22:54,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:22:55,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -54.78869 ± 55.842
2025-08-07 08:22:55,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-78.05283, -71.12016, -36.400063, -2.0528367, -115.64799, -176.40408, -56.43758, 15.867896, 0.50540215, -28.14471]
2025-08-07 08:22:55,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 51.0, 43.0, 41.0, 100.0, 126.0, 79.0, 30.0, 27.0, 42.0]
2025-08-07 08:22:55,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (-54.79) for latency MM1Queue_a033_s075
2025-08-07 08:22:55,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 48 minutes, 49 seconds)
2025-08-07 08:24:35,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:24:39,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -191.55492 ± 308.860
2025-08-07 08:24:39,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-7.368313, -782.5042, -9.096605, -2.2316136, -116.332985, -812.678, -184.49663, -16.736128, 7.0375338, 8.857922]
2025-08-07 08:24:39,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [92.0, 1000.0, 51.0, 87.0, 182.0, 1000.0, 232.0, 83.0, 21.0, 38.0]
2025-08-07 08:24:39,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 48 minutes, 14 seconds)
2025-08-07 08:26:14,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:26:18,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -198.87204 ± 305.854
2025-08-07 08:26:18,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-16.834484, -736.1526, -51.58672, -2.6829855, -872.31165, -61.316074, -16.20627, -33.148624, -86.06678, -112.41415]
2025-08-07 08:26:18,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 1000.0, 96.0, 61.0, 1000.0, 146.0, 64.0, 79.0, 123.0, 166.0]
2025-08-07 08:26:18,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 44 minutes, 34 seconds)
2025-08-07 08:27:59,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:28:02,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: -85.42864 ± 173.241
2025-08-07 08:28:02,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-218.23743, 10.796248, -8.7402315, -15.038164, -559.2081, 6.253792, 24.25178, -99.61985, 32.457134, -27.201614]
2025-08-07 08:28:02,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [427.0, 79.0, 36.0, 60.0, 1000.0, 52.0, 43.0, 186.0, 65.0, 86.0]
2025-08-07 08:28:02,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 43 minutes, 40 seconds)
2025-08-07 08:29:43,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:29:48,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 6.99672 ± 24.295
2025-08-07 08:29:48,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-43.785168, 4.4589314, 15.807155, 2.3399782, 26.517193, 6.626286, -1.8280511, -12.761472, 54.34691, 18.245417]
2025-08-07 08:29:48,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [116.0, 73.0, 97.0, 120.0, 198.0, 65.0, 1000.0, 735.0, 168.0, 461.0]
2025-08-07 08:29:48,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (7.00) for latency MM1Queue_a033_s075
2025-08-07 08:29:48,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 43 minutes, 4 seconds)
2025-08-07 08:31:31,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:31:36,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 76.35838 ± 59.666
2025-08-07 08:31:36,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [131.48082, 40.973495, 230.3277, 51.54094, 36.772655, 96.477615, 26.566399, 33.231277, 60.342327, 55.870483]
2025-08-07 08:31:36,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 208.0, 1000.0, 91.0, 106.0, 340.0, 37.0, 105.0, 292.0, 226.0]
2025-08-07 08:31:36,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (76.36) for latency MM1Queue_a033_s075
2025-08-07 08:31:36,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 43 minutes, 12 seconds)
2025-08-07 08:33:16,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:33:20,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 28.33545 ± 44.675
2025-08-07 08:33:20,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [18.096607, 15.910933, 46.467567, -63.448097, 41.883675, 96.47371, 6.5740776, 29.276114, -4.0251966, 96.14512]
2025-08-07 08:33:20,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 56.0, 160.0, 301.0, 80.0, 1000.0, 113.0, 45.0, 72.0, 1000.0]
2025-08-07 08:33:20,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 41 minutes, 39 seconds)
2025-08-07 08:35:02,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:35:06,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 27.41318 ± 41.011
2025-08-07 08:35:06,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [49.493248, 43.350998, 27.404915, 41.437397, 27.655588, 15.548543, -5.211679, 4.9480233, -49.1012, 118.60599]
2025-08-07 08:35:06,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [190.0, 137.0, 1000.0, 149.0, 66.0, 68.0, 103.0, 21.0, 245.0, 1000.0]
2025-08-07 08:35:06,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 42 minutes)
2025-08-07 08:36:56,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:37:00,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 52.03181 ± 40.218
2025-08-07 08:37:00,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [43.988064, 161.07268, 54.083805, 31.18511, 55.320343, 65.22409, 48.003414, 1.5455976, 24.659224, 35.235756]
2025-08-07 08:37:00,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [172.0, 1000.0, 100.0, 47.0, 1000.0, 240.0, 238.0, 177.0, 184.0, 56.0]
2025-08-07 08:37:00,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 43 minutes, 22 seconds)
2025-08-07 08:38:33,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:38:36,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 56.69909 ± 110.820
2025-08-07 08:38:36,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [46.142994, 38.2082, 38.857204, -53.886055, 19.6912, 21.056099, 57.116573, -16.95661, 41.452324, 375.30893]
2025-08-07 08:38:36,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 87.0, 193.0, 173.0, 195.0, 141.0, 96.0, 124.0, 77.0, 1000.0]
2025-08-07 08:38:36,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 38 minutes, 34 seconds)
2025-08-07 08:40:22,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:40:23,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 23.61446 ± 21.523
2025-08-07 08:40:23,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-25.573565, 38.41411, 13.610667, 18.155745, 11.77958, 15.415968, 28.980225, 53.748272, 32.78968, 48.82392]
2025-08-07 08:40:23,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [124.0, 51.0, 81.0, 73.0, 55.0, 34.0, 37.0, 162.0, 280.0, 64.0]
2025-08-07 08:40:23,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 36 minutes, 30 seconds)
2025-08-07 08:42:08,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:42:09,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 21.20885 ± 13.601
2025-08-07 08:42:09,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [41.679436, -5.07106, 25.205935, 19.53851, 12.751021, 29.900068, 8.968022, 30.145563, 37.199547, 11.771419]
2025-08-07 08:42:09,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 110.0, 106.0, 117.0, 61.0, 156.0, 12.0, 51.0, 154.0, 67.0]
2025-08-07 08:42:09,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 35 minutes, 17 seconds)
2025-08-07 08:43:41,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:43:46,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 133.44699 ± 145.584
2025-08-07 08:43:46,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [18.371784, 285.78653, 61.678715, 37.08676, 386.91354, 380.11105, 66.57215, 57.655945, 26.551563, 13.741881]
2025-08-07 08:43:46,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [108.0, 1000.0, 156.0, 184.0, 1000.0, 1000.0, 116.0, 76.0, 62.0, 93.0]
2025-08-07 08:43:46,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (133.45) for latency MM1Queue_a033_s075
2025-08-07 08:43:46,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 30 minutes, 50 seconds)
2025-08-07 08:45:28,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:45:32,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 102.42698 ± 168.988
2025-08-07 08:45:32,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [52.81964, 472.32275, 12.114249, 3.1365604, 8.054305, 2.55207, 397.09723, 73.81681, -23.690992, 26.047209]
2025-08-07 08:45:32,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [319.0, 1000.0, 73.0, 83.0, 102.0, 35.0, 1000.0, 121.0, 82.0, 35.0]
2025-08-07 08:45:32,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 26 minutes, 35 seconds)
2025-08-07 08:47:13,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:47:17,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 125.99330 ± 168.696
2025-08-07 08:47:17,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [489.18497, 33.076775, 22.45868, 13.528356, 393.9122, 208.80305, 13.39809, 7.523308, 29.099953, 48.947495]
2025-08-07 08:47:17,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 45.0, 61.0, 40.0, 1000.0, 639.0, 40.0, 83.0, 62.0, 180.0]
2025-08-07 08:47:17,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 27 minutes, 37 seconds)
2025-08-07 08:48:59,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:49:02,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 83.43288 ± 139.737
2025-08-07 08:49:02,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [10.143935, -32.440166, 275.0108, -4.89542, 423.8443, 4.808157, 58.89583, 60.540833, 35.41137, 3.0091789]
2025-08-07 08:49:02,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [135.0, 139.0, 685.0, 65.0, 1000.0, 136.0, 91.0, 180.0, 86.0, 91.0]
2025-08-07 08:49:02,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 25 minutes, 20 seconds)
2025-08-07 08:50:42,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:50:47,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 138.52570 ± 135.000
2025-08-07 08:50:47,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [172.31152, -4.2697906, 30.7116, 318.01843, 64.59832, 36.519302, 304.8435, 374.1146, 39.30493, 49.10452]
2025-08-07 08:50:47,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [333.0, 81.0, 293.0, 1000.0, 76.0, 72.0, 1000.0, 1000.0, 88.0, 75.0]
2025-08-07 08:50:47,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (138.53) for latency MM1Queue_a033_s075
2025-08-07 08:50:47,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 23 minutes, 13 seconds)
2025-08-07 08:52:27,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:52:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 64.46878 ± 87.952
2025-08-07 08:52:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [-19.063921, 50.0342, 88.75391, 36.886116, 312.19244, 47.057304, 29.514456, 77.56496, 8.545118, 13.20326]
2025-08-07 08:52:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [206.0, 113.0, 324.0, 175.0, 1000.0, 252.0, 180.0, 505.0, 62.0, 573.0]
2025-08-07 08:52:32,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 23 minutes, 39 seconds)
2025-08-07 08:54:14,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:54:16,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 34.15343 ± 29.327
2025-08-07 08:54:16,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [109.13002, 22.4305, 19.17091, 32.336826, 42.5524, 57.791958, -0.32387948, 22.302807, 9.752423, 26.39037]
2025-08-07 08:54:16,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [241.0, 45.0, 93.0, 167.0, 104.0, 81.0, 237.0, 144.0, 85.0, 113.0]
2025-08-07 08:54:16,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 21 minutes, 35 seconds)
2025-08-07 08:55:57,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:56:00,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 96.25375 ± 99.667
2025-08-07 08:56:00,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [37.052055, 76.70067, 36.29698, 10.550253, 111.4892, 57.684635, 223.12053, 60.594303, 11.478101, 337.57077]
2025-08-07 08:56:00,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 153.0, 238.0, 52.0, 184.0, 90.0, 415.0, 239.0, 41.0, 1000.0]
2025-08-07 08:56:00,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 19 minutes, 32 seconds)
2025-08-07 08:57:41,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:57:48,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 184.12468 ± 189.051
2025-08-07 08:57:48,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [301.6591, -2.259294, 89.49816, 476.62335, 21.776733, 450.9308, 127.050285, 7.449597, 387.14493, -18.626846]
2025-08-07 08:57:48,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 92.0, 313.0, 1000.0, 35.0, 1000.0, 467.0, 62.0, 1000.0, 63.0]
2025-08-07 08:57:48,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (184.12) for latency MM1Queue_a033_s075
2025-08-07 08:57:48,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 18 minutes, 30 seconds)
2025-08-07 08:59:34,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:59:39,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 164.76616 ± 185.058
2025-08-07 08:59:39,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [436.34125, 91.2771, 410.38568, 40.17856, 481.4948, -5.9101396, 48.04691, 87.782326, 57.45014, 0.6149207]
2025-08-07 08:59:39,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 237.0, 1000.0, 147.0, 1000.0, 76.0, 69.0, 170.0, 129.0, 31.0]
2025-08-07 08:59:40,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 18 minutes, 25 seconds)
2025-08-07 09:01:14,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:01:18,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 120.27934 ± 145.128
2025-08-07 09:01:18,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [415.88556, 61.32678, 37.489017, 366.8532, -4.178527, 193.08257, 58.528664, 29.60081, 20.21041, 23.994917]
2025-08-07 09:01:18,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 117.0, 51.0, 1000.0, 91.0, 433.0, 167.0, 115.0, 65.0, 34.0]
2025-08-07 09:01:18,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 15 minutes)
2025-08-07 09:03:02,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:03:04,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 77.40704 ± 117.868
2025-08-07 09:03:04,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [421.80603, 87.16592, 38.60576, 5.5158334, 56.51005, 36.934288, 80.18422, 10.953037, 13.79936, 22.5958]
2025-08-07 09:03:04,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 211.0, 120.0, 60.0, 190.0, 146.0, 113.0, 39.0, 54.0, 67.0]
2025-08-07 09:03:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 13 minutes, 50 seconds)
2025-08-07 09:04:50,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:04:54,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 146.54115 ± 169.230
2025-08-07 09:04:54,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [32.442574, 105.871254, 57.990253, 393.9018, 551.4734, 50.63049, 43.49706, 121.3661, 29.79654, 78.44218]
2025-08-07 09:04:54,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 278.0, 230.0, 1000.0, 1000.0, 99.0, 95.0, 320.0, 190.0, 190.0]
2025-08-07 09:04:55,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 13 minutes, 31 seconds)
2025-08-07 09:06:32,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:06:38,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 161.45096 ± 176.024
2025-08-07 09:06:38,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [37.117264, 33.183792, 69.93784, 46.712734, 340.84137, 37.828167, 399.36118, -41.051895, 197.4247, 493.15436]
2025-08-07 09:06:38,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 98.0, 452.0, 148.0, 1000.0, 80.0, 1000.0, 284.0, 403.0, 1000.0]
2025-08-07 09:06:38,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 10 minutes, 40 seconds)
2025-08-07 09:08:18,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:08:20,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 110.40678 ± 85.660
2025-08-07 09:08:20,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [258.5834, 282.866, 83.827385, 125.83719, 81.27152, 34.629936, 37.382015, 64.66393, 28.543718, 106.462715]
2025-08-07 09:08:20,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [457.0, 478.0, 175.0, 173.0, 173.0, 41.0, 57.0, 80.0, 36.0, 255.0]
2025-08-07 09:08:20,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 6 minutes, 39 seconds)
2025-08-07 09:09:56,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:10:01,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 160.55307 ± 145.460
2025-08-07 09:10:01,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [83.4518, 119.47241, 23.413668, 29.535166, 64.8245, 132.0335, 28.989248, 328.9641, 389.72833, 405.11783]
2025-08-07 09:10:01,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [124.0, 162.0, 55.0, 48.0, 138.0, 213.0, 32.0, 554.0, 1000.0, 1000.0]
2025-08-07 09:10:01,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 5 minutes, 30 seconds)
2025-08-07 09:11:45,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:11:48,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 97.84361 ± 87.768
2025-08-07 09:11:48,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [148.48598, 80.842995, 98.46059, 41.80585, 337.16168, 53.21375, 59.480583, 24.946095, 106.30909, 27.729536]
2025-08-07 09:11:48,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [212.0, 195.0, 221.0, 84.0, 1000.0, 167.0, 108.0, 41.0, 140.0, 43.0]
2025-08-07 09:11:48,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 3 minutes, 58 seconds)
2025-08-07 09:13:28,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:13:29,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 63.16430 ± 47.540
2025-08-07 09:13:29,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [89.270454, 76.18689, 133.95848, 4.1224604, 59.41524, 14.007536, 39.243576, 21.112787, 152.6997, 41.62583]
2025-08-07 09:13:29,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [125.0, 107.0, 267.0, 31.0, 142.0, 39.0, 120.0, 60.0, 332.0, 85.0]
2025-08-07 09:13:29,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 6 seconds)
2025-08-07 09:15:05,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:15:08,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 110.45387 ± 145.904
2025-08-07 09:15:08,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [25.672504, 27.244503, 83.77209, 208.62965, 33.834858, 46.17208, 4.530359, 510.59583, 29.765648, 134.32123]
2025-08-07 09:15:08,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [69.0, 51.0, 79.0, 328.0, 67.0, 64.0, 116.0, 1000.0, 88.0, 152.0]
2025-08-07 09:15:08,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 57 minutes, 13 seconds)
2025-08-07 09:16:52,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:16:55,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 79.02613 ± 105.034
2025-08-07 09:16:55,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [32.942947, 50.54056, 69.23648, 30.423326, 60.450974, 391.55966, 56.347153, 29.936432, 33.465202, 35.358566]
2025-08-07 09:16:55,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [73.0, 81.0, 141.0, 50.0, 108.0, 1000.0, 80.0, 69.0, 69.0, 65.0]
2025-08-07 09:16:55,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 56 minutes, 37 seconds)
2025-08-07 09:18:33,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:18:37,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 198.38588 ± 139.335
2025-08-07 09:18:37,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [163.51535, 34.247936, 105.17974, 74.08343, 186.54211, 193.7914, 115.04085, 235.3189, 534.1893, 341.94983]
2025-08-07 09:18:37,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [242.0, 54.0, 156.0, 151.0, 373.0, 348.0, 151.0, 346.0, 1000.0, 522.0]
2025-08-07 09:18:37,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (198.39) for latency MM1Queue_a033_s075
2025-08-07 09:18:37,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 55 minutes, 17 seconds)
2025-08-07 09:20:17,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:20:23,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 191.12679 ± 172.201
2025-08-07 09:20:23,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [44.68394, 91.84931, 138.55539, 456.31522, 97.55023, 406.48593, 87.47573, 101.94495, 480.77777, 5.629452]
2025-08-07 09:20:23,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [108.0, 177.0, 290.0, 1000.0, 107.0, 1000.0, 106.0, 188.0, 1000.0, 30.0]
2025-08-07 09:20:23,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 53 minutes, 14 seconds)
2025-08-07 09:22:04,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:22:08,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 174.48077 ± 150.854
2025-08-07 09:22:08,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [211.63481, 124.859856, 467.9769, 43.379074, 92.76807, 170.81036, 18.539148, 75.56974, 444.99118, 94.27865]
2025-08-07 09:22:08,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [428.0, 215.0, 1000.0, 87.0, 85.0, 268.0, 20.0, 91.0, 1000.0, 159.0]
2025-08-07 09:22:08,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 52 minutes, 25 seconds)
2025-08-07 09:23:48,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:23:54,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 262.37051 ± 174.238
2025-08-07 09:23:54,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [31.703058, 83.27899, 239.78258, 514.4655, 199.84908, 523.4704, 183.98843, 107.52823, 490.1375, 249.5012]
2025-08-07 09:23:54,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 138.0, 385.0, 1000.0, 282.0, 1000.0, 289.0, 141.0, 1000.0, 310.0]
2025-08-07 09:23:54,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (262.37) for latency MM1Queue_a033_s075
2025-08-07 09:23:54,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 52 minutes, 14 seconds)
2025-08-07 09:25:44,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:25:51,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 310.58627 ± 209.540
2025-08-07 09:25:51,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [45.176723, 506.87732, 190.87398, 498.84927, 121.63606, 23.105408, 592.0678, 161.10803, 504.18567, 461.9824]
2025-08-07 09:25:51,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 1000.0, 286.0, 1000.0, 260.0, 52.0, 1000.0, 277.0, 1000.0, 708.0]
2025-08-07 09:25:51,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (310.59) for latency MM1Queue_a033_s075
2025-08-07 09:25:51,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 52 minutes, 42 seconds)
2025-08-07 09:27:30,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:27:35,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 204.51974 ± 195.314
2025-08-07 09:27:35,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [92.58393, 624.8919, 357.94626, 97.78949, 70.65399, 440.76468, 5.401747, 59.495354, 57.10361, 238.56651]
2025-08-07 09:27:35,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [146.0, 1000.0, 474.0, 143.0, 100.0, 1000.0, 24.0, 68.0, 154.0, 376.0]
2025-08-07 09:27:35,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 51 minutes, 3 seconds)
2025-08-07 09:29:09,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:29:14,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 213.25452 ± 150.042
2025-08-07 09:29:14,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [242.36389, 193.77576, 85.7659, 349.94333, 66.11358, 110.50179, 2.5498822, 183.23698, 458.2107, 440.08347]
2025-08-07 09:29:14,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [296.0, 328.0, 142.0, 609.0, 91.0, 149.0, 26.0, 232.0, 1000.0, 1000.0]
2025-08-07 09:29:14,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 48 minutes, 4 seconds)
2025-08-07 09:30:55,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:30:58,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 145.05269 ± 126.067
2025-08-07 09:30:58,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [43.90342, 319.53644, 405.49652, 58.30366, 222.87581, 75.18984, 34.29961, 104.88396, 13.444236, 172.59332]
2025-08-07 09:30:58,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [89.0, 518.0, 375.0, 62.0, 305.0, 160.0, 103.0, 212.0, 99.0, 259.0]
2025-08-07 09:30:58,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 46 minutes)
2025-08-07 09:32:39,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:32:42,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 148.10428 ± 77.463
2025-08-07 09:32:42,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [241.4791, 114.74301, 143.12793, 243.4062, 50.02715, 132.22966, 168.30275, 122.275154, 254.47745, 10.974426]
2025-08-07 09:32:42,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [206.0, 185.0, 165.0, 329.0, 78.0, 236.0, 227.0, 214.0, 326.0, 105.0]
2025-08-07 09:32:42,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 43 minutes, 52 seconds)
2025-08-07 09:34:20,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:34:24,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 200.65227 ± 172.900
2025-08-07 09:34:24,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [79.83499, 155.92001, 144.40582, 131.54208, 27.165325, 21.666887, 476.2152, 462.99902, 77.38918, 429.3842]
2025-08-07 09:34:24,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [76.0, 153.0, 177.0, 131.0, 32.0, 84.0, 1000.0, 612.0, 103.0, 427.0]
2025-08-07 09:34:24,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 39 minutes, 5 seconds)
2025-08-07 09:36:05,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:11,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 240.28760 ± 211.834
2025-08-07 09:36:11,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [49.397587, 148.60954, 217.59291, 59.760857, 369.94675, 257.67276, 15.89981, 474.45996, 93.01441, 716.52155]
2025-08-07 09:36:11,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 274.0, 291.0, 80.0, 1000.0, 332.0, 21.0, 1000.0, 133.0, 1000.0]
2025-08-07 09:36:11,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 38 minutes, 9 seconds)
2025-08-07 09:37:55,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:00,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 235.39639 ± 187.541
2025-08-07 09:38:00,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [74.10564, 19.834524, 333.0748, 550.8575, 81.353195, 420.36722, 437.44702, 87.050545, 29.86338, 320.00998]
2025-08-07 09:38:00,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 111.0, 293.0, 1000.0, 103.0, 518.0, 1000.0, 104.0, 49.0, 372.0]
2025-08-07 09:38:00,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 38 minutes, 9 seconds)
2025-08-07 09:39:42,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:39:45,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 140.27060 ± 132.611
2025-08-07 09:39:45,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [412.78482, 252.62141, 71.13912, 296.45584, 197.33934, 48.522373, 21.482767, 23.334118, 35.378323, 43.647984]
2025-08-07 09:39:45,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 313.0, 91.0, 316.0, 243.0, 74.0, 54.0, 58.0, 56.0, 52.0]
2025-08-07 09:39:45,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 36 minutes, 37 seconds)
2025-08-07 09:41:21,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:41:30,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 369.71344 ± 200.022
2025-08-07 09:41:30,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [38.20127, 563.62177, 555.2611, 442.73932, 510.112, 520.39935, 188.91342, 165.95503, 574.85596, 137.0749]
2025-08-07 09:41:30,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 1000.0, 1000.0, 1000.0, 603.0, 1000.0, 282.0, 260.0, 1000.0, 224.0]
2025-08-07 09:41:30,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (369.71) for latency MM1Queue_a033_s075
2025-08-07 09:41:30,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 35 minutes, 5 seconds)
2025-08-07 09:43:15,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:43:19,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 182.53217 ± 141.162
2025-08-07 09:43:19,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [201.98117, 24.423597, 15.644199, 182.32707, 65.63175, 78.40479, 455.81927, 395.83063, 219.0742, 186.18495]
2025-08-07 09:43:19,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [225.0, 36.0, 36.0, 307.0, 110.0, 98.0, 1000.0, 402.0, 361.0, 233.0]
2025-08-07 09:43:19,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 34 minutes, 28 seconds)
2025-08-07 09:45:04,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:45:08,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 218.76010 ± 153.763
2025-08-07 09:45:08,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [264.8233, 12.521727, 378.94247, 148.4372, 549.9053, 311.4065, 49.726177, 159.43565, 117.39498, 195.00784]
2025-08-07 09:45:08,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [281.0, 27.0, 396.0, 163.0, 1000.0, 480.0, 66.0, 183.0, 138.0, 252.0]
2025-08-07 09:45:08,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 33 minutes, 1 second)
2025-08-07 09:46:40,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:46:48,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 318.18964 ± 226.725
2025-08-07 09:46:48,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [110.23966, 596.2614, 166.62328, 167.58237, 480.66458, 538.74786, 462.92407, 28.029673, 25.91414, 604.9093]
2025-08-07 09:46:48,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [166.0, 1000.0, 174.0, 214.0, 1000.0, 1000.0, 1000.0, 50.0, 38.0, 760.0]
2025-08-07 09:46:48,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 29 minutes, 40 seconds)
2025-08-07 09:48:36,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:48:41,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 278.43924 ± 249.115
2025-08-07 09:48:41,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [110.94165, 293.53357, 141.7471, 875.11395, 139.9968, 105.056755, 258.97812, 177.57709, 66.811485, 614.63574]
2025-08-07 09:48:41,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [200.0, 303.0, 152.0, 1000.0, 217.0, 89.0, 398.0, 239.0, 132.0, 1000.0]
2025-08-07 09:48:41,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 29 minutes, 18 seconds)
2025-08-07 09:50:15,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:50:21,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 294.94061 ± 221.255
2025-08-07 09:50:21,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [17.799683, 364.43893, 512.3225, 96.14244, 534.1456, 71.24822, 158.21663, 657.7633, 92.38062, 444.94794]
2025-08-07 09:50:21,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 402.0, 655.0, 163.0, 1000.0, 94.0, 256.0, 1000.0, 198.0, 469.0]
2025-08-07 09:50:21,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 26 minutes, 38 seconds)
2025-08-07 09:52:00,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:52:09,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 476.56674 ± 263.413
2025-08-07 09:52:09,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [134.6178, 425.40762, 793.06335, 857.4812, 539.9061, 496.53766, 659.7263, 84.62209, 140.69662, 633.60876]
2025-08-07 09:52:09,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [147.0, 542.0, 952.0, 1000.0, 694.0, 1000.0, 1000.0, 72.0, 172.0, 1000.0]
2025-08-07 09:52:09,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (476.57) for latency MM1Queue_a033_s075
2025-08-07 09:52:09,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 24 minutes, 52 seconds)
2025-08-07 09:53:59,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:54:06,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 366.31671 ± 257.672
2025-08-07 09:54:06,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [76.52147, 630.86755, 284.42862, 597.53094, 528.7547, 826.97644, 331.2471, 90.49444, 9.001304, 287.3444]
2025-08-07 09:54:06,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [82.0, 659.0, 293.0, 1000.0, 1000.0, 1000.0, 330.0, 114.0, 31.0, 233.0]
2025-08-07 09:54:06,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 24 minutes, 14 seconds)
2025-08-07 09:55:39,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:55:42,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 221.55698 ± 207.795
2025-08-07 09:55:42,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [726.98425, 16.678717, 250.40015, 32.52441, 181.74704, 409.05417, 243.44466, 21.22591, 251.70927, 81.801346]
2025-08-07 09:55:42,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [638.0, 25.0, 282.0, 49.0, 164.0, 464.0, 251.0, 34.0, 306.0, 133.0]
2025-08-07 09:55:42,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 21 minutes, 56 seconds)
2025-08-07 09:57:21,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:57:28,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 402.77686 ± 262.549
2025-08-07 09:57:28,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [578.4464, 51.4041, 666.28546, 852.70447, 100.32424, 576.6184, 380.0067, 208.42717, 512.5711, 100.98056]
2025-08-07 09:57:28,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [722.0, 73.0, 668.0, 1000.0, 117.0, 1000.0, 368.0, 168.0, 1000.0, 139.0]
2025-08-07 09:57:28,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 19 minutes, 6 seconds)
2025-08-07 09:59:08,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:59:12,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 227.71236 ± 129.752
2025-08-07 09:59:12,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [128.50284, 59.265827, 260.8249, 19.702679, 304.76822, 307.24887, 311.2276, 146.63304, 270.05096, 468.89862]
2025-08-07 09:59:12,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [314.0, 71.0, 288.0, 30.0, 352.0, 303.0, 316.0, 155.0, 301.0, 562.0]
2025-08-07 09:59:12,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 17 minutes, 52 seconds)
2025-08-07 10:00:54,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:01:01,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 493.47354 ± 403.647
2025-08-07 10:01:01,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [215.03862, 77.84239, 973.1678, 1015.0938, 64.20552, 45.23633, 95.90907, 765.3234, 747.9062, 935.0121]
2025-08-07 10:01:01,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [240.0, 75.0, 1000.0, 1000.0, 66.0, 63.0, 91.0, 927.0, 781.0, 1000.0]
2025-08-07 10:01:01,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (493.47) for latency MM1Queue_a033_s075
2025-08-07 10:01:01,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 16 minutes, 15 seconds)
2025-08-07 10:02:48,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:02:53,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 307.27808 ± 215.409
2025-08-07 10:02:53,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [871.9797, 198.70024, 274.15103, 396.0096, 311.28653, 52.46184, 162.73276, 408.13995, 247.41527, 149.90392]
2025-08-07 10:02:53,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 274.0, 283.0, 377.0, 302.0, 43.0, 186.0, 388.0, 270.0, 159.0]
2025-08-07 10:02:53,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 48 seconds)
2025-08-07 10:04:31,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:04:35,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 301.53271 ± 195.530
2025-08-07 10:04:35,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [222.68889, 157.54619, 44.6387, 145.44598, 418.85782, 350.0348, 317.06384, 620.977, 631.2351, 106.83878]
2025-08-07 10:04:35,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [269.0, 175.0, 56.0, 168.0, 406.0, 324.0, 369.0, 734.0, 684.0, 106.0]
2025-08-07 10:04:35,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 12 minutes, 47 seconds)
2025-08-07 10:06:10,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:06:18,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 443.55103 ± 309.326
2025-08-07 10:06:18,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [305.12003, 772.3275, 243.35152, 954.9973, 171.29753, 400.7923, 936.7339, 142.86461, 406.44595, 101.57971]
2025-08-07 10:06:18,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [365.0, 1000.0, 249.0, 1000.0, 196.0, 1000.0, 1000.0, 142.0, 457.0, 123.0]
2025-08-07 10:06:18,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 10 minutes, 36 seconds)
2025-08-07 10:08:03,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:08,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 272.81461 ± 198.956
2025-08-07 10:08:08,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [686.29675, 56.267643, 168.08679, 173.09546, 296.35782, 135.70195, 469.50296, 229.29863, 36.9825, 476.55554]
2025-08-07 10:08:08,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [873.0, 62.0, 229.0, 210.0, 354.0, 120.0, 1000.0, 339.0, 45.0, 578.0]
2025-08-07 10:08:08,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 9 minutes, 43 seconds)
2025-08-07 10:09:51,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:09:57,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 393.70984 ± 276.126
2025-08-07 10:09:57,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [251.70679, 139.14322, 16.956598, 205.95683, 706.3439, 266.79922, 949.6285, 663.4899, 409.4676, 327.60596]
2025-08-07 10:09:57,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [273.0, 135.0, 35.0, 210.0, 707.0, 244.0, 937.0, 1000.0, 438.0, 376.0]
2025-08-07 10:09:57,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 7 minutes, 51 seconds)
2025-08-07 10:11:32,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:11:39,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 452.64313 ± 349.406
2025-08-07 10:11:39,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [868.41766, 35.872383, 705.1508, 152.6838, 68.548676, 837.8242, 60.536804, 262.7035, 631.7933, 902.90027]
2025-08-07 10:11:39,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 34.0, 750.0, 210.0, 75.0, 813.0, 84.0, 303.0, 732.0, 1000.0]
2025-08-07 10:11:39,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 4 minutes, 52 seconds)
2025-08-07 10:13:20,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:13:27,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 409.94427 ± 302.615
2025-08-07 10:13:27,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [23.42806, 153.42937, 1003.15247, 440.73688, 63.055706, 684.5738, 488.2878, 258.9286, 725.9965, 257.85358]
2025-08-07 10:13:27,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 186.0, 1000.0, 464.0, 93.0, 1000.0, 522.0, 231.0, 839.0, 271.0]
2025-08-07 10:13:27,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes, 49 seconds)
2025-08-07 10:15:08,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:15,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 356.79486 ± 268.185
2025-08-07 10:15:15,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [105.88767, 547.58105, 590.487, 865.0051, 285.7537, 327.82324, 151.7133, 50.916817, 35.930286, 606.8507]
2025-08-07 10:15:15,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [108.0, 1000.0, 1000.0, 867.0, 278.0, 356.0, 176.0, 63.0, 54.0, 1000.0]
2025-08-07 10:15:15,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 2 minutes, 36 seconds)
2025-08-07 10:16:52,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:16:59,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 367.58563 ± 213.452
2025-08-07 10:16:59,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [179.97276, 160.00116, 737.756, 187.70732, 187.02356, 107.34583, 522.2728, 500.72598, 577.55743, 515.4938]
2025-08-07 10:16:59,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [178.0, 175.0, 1000.0, 203.0, 191.0, 182.0, 1000.0, 1000.0, 544.0, 513.0]
2025-08-07 10:16:59,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 seconds)
2025-08-07 10:18:48,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:55,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 405.23578 ± 327.868
2025-08-07 10:18:55,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [51.009747, 107.78173, 67.34113, 1029.3273, 465.0614, 365.14633, 24.877882, 767.15704, 515.9371, 658.71826]
2025-08-07 10:18:55,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 92.0, 95.0, 976.0, 447.0, 327.0, 38.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:18:55,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 59 minutes, 10 seconds)
2025-08-07 10:20:28,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:36,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 561.18384 ± 307.929
2025-08-07 10:20:36,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [492.31647, 393.99713, 937.3295, 20.632801, 871.11755, 868.1631, 931.8239, 492.13126, 214.9633, 389.36325]
2025-08-07 10:20:36,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [508.0, 457.0, 784.0, 42.0, 1000.0, 1000.0, 1000.0, 541.0, 213.0, 423.0]
2025-08-07 10:20:36,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (561.18) for latency MM1Queue_a033_s075
2025-08-07 10:20:36,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 57 minutes, 16 seconds)
2025-08-07 10:22:17,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:21,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 289.63901 ± 276.347
2025-08-07 10:22:21,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [88.53903, 139.9601, 57.84289, 281.12286, 949.47626, 393.926, 615.3288, 195.75655, 42.883636, 131.55421]
2025-08-07 10:22:21,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [100.0, 148.0, 59.0, 248.0, 934.0, 322.0, 1000.0, 216.0, 70.0, 140.0]
2025-08-07 10:22:21,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 55 minutes, 12 seconds)
2025-08-07 10:24:02,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 494.98169 ± 375.356
2025-08-07 10:24:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [154.38686, 1098.3646, 412.80515, 175.13893, 215.30951, 543.34924, 280.29486, 1051.9314, 946.4698, 71.7668]
2025-08-07 10:24:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 1000.0, 365.0, 190.0, 211.0, 464.0, 264.0, 1000.0, 1000.0, 60.0]
2025-08-07 10:24:09,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 53 minutes, 24 seconds)
2025-08-07 10:25:53,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:58,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 369.67511 ± 315.707
2025-08-07 10:25:58,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [239.71606, 57.23046, 279.35794, 49.80281, 204.94899, 339.69464, 790.64154, 981.03265, 66.83777, 687.48785]
2025-08-07 10:25:58,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [222.0, 58.0, 318.0, 58.0, 222.0, 397.0, 778.0, 1000.0, 106.0, 1000.0]
2025-08-07 10:25:58,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 52 minutes, 8 seconds)
2025-08-07 10:27:37,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:45,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 594.78436 ± 229.672
2025-08-07 10:27:45,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [385.85138, 912.4179, 832.52216, 455.96313, 547.77997, 563.34595, 856.84265, 551.4557, 126.817696, 714.8473]
2025-08-07 10:27:45,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [344.0, 883.0, 817.0, 386.0, 468.0, 460.0, 843.0, 1000.0, 121.0, 626.0]
2025-08-07 10:27:45,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (594.78) for latency MM1Queue_a033_s075
2025-08-07 10:27:45,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 49 minutes, 26 seconds)
2025-08-07 10:29:31,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:38,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 472.81543 ± 280.381
2025-08-07 10:29:38,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [640.46204, 181.6944, 824.0443, 331.73843, 14.676674, 278.4555, 454.0084, 468.9603, 1000.31586, 533.7984]
2025-08-07 10:29:38,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [595.0, 229.0, 1000.0, 349.0, 26.0, 269.0, 456.0, 489.0, 1000.0, 499.0]
2025-08-07 10:29:38,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 48 minutes, 47 seconds)
2025-08-07 10:31:14,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:19,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 442.46069 ± 296.729
2025-08-07 10:31:19,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [50.187263, 242.63931, 358.9724, 558.71765, 693.5421, 198.75969, 14.356034, 781.0174, 621.60504, 904.8099]
2025-08-07 10:31:19,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [56.0, 248.0, 287.0, 519.0, 691.0, 209.0, 26.0, 748.0, 740.0, 854.0]
2025-08-07 10:31:19,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 46 minutes, 40 seconds)
2025-08-07 10:33:06,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:14,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 537.56720 ± 237.297
2025-08-07 10:33:14,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [665.808, 144.3657, 267.00018, 629.42267, 650.9582, 393.28806, 580.23303, 1014.78784, 363.01883, 666.78925]
2025-08-07 10:33:14,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 141.0, 255.0, 713.0, 536.0, 366.0, 576.0, 1000.0, 420.0, 1000.0]
2025-08-07 10:33:14,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 45 minutes, 27 seconds)
2025-08-07 10:34:56,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:01,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 352.65076 ± 307.772
2025-08-07 10:35:01,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [139.71828, 126.712364, 99.2749, 459.5784, 840.6797, 788.56573, 188.06792, 15.541991, 123.54412, 744.8243]
2025-08-07 10:35:01,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [149.0, 231.0, 83.0, 409.0, 711.0, 827.0, 157.0, 29.0, 126.0, 657.0]
2025-08-07 10:35:01,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 43 minutes, 23 seconds)
2025-08-07 10:36:34,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:41,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 382.83881 ± 296.019
2025-08-07 10:36:41,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [171.97623, 68.80175, 502.34985, 191.03477, 397.01526, 221.01147, 1037.8605, 778.62756, 122.11343, 337.5973]
2025-08-07 10:36:41,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [127.0, 72.0, 1000.0, 155.0, 1000.0, 243.0, 1000.0, 1000.0, 122.0, 299.0]
2025-08-07 10:36:41,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 7 seconds)
2025-08-07 10:38:23,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:28,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 381.25528 ± 322.257
2025-08-07 10:38:28,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [90.11993, 143.5225, 508.82938, 759.8197, 46.140224, 786.33887, 881.4929, 470.69092, 53.5354, 72.06303]
2025-08-07 10:38:28,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [97.0, 173.0, 390.0, 1000.0, 91.0, 674.0, 1000.0, 551.0, 69.0, 88.0]
2025-08-07 10:38:28,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 53 seconds)
2025-08-07 10:40:11,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:14,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 301.90924 ± 236.791
2025-08-07 10:40:14,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [281.83875, 825.44727, 615.7577, 280.60266, 153.45944, 31.205017, 329.44437, 111.065636, 321.53418, 68.737305]
2025-08-07 10:40:14,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [324.0, 718.0, 486.0, 255.0, 114.0, 37.0, 346.0, 108.0, 309.0, 52.0]
2025-08-07 10:40:14,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 26 seconds)
2025-08-07 10:41:51,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:56,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 339.82025 ± 264.350
2025-08-07 10:41:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [817.27454, 508.12677, 142.49022, 65.58793, 438.33514, 33.464672, 461.01398, 659.6081, 24.568846, 247.73218]
2025-08-07 10:41:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [800.0, 1000.0, 111.0, 66.0, 398.0, 54.0, 399.0, 593.0, 36.0, 224.0]
2025-08-07 10:41:56,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 48 seconds)
2025-08-07 10:43:41,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:46,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 343.12808 ± 229.147
2025-08-07 10:43:46,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [153.30147, 104.00928, 87.98933, 645.9827, 130.26968, 728.20996, 618.4179, 292.5752, 296.5634, 373.9619]
2025-08-07 10:43:46,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [227.0, 93.0, 96.0, 673.0, 103.0, 735.0, 1000.0, 332.0, 259.0, 384.0]
2025-08-07 10:43:46,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 15 seconds)
2025-08-07 10:45:29,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:38,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 698.45032 ± 358.001
2025-08-07 10:45:38,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [556.04193, 634.8465, 860.74725, 1282.9033, 790.8549, 1019.8227, 987.2822, 196.64412, 616.99207, 38.367508]
2025-08-07 10:45:38,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [466.0, 669.0, 768.0, 1000.0, 639.0, 1000.0, 922.0, 186.0, 545.0, 85.0]
2025-08-07 10:45:38,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (698.45) for latency MM1Queue_a033_s075
2025-08-07 10:45:38,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 12 seconds)
2025-08-07 10:47:13,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:18,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 400.42044 ± 272.602
2025-08-07 10:47:18,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [79.46781, 488.93008, 1006.7004, 488.27997, 509.1292, 41.451096, 359.3728, 514.1284, 433.48843, 83.256004]
2025-08-07 10:47:18,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 456.0, 918.0, 497.0, 475.0, 57.0, 319.0, 547.0, 378.0, 84.0]
2025-08-07 10:47:18,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 1 second)
2025-08-07 10:49:00,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:05,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 333.95343 ± 267.558
2025-08-07 10:49:05,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [228.16656, 541.8196, 18.347637, 104.34402, 443.5216, 113.61129, 945.7977, 460.42307, 98.155, 385.34753]
2025-08-07 10:49:05,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [188.0, 390.0, 25.0, 100.0, 396.0, 113.0, 1000.0, 506.0, 117.0, 331.0]
2025-08-07 10:49:05,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 17 seconds)
2025-08-07 10:50:43,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:47,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 277.98724 ± 158.348
2025-08-07 10:50:47,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [353.92804, 76.53056, 341.2719, 385.235, 403.45117, 29.346798, 212.6606, 85.31067, 528.0, 364.1377]
2025-08-07 10:50:47,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [387.0, 106.0, 290.0, 334.0, 306.0, 42.0, 195.0, 75.0, 1000.0, 299.0]
2025-08-07 10:50:47,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 32 seconds)
2025-08-07 10:52:32,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:35,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 276.07513 ± 166.430
2025-08-07 10:52:35,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [348.02972, 539.64856, 155.73691, 445.9494, 65.28618, 364.75113, 426.09927, 244.76077, 13.182456, 157.30734]
2025-08-07 10:52:35,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [288.0, 515.0, 174.0, 339.0, 83.0, 318.0, 450.0, 254.0, 36.0, 162.0]
2025-08-07 10:52:35,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 42 seconds)
2025-08-07 10:54:16,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:23,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 485.19708 ± 414.267
2025-08-07 10:54:23,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [109.125984, 499.71503, 239.22026, 29.68857, 845.3275, 1296.5428, 215.64896, 59.590332, 993.785, 563.3259]
2025-08-07 10:54:23,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [90.0, 466.0, 196.0, 35.0, 1000.0, 1000.0, 227.0, 69.0, 1000.0, 1000.0]
2025-08-07 10:54:23,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 44 seconds)
2025-08-07 10:56:04,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:11,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 403.25824 ± 275.470
2025-08-07 10:56:11,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [282.79453, 212.01329, 655.7377, 11.693298, 579.68506, 842.00507, 737.7873, 421.92648, 246.58517, 42.354786]
2025-08-07 10:56:11,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [243.0, 162.0, 1000.0, 28.0, 1000.0, 752.0, 1000.0, 318.0, 203.0, 38.0]
2025-08-07 10:56:11,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 18 seconds)
2025-08-07 10:57:48,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 595.50183 ± 338.900
2025-08-07 10:57:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [711.46826, 734.4769, 290.68585, 464.27322, 119.11015, 96.31242, 1178.7977, 961.8137, 828.7308, 569.3488]
2025-08-07 10:57:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [706.0, 623.0, 299.0, 1000.0, 164.0, 79.0, 1000.0, 793.0, 868.0, 1000.0]
2025-08-07 10:57:57,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 30 seconds)
2025-08-07 10:59:42,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:54,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 764.00238 ± 295.304
2025-08-07 10:59:54,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [544.5433, 1136.9882, 1123.0042, 788.06683, 657.97473, 497.19623, 648.2513, 1081.1403, 963.68256, 199.1764]
2025-08-07 10:59:54,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 659.0, 1000.0, 1000.0, 224.0]
2025-08-07 10:59:54,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1226 [INFO]: New best (764.00) for latency MM1Queue_a033_s075
2025-08-07 10:59:54,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 13 seconds)
2025-08-07 11:01:31,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:35,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 367.69006 ± 279.397
2025-08-07 11:01:35,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [71.057556, 205.77669, 1099.5, 331.8072, 58.182205, 469.2094, 434.601, 434.27295, 311.84317, 260.65042]
2025-08-07 11:01:35,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 218.0, 1000.0, 320.0, 71.0, 426.0, 354.0, 443.0, 324.0, 232.0]
2025-08-07 11:01:35,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 12 seconds)
2025-08-07 11:03:19,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:27,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 573.23474 ± 375.500
2025-08-07 11:03:27,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [329.4155, 102.17818, 307.86148, 1043.0115, 761.83374, 1313.3075, 456.1903, 696.6377, 100.94167, 620.9703]
2025-08-07 11:03:27,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [299.0, 87.0, 306.0, 1000.0, 1000.0, 1000.0, 391.0, 1000.0, 87.0, 1000.0]
2025-08-07 11:03:27,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 31 seconds)
2025-08-07 11:05:05,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:07,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 208.58664 ± 182.100
2025-08-07 11:05:07,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [177.86177, 413.63525, 148.99727, 21.721151, 30.051548, 278.9904, 170.30893, 626.6639, 201.42407, 16.212234]
2025-08-07 11:05:07,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [236.0, 317.0, 135.0, 36.0, 45.0, 197.0, 138.0, 520.0, 167.0, 25.0]
2025-08-07 11:05:07,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 31 seconds)
2025-08-07 11:06:50,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:56,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 487.48993 ± 376.719
2025-08-07 11:06:56,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [907.77625, 535.1356, 485.12692, 328.0866, 337.53302, 47.15822, 116.748184, 849.1204, 47.911484, 1220.3029]
2025-08-07 11:06:56,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 440.0, 1000.0, 259.0, 275.0, 53.0, 106.0, 642.0, 60.0, 1000.0]
2025-08-07 11:06:56,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 47 seconds)
2025-08-07 11:08:44,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:50,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 497.99487 ± 393.278
2025-08-07 11:08:50,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [1197.2811, 222.51385, 576.0693, 255.85594, 1208.1438, 9.706296, 153.77347, 554.44525, 531.2017, 270.9578]
2025-08-07 11:08:50,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 242.0, 515.0, 210.0, 1000.0, 24.0, 151.0, 474.0, 540.0, 245.0]
2025-08-07 11:08:50,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 56 seconds)
2025-08-07 11:10:24,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:32,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 571.26489 ± 331.341
2025-08-07 11:10:32,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [517.1662, 1021.1852, 500.62112, 124.73355, 560.3113, 391.03928, 199.89749, 985.5005, 312.60812, 1099.5864]
2025-08-07 11:10:32,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 910.0, 400.0, 117.0, 544.0, 418.0, 152.0, 1000.0, 250.0, 1000.0]
2025-08-07 11:10:32,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 9 seconds)
2025-08-07 11:12:13,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:19,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 386.89883 ± 263.007
2025-08-07 11:12:19,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [827.0761, 173.26645, 513.05457, 579.6625, 746.19934, 23.198322, 191.33047, 102.28138, 264.87167, 448.04767]
2025-08-07 11:12:19,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [730.0, 174.0, 1000.0, 566.0, 1000.0, 58.0, 144.0, 88.0, 232.0, 430.0]
2025-08-07 11:12:19,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 19 seconds)
2025-08-07 11:14:04,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:13,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 652.05762 ± 373.970
2025-08-07 11:14:13,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [273.05286, 417.9851, 338.66977, 407.87408, 891.7636, 914.582, 1169.505, 853.5022, 76.23892, 1177.4028]
2025-08-07 11:14:13,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [201.0, 354.0, 361.0, 1000.0, 749.0, 806.0, 1000.0, 755.0, 90.0, 1000.0]
2025-08-07 11:14:13,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 38 seconds)
2025-08-07 11:15:57,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:02,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 421.60211 ± 397.357
2025-08-07 11:16:02,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [1131.8511, 607.37274, 1147.006, 186.99666, 109.150116, 111.81116, 441.78555, 358.53372, 73.28066, 48.233444]
2025-08-07 11:16:02,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 440.0, 814.0, 164.0, 94.0, 127.0, 381.0, 336.0, 54.0, 63.0]
2025-08-07 11:16:02,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 49 seconds)
2025-08-07 11:17:35,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:38,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1221 [DEBUG]: Total Reward: 303.99811 ± 337.042
2025-08-07 11:17:38,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1222 [DEBUG]: All rewards: [110.36089, 312.1514, 415.40396, 100.49493, 282.9309, 1257.2494, 15.512234, 195.56532, 235.44421, 114.8677]
2025-08-07 11:17:38,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1223 [DEBUG]: All trajectory lengths: [92.0, 262.0, 344.0, 99.0, 196.0, 1000.0, 27.0, 176.0, 196.0, 118.0]
2025-08-07 11:17:38,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-ant):1251 [DEBUG]: Training session finished
