2025-08-07 08:31:51,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 08:31:51,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 08:31:51,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14664a032b50>}
2025-08-07 08:31:51,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1111 [DEBUG]: using device: cuda
2025-08-07 08:31:51,988 baseline-bpql-noiseperc20-ant:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 08:31:51,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1133 [INFO]: Creating new trainer
2025-08-07 08:31:52,006 baseline-bpql-noiseperc20-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=155, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 08:31:52,006 baseline-bpql-noiseperc20-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 08:31:53,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1194 [DEBUG]: Starting training session...
2025-08-07 08:31:53,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 1/100
2025-08-07 08:33:29,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:33:31,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -154.34871 ± 387.261
2025-08-07 08:33:31,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-22.258553, -23.799026, -44.588005, -42.535725, -1315.0017, -19.929918, -3.0965347, -2.0924015, -12.9484625, -57.23673]
2025-08-07 08:33:31,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 32.0, 74.0, 48.0, 1000.0, 75.0, 35.0, 21.0, 29.0, 64.0]
2025-08-07 08:33:31,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-154.35) for latency MM1Queue_a033_s075
2025-08-07 08:33:31,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 40 minutes, 14 seconds)
2025-08-07 08:35:15,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:35:16,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -41.89381 ± 36.616
2025-08-07 08:35:16,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-71.85216, 1.5502496, -16.537209, 5.117973, -7.782549, -114.485085, -47.379787, -75.544334, -37.642258, -54.382973]
2025-08-07 08:35:16,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [73.0, 30.0, 58.0, 32.0, 37.0, 154.0, 59.0, 97.0, 47.0, 76.0]
2025-08-07 08:35:16,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-41.89) for latency MM1Queue_a033_s075
2025-08-07 08:35:16,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 45 minutes, 3 seconds)
2025-08-07 08:36:59,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:36:59,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -9.51946 ± 14.260
2025-08-07 08:36:59,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [0.38150966, -3.264696, -0.95838195, 8.879401, -7.1654515, -22.641117, -12.435117, -27.37753, 6.0514317, -36.664646]
2025-08-07 08:36:59,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 43.0, 33.0, 61.0, 26.0, 38.0, 31.0, 55.0, 92.0, 67.0]
2025-08-07 08:36:59,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-9.52) for latency MM1Queue_a033_s075
2025-08-07 08:36:59,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 44 minutes, 52 seconds)
2025-08-07 08:38:37,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:38:40,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -128.30466 ± 251.793
2025-08-07 08:38:40,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-33.305584, -86.985634, -100.09617, 1.130999, -5.695925, -69.078156, -17.113136, -15.917907, -79.485916, -876.4992]
2025-08-07 08:38:40,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [91.0, 142.0, 146.0, 49.0, 32.0, 134.0, 71.0, 77.0, 167.0, 1000.0]
2025-08-07 08:38:40,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 42 minutes, 29 seconds)
2025-08-07 08:40:21,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:40:22,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -14.74875 ± 42.925
2025-08-07 08:40:22,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-5.6672606, -2.3153086, 4.5138283, 2.709956, -3.7232888, 20.665842, -88.33128, 3.5325289, 28.274162, -107.146736]
2025-08-07 08:40:22,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [88.0, 71.0, 23.0, 81.0, 99.0, 36.0, 158.0, 83.0, 70.0, 177.0]
2025-08-07 08:40:22,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 41 minutes, 1 second)
2025-08-07 08:42:04,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:42:08,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -60.91040 ± 90.910
2025-08-07 08:42:08,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-31.532349, -112.37595, 35.173576, -0.30173337, -51.005512, -7.3163266, -261.61453, -0.58379436, -183.76872, 4.221376]
2025-08-07 08:42:08,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 292.0, 156.0, 78.0, 249.0, 83.0, 1000.0, 32.0, 1000.0, 27.0]
2025-08-07 08:42:08,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 42 minutes, 6 seconds)
2025-08-07 08:43:50,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:43:53,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -59.38883 ± 108.959
2025-08-07 08:43:53,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-21.499441, -71.682655, -50.32858, -31.420153, 21.167494, -8.791974, -46.738853, -377.00446, -3.7842171, -3.805403]
2025-08-07 08:43:53,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [80.0, 146.0, 213.0, 138.0, 123.0, 57.0, 58.0, 1000.0, 98.0, 66.0]
2025-08-07 08:43:53,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 40 minutes, 27 seconds)
2025-08-07 08:45:33,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:45:36,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -87.05832 ± 142.342
2025-08-07 08:45:36,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-495.17416, -11.350633, -95.77772, -141.70346, -33.456264, -11.507028, -45.625668, -18.072832, -7.845477, -10.069918]
2025-08-07 08:45:36,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 130.0, 154.0, 236.0, 91.0, 72.0, 112.0, 47.0, 37.0, 85.0]
2025-08-07 08:45:36,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 38 minutes, 24 seconds)
2025-08-07 08:47:19,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:47:21,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -53.12538 ± 164.482
2025-08-07 08:47:21,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-17.840202, 29.096306, 35.075195, -72.182655, 12.871946, 6.542132, 1.0054636, 9.1399765, 4.3421335, -539.3041]
2025-08-07 08:47:21,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 73.0, 59.0, 196.0, 27.0, 32.0, 28.0, 47.0, 19.0, 1000.0]
2025-08-07 08:47:21,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 38 minutes, 2 seconds)
2025-08-07 08:49:01,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:49:04,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -87.02851 ± 159.587
2025-08-07 08:49:04,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-79.32662, -35.335457, -125.574165, 10.884368, -550.70764, -54.925755, 0.9126986, -19.110928, 0.98363537, -18.085203]
2025-08-07 08:49:04,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [76.0, 123.0, 309.0, 33.0, 1000.0, 153.0, 80.0, 66.0, 26.0, 134.0]
2025-08-07 08:49:04,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 36 minutes, 37 seconds)
2025-08-07 08:50:51,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:50:52,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -8.41570 ± 14.958
2025-08-07 08:50:52,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-7.2115254, -39.51803, -11.14587, -8.187195, -17.517607, -22.136663, -5.0798473, 16.219099, 5.8159494, 4.604681]
2025-08-07 08:50:52,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [136.0, 68.0, 28.0, 62.0, 57.0, 86.0, 62.0, 74.0, 62.0, 38.0]
2025-08-07 08:50:52,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-8.42) for latency MM1Queue_a033_s075
2025-08-07 08:50:52,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 35 minutes, 33 seconds)
2025-08-07 08:52:27,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:52:29,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -20.13785 ± 21.024
2025-08-07 08:52:29,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-4.549941, -18.168009, -10.477785, 15.967588, -59.349094, 0.2573062, -15.783449, -35.35599, -35.168907, -38.750164]
2025-08-07 08:52:29,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [75.0, 71.0, 56.0, 80.0, 66.0, 132.0, 115.0, 102.0, 97.0, 77.0]
2025-08-07 08:52:29,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 31 minutes, 11 seconds)
2025-08-07 08:54:19,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:54:20,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -7.59096 ± 25.256
2025-08-07 08:54:20,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-13.005938, -53.681374, 24.56809, -20.95977, -39.903156, -2.4939919, 5.404318, 6.811352, -14.088121, 31.438993]
2025-08-07 08:54:20,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 103.0, 30.0, 112.0, 53.0, 30.0, 22.0, 47.0, 48.0, 93.0]
2025-08-07 08:54:20,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (-7.59) for latency MM1Queue_a033_s075
2025-08-07 08:54:20,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 31 minutes, 59 seconds)
2025-08-07 08:55:58,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:56:01,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -45.12315 ± 92.071
2025-08-07 08:56:01,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [18.19976, -6.6994386, -0.4307213, 14.501892, 20.34712, -4.656023, -146.72943, -270.99612, 18.946302, -93.71488]
2025-08-07 08:56:01,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 53.0, 41.0, 42.0, 72.0, 36.0, 1000.0, 1000.0, 41.0, 118.0]
2025-08-07 08:56:01,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 29 minutes, 12 seconds)
2025-08-07 08:57:37,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:57:38,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -17.86141 ± 46.434
2025-08-07 08:57:38,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-8.04874, -151.8925, 5.3087087, -12.875695, 19.218456, 12.705726, -24.742205, 3.5562944, -7.250869, -14.59329]
2025-08-07 08:57:38,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 200.0, 15.0, 76.0, 69.0, 119.0, 40.0, 31.0, 54.0, 53.0]
2025-08-07 08:57:38,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 25 minutes, 39 seconds)
2025-08-07 08:59:20,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:59:23,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 6.08911 ± 71.209
2025-08-07 08:59:23,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-0.4043646, 14.465323, -20.010885, -84.512856, 114.266685, 18.52149, -97.39104, -10.561894, 141.7655, -15.246817]
2025-08-07 08:59:23,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 31.0, 37.0, 95.0, 1000.0, 43.0, 150.0, 37.0, 1000.0, 78.0]
2025-08-07 08:59:23,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (6.09) for latency MM1Queue_a033_s075
2025-08-07 08:59:23,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 23 minutes, 2 seconds)
2025-08-07 09:01:10,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:01:11,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -25.34255 ± 43.252
2025-08-07 09:01:11,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [3.3523202, -16.085648, -4.308766, 25.919394, -99.027534, -72.02249, 30.311972, -7.7115555, -86.223236, -27.629946]
2025-08-07 09:01:11,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 63.0, 96.0, 68.0, 97.0, 131.0, 73.0, 29.0, 101.0, 58.0]
2025-08-07 09:01:11,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 24 minutes, 31 seconds)
2025-08-07 09:02:50,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:02:52,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 6.65802 ± 57.470
2025-08-07 09:02:52,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-28.002138, -21.397615, -63.316166, 6.313246, -23.41203, 37.39951, 9.822847, 8.716939, -19.763609, 160.21918]
2025-08-07 09:02:52,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 32.0, 166.0, 81.0, 56.0, 56.0, 57.0, 81.0, 36.0, 1000.0]
2025-08-07 09:02:52,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (6.66) for latency MM1Queue_a033_s075
2025-08-07 09:02:52,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 20 minutes, 4 seconds)
2025-08-07 09:04:30,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:04:32,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 4.17351 ± 31.334
2025-08-07 09:04:32,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [4.3679957, -26.433523, 38.17468, 22.085522, 8.118611, -62.42397, 13.1317625, 7.5887065, -16.422365, 53.547665]
2025-08-07 09:04:32,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 124.0, 1000.0, 35.0, 31.0, 163.0, 21.0, 47.0, 140.0, 59.0]
2025-08-07 09:04:32,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 17 minutes, 59 seconds)
2025-08-07 09:06:19,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:06:20,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -5.96251 ± 26.854
2025-08-07 09:06:20,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [10.353657, -12.276231, 15.155833, 44.53368, -10.765437, 2.3580139, -41.90793, -55.469486, 0.28466174, -11.8918705]
2025-08-07 09:06:20,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [82.0, 72.0, 28.0, 125.0, 182.0, 50.0, 204.0, 205.0, 45.0, 96.0]
2025-08-07 09:06:20,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 19 minutes, 12 seconds)
2025-08-07 09:07:59,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:07:59,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -17.09647 ± 27.758
2025-08-07 09:07:59,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-13.557479, -54.909527, -7.5863967, 7.4414096, -5.5088844, -62.017754, -54.22273, -4.507033, 2.1216846, 21.78206]
2025-08-07 09:07:59,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 106.0, 35.0, 48.0, 33.0, 120.0, 80.0, 67.0, 25.0, 32.0]
2025-08-07 09:07:59,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 58 seconds)
2025-08-07 09:09:34,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:09:35,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 2.12917 ± 17.677
2025-08-07 09:09:35,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [31.438656, -6.460615, -3.4700875, 28.961323, -11.605687, -27.882936, -11.525059, 1.102308, 14.120353, 6.6134973]
2025-08-07 09:09:35,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 29.0, 28.0, 60.0, 61.0, 95.0, 41.0, 35.0, 30.0, 164.0]
2025-08-07 09:09:35,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 10 minutes, 57 seconds)
2025-08-07 09:11:19,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:11:20,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -9.08733 ± 21.742
2025-08-07 09:11:20,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-2.3587763, 6.184327, -26.318377, -58.744335, 19.136177, 11.2996645, -6.4100003, -4.0065813, -1.1352936, -28.520102]
2025-08-07 09:11:20,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 78.0, 111.0, 164.0, 28.0, 124.0, 57.0, 26.0, 25.0, 125.0]
2025-08-07 09:11:20,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 20 seconds)
2025-08-07 09:12:58,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:13:00,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 11.15589 ± 40.126
2025-08-07 09:13:00,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [1.969721, 0.81520903, 118.542076, 22.209684, 10.205813, 5.5283127, -51.5037, 0.4491533, 2.6619275, 0.6807533]
2025-08-07 09:13:00,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [143.0, 28.0, 1000.0, 34.0, 86.0, 30.0, 126.0, 28.0, 25.0, 26.0]
2025-08-07 09:13:00,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (11.16) for latency MM1Queue_a033_s075
2025-08-07 09:13:00,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 8 minutes, 35 seconds)
2025-08-07 09:14:37,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:14:39,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 23.54163 ± 89.620
2025-08-07 09:14:39,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [5.823628, 11.0576725, 270.06122, -45.576385, 50.110344, 16.500538, 28.473703, -17.157211, -82.47356, -1.4036299]
2025-08-07 09:14:39,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 25.0, 1000.0, 195.0, 75.0, 37.0, 46.0, 73.0, 129.0, 70.0]
2025-08-07 09:14:39,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (23.54) for latency MM1Queue_a033_s075
2025-08-07 09:14:39,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 4 minutes, 43 seconds)
2025-08-07 09:16:19,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:16:23,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 40.54447 ± 65.817
2025-08-07 09:16:23,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [157.48293, -22.779465, 29.880127, 4.571183, 14.161582, 17.430096, 38.69116, -0.95882344, -10.27827, 177.24416]
2025-08-07 09:16:23,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 34.0, 33.0, 31.0, 49.0, 60.0, 152.0, 32.0, 71.0, 1000.0]
2025-08-07 09:16:23,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (40.54) for latency MM1Queue_a033_s075
2025-08-07 09:16:23,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 10 seconds)
2025-08-07 09:18:02,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:18:03,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 4.58575 ± 18.981
2025-08-07 09:18:03,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [1.5878502, 4.2418337, 28.664747, 0.40711772, 3.3163764, -37.53906, 20.694824, -16.157104, 24.525787, 16.115107]
2025-08-07 09:18:03,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 34.0, 66.0, 108.0, 53.0, 90.0, 65.0, 47.0, 38.0, 34.0]
2025-08-07 09:18:03,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 3 minutes, 40 seconds)
2025-08-07 09:19:51,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:19:53,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 23.36908 ± 54.295
2025-08-07 09:19:53,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [14.910247, -3.5747051, 20.483633, 182.48584, 10.818796, -3.3794389, 25.477377, -14.095955, -1.7185694, 2.283587]
2025-08-07 09:19:53,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 29.0, 28.0, 1000.0, 23.0, 37.0, 39.0, 36.0, 81.0, 32.0]
2025-08-07 09:19:53,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 3 minutes, 6 seconds)
2025-08-07 09:21:28,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:21:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 18.45098 ± 69.661
2025-08-07 09:21:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-10.595687, 3.3724298, -4.163188, 28.508795, -47.698677, 24.42198, 216.96786, -29.707191, 10.138311, -6.7348166]
2025-08-07 09:21:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 24.0, 51.0, 34.0, 138.0, 40.0, 1000.0, 33.0, 27.0, 62.0]
2025-08-07 09:21:30,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 42 seconds)
2025-08-07 09:23:06,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:23:08,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 10.23547 ± 54.850
2025-08-07 09:23:08,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-13.935585, -1.4180416, -1.4873048, -34.766033, 166.41168, -18.480284, -19.614023, -1.5556573, -8.018596, 35.21855]
2025-08-07 09:23:08,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 27.0, 103.0, 91.0, 1000.0, 77.0, 92.0, 36.0, 45.0, 83.0]
2025-08-07 09:23:08,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 58 minutes, 44 seconds)
2025-08-07 09:24:56,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:25:00,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 44.05086 ± 78.794
2025-08-07 09:25:00,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [25.32457, 0.6382057, 207.51585, 13.901628, 5.769669, 19.395054, -28.371634, 18.846704, -11.933268, 189.4218]
2025-08-07 09:25:00,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 67.0, 1000.0, 51.0, 31.0, 104.0, 99.0, 70.0, 29.0, 1000.0]
2025-08-07 09:25:00,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (44.05) for latency MM1Queue_a033_s075
2025-08-07 09:25:00,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 54 seconds)
2025-08-07 09:26:33,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:26:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 47.15949 ± 55.173
2025-08-07 09:26:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [26.633678, 113.40085, 22.167511, 6.3114524, 98.254555, 20.887339, 17.002516, 166.0092, -17.45014, 18.377924]
2025-08-07 09:26:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [93.0, 1000.0, 120.0, 83.0, 1000.0, 50.0, 126.0, 1000.0, 88.0, 129.0]
2025-08-07 09:26:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (47.16) for latency MM1Queue_a033_s075
2025-08-07 09:26:38,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 56 minutes, 45 seconds)
2025-08-07 09:28:23,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:28:24,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 15.87656 ± 27.381
2025-08-07 09:28:24,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [75.13528, -1.2822975, -12.372242, 8.71858, 10.913249, 15.530745, -17.183226, 54.10291, 23.593618, 1.60902]
2025-08-07 09:28:24,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [108.0, 47.0, 61.0, 105.0, 50.0, 37.0, 91.0, 90.0, 105.0, 24.0]
2025-08-07 09:28:24,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 54 minutes)
2025-08-07 09:29:59,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:30:01,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 14.82951 ± 19.442
2025-08-07 09:30:01,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [40.71154, 5.8454566, -21.593979, 43.51284, -4.755423, 30.810596, 7.453902, 12.283495, 7.7796416, 26.247034]
2025-08-07 09:30:01,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 62.0, 40.0, 50.0, 64.0, 39.0, 51.0, 42.0, 37.0, 120.0]
2025-08-07 09:30:01,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 52 minutes, 26 seconds)
2025-08-07 09:31:47,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:31:49,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 24.46871 ± 36.294
2025-08-07 09:31:49,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [27.722033, -10.274329, 13.123688, 1.2165041, 123.736694, 16.393522, 18.006788, 14.559021, -4.2702093, 44.473362]
2025-08-07 09:31:49,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 36.0, 35.0, 22.0, 1000.0, 31.0, 27.0, 69.0, 64.0, 57.0]
2025-08-07 09:31:49,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 52 minutes, 54 seconds)
2025-08-07 09:33:20,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:33:20,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 11.36768 ± 32.032
2025-08-07 09:33:20,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [23.329922, 36.462555, 7.6341105, -78.10749, 35.712105, 24.130703, 18.134268, 0.9778993, 34.98795, 10.414819]
2025-08-07 09:33:20,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [132.0, 53.0, 29.0, 67.0, 59.0, 52.0, 45.0, 61.0, 30.0, 86.0]
2025-08-07 09:33:20,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 46 minutes, 46 seconds)
2025-08-07 09:35:00,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:35:01,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -3.53812 ± 34.845
2025-08-07 09:35:01,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [20.599413, -39.68793, 0.16551994, 19.80454, -2.5804968, 5.850655, -85.137215, 50.690315, -3.934367, -1.151644]
2025-08-07 09:35:01,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 97.0, 27.0, 55.0, 38.0, 25.0, 127.0, 72.0, 24.0, 25.0]
2025-08-07 09:35:01,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 45 minutes, 35 seconds)
2025-08-07 09:36:47,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:49,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: -0.11073 ± 50.364
2025-08-07 09:36:49,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-6.9355793, 26.0033, 4.9345007, -131.90372, 59.81465, -32.682358, 22.486328, 44.32788, 9.538507, 3.309156]
2025-08-07 09:36:49,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 36.0, 35.0, 174.0, 202.0, 110.0, 111.0, 782.0, 82.0, 25.0]
2025-08-07 09:36:49,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 44 minutes, 20 seconds)
2025-08-07 09:38:30,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:33,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 65.63040 ± 52.139
2025-08-07 09:38:33,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [180.96848, 17.44863, 21.408148, 96.31651, 58.594975, 6.762004, 96.72335, 106.38564, 55.1685, 16.527773]
2025-08-07 09:38:33,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 50.0, 28.0, 114.0, 132.0, 44.0, 181.0, 1000.0, 73.0, 34.0]
2025-08-07 09:38:33,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (65.63) for latency MM1Queue_a033_s075
2025-08-07 09:38:33,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 44 minutes, 11 seconds)
2025-08-07 09:40:11,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:40:13,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 28.56571 ± 63.705
2025-08-07 09:40:13,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [3.8589787, 1.8531274, 6.8540063, 44.08975, 7.4869485, -17.264572, -19.34675, 211.9375, 26.391533, 19.796595]
2025-08-07 09:40:13,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 129.0, 33.0, 73.0, 28.0, 69.0, 73.0, 1000.0, 66.0, 57.0]
2025-08-07 09:40:13,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 40 minutes, 47 seconds)
2025-08-07 09:41:49,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:41:51,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 31.83274 ± 57.895
2025-08-07 09:41:51,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-5.8193707, 61.45342, 12.548323, -6.601291, 45.321182, 22.84382, 191.96208, 10.60299, -8.573567, -5.410113]
2025-08-07 09:41:51,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 103.0, 91.0, 33.0, 60.0, 61.0, 1000.0, 39.0, 98.0, 22.0]
2025-08-07 09:41:51,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 40 minutes, 27 seconds)
2025-08-07 09:43:27,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:43:28,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 14.62662 ± 30.350
2025-08-07 09:43:28,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [11.8368435, -3.0837047, 53.713634, 31.921812, -23.851519, -17.923468, 14.36455, -14.130013, 75.32005, 18.098055]
2025-08-07 09:43:28,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 16.0, 132.0, 52.0, 96.0, 177.0, 28.0, 101.0, 80.0, 79.0]
2025-08-07 09:43:28,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 38 minutes, 7 seconds)
2025-08-07 09:45:08,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:45:09,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 11.08013 ± 13.899
2025-08-07 09:45:09,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [8.373706, -4.866633, 34.57105, 19.769812, 6.0140615, 3.4461904, -2.8322713, -4.047031, 33.510574, 16.861872]
2025-08-07 09:45:09,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 148.0, 29.0, 44.0, 41.0, 30.0, 27.0, 28.0, 57.0, 25.0]
2025-08-07 09:45:09,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 35 minutes, 4 seconds)
2025-08-07 09:46:48,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:46:49,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 22.55407 ± 31.610
2025-08-07 09:46:49,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [22.63267, 18.086203, 5.6192727, 5.2050953, -3.5465317, 104.00403, 17.732084, 0.24013104, 0.46065372, 55.107075]
2025-08-07 09:46:49,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [116.0, 44.0, 14.0, 59.0, 22.0, 172.0, 75.0, 78.0, 55.0, 104.0]
2025-08-07 09:46:49,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 32 minutes, 28 seconds)
2025-08-07 09:48:30,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:48:33,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 36.20499 ± 73.241
2025-08-07 09:48:33,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [74.02356, -18.741661, -10.075671, 17.545301, -19.637402, 238.62016, -2.4220045, 3.403656, 42.874817, 36.459164]
2025-08-07 09:48:33,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [170.0, 30.0, 206.0, 46.0, 123.0, 1000.0, 44.0, 15.0, 68.0, 64.0]
2025-08-07 09:48:33,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 31 minutes, 34 seconds)
2025-08-07 09:50:10,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:50:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 95.98027 ± 105.029
2025-08-07 09:50:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [37.308872, 14.414408, 9.827441, 263.39053, 226.79797, 270.68436, 18.332382, 15.036479, 33.525406, 70.48483]
2025-08-07 09:50:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 27.0, 54.0, 1000.0, 1000.0, 1000.0, 120.0, 71.0, 81.0, 148.0]
2025-08-07 09:50:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (95.98) for latency MM1Queue_a033_s075
2025-08-07 09:50:15,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 30 minutes, 37 seconds)
2025-08-07 09:51:55,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:51:58,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 14.77413 ± 28.090
2025-08-07 09:51:58,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [4.5796824, 10.92388, -27.52655, 25.565351, 22.071154, 51.78523, 19.474258, 60.694824, 13.4645815, -33.291126]
2025-08-07 09:51:58,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 33.0, 129.0, 36.0, 144.0, 81.0, 41.0, 940.0, 82.0, 194.0]
2025-08-07 09:51:58,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 29 minutes, 57 seconds)
2025-08-07 09:53:37,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:53:39,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 38.19915 ± 65.042
2025-08-07 09:53:39,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [16.248755, -38.314316, -4.8233023, 13.850715, -13.231709, 48.29025, 51.781868, 23.714844, 208.32637, 76.148056]
2025-08-07 09:53:39,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [122.0, 94.0, 165.0, 112.0, 94.0, 64.0, 43.0, 27.0, 1000.0, 143.0]
2025-08-07 09:53:39,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 28 seconds)
2025-08-07 09:55:22,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:55:26,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 55.01539 ± 69.524
2025-08-07 09:55:26,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [12.238594, 18.690737, 173.83571, 16.498426, 67.678116, -3.6144018, 48.345947, 9.428244, 5.6538014, 201.39876]
2025-08-07 09:55:26,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 51.0, 1000.0, 90.0, 97.0, 26.0, 173.0, 42.0, 82.0, 1000.0]
2025-08-07 09:55:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 27 minutes, 53 seconds)
2025-08-07 09:57:02,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:57:04,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 38.51604 ± 67.626
2025-08-07 09:57:04,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [47.182297, -4.0281153, 14.195594, 15.250352, 13.469781, 15.900957, 20.364521, 37.599007, -10.40209, 235.6281]
2025-08-07 09:57:04,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [82.0, 124.0, 27.0, 86.0, 40.0, 30.0, 152.0, 68.0, 39.0, 1000.0]
2025-08-07 09:57:04,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 25 minutes, 11 seconds)
2025-08-07 09:58:53,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:58:53,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 25.50948 ± 14.181
2025-08-07 09:58:53,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [17.736183, 22.709743, 14.076251, 46.04203, 1.6366457, 25.345097, 21.436615, 52.609573, 20.986204, 32.51646]
2025-08-07 09:58:53,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [92.0, 55.0, 84.0, 66.0, 21.0, 64.0, 47.0, 103.0, 51.0, 55.0]
2025-08-07 09:58:53,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 24 minutes, 43 seconds)
2025-08-07 10:00:33,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:00:37,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 63.55529 ± 74.078
2025-08-07 10:00:37,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [80.254524, 20.289118, 184.36995, 45.51269, 6.124046, 0.4607172, 17.417427, 21.82216, 35.006264, 224.29607]
2025-08-07 10:00:37,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [203.0, 74.0, 1000.0, 139.0, 55.0, 62.0, 53.0, 124.0, 61.0, 1000.0]
2025-08-07 10:00:37,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 23 minutes, 7 seconds)
2025-08-07 10:02:09,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:02:11,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 43.57464 ± 60.615
2025-08-07 10:02:11,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [3.9412823, -1.6678067, 46.36681, 46.70403, 12.542626, 29.908916, 207.4614, 7.0136933, 85.06768, -1.592259]
2025-08-07 10:02:11,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [170.0, 35.0, 123.0, 141.0, 51.0, 60.0, 1000.0, 221.0, 71.0, 34.0]
2025-08-07 10:02:11,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 20 minutes, 12 seconds)
2025-08-07 10:03:55,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:03:56,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 42.84501 ± 48.226
2025-08-07 10:03:56,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [28.946531, 8.714183, 96.79238, -2.6950855, 11.7141485, 8.709207, 47.893223, 162.92288, 40.846943, 24.605707]
2025-08-07 10:03:56,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [215.0, 53.0, 92.0, 52.0, 53.0, 356.0, 165.0, 192.0, 74.0, 106.0]
2025-08-07 10:03:56,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 18 minutes, 17 seconds)
2025-08-07 10:05:34,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:05:36,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 31.19888 ± 62.890
2025-08-07 10:05:36,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [-9.269644, 213.6902, 17.134535, 23.816362, 9.98538, 21.305738, 23.674177, 34.98442, -5.9548244, -17.377514]
2025-08-07 10:05:36,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 1000.0, 133.0, 39.0, 51.0, 36.0, 40.0, 55.0, 128.0, 103.0]
2025-08-07 10:05:36,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes, 50 seconds)
2025-08-07 10:07:17,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:07:20,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 53.27306 ± 79.265
2025-08-07 10:07:20,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [11.766455, 24.4747, 24.5574, 42.727875, 58.14691, 6.2421412, 273.3206, 98.80234, 4.4078298, -11.7156]
2025-08-07 10:07:20,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 58.0, 30.0, 83.0, 277.0, 51.0, 1000.0, 124.0, 48.0, 303.0]
2025-08-07 10:07:20,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 20 seconds)
2025-08-07 10:08:56,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:59,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 60.59999 ± 97.180
2025-08-07 10:08:59,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [30.676922, 71.2192, -3.3000402, -9.795357, 22.338299, 2.4154594, 220.21214, 275.4706, 7.189498, -10.426824]
2025-08-07 10:08:59,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 71.0, 61.0, 21.0, 82.0, 44.0, 1000.0, 1000.0, 122.0, 27.0]
2025-08-07 10:08:59,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 58 seconds)
2025-08-07 10:10:38,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:39,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 20.31577 ± 29.984
2025-08-07 10:10:39,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [38.895058, -19.635248, 4.16643, 54.739258, 54.659306, -11.705023, 22.95982, 66.97702, -8.248673, 0.34971276]
2025-08-07 10:10:39,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 102.0, 54.0, 118.0, 85.0, 42.0, 50.0, 74.0, 21.0, 33.0]
2025-08-07 10:10:39,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 1 second)
2025-08-07 10:12:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:24,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 51.98112 ± 93.319
2025-08-07 10:12:24,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [38.74814, 50.54236, 14.489764, 15.794156, 12.412166, 10.273222, 48.631992, 323.07236, 38.778008, -32.930973]
2025-08-07 10:12:24,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 243.0, 67.0, 38.0, 66.0, 21.0, 98.0, 1000.0, 48.0, 170.0]
2025-08-07 10:12:24,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 24 seconds)
2025-08-07 10:14:00,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:04,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 80.81700 ± 65.957
2025-08-07 10:14:04,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [70.482, 40.68654, 13.158359, 48.013824, 208.00688, 30.600056, 197.78694, 112.16557, 42.426334, 44.843483]
2025-08-07 10:14:04,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [76.0, 38.0, 32.0, 74.0, 1000.0, 87.0, 1000.0, 1000.0, 56.0, 38.0]
2025-08-07 10:14:05,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 47 seconds)
2025-08-07 10:15:45,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:46,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 51.33312 ± 33.723
2025-08-07 10:15:46,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [130.25612, 45.037464, 42.57359, 48.225014, -0.88800246, 70.60688, 53.729027, 38.0122, 14.667929, 71.11104]
2025-08-07 10:15:46,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [153.0, 200.0, 143.0, 89.0, 15.0, 169.0, 68.0, 89.0, 40.0, 102.0]
2025-08-07 10:15:46,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 46 seconds)
2025-08-07 10:17:33,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:34,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 18.13428 ± 38.621
2025-08-07 10:17:34,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [119.99208, -16.424194, -5.786931, 45.51414, 24.362936, -2.9175549, 10.788394, 22.743547, -13.590067, -3.3395417]
2025-08-07 10:17:34,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [138.0, 51.0, 34.0, 51.0, 65.0, 36.0, 35.0, 82.0, 89.0, 80.0]
2025-08-07 10:17:34,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 5 minutes, 13 seconds)
2025-08-07 10:19:06,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:08,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 68.07761 ± 34.346
2025-08-07 10:19:08,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [116.450294, 23.202723, 49.97288, 106.99925, 44.194798, 23.666948, 80.09321, 117.33651, 72.41097, 46.44848]
2025-08-07 10:19:08,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [233.0, 109.0, 63.0, 184.0, 67.0, 146.0, 267.0, 194.0, 93.0, 117.0]
2025-08-07 10:19:08,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 49 seconds)
2025-08-07 10:20:56,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:58,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 70.78582 ± 95.887
2025-08-07 10:20:58,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [28.455921, 325.55884, -13.395841, 8.496808, 36.111702, 21.303497, 147.9403, 43.210598, 96.43399, 13.742386]
2025-08-07 10:20:58,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 1000.0, 95.0, 29.0, 99.0, 32.0, 337.0, 77.0, 149.0, 144.0]
2025-08-07 10:20:58,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 41 seconds)
2025-08-07 10:22:33,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:35,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 49.18027 ± 83.906
2025-08-07 10:22:35,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [7.317704, 25.447962, 109.49792, -11.995013, 282.2268, 38.13579, 3.8383873, 22.587692, 4.766299, 9.9791765]
2025-08-07 10:22:35,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 113.0, 174.0, 30.0, 1000.0, 77.0, 15.0, 30.0, 18.0, 28.0]
2025-08-07 10:22:36,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 37 seconds)
2025-08-07 10:24:12,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:15,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 63.31487 ± 57.955
2025-08-07 10:24:15,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [189.14539, 31.973656, 150.22493, 41.563496, 29.077728, 42.339478, 15.37557, 8.588268, 92.75625, 32.10394]
2025-08-07 10:24:15,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 62.0, 266.0, 55.0, 42.0, 54.0, 32.0, 62.0, 112.0, 68.0]
2025-08-07 10:24:15,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 37 seconds)
2025-08-07 10:25:54,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:57,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 64.17644 ± 62.029
2025-08-07 10:25:57,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [171.89095, 86.470566, 21.243713, 64.63194, 27.834223, 42.907394, 180.96259, 49.7547, 7.0744534, -11.006161]
2025-08-07 10:25:57,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 121.0, 76.0, 83.0, 52.0, 63.0, 1000.0, 49.0, 95.0, 31.0]
2025-08-07 10:25:57,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 20 seconds)
2025-08-07 10:27:39,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:40,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 44.33658 ± 34.947
2025-08-07 10:27:40,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [19.196053, 17.49308, 63.246853, 0.33516967, 85.19008, 99.19156, 74.18466, 64.10372, -1.5795927, 22.0042]
2025-08-07 10:27:40,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 61.0, 84.0, 62.0, 158.0, 90.0, 151.0, 146.0, 25.0, 73.0]
2025-08-07 10:27:40,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 34 seconds)
2025-08-07 10:29:19,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:23,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 80.16331 ± 93.126
2025-08-07 10:29:23,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [15.11889, 12.288805, 7.7282877, 30.222527, -2.9015214, 263.6795, 45.15537, 66.06742, 121.52887, 242.745]
2025-08-07 10:29:23,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [115.0, 36.0, 25.0, 49.0, 89.0, 1000.0, 99.0, 167.0, 265.0, 1000.0]
2025-08-07 10:29:23,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 8 seconds)
2025-08-07 10:31:08,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:12,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 78.13519 ± 90.907
2025-08-07 10:31:12,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [106.82038, 195.39742, 66.25484, 3.8820574, 16.157373, 12.07344, 294.25613, 17.85105, 38.045734, 30.613503]
2025-08-07 10:31:12,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [350.0, 1000.0, 100.0, 126.0, 89.0, 27.0, 1000.0, 83.0, 58.0, 164.0]
2025-08-07 10:31:12,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 37 seconds)
2025-08-07 10:32:49,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:50,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 56.61849 ± 48.907
2025-08-07 10:32:50,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [55.20636, 6.247429, 7.4290876, 33.030266, 27.769468, 90.98906, 19.268513, 175.93918, 69.19323, 81.112335]
2025-08-07 10:32:50,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [74.0, 26.0, 17.0, 39.0, 52.0, 132.0, 29.0, 293.0, 188.0, 176.0]
2025-08-07 10:32:50,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 52 seconds)
2025-08-07 10:34:31,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:35,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 62.98162 ± 80.022
2025-08-07 10:34:35,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [137.17317, 20.225018, 234.28252, 156.5588, 12.775803, 18.353153, -4.7239475, 59.020123, 21.144346, -24.992807]
2025-08-07 10:34:35,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [290.0, 34.0, 1000.0, 1000.0, 28.0, 26.0, 34.0, 270.0, 103.0, 61.0]
2025-08-07 10:34:35,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 20 seconds)
2025-08-07 10:36:09,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:11,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 44.79472 ± 27.580
2025-08-07 10:36:11,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [25.645416, -7.795282, 35.92174, 54.886612, 58.050907, 54.86565, 100.42293, 65.988594, 25.866955, 34.093613]
2025-08-07 10:36:11,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [219.0, 62.0, 45.0, 210.0, 127.0, 68.0, 126.0, 102.0, 33.0, 100.0]
2025-08-07 10:36:11,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 58 seconds)
2025-08-07 10:37:55,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:56,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 29.44315 ± 47.139
2025-08-07 10:37:56,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [18.32628, 150.8378, 63.883316, -0.42550972, -37.385666, 18.92778, 8.121763, 13.605368, 28.720911, 29.819448]
2025-08-07 10:37:56,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [101.0, 343.0, 85.0, 75.0, 134.0, 108.0, 29.0, 26.0, 70.0, 83.0]
2025-08-07 10:37:56,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 27 seconds)
2025-08-07 10:39:38,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:41,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 59.06139 ± 62.619
2025-08-07 10:39:41,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [20.373892, 130.99437, 9.919885, 29.094429, 35.456295, 93.52566, 17.744232, 19.9789, 210.5406, 22.985634]
2025-08-07 10:39:41,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 165.0, 50.0, 86.0, 116.0, 206.0, 27.0, 45.0, 1000.0, 40.0]
2025-08-07 10:39:41,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 25 seconds)
2025-08-07 10:41:14,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:16,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 86.69009 ± 74.934
2025-08-07 10:41:16,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [129.67812, 130.6047, 118.560036, 40.050877, 27.28827, 80.9922, 25.298237, 268.86612, 7.0223227, 38.53996]
2025-08-07 10:41:16,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [221.0, 160.0, 134.0, 34.0, 99.0, 123.0, 45.0, 1000.0, 60.0, 96.0]
2025-08-07 10:41:16,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 28 seconds)
2025-08-07 10:42:58,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:00,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 44.86972 ± 62.786
2025-08-07 10:43:00,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [6.117147, 36.032837, 47.251595, 25.902176, 3.473877, 49.878128, 225.26186, 44.96454, 5.817778, 3.997255]
2025-08-07 10:43:00,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [46.0, 69.0, 71.0, 56.0, 118.0, 102.0, 1000.0, 61.0, 23.0, 26.0]
2025-08-07 10:43:00,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 41 seconds)
2025-08-07 10:44:40,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:42,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 66.67024 ± 90.604
2025-08-07 10:44:42,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [1.8720336, 41.028896, -10.573937, 15.75887, 325.59167, 86.400185, 46.89587, 59.507275, 61.40131, 38.820175]
2025-08-07 10:44:42,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 35.0, 77.0, 25.0, 1000.0, 144.0, 76.0, 65.0, 139.0, 101.0]
2025-08-07 10:44:42,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 30 seconds)
2025-08-07 10:46:21,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:24,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 71.16855 ± 63.911
2025-08-07 10:46:24,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [66.84564, 36.50101, 52.303497, 16.388964, 209.20787, 46.77023, 73.543755, 17.807955, 175.26698, 17.049643]
2025-08-07 10:46:24,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [89.0, 66.0, 52.0, 24.0, 353.0, 206.0, 141.0, 203.0, 1000.0, 34.0]
2025-08-07 10:46:24,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 35 seconds)
2025-08-07 10:48:03,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:04,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 47.74628 ± 32.150
2025-08-07 10:48:04,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [70.95552, 77.09872, 8.506701, 17.32104, 31.045292, 112.57222, 64.00736, 48.775063, 40.274548, 6.906373]
2025-08-07 10:48:04,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [222.0, 137.0, 25.0, 36.0, 52.0, 161.0, 57.0, 91.0, 33.0, 20.0]
2025-08-07 10:48:04,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 34 seconds)
2025-08-07 10:49:44,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:47,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 61.73641 ± 80.524
2025-08-07 10:49:47,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [14.85389, 213.448, 38.057774, 23.22876, 69.22998, -8.052778, 25.77925, 14.501751, 3.9698248, 222.34767]
2025-08-07 10:49:47,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [63.0, 1000.0, 52.0, 75.0, 85.0, 27.0, 37.0, 18.0, 35.0, 1000.0]
2025-08-07 10:49:47,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 20 seconds)
2025-08-07 10:51:27,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:28,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 54.33758 ± 57.700
2025-08-07 10:51:28,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [105.55067, 12.834542, 29.875238, 46.879436, 13.398038, 203.66872, 62.611977, 33.59708, -5.9962487, 40.956326]
2025-08-07 10:51:28,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [129.0, 33.0, 84.0, 34.0, 60.0, 394.0, 131.0, 53.0, 57.0, 93.0]
2025-08-07 10:51:28,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 29 seconds)
2025-08-07 10:53:13,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:14,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 55.45648 ± 41.624
2025-08-07 10:53:14,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [2.6177018, 56.775845, 34.441486, 61.72837, 128.36874, 72.463684, 26.604664, 127.87658, 29.219032, 14.4686985]
2025-08-07 10:53:14,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 59.0, 98.0, 119.0, 128.0, 78.0, 58.0, 213.0, 69.0, 65.0]
2025-08-07 10:53:14,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 1 second)
2025-08-07 10:54:58,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:03,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 99.62152 ± 98.132
2025-08-07 10:55:03,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [8.825329, 21.913048, 56.76265, 275.83264, 59.266815, 189.24722, -22.599884, 142.1361, 30.999165, 233.83215]
2025-08-07 10:55:03,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 69.0, 1000.0, 136.0, 1000.0, 151.0, 160.0, 79.0, 1000.0]
2025-08-07 10:55:03,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (99.62) for latency MM1Queue_a033_s075
2025-08-07 10:55:03,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 38 seconds)
2025-08-07 10:56:35,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:37,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 38.83285 ± 31.755
2025-08-07 10:56:37,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [34.844364, 5.216897, 77.04356, 70.58934, 26.273258, 10.495247, 14.329557, 14.34963, 104.022736, 31.163872]
2025-08-07 10:56:37,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 20.0, 138.0, 205.0, 87.0, 29.0, 25.0, 149.0, 254.0, 95.0]
2025-08-07 10:56:37,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 36 seconds)
2025-08-07 10:58:16,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:17,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 56.46705 ± 40.335
2025-08-07 10:58:17,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [85.75768, 137.27614, 26.353012, 43.0315, 54.151894, 17.080284, 3.1652734, 111.54257, 45.15111, 41.161034]
2025-08-07 10:58:17,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [228.0, 331.0, 32.0, 106.0, 102.0, 24.0, 76.0, 269.0, 86.0, 108.0]
2025-08-07 10:58:17,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 49 seconds)
2025-08-07 11:00:02,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:05,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 79.08948 ± 81.530
2025-08-07 11:00:05,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [41.53423, 166.62183, 19.265594, 21.776104, 12.80611, 39.297367, 256.66483, 6.929394, 165.52802, 60.47132]
2025-08-07 11:00:05,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 258.0, 33.0, 26.0, 160.0, 50.0, 1000.0, 28.0, 168.0, 139.0]
2025-08-07 11:00:05,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 24 seconds)
2025-08-07 11:01:38,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:41,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 88.72421 ± 96.814
2025-08-07 11:01:41,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [152.18993, 127.075035, 49.953396, 76.08368, 74.017715, -18.384884, 62.669125, 15.299968, 338.80225, 9.535956]
2025-08-07 11:01:41,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [233.0, 337.0, 89.0, 286.0, 133.0, 54.0, 74.0, 66.0, 1000.0, 112.0]
2025-08-07 11:01:41,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 16 seconds)
2025-08-07 11:03:27,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:28,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 41.35314 ± 29.521
2025-08-07 11:03:28,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [24.21388, 42.283524, 44.673965, 43.28247, 15.609078, 22.820896, 74.22379, 35.0837, 1.4959962, 109.84404]
2025-08-07 11:03:28,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 72.0, 90.0, 73.0, 123.0, 64.0, 136.0, 91.0, 79.0, 176.0]
2025-08-07 11:03:28,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 31 seconds)
2025-08-07 11:05:04,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:06,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 76.04191 ± 101.816
2025-08-07 11:05:06,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [93.99539, 358.38208, 27.750475, 51.126575, 71.361916, 110.919815, -19.293673, -10.106877, 33.082825, 43.20053]
2025-08-07 11:05:06,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [218.0, 1000.0, 30.0, 207.0, 132.0, 223.0, 59.0, 78.0, 52.0, 59.0]
2025-08-07 11:05:06,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 59 seconds)
2025-08-07 11:06:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:46,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 69.23739 ± 59.323
2025-08-07 11:06:46,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [141.33795, 41.821842, 4.3807874, 49.464367, 6.6608906, 199.16023, 34.127636, 30.222767, 93.77723, 91.4202]
2025-08-07 11:06:46,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [170.0, 76.0, 29.0, 68.0, 24.0, 200.0, 98.0, 50.0, 149.0, 325.0]
2025-08-07 11:06:46,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 16 seconds)
2025-08-07 11:08:26,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:28,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 78.12669 ± 86.533
2025-08-07 11:08:28,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [16.893131, 43.60817, 271.39267, 0.23960684, 18.81338, 110.56734, 35.92176, 203.33727, 11.782487, 68.71117]
2025-08-07 11:08:28,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 50.0, 1000.0, 25.0, 69.0, 287.0, 71.0, 194.0, 19.0, 112.0]
2025-08-07 11:08:28,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 25 seconds)
2025-08-07 11:10:08,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:09,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 45.69719 ± 42.510
2025-08-07 11:10:09,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [37.53048, 34.123158, 153.9194, 50.670048, 14.079669, 16.99868, -3.7977152, 33.934204, 85.51726, 33.9967]
2025-08-07 11:10:09,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [100.0, 69.0, 114.0, 216.0, 27.0, 141.0, 31.0, 97.0, 76.0, 62.0]
2025-08-07 11:10:09,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 51 seconds)
2025-08-07 11:11:49,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:11:51,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 57.28788 ± 52.231
2025-08-07 11:11:51,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [32.355602, 27.405392, 198.7837, 55.62492, 30.59851, -11.5097475, 47.783722, 57.06733, 66.1831, 68.58629]
2025-08-07 11:11:51,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [58.0, 44.0, 185.0, 53.0, 49.0, 54.0, 135.0, 159.0, 191.0, 138.0]
2025-08-07 11:11:51,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 3 seconds)
2025-08-07 11:13:39,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:13:41,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 39.78921 ± 32.776
2025-08-07 11:13:41,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [16.555735, 43.270134, 1.0592244, 7.1564503, 57.746513, 10.98584, 58.459267, 42.173107, 117.96904, 42.516808]
2025-08-07 11:13:41,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [127.0, 87.0, 37.0, 55.0, 70.0, 25.0, 604.0, 63.0, 197.0, 98.0]
2025-08-07 11:13:41,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 34 seconds)
2025-08-07 11:15:12,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:14,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 102.98743 ± 64.804
2025-08-07 11:15:14,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [185.58727, 190.2474, 32.749863, 41.184822, 133.76971, 102.217926, 20.736492, 189.70349, 88.73078, 44.94649]
2025-08-07 11:15:14,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [189.0, 183.0, 52.0, 67.0, 136.0, 141.0, 35.0, 1000.0, 81.0, 73.0]
2025-08-07 11:15:14,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (102.99) for latency MM1Queue_a033_s075
2025-08-07 11:15:14,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 46 seconds)
2025-08-07 11:16:54,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:56,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 60.20599 ± 56.600
2025-08-07 11:16:56,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [114.38325, 148.65453, -20.0991, 80.47094, 127.14556, 66.45399, 5.3476586, -9.71704, 17.15518, 72.26502]
2025-08-07 11:16:56,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [160.0, 230.0, 38.0, 108.0, 252.0, 125.0, 37.0, 61.0, 56.0, 86.0]
2025-08-07 11:16:56,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 4 seconds)
2025-08-07 11:18:34,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:36,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 52.76528 ± 49.676
2025-08-07 11:18:36,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [68.30499, 32.960007, -4.9192085, 157.64093, 6.33592, 2.6109087, 49.8463, 104.076126, 20.82884, 89.96806]
2025-08-07 11:18:36,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [199.0, 77.0, 26.0, 341.0, 54.0, 19.0, 176.0, 138.0, 70.0, 137.0]
2025-08-07 11:18:36,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 22 seconds)
2025-08-07 11:20:13,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:17,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 123.26787 ± 122.097
2025-08-07 11:20:17,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [51.411324, 128.18481, 33.57813, 303.89597, 17.116224, 105.58411, 134.66582, 396.08, 37.589363, 24.573025]
2025-08-07 11:20:17,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 186.0, 49.0, 1000.0, 36.0, 70.0, 302.0, 1000.0, 45.0, 64.0]
2025-08-07 11:20:17,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1226 [INFO]: New best (123.27) for latency MM1Queue_a033_s075
2025-08-07 11:20:17,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2025-08-07 11:22:05,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:08,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1221 [DEBUG]: Total Reward: 84.19346 ± 91.295
2025-08-07 11:22:08,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1222 [DEBUG]: All rewards: [18.046482, 39.905746, 170.08566, 251.45563, 22.772001, 215.7703, -29.06465, 48.262394, 5.311095, 99.38987]
2025-08-07 11:22:08,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 72.0, 183.0, 1000.0, 36.0, 1000.0, 41.0, 91.0, 9.0, 110.0]
2025-08-07 11:22:08,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-ant):1251 [DEBUG]: Training session finished
