2025-08-07 07:36:59,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 07:36:59,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-ant/MM1Queue_a033_s075-bpql-mem16
2025-08-07 07:36:59,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x145e5c7ff990>}
2025-08-07 07:36:59,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1111 [DEBUG]: using device: cuda
2025-08-07 07:36:59,640 baseline-bpql-noiseperc0-ant:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 07:36:59,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1133 [INFO]: Creating new trainer
2025-08-07 07:36:59,657 baseline-bpql-noiseperc0-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=155, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 07:36:59,657 baseline-bpql-noiseperc0-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:37:02,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1194 [DEBUG]: Starting training session...
2025-08-07 07:37:02,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 1/100
2025-08-07 07:38:35,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:38:38,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -252.56343 ± 525.177
2025-08-07 07:38:38,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [-183.88762, -95.42883, -45.42288, -56.10319, -48.109135, -74.08212, -64.48587, -132.55766, -3.932555, -1821.6244]
2025-08-07 07:38:38,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [121.0, 72.0, 56.0, 71.0, 47.0, 65.0, 56.0, 85.0, 30.0, 1000.0]
2025-08-07 07:38:38,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (-252.56) for latency MM1Queue_a033_s075
2025-08-07 07:38:38,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 37 minutes, 51 seconds)
2025-08-07 07:40:16,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:40:19,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -119.26693 ± 208.684
2025-08-07 07:40:19,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [15.330873, -180.84409, 2.1745439, -25.325022, -720.53064, 6.193952, -118.6312, -84.15685, -40.63832, -46.24254]
2025-08-07 07:40:19,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 232.0, 51.0, 75.0, 1000.0, 42.0, 169.0, 114.0, 88.0, 80.0]
2025-08-07 07:40:19,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (-119.27) for latency MM1Queue_a033_s075
2025-08-07 07:40:19,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 40 minutes, 55 seconds)
2025-08-07 07:42:03,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:42:07,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -145.55656 ± 267.920
2025-08-07 07:42:07,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3.6471035, -31.229147, 6.415335, 0.016463913, 39.011562, -40.974777, -663.0477, 17.903889, -97.36972, -689.93854]
2025-08-07 07:42:07,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 75.0, 79.0, 57.0, 80.0, 191.0, 1000.0, 41.0, 122.0, 1000.0]
2025-08-07 07:42:07,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 44 minutes, 25 seconds)
2025-08-07 07:43:39,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:43:41,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: -11.38413 ± 67.147
2025-08-07 07:43:41,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [0.036677077, 48.845913, -205.57327, 13.329145, -2.83145, 25.888706, -17.20811, 22.309574, 7.2251415, -5.863655]
2025-08-07 07:43:41,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 78.0, 1000.0, 149.0, 52.0, 60.0, 224.0, 57.0, 61.0, 78.0]
2025-08-07 07:43:41,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (-11.38) for latency MM1Queue_a033_s075
2025-08-07 07:43:41,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 39 minutes, 45 seconds)
2025-08-07 07:45:21,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:45:30,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 344.99585 ± 191.449
2025-08-07 07:45:30,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [214.95761, 512.261, 27.052921, 356.36713, 267.06464, 578.6173, 59.838734, 370.70523, 595.54535, 467.54822]
2025-08-07 07:45:30,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [443.0, 1000.0, 24.0, 1000.0, 814.0, 1000.0, 78.0, 738.0, 1000.0, 1000.0]
2025-08-07 07:45:30,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (345.00) for latency MM1Queue_a033_s075
2025-08-07 07:45:30,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 40 minutes, 59 seconds)
2025-08-07 07:47:09,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:47:23,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 602.06171 ± 55.781
2025-08-07 07:47:23,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [557.1794, 584.3957, 629.7216, 654.2739, 657.985, 597.2306, 530.9391, 689.96893, 505.92972, 612.99304]
2025-08-07 07:47:23,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:47:23,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (602.06) for latency MM1Queue_a033_s075
2025-08-07 07:47:23,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 44 minutes, 35 seconds)
2025-08-07 07:49:05,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:49:17,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 593.35291 ± 309.217
2025-08-07 07:49:17,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [31.13268, 31.369173, 589.983, 792.70654, 588.9757, 902.03625, 780.10486, 984.7006, 623.17084, 609.3491]
2025-08-07 07:49:17,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 31.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:49:17,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 46 minutes, 41 seconds)
2025-08-07 07:50:57,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:51:04,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 489.29404 ± 327.217
2025-08-07 07:51:04,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [462.70566, 885.6895, 195.10454, 103.77931, 270.69138, 961.3527, 618.2259, 430.27948, 42.737797, 922.374]
2025-08-07 07:51:04,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [476.0, 1000.0, 210.0, 88.0, 230.0, 914.0, 647.0, 449.0, 32.0, 1000.0]
2025-08-07 07:51:04,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 44 minutes, 42 seconds)
2025-08-07 07:52:43,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:52:51,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 501.06104 ± 320.142
2025-08-07 07:52:51,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [170.12044, 739.50134, 435.2341, 80.54688, 887.2451, 808.66486, 219.25099, 77.68518, 776.7678, 815.5936]
2025-08-07 07:52:51,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [204.0, 1000.0, 359.0, 64.0, 1000.0, 1000.0, 214.0, 86.0, 1000.0, 1000.0]
2025-08-07 07:52:51,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 46 minutes, 51 seconds)
2025-08-07 07:54:30,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:54:40,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 640.85608 ± 331.666
2025-08-07 07:54:40,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [866.29236, 169.1119, 1019.6013, 850.81165, 263.34485, 585.69055, 834.8435, 57.236427, 871.53217, 890.09546]
2025-08-07 07:54:40,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 131.0, 1000.0, 1000.0, 255.0, 1000.0, 1000.0, 52.0, 1000.0, 872.0]
2025-08-07 07:54:40,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (640.86) for latency MM1Queue_a033_s075
2025-08-07 07:54:40,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 44 minutes, 50 seconds)
2025-08-07 07:56:14,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:56:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 360.30179 ± 335.211
2025-08-07 07:56:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [214.3396, 944.6995, 222.12772, 54.430557, 994.26465, 318.94547, 542.7828, 84.83106, 195.0546, 31.54174]
2025-08-07 07:56:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [165.0, 1000.0, 250.0, 55.0, 1000.0, 370.0, 515.0, 66.0, 145.0, 28.0]
2025-08-07 07:56:19,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 39 minutes, 3 seconds)
2025-08-07 07:57:54,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:58:04,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 693.14282 ± 331.328
2025-08-07 07:58:04,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [765.53796, 896.4837, 188.43445, 510.97522, 844.57916, 938.89246, 127.668755, 1277.9869, 595.2076, 785.6621]
2025-08-07 07:58:04,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [610.0, 1000.0, 231.0, 383.0, 1000.0, 1000.0, 96.0, 1000.0, 429.0, 1000.0]
2025-08-07 07:58:04,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (693.14) for latency MM1Queue_a033_s075
2025-08-07 07:58:04,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 34 minutes, 33 seconds)
2025-08-07 07:59:47,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 07:59:56,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 740.82343 ± 363.143
2025-08-07 07:59:56,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [878.8262, 1078.704, 38.645546, 980.8611, 1027.3965, 1044.0331, 629.3282, 791.6342, 855.6932, 83.11244]
2025-08-07 07:59:56,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 38.0, 1000.0, 1000.0, 1000.0, 438.0, 544.0, 1000.0, 64.0]
2025-08-07 07:59:56,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (740.82) for latency MM1Queue_a033_s075
2025-08-07 07:59:56,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 34 minutes, 21 seconds)
2025-08-07 08:01:30,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:01:39,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 612.18817 ± 419.576
2025-08-07 08:01:39,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [952.453, 71.600105, 884.779, 986.1574, 143.36862, 987.38104, 110.47802, 1060.831, 88.351944, 836.4821]
2025-08-07 08:01:39,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 51.0, 1000.0, 1000.0, 86.0, 1000.0, 65.0, 1000.0, 68.0, 1000.0]
2025-08-07 08:01:39,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 31 minutes, 18 seconds)
2025-08-07 08:03:20,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:03:25,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 454.33820 ± 282.006
2025-08-07 08:03:25,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [373.70538, 300.1013, 946.9201, 814.2538, 110.490906, 705.73944, 627.6647, 135.5111, 202.8516, 326.1433]
2025-08-07 08:03:25,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [211.0, 240.0, 640.0, 651.0, 76.0, 529.0, 443.0, 113.0, 121.0, 205.0]
2025-08-07 08:03:25,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 28 minutes, 41 seconds)
2025-08-07 08:05:01,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:05:05,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 431.98111 ± 323.443
2025-08-07 08:05:05,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1033.6311, 582.4391, 887.83936, 400.90576, 628.22516, 189.84581, 298.81958, 139.10376, 122.09614, 36.90532]
2025-08-07 08:05:05,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [836.0, 435.0, 1000.0, 237.0, 522.0, 108.0, 216.0, 80.0, 94.0, 38.0]
2025-08-07 08:05:05,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 27 minutes, 21 seconds)
2025-08-07 08:06:45,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:06:51,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 504.41595 ± 373.394
2025-08-07 08:06:51,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1018.92505, 358.6068, 1069.4141, 963.52045, 64.483025, 139.54117, 416.43353, 242.70773, 656.3286, 114.199265]
2025-08-07 08:06:51,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 368.0, 734.0, 1000.0, 50.0, 96.0, 265.0, 188.0, 444.0, 100.0]
2025-08-07 08:06:51,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 25 minutes, 48 seconds)
2025-08-07 08:08:26,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:08:34,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 848.79803 ± 379.574
2025-08-07 08:08:34,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [688.74976, 1474.0209, 663.67365, 900.35474, 52.360165, 883.5474, 940.45966, 756.7138, 716.0221, 1412.0784]
2025-08-07 08:08:34,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [454.0, 1000.0, 407.0, 1000.0, 40.0, 488.0, 506.0, 446.0, 496.0, 1000.0]
2025-08-07 08:08:34,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (848.80) for latency MM1Queue_a033_s075
2025-08-07 08:08:34,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 21 minutes, 21 seconds)
2025-08-07 08:10:18,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:10:21,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 292.08344 ± 234.638
2025-08-07 08:10:21,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [224.13719, 253.44864, 791.4106, 457.26212, 34.80351, 278.96686, 62.893383, 575.22296, 193.6792, 49.00966]
2025-08-07 08:10:21,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [129.0, 172.0, 458.0, 294.0, 29.0, 179.0, 41.0, 354.0, 102.0, 31.0]
2025-08-07 08:10:21,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 20 minutes, 48 seconds)
2025-08-07 08:11:57,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:12:03,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 677.51855 ± 501.558
2025-08-07 08:12:03,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [221.99799, 1539.0258, 780.66876, 686.9058, 1570.0887, 538.8335, 181.79915, 73.4057, 829.6439, 352.81592]
2025-08-07 08:12:03,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [139.0, 941.0, 387.0, 1000.0, 862.0, 306.0, 100.0, 66.0, 442.0, 222.0]
2025-08-07 08:12:03,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 18 minutes, 14 seconds)
2025-08-07 08:13:33,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:13:43,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1366.76733 ± 579.883
2025-08-07 08:13:43,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1954.7249, 1980.801, 428.91226, 1792.3485, 1777.0575, 676.2541, 1901.4076, 695.50586, 947.12177, 1513.5393]
2025-08-07 08:13:43,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 250.0, 1000.0, 1000.0, 335.0, 1000.0, 367.0, 545.0, 808.0]
2025-08-07 08:13:43,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (1366.77) for latency MM1Queue_a033_s075
2025-08-07 08:13:43,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 16 minutes, 22 seconds)
2025-08-07 08:15:23,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:15:28,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 714.04962 ± 584.448
2025-08-07 08:15:28,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2151.1472, 128.24287, 429.05258, 957.8165, 875.8012, 41.78972, 590.8582, 244.27463, 1094.9838, 626.5292]
2025-08-07 08:15:28,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 70.0, 193.0, 546.0, 543.0, 41.0, 342.0, 174.0, 557.0, 312.0]
2025-08-07 08:15:28,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 14 minutes, 33 seconds)
2025-08-07 08:17:11,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:17:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1125.44556 ± 707.805
2025-08-07 08:17:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1997.2006, 1173.4463, 526.5938, 576.5531, 997.0812, 164.6283, 1913.977, 2071.7715, 1638.9285, 194.27406]
2025-08-07 08:17:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [962.0, 583.0, 241.0, 281.0, 489.0, 78.0, 968.0, 1000.0, 725.0, 98.0]
2025-08-07 08:17:19,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 14 minutes, 47 seconds)
2025-08-07 08:18:50,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:18:59,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1539.68604 ± 630.658
2025-08-07 08:18:59,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2024.5018, 682.8206, 1994.3691, 2159.2517, 450.09265, 1222.0297, 1232.2112, 2199.1292, 1197.7806, 2234.6736]
2025-08-07 08:18:59,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [973.0, 378.0, 1000.0, 1000.0, 227.0, 595.0, 608.0, 1000.0, 552.0, 1000.0]
2025-08-07 08:18:59,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (1539.69) for latency MM1Queue_a033_s075
2025-08-07 08:18:59,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 11 minutes, 23 seconds)
2025-08-07 08:20:41,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:20:52,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1908.78381 ± 584.683
2025-08-07 08:20:52,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2440.9336, 559.75287, 2229.049, 2146.6353, 1444.5801, 2282.4697, 2224.3887, 2165.1775, 2350.1733, 1244.6783]
2025-08-07 08:20:52,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 259.0, 1000.0, 1000.0, 640.0, 1000.0, 1000.0, 1000.0, 1000.0, 544.0]
2025-08-07 08:20:52,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (1908.78) for latency MM1Queue_a033_s075
2025-08-07 08:20:52,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 12 minutes, 22 seconds)
2025-08-07 08:22:24,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:22:31,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 1292.69409 ± 753.449
2025-08-07 08:22:31,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1645.9025, 488.408, 2190.9304, 935.0184, 2367.4421, 201.88658, 502.68488, 2274.8677, 1132.8485, 1186.9515]
2025-08-07 08:22:31,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [729.0, 237.0, 1000.0, 466.0, 1000.0, 98.0, 201.0, 1000.0, 458.0, 566.0]
2025-08-07 08:22:31,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 10 minutes, 19 seconds)
2025-08-07 08:24:15,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:24:29,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2266.40723 ± 135.737
2025-08-07 08:24:29,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2120.1436, 2320.3418, 2231.1868, 2279.1804, 1967.9187, 2389.272, 2338.4006, 2445.8923, 2375.8472, 2195.8884]
2025-08-07 08:24:29,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:24:29,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2266.41) for latency MM1Queue_a033_s075
2025-08-07 08:24:29,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 11 minutes, 38 seconds)
2025-08-07 08:26:05,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:26:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2293.31055 ± 386.960
2025-08-07 08:26:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2677.6013, 2350.6792, 2405.8328, 2424.5798, 2365.3213, 2343.5613, 2403.8809, 2328.619, 2464.846, 1168.1826]
2025-08-07 08:26:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 976.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:26:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2293.31) for latency MM1Queue_a033_s075
2025-08-07 08:26:19,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 9 minutes, 35 seconds)
2025-08-07 08:27:56,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:28:07,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2105.13232 ± 772.680
2025-08-07 08:28:07,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2498.9941, 2379.244, 2638.2363, 2432.0898, 2514.247, 2464.1184, 2505.9622, 1061.5377, 174.78056, 2382.1113]
2025-08-07 08:28:07,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 407.0, 94.0, 1000.0]
2025-08-07 08:28:07,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 9 minutes, 42 seconds)
2025-08-07 08:29:50,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:30:00,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2029.89587 ± 911.691
2025-08-07 08:30:00,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2482.3486, 301.06622, 2623.1602, 2556.4846, 857.7832, 2780.517, 2565.236, 828.5132, 2497.0715, 2806.7766]
2025-08-07 08:30:00,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 113.0, 1000.0, 1000.0, 314.0, 1000.0, 1000.0, 358.0, 1000.0, 1000.0]
2025-08-07 08:30:00,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 7 minutes, 52 seconds)
2025-08-07 08:31:37,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:31:50,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2274.48584 ± 686.586
2025-08-07 08:31:50,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2406.1836, 2364.8093, 2432.8547, 2818.0337, 2453.7622, 2549.3003, 248.2593, 2588.2554, 2428.436, 2454.9636]
2025-08-07 08:31:50,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 126.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:31:50,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 8 minutes, 23 seconds)
2025-08-07 08:33:26,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:33:39,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2449.78613 ± 123.042
2025-08-07 08:33:39,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2521.9565, 2539.6414, 2616.642, 2482.0833, 2443.067, 2208.509, 2309.5547, 2315.215, 2538.7595, 2522.4336]
2025-08-07 08:33:39,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 910.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:33:39,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2449.79) for latency MM1Queue_a033_s075
2025-08-07 08:33:39,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 4 minutes, 39 seconds)
2025-08-07 08:35:16,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:35:26,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2070.82471 ± 974.624
2025-08-07 08:35:26,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2569.9468, 652.2136, 2513.3726, 2808.1675, 2899.183, 2537.571, 57.257446, 2690.745, 1234.2695, 2745.5195]
2025-08-07 08:35:26,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 311.0, 958.0, 1000.0, 1000.0, 1000.0, 47.0, 1000.0, 562.0, 1000.0]
2025-08-07 08:35:26,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 2 minutes, 19 seconds)
2025-08-07 08:37:06,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:37:17,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2263.18335 ± 651.622
2025-08-07 08:37:17,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1920.1628, 2457.3818, 1805.0817, 541.94714, 2725.2224, 2591.8718, 2709.881, 2632.3704, 2640.6511, 2607.263]
2025-08-07 08:37:17,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [787.0, 1000.0, 672.0, 249.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:37:17,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 1 minute)
2025-08-07 08:39:02,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:39:13,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2292.34814 ± 916.303
2025-08-07 08:39:13,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [733.6584, 2787.4253, 2461.5461, 2723.188, 2513.4495, 2956.6711, 275.7886, 2578.7605, 2866.9253, 3026.067]
2025-08-07 08:39:13,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [252.0, 1000.0, 1000.0, 1000.0, 940.0, 1000.0, 124.0, 872.0, 1000.0, 1000.0]
2025-08-07 08:39:13,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 59 minutes, 39 seconds)
2025-08-07 08:40:50,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:41:03,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2584.01318 ± 849.760
2025-08-07 08:41:03,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [63.00317, 2705.6328, 2969.351, 2923.6753, 2717.4082, 2859.7998, 2738.7087, 3024.5454, 3077.9094, 2760.0955]
2025-08-07 08:41:03,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:41:03,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2584.01) for latency MM1Queue_a033_s075
2025-08-07 08:41:03,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 58 minutes, 2 seconds)
2025-08-07 08:42:39,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:42:50,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2066.00635 ± 898.180
2025-08-07 08:42:50,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2789.2063, 2505.916, 2649.6265, 1013.83386, 2554.233, 2514.7212, 261.10077, 2753.2842, 914.28577, 2703.855]
2025-08-07 08:42:50,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 357.0, 1000.0, 1000.0, 106.0, 1000.0, 349.0, 1000.0]
2025-08-07 08:42:50,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 55 minutes, 38 seconds)
2025-08-07 08:44:31,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:44:44,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2609.98315 ± 577.227
2025-08-07 08:44:44,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [925.88214, 2814.2124, 2589.1099, 2988.0437, 2940.949, 2597.555, 2735.4536, 2692.3613, 2925.8357, 2890.4297]
2025-08-07 08:44:44,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [347.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:44:44,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2609.98) for latency MM1Queue_a033_s075
2025-08-07 08:44:44,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 55 minutes, 15 seconds)
2025-08-07 08:46:25,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:46:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2647.40674 ± 502.958
2025-08-07 08:46:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2815.6348, 2814.8384, 1153.1184, 2826.9775, 2682.6943, 2738.2642, 2932.0415, 2756.9905, 2891.946, 2861.5645]
2025-08-07 08:46:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 405.0, 1000.0, 1000.0, 1000.0, 1000.0, 943.0, 1000.0, 1000.0]
2025-08-07 08:46:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2647.41) for latency MM1Queue_a033_s075
2025-08-07 08:46:38,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 54 minutes, 1 second)
2025-08-07 08:48:18,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:48:31,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2739.82178 ± 250.498
2025-08-07 08:48:31,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3015.3643, 2906.7607, 2568.7986, 2766.0176, 2126.8494, 2546.8037, 2942.7444, 2778.0178, 2908.3303, 2838.5295]
2025-08-07 08:48:31,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 785.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:48:31,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2739.82) for latency MM1Queue_a033_s075
2025-08-07 08:48:31,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 51 minutes, 40 seconds)
2025-08-07 08:50:12,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:50:25,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2716.28271 ± 491.384
2025-08-07 08:50:25,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2920.1023, 2940.1626, 2985.7966, 3013.5466, 2961.5215, 1943.8768, 2916.7834, 1594.9604, 3171.9058, 2714.172]
2025-08-07 08:50:25,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 588.0, 1000.0, 1000.0]
2025-08-07 08:50:25,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 50 minutes, 30 seconds)
2025-08-07 08:51:59,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:52:13,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2974.00464 ± 151.575
2025-08-07 08:52:13,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2787.3792, 2809.5422, 2990.163, 3162.205, 2802.6528, 3197.4495, 2868.5522, 2980.692, 3178.028, 2963.3833]
2025-08-07 08:52:13,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:52:13,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (2974.00) for latency MM1Queue_a033_s075
2025-08-07 08:52:13,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 48 minutes, 52 seconds)
2025-08-07 08:53:59,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:54:13,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2955.62622 ± 106.170
2025-08-07 08:54:13,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3070.4316, 2796.8174, 2839.0542, 3057.948, 3116.9775, 2921.7188, 2956.3596, 2976.8706, 3006.1436, 2813.942]
2025-08-07 08:54:13,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:54:13,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 48 minutes, 6 seconds)
2025-08-07 08:55:44,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:55:58,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2944.56665 ± 126.509
2025-08-07 08:55:58,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2836.772, 2839.3015, 3042.3694, 2765.9197, 2873.3086, 2893.9287, 3029.514, 3205.2698, 3052.373, 2906.9094]
2025-08-07 08:55:58,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:55:58,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 44 minutes, 31 seconds)
2025-08-07 08:57:41,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:57:54,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2870.67480 ± 938.918
2025-08-07 08:57:54,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3271.8696, 3164.7124, 2962.944, 3394.9287, 3040.81, 3266.547, 3174.685, 3209.9985, 74.92029, 3145.332]
2025-08-07 08:57:54,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 991.0, 1000.0, 1000.0, 62.0, 1000.0]
2025-08-07 08:57:54,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 43 minutes, 10 seconds)
2025-08-07 08:59:33,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 08:59:47,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3130.63135 ± 160.201
2025-08-07 08:59:47,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3328.3933, 3299.5093, 3291.5146, 3071.1072, 3217.5745, 2844.0398, 3084.3596, 3229.7947, 2913.7896, 3026.2312]
2025-08-07 08:59:47,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:59:47,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3130.63) for latency MM1Queue_a033_s075
2025-08-07 08:59:47,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 41 minutes, 9 seconds)
2025-08-07 09:01:25,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:01:38,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2849.53076 ± 641.373
2025-08-07 09:01:38,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [996.4535, 3210.7192, 2990.2131, 3173.2712, 2861.0378, 2692.7969, 3304.5623, 2981.861, 3097.0642, 3187.3284]
2025-08-07 09:01:38,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [327.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:01:38,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 39 minutes, 49 seconds)
2025-08-07 09:03:17,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:03:31,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3125.27197 ± 194.769
2025-08-07 09:03:31,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3268.8967, 3092.9397, 3153.5847, 3317.6248, 3386.8032, 3174.8704, 2755.6143, 3241.068, 2822.9236, 3038.3948]
2025-08-07 09:03:31,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 877.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:03:31,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 36 minutes, 40 seconds)
2025-08-07 09:05:15,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:05:29,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3006.43750 ± 205.377
2025-08-07 09:05:29,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2868.9019, 2967.2341, 2911.0854, 3274.3975, 3199.5823, 2735.1877, 3377.2932, 2750.65, 2934.9314, 3045.111]
2025-08-07 09:05:29,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:05:29,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 36 minutes, 58 seconds)
2025-08-07 09:07:08,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:07:20,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2889.59668 ± 915.377
2025-08-07 09:07:20,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3072.245, 3512.7893, 3426.3477, 190.97066, 3047.8416, 3249.0872, 3194.7678, 3098.301, 3198.6787, 2904.9358]
2025-08-07 09:07:20,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 104.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:07:20,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 34 minutes, 27 seconds)
2025-08-07 09:09:00,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:09:14,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3137.34180 ± 192.561
2025-08-07 09:09:14,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3073.064, 2921.2742, 3079.247, 3180.541, 3379.4595, 3207.7083, 3230.0674, 3188.8152, 3393.436, 2719.8064]
2025-08-07 09:09:14,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:09:14,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3137.34) for latency MM1Queue_a033_s075
2025-08-07 09:09:14,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 32 minutes, 35 seconds)
2025-08-07 09:10:53,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:11:07,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3226.48828 ± 111.939
2025-08-07 09:11:07,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3336.7212, 3206.2852, 3241.4575, 3202.8516, 3365.9285, 3350.6467, 3127.6953, 2999.5774, 3305.3674, 3128.3506]
2025-08-07 09:11:07,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:11:07,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3226.49) for latency MM1Queue_a033_s075
2025-08-07 09:11:07,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 31 minutes, 3 seconds)
2025-08-07 09:12:44,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:12:58,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3260.21729 ± 177.259
2025-08-07 09:12:58,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2999.763, 3046.8982, 3093.121, 3443.347, 3416.3577, 3494.1743, 3368.42, 3179.9866, 3423.8628, 3136.242]
2025-08-07 09:12:58,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:12:58,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3260.22) for latency MM1Queue_a033_s075
2025-08-07 09:12:58,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 28 minutes, 49 seconds)
2025-08-07 09:14:37,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:14:50,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2925.25635 ± 659.617
2025-08-07 09:14:50,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3012.2017, 3176.4192, 2959.429, 2979.6016, 3137.1257, 3583.0295, 2920.789, 1028.1676, 3137.1597, 3318.643]
2025-08-07 09:14:50,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 376.0, 1000.0, 1000.0]
2025-08-07 09:14:50,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 26 minutes, 6 seconds)
2025-08-07 09:16:30,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:16:43,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3378.98877 ± 141.736
2025-08-07 09:16:43,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3154.787, 3332.619, 3304.1265, 3265.2327, 3343.6985, 3663.702, 3451.5024, 3291.343, 3422.9094, 3559.966]
2025-08-07 09:16:43,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:16:43,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3378.99) for latency MM1Queue_a033_s075
2025-08-07 09:16:43,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 24 minutes, 25 seconds)
2025-08-07 09:18:23,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:18:36,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3221.53638 ± 130.688
2025-08-07 09:18:36,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3185.891, 2981.7393, 3300.866, 3326.146, 3154.9966, 3399.1267, 3095.0103, 3185.9106, 3418.0815, 3167.5928]
2025-08-07 09:18:36,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:18:36,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 22 minutes, 30 seconds)
2025-08-07 09:20:16,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:20:30,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3085.41357 ± 209.744
2025-08-07 09:20:30,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3091.9185, 2581.119, 3051.858, 3071.0588, 3110.0476, 3308.1516, 3021.683, 3436.524, 3033.3662, 3148.4084]
2025-08-07 09:20:30,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:20:30,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 20 minutes, 39 seconds)
2025-08-07 09:22:05,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:22:19,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3321.00317 ± 164.055
2025-08-07 09:22:19,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3505.2744, 3409.7114, 3546.6768, 3535.036, 3178.724, 3052.1394, 3237.4563, 3214.6682, 3182.9355, 3347.4087]
2025-08-07 09:22:19,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:22:19,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 18 minutes, 31 seconds)
2025-08-07 09:23:58,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:24:11,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2968.48706 ± 971.395
2025-08-07 09:24:11,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3358.3696, 3407.6409, 3369.4583, 3328.6433, 3252.7876, 3195.9111, 3358.81, 3070.0261, 68.21908, 3275.003]
2025-08-07 09:24:11,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 45.0, 1000.0]
2025-08-07 09:24:11,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 16 minutes, 35 seconds)
2025-08-07 09:25:50,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:26:03,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3362.85669 ± 161.599
2025-08-07 09:26:03,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [2955.272, 3469.2185, 3442.1345, 3354.4663, 3356.9282, 3597.5813, 3388.694, 3339.99, 3464.677, 3259.6052]
2025-08-07 09:26:03,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:26:03,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 14 minutes, 39 seconds)
2025-08-07 09:27:42,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:27:56,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3302.82935 ± 171.375
2025-08-07 09:27:56,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3302.709, 3112.2517, 3305.8228, 3491.6902, 3606.2317, 3330.67, 3133.6936, 3406.2124, 3005.154, 3333.8557]
2025-08-07 09:27:56,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:27:56,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 12 minutes, 46 seconds)
2025-08-07 09:29:36,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:29:50,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3279.05518 ± 191.132
2025-08-07 09:29:50,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3377.8152, 3175.0117, 3333.2434, 3151.6748, 3369.5483, 2799.9744, 3375.266, 3542.0933, 3290.4905, 3375.4324]
2025-08-07 09:29:50,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:29:50,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 10 minutes, 54 seconds)
2025-08-07 09:31:28,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:31:42,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3362.54883 ± 133.596
2025-08-07 09:31:42,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3424.5725, 3111.5312, 3411.1567, 3402.9517, 3294.3901, 3556.4226, 3436.0015, 3452.8032, 3137.6194, 3398.0393]
2025-08-07 09:31:42,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:31:42,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 9 minutes, 27 seconds)
2025-08-07 09:33:22,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:33:34,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2976.90308 ± 588.102
2025-08-07 09:33:34,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1309.4572, 3147.4573, 3016.9927, 3037.4739, 3164.1794, 3274.7417, 3515.766, 3126.0254, 2786.3005, 3390.6372]
2025-08-07 09:33:34,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [418.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:33:34,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 7 minutes, 39 seconds)
2025-08-07 09:35:14,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:35:27,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3413.89185 ± 161.895
2025-08-07 09:35:27,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3516.5127, 3515.6323, 3427.9146, 3190.3826, 3665.47, 3498.508, 3305.4478, 3524.471, 3103.914, 3390.6648]
2025-08-07 09:35:27,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:35:27,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3413.89) for latency MM1Queue_a033_s075
2025-08-07 09:35:27,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 5 minutes, 48 seconds)
2025-08-07 09:37:07,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:37:21,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3467.91211 ± 176.230
2025-08-07 09:37:21,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3386.6606, 3509.5886, 3394.3667, 3297.3623, 3185.4739, 3707.097, 3285.9397, 3717.7927, 3626.187, 3568.653]
2025-08-07 09:37:21,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:37:21,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3467.91) for latency MM1Queue_a033_s075
2025-08-07 09:37:21,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 3 minutes, 58 seconds)
2025-08-07 09:39:03,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:39:17,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3353.90112 ± 97.233
2025-08-07 09:39:17,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3547.0437, 3361.0327, 3223.274, 3379.7095, 3393.9246, 3403.1433, 3204.9492, 3430.617, 3320.3806, 3274.9368]
2025-08-07 09:39:17,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:39:17,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 2 minutes, 25 seconds)
2025-08-07 09:40:57,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:41:10,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3003.59253 ± 938.695
2025-08-07 09:41:10,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3301.2957, 235.89827, 3469.2554, 3458.4414, 3435.5344, 3127.7527, 2967.8982, 3114.9727, 3428.9788, 3495.8984]
2025-08-07 09:41:10,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 114.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:41:10,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 33 seconds)
2025-08-07 09:42:49,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:43:03,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3411.96680 ± 141.320
2025-08-07 09:43:03,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3195.7314, 3458.4465, 3411.7883, 3320.0437, 3603.9453, 3669.3967, 3304.629, 3515.6714, 3323.8523, 3316.1638]
2025-08-07 09:43:03,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:43:03,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 58 minutes, 46 seconds)
2025-08-07 09:44:43,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:44:56,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3444.64380 ± 212.530
2025-08-07 09:44:56,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3671.7332, 3520.0593, 3322.1414, 3575.1826, 3510.8657, 3468.1138, 3149.3645, 3627.7163, 2988.1692, 3613.0889]
2025-08-07 09:44:56,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:44:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 56 minutes, 55 seconds)
2025-08-07 09:46:36,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:46:50,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3407.26880 ± 131.100
2025-08-07 09:46:50,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3356.9849, 3685.842, 3162.3774, 3418.452, 3458.052, 3309.2104, 3416.0273, 3520.9026, 3417.3096, 3327.5295]
2025-08-07 09:46:50,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:46:50,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 55 minutes, 2 seconds)
2025-08-07 09:48:30,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:48:43,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3297.37109 ± 164.777
2025-08-07 09:48:43,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3215.635, 3585.3833, 3300.8882, 3186.178, 3638.0684, 3289.9246, 3146.8662, 3241.5642, 3151.619, 3217.582]
2025-08-07 09:48:43,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:48:43,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 52 minutes, 51 seconds)
2025-08-07 09:50:23,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:50:37,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3510.09253 ± 118.811
2025-08-07 09:50:37,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3503.0327, 3407.0378, 3392.5608, 3431.5854, 3618.6072, 3336.5076, 3689.184, 3698.9507, 3532.468, 3490.9907]
2025-08-07 09:50:37,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:50:37,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3510.09) for latency MM1Queue_a033_s075
2025-08-07 09:50:37,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 2 seconds)
2025-08-07 09:52:21,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:52:34,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3405.31396 ± 121.380
2025-08-07 09:52:34,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3245.2375, 3234.111, 3472.4705, 3596.7766, 3427.6008, 3290.136, 3321.476, 3556.429, 3417.2283, 3491.6743]
2025-08-07 09:52:34,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:52:34,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 30 seconds)
2025-08-07 09:54:09,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:54:22,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3527.62451 ± 145.941
2025-08-07 09:54:22,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3706.8992, 3775.5989, 3362.6016, 3445.3108, 3659.949, 3455.9111, 3401.8389, 3335.3513, 3507.6582, 3625.1252]
2025-08-07 09:54:22,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:54:22,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3527.62) for latency MM1Queue_a033_s075
2025-08-07 09:54:22,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 9 seconds)
2025-08-07 09:56:02,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:56:16,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3538.59521 ± 107.362
2025-08-07 09:56:16,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3512.266, 3608.4172, 3533.865, 3441.7344, 3654.896, 3743.331, 3542.3855, 3559.8872, 3432.6965, 3356.4736]
2025-08-07 09:56:16,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:56:16,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3538.60) for latency MM1Queue_a033_s075
2025-08-07 09:56:16,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 16 seconds)
2025-08-07 09:57:59,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:58:13,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3432.62378 ± 174.687
2025-08-07 09:58:13,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3577.052, 3428.681, 2987.9841, 3515.984, 3549.481, 3553.7498, 3437.1892, 3608.6362, 3314.4438, 3353.0386]
2025-08-07 09:58:13,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 871.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:58:13,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 39 seconds)
2025-08-07 09:59:47,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:00:00,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3337.51367 ± 252.355
2025-08-07 10:00:00,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3444.5989, 3521.9236, 3368.3872, 3260.1187, 2949.398, 3439.3323, 3565.8203, 2790.6436, 3456.3691, 3578.5464]
2025-08-07 10:00:00,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 817.0, 1000.0, 1000.0]
2025-08-07 10:00:00,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 18 seconds)
2025-08-07 10:01:41,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:01:52,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 2942.18848 ± 1087.754
2025-08-07 10:01:52,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [1884.5004, 3126.9944, 3359.8894, 3427.2432, 3740.5435, 3219.8662, 36.0182, 3605.6956, 3403.7463, 3617.3887]
2025-08-07 10:01:52,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [538.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 30.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:01:53,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 4 seconds)
2025-08-07 10:03:36,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:03:49,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3532.38818 ± 133.013
2025-08-07 10:03:49,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3552.5022, 3413.2822, 3754.6775, 3493.2546, 3469.7185, 3428.9028, 3434.2073, 3676.53, 3731.6328, 3369.175]
2025-08-07 10:03:49,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:03:49,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 48 seconds)
2025-08-07 10:05:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:05:43,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3688.14648 ± 169.942
2025-08-07 10:05:43,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3371.6892, 3805.8623, 3517.2402, 3477.2402, 3714.0479, 3649.5464, 3872.1755, 3826.2092, 3755.6917, 3891.7605]
2025-08-07 10:05:43,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:05:43,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3688.15) for latency MM1Queue_a033_s075
2025-08-07 10:05:43,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 54 seconds)
2025-08-07 10:07:22,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:07:36,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3497.84229 ± 168.378
2025-08-07 10:07:36,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3374.919, 3315.9758, 3791.3037, 3582.517, 3566.6863, 3730.6223, 3547.866, 3361.9287, 3454.1758, 3252.427]
2025-08-07 10:07:36,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:07:36,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 33 minutes, 47 seconds)
2025-08-07 10:09:16,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:09:29,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3650.07300 ± 95.852
2025-08-07 10:09:29,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3616.8308, 3698.3457, 3809.2354, 3545.8486, 3736.4468, 3504.3926, 3757.7334, 3656.3777, 3633.4033, 3542.116]
2025-08-07 10:09:29,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:09:29,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 15 seconds)
2025-08-07 10:11:08,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:11:22,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3686.59229 ± 81.669
2025-08-07 10:11:22,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3582.1736, 3548.3196, 3724.43, 3703.644, 3648.4236, 3666.3994, 3671.4622, 3713.0469, 3756.1282, 3851.8945]
2025-08-07 10:11:22,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:11:22,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 21 seconds)
2025-08-07 10:13:01,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:13:15,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3618.54443 ± 152.515
2025-08-07 10:13:15,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3702.1946, 3300.2793, 3426.0974, 3833.5854, 3730.8518, 3694.8604, 3706.0928, 3545.5276, 3692.1445, 3553.8135]
2025-08-07 10:13:15,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:13:15,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 17 seconds)
2025-08-07 10:14:56,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:09,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3584.74268 ± 278.918
2025-08-07 10:15:09,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3104.4731, 3950.2986, 3515.521, 3623.5308, 3479.264, 3598.3384, 3897.1194, 3771.3298, 3792.418, 3115.132]
2025-08-07 10:15:09,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:15:09,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 26 seconds)
2025-08-07 10:16:49,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:02,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3547.21729 ± 137.401
2025-08-07 10:17:02,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3604.5278, 3643.8708, 3758.6995, 3454.9714, 3439.707, 3263.1555, 3484.7297, 3625.454, 3510.8154, 3686.2434]
2025-08-07 10:17:02,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:17:03,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 32 seconds)
2025-08-07 10:18:36,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:49,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3589.57935 ± 137.883
2025-08-07 10:18:49,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3746.8562, 3775.474, 3574.926, 3681.7996, 3497.7617, 3285.845, 3480.4443, 3549.6602, 3640.4822, 3662.5454]
2025-08-07 10:18:49,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:18:49,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 23 seconds)
2025-08-07 10:20:29,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:43,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3533.51807 ± 158.247
2025-08-07 10:20:43,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3710.8782, 3633.4424, 3370.004, 3371.887, 3602.9292, 3817.426, 3490.007, 3569.2139, 3276.9138, 3492.4795]
2025-08-07 10:20:43,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:20:43,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 34 seconds)
2025-08-07 10:22:23,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:36,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3506.67188 ± 131.254
2025-08-07 10:22:36,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3754.1016, 3591.289, 3290.3003, 3459.629, 3650.1038, 3450.2627, 3361.7615, 3433.1052, 3575.0085, 3501.1577]
2025-08-07 10:22:36,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:22:36,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 42 seconds)
2025-08-07 10:24:16,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:30,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3512.62939 ± 136.074
2025-08-07 10:24:30,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3382.936, 3542.864, 3531.5486, 3578.394, 3515.5068, 3675.8245, 3521.5352, 3280.468, 3748.0261, 3349.1907]
2025-08-07 10:24:30,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:24:30,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 48 seconds)
2025-08-07 10:26:15,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:29,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3717.33862 ± 125.794
2025-08-07 10:26:29,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3614.4666, 3757.1887, 3754.9946, 3534.6172, 3775.4385, 3557.806, 3824.9236, 3834.04, 3923.6824, 3596.228]
2025-08-07 10:26:29,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:26:29,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1226 [INFO]: New best (3717.34) for latency MM1Queue_a033_s075
2025-08-07 10:26:29,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 6 seconds)
2025-08-07 10:28:09,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:22,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3605.52026 ± 140.700
2025-08-07 10:28:22,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3725.2722, 3663.0098, 3749.3462, 3485.1965, 3387.1304, 3460.8098, 3431.391, 3792.5938, 3684.786, 3675.666]
2025-08-07 10:28:22,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:28:22,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 22 seconds)
2025-08-07 10:30:02,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:16,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3246.16846 ± 323.107
2025-08-07 10:30:16,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3593.193, 3633.481, 3052.2742, 2465.9617, 3108.5073, 3296.0232, 3248.1875, 3153.3271, 3385.933, 3524.7964]
2025-08-07 10:30:16,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 910.0, 776.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:30:16,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 27 seconds)
2025-08-07 10:31:55,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:09,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3657.67114 ± 104.113
2025-08-07 10:32:09,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3560.7065, 3660.1663, 3671.7744, 3445.9548, 3668.194, 3807.7136, 3621.2917, 3772.2969, 3596.3875, 3772.2283]
2025-08-07 10:32:09,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:32:09,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 32 seconds)
2025-08-07 10:33:42,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3231.02368 ± 1072.123
2025-08-07 10:33:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3544.072, 3761.2395, 3774.2737, 3699.8186, 3696.473, 3643.3264, 48.10811, 3339.4663, 3481.8074, 3321.6526]
2025-08-07 10:33:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 41.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:33:54,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 31 seconds)
2025-08-07 10:35:34,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:48,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3617.88599 ± 148.021
2025-08-07 10:35:48,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3454.6084, 3942.1238, 3453.3599, 3651.3066, 3538.1948, 3566.9385, 3737.4167, 3466.289, 3729.5203, 3639.103]
2025-08-07 10:35:48,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:35:48,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 35 seconds)
2025-08-07 10:37:27,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:41,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3634.14404 ± 129.185
2025-08-07 10:37:41,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3639.646, 3636.8667, 3814.3054, 3740.3535, 3425.7104, 3509.5066, 3602.4817, 3612.1465, 3509.3323, 3851.0947]
2025-08-07 10:37:41,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:37:41,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 43 seconds)
2025-08-07 10:39:21,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:35,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3666.30933 ± 195.987
2025-08-07 10:39:35,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3543.2566, 3242.2246, 3628.1082, 3897.789, 3436.4746, 3853.6921, 3743.29, 3727.2622, 3756.0068, 3834.9868]
2025-08-07 10:39:35,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:39:35,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 51 seconds)
2025-08-07 10:41:14,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:28,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1221 [DEBUG]: Total Reward: 3505.18872 ± 209.333
2025-08-07 10:41:28,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1222 [DEBUG]: All rewards: [3774.9512, 3290.9866, 3572.0383, 3210.3245, 3827.9336, 3214.7551, 3468.5613, 3493.5796, 3498.383, 3700.3752]
2025-08-07 10:41:28,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:41:28,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-ant):1251 [DEBUG]: Training session finished
