2025-05-10 22:03:44,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4
2025-05-10 22:03:44,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4
2025-05-10 22:03:44,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x741f2f5c4c70>}
2025-05-10 22:03:44,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1111 [DEBUG]: using device: cpu
2025-05-10 22:03:44,112 baseline-bpql-noisy-ant:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 24
2025-05-10 22:03:44,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-10 22:03:44,117 baseline-bpql-noisy-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-10 22:03:44,118 baseline-bpql-noisy-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-10 22:03:44,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-10 22:03:44,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-10 22:07:04,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:07:11,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: -310.64435 ± 273.325
2025-05-10 22:07:11,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [-99.19358, -1039.1423, -417.40308, -174.35237, -317.47955, -367.77075, -104.600815, -137.00497, -384.43265, -65.063354]
2025-05-10 22:07:11,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 1000.0, 435.0, 239.0, 274.0, 351.0, 107.0, 123.0, 301.0, 77.0]
2025-05-10 22:07:11,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (-310.64) for latency MM1Queue_a033_s075
2025-05-10 22:07:11,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:07:11,450 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:07:11,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 41 minutes, 46 seconds)
2025-05-10 22:10:27,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:10:35,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: -51.79665 ± 64.247
2025-05-10 22:10:35,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [-2.1340888, -4.6247797, -36.758205, -13.737112, 5.617106, -97.7976, -26.355871, -88.73602, -36.555695, -216.88423]
2025-05-10 22:10:35,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 57.0, 337.0, 268.0, 170.0, 1000.0, 96.0, 1000.0, 112.0, 1000.0]
2025-05-10 22:10:35,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (-51.80) for latency MM1Queue_a033_s075
2025-05-10 22:10:35,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:10:35,625 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:10:35,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 35 minutes, 54 seconds)
2025-05-10 22:13:22,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:13:33,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 95.31892 ± 88.325
2025-05-10 22:13:33,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [23.917425, 38.28495, 138.40706, 40.59876, 106.51429, 336.76788, 30.64209, 110.21265, 66.403725, 61.44041]
2025-05-10 22:13:33,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [170.0, 1000.0, 1000.0, 134.0, 1000.0, 1000.0, 194.0, 588.0, 646.0, 167.0]
2025-05-10 22:13:33,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (95.32) for latency MM1Queue_a033_s075
2025-05-10 22:13:33,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:13:33,461 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:13:33,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 17 minutes, 29 seconds)
2025-05-10 22:16:47,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:16:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 219.44646 ± 149.498
2025-05-10 22:16:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [11.0428705, 392.17743, 321.08734, 146.81479, 154.0213, 138.81985, 120.951996, 534.22705, 106.93902, 268.383]
2025-05-10 22:16:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 1000.0, 1000.0, 201.0, 423.0, 181.0, 174.0, 1000.0, 400.0, 1000.0]
2025-05-10 22:16:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (219.45) for latency MM1Queue_a033_s075
2025-05-10 22:16:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:16:58,025 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:16:58,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 17 minutes, 29 seconds)
2025-05-10 22:19:49,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:20:06,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 486.24048 ± 238.871
2025-05-10 22:20:06,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [511.27823, 621.43195, 626.9806, 283.3516, 678.3699, 39.55022, 620.3951, 693.2358, 96.940956, 690.8704]
2025-05-10 22:20:06,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 959.0, 214.0, 1000.0, 932.0, 138.0, 1000.0]
2025-05-10 22:20:06,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (486.24) for latency MM1Queue_a033_s075
2025-05-10 22:20:06,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:20:06,128 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:20:06,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 10 minutes, 54 seconds)
2025-05-10 22:23:02,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:23:12,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 314.13324 ± 138.317
2025-05-10 22:23:12,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [88.72429, 422.1854, 163.9051, 367.44382, 147.67746, 259.38678, 377.55292, 400.70374, 355.7587, 557.9942]
2025-05-10 22:23:12,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [91.0, 1000.0, 237.0, 1000.0, 192.0, 289.0, 1000.0, 438.0, 377.0, 762.0]
2025-05-10 22:23:12,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 1 minute, 3 seconds)
2025-05-10 22:26:12,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:26:20,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 330.00433 ± 262.713
2025-05-10 22:26:20,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [30.811401, 543.70276, 64.10009, 240.4647, 197.58105, 247.41602, 148.49455, 932.17773, 327.89728, 567.39777]
2025-05-10 22:26:20,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 1000.0, 43.0, 299.0, 222.0, 262.0, 167.0, 1000.0, 358.0, 1000.0]
2025-05-10 22:26:20,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 52 minutes, 59 seconds)
2025-05-10 22:29:17,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:29:31,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 685.11267 ± 343.888
2025-05-10 22:29:31,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [656.14905, 1090.7892, 1075.9678, 549.31036, 633.5587, 750.7346, 1146.8468, 130.26646, 121.35202, 696.1519]
2025-05-10 22:29:31,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 545.0, 1000.0, 565.0, 1000.0, 127.0, 93.0, 1000.0]
2025-05-10 22:29:31,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (685.11) for latency MM1Queue_a033_s075
2025-05-10 22:29:31,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:29:31,769 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:29:31,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 53 minutes, 52 seconds)
2025-05-10 22:32:44,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:32:53,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 536.43677 ± 326.246
2025-05-10 22:32:53,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [224.7499, 214.41594, 696.6312, 207.55832, 900.0434, 1041.898, 181.46817, 917.8988, 308.00305, 671.70087]
2025-05-10 22:32:53,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [171.0, 175.0, 597.0, 136.0, 705.0, 796.0, 132.0, 747.0, 223.0, 1000.0]
2025-05-10 22:32:53,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 49 minutes, 54 seconds)
2025-05-10 22:35:37,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:35:49,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 600.25000 ± 397.835
2025-05-10 22:35:49,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [113.76426, 616.6448, 68.32733, 845.17944, 908.96674, 461.83533, 1225.773, 822.82574, 924.2529, 14.930442]
2025-05-10 22:35:49,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [91.0, 432.0, 55.0, 1000.0, 1000.0, 297.0, 1000.0, 1000.0, 1000.0, 17.0]
2025-05-10 22:35:49,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 42 minutes, 57 seconds)
2025-05-10 22:38:52,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:38:58,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 423.14325 ± 394.287
2025-05-10 22:38:58,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [81.55697, 827.6286, 852.79486, 159.39261, 1254.2585, 134.03311, 477.91644, 78.72576, 117.557594, 247.56827]
2025-05-10 22:38:58,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 585.0, 600.0, 115.0, 942.0, 105.0, 309.0, 56.0, 64.0, 160.0]
2025-05-10 22:38:58,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 40 minutes, 36 seconds)
2025-05-10 22:41:57,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:42:06,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 412.73926 ± 266.691
2025-05-10 22:42:06,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [459.84607, 242.9361, 190.66238, 711.891, 590.8561, 175.33606, 84.43368, 173.5655, 925.35895, 572.507]
2025-05-10 22:42:06,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [272.0, 181.0, 111.0, 1000.0, 1000.0, 131.0, 59.0, 129.0, 1000.0, 367.0]
2025-05-10 22:42:06,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 37 minutes, 17 seconds)
2025-05-10 22:45:08,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:45:27,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1068.76294 ± 333.012
2025-05-10 22:45:27,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1085.4158, 1487.0779, 885.734, 665.3451, 1257.6381, 878.65594, 781.14703, 641.21313, 1611.1409, 1394.2627]
2025-05-10 22:45:27,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 453.0, 875.0, 1000.0, 1000.0, 1000.0, 1000.0, 936.0]
2025-05-10 22:45:27,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1068.76) for latency MM1Queue_a033_s075
2025-05-10 22:45:27,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:45:27,242 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:45:27,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 37 minutes, 5 seconds)
2025-05-10 22:48:24,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:48:38,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1091.54260 ± 471.917
2025-05-10 22:48:38,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [251.34383, 1517.3514, 1182.4015, 476.76367, 1459.366, 1367.1562, 917.48425, 1644.9535, 600.8533, 1497.752]
2025-05-10 22:48:38,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [161.0, 1000.0, 796.0, 281.0, 1000.0, 1000.0, 506.0, 1000.0, 367.0, 948.0]
2025-05-10 22:48:38,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1091.54) for latency MM1Queue_a033_s075
2025-05-10 22:48:38,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:48:38,050 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:48:38,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 30 minutes, 42 seconds)
2025-05-10 22:51:55,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:52:12,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1487.30371 ± 327.507
2025-05-10 22:52:12,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1320.248, 991.48944, 1762.571, 892.0609, 1870.9686, 1664.5613, 1599.3074, 1722.0695, 1769.5027, 1280.2589]
2025-05-10 22:52:12,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [821.0, 1000.0, 1000.0, 481.0, 1000.0, 1000.0, 882.0, 1000.0, 1000.0, 685.0]
2025-05-10 22:52:12,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1487.30) for latency MM1Queue_a033_s075
2025-05-10 22:52:12,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:52:12,184 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:52:12,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 38 minutes, 28 seconds)
2025-05-10 22:55:04,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:55:20,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1155.59888 ± 453.667
2025-05-10 22:55:20,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1638.1707, 736.6201, 1611.6754, 1130.7031, 374.6984, 1286.7592, 1781.8732, 1479.1509, 783.46045, 732.8783]
2025-05-10 22:55:20,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [919.0, 1000.0, 899.0, 640.0, 225.0, 639.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:55:20,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 34 minutes, 58 seconds)
2025-05-10 22:58:19,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:58:32,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1091.15100 ± 733.449
2025-05-10 22:58:32,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [269.68246, 155.44736, 2002.3966, 1177.9988, 2229.5527, 355.2538, 1196.2051, 534.4361, 1998.3193, 992.21686]
2025-05-10 22:58:32,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [141.0, 81.0, 1000.0, 1000.0, 1000.0, 213.0, 1000.0, 284.0, 1000.0, 1000.0]
2025-05-10 22:58:32,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 33 minutes, 1 second)
2025-05-10 23:01:45,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:01:57,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 873.96716 ± 377.560
2025-05-10 23:01:57,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [407.52774, 1121.8353, 454.73108, 788.93164, 931.27997, 1584.883, 894.4866, 900.6769, 353.6287, 1301.6904]
2025-05-10 23:01:57,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [229.0, 1000.0, 273.0, 383.0, 1000.0, 803.0, 1000.0, 444.0, 198.0, 1000.0]
2025-05-10 23:01:57,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 30 minutes, 38 seconds)
2025-05-10 23:04:43,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:04:58,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1124.89697 ± 469.187
2025-05-10 23:04:58,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1197.1361, 876.34216, 1155.409, 1602.924, 1590.7806, 1016.5651, 496.6465, 409.7224, 924.3021, 1979.1412]
2025-05-10 23:04:58,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [647.0, 1000.0, 1000.0, 807.0, 1000.0, 1000.0, 226.0, 226.0, 470.0, 1000.0]
2025-05-10 23:04:58,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 24 minutes, 40 seconds)
2025-05-10 23:08:03,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:08:15,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1047.64624 ± 535.537
2025-05-10 23:08:15,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [647.7331, 1735.0608, 1142.8396, 535.9959, 465.74475, 272.2038, 1257.8003, 1041.9855, 1406.5442, 1970.5543]
2025-05-10 23:08:15,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [311.0, 739.0, 1000.0, 259.0, 205.0, 146.0, 1000.0, 508.0, 1000.0, 984.0]
2025-05-10 23:08:15,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 16 minutes, 53 seconds)
2025-05-10 23:11:18,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:11:26,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 669.31226 ± 431.828
2025-05-10 23:11:26,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [910.8507, 810.80164, 159.41197, 1422.869, 164.94856, 1199.2577, 59.501373, 708.7334, 780.9283, 475.82037]
2025-05-10 23:11:26,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [471.0, 349.0, 61.0, 1000.0, 107.0, 1000.0, 38.0, 314.0, 1000.0, 237.0]
2025-05-10 23:11:26,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 14 minutes, 30 seconds)
2025-05-10 23:14:28,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:14:39,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 830.95575 ± 491.171
2025-05-10 23:14:39,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1291.6902, 655.8798, 296.63025, 840.00494, 450.03946, 1420.2129, 110.88692, 1670.8281, 1072.921, 500.464]
2025-05-10 23:14:39,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 246.0, 143.0, 1000.0, 197.0, 644.0, 59.0, 1000.0, 1000.0, 214.0]
2025-05-10 23:14:39,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 11 minutes, 20 seconds)
2025-05-10 23:17:45,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:18:05,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1422.11060 ± 533.506
2025-05-10 23:18:05,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2018.7751, 870.3492, 2367.8914, 788.57684, 1891.6809, 1838.9463, 1117.5872, 1235.0055, 839.6545, 1252.64]
2025-05-10 23:18:05,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 583.0, 1000.0, 1000.0]
2025-05-10 23:18:05,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 8 minutes, 30 seconds)
2025-05-10 23:21:08,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:21:19,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 955.26990 ± 728.496
2025-05-10 23:21:19,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2137.601, 729.6253, -1.4359815, 341.37704, 2245.0247, 948.90326, 120.2836, 1096.67, 663.1082, 1271.5415]
2025-05-10 23:21:19,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 348.0, 17.0, 204.0, 1000.0, 1000.0, 56.0, 492.0, 305.0, 1000.0]
2025-05-10 23:21:19,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 8 minutes, 27 seconds)
2025-05-10 23:24:15,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:24:29,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1245.39856 ± 576.389
2025-05-10 23:24:29,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1308.1232, 1413.3083, 574.94244, 2458.9707, 2027.8345, 643.6227, 1247.8795, 1097.0006, 691.741, 990.5621]
2025-05-10 23:24:29,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [535.0, 678.0, 277.0, 1000.0, 1000.0, 325.0, 556.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:24:29,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 3 minutes, 35 seconds)
2025-05-10 23:27:36,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:27:48,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1414.68921 ± 774.358
2025-05-10 23:27:48,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2469.691, 2039.2866, 2259.6677, 2220.075, 734.4487, 688.1154, 324.97144, 1278.7002, 1653.6245, 478.30927]
2025-05-10 23:27:48,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 986.0, 357.0, 302.0, 152.0, 540.0, 780.0, 216.0]
2025-05-10 23:27:48,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 2 minutes, 5 seconds)
2025-05-10 23:30:41,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:30:55,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1465.09290 ± 853.679
2025-05-10 23:30:55,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1966.5022, 2488.571, 191.99588, 2196.0842, 181.63869, 2439.8645, 742.8285, 2142.7327, 981.2132, 1319.4977]
2025-05-10 23:30:55,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 81.0, 1000.0, 85.0, 1000.0, 1000.0, 1000.0, 435.0, 608.0]
2025-05-10 23:30:55,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 57 minutes, 27 seconds)
2025-05-10 23:33:45,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:34:03,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2132.46655 ± 429.819
2025-05-10 23:34:03,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2586.5815, 1767.8068, 2380.8032, 1522.554, 2555.3357, 1887.3632, 2329.4539, 1371.0413, 2469.4155, 2454.3105]
2025-05-10 23:34:03,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 638.0, 1000.0, 1000.0, 1000.0, 833.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:34:03,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2132.47) for latency MM1Queue_a033_s075
2025-05-10 23:34:03,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:34:03,376 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 23:34:03,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 49 minutes, 52 seconds)
2025-05-10 23:36:56,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:37:05,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1033.83301 ± 636.974
2025-05-10 23:37:05,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [383.37186, 1026.6534, 1111.5474, 1285.5403, 503.52274, 89.19006, 2451.5503, 749.0297, 1174.5667, 1563.3579]
2025-05-10 23:37:05,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [135.0, 379.0, 414.0, 1000.0, 163.0, 36.0, 1000.0, 288.0, 1000.0, 625.0]
2025-05-10 23:37:05,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 44 minutes, 5 seconds)
2025-05-10 23:40:16,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:40:26,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1194.73865 ± 929.772
2025-05-10 23:40:26,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1598.1661, 247.74901, 256.03592, 313.93762, 2525.4001, 87.21851, 836.1854, 2255.5, 1346.2838, 2480.9106]
2025-05-10 23:40:26,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [675.0, 126.0, 118.0, 100.0, 1000.0, 51.0, 377.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:40:26,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 43 minutes, 14 seconds)
2025-05-10 23:43:21,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:43:36,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1497.79688 ± 873.658
2025-05-10 23:43:36,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1161.2303, 1276.705, 655.2055, 787.0749, 2850.7612, 1254.1484, 2589.0073, 43.344906, 1927.2437, 2433.248]
2025-05-10 23:43:36,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [396.0, 1000.0, 271.0, 1000.0, 1000.0, 1000.0, 1000.0, 25.0, 1000.0, 1000.0]
2025-05-10 23:43:36,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 38 minutes, 7 seconds)
2025-05-10 23:46:22,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:46:39,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1760.01880 ± 945.246
2025-05-10 23:46:39,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1994.6835, 842.5926, 2403.9548, 2606.7026, 2886.5447, 2261.2722, 32.643078, 462.65375, 1532.1633, 2576.9785]
2025-05-10 23:46:39,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 19.0, 174.0, 1000.0, 1000.0]
2025-05-10 23:46:39,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 33 minutes, 52 seconds)
2025-05-10 23:49:49,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:50:09,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1691.65393 ± 610.828
2025-05-10 23:50:09,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2240.0132, 968.1157, 1435.885, 1021.3784, 1237.2434, 1437.739, 2483.9998, 2795.305, 2016.9783, 1279.881]
2025-05-10 23:50:09,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [857.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:50:09,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 35 minutes, 44 seconds)
2025-05-10 23:52:53,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:53:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1847.77905 ± 948.184
2025-05-10 23:53:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1209.897, 304.7697, 2611.2566, 2699.6555, 1950.649, 442.02258, 2683.6348, 2660.2573, 1084.7755, 2830.8733]
2025-05-10 23:53:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 120.0, 1000.0, 1000.0, 741.0, 207.0, 1000.0, 1000.0, 427.0, 1000.0]
2025-05-10 23:53:08,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 31 minutes, 48 seconds)
2025-05-10 23:56:13,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:56:27,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1577.86938 ± 722.933
2025-05-10 23:56:27,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1157.2104, 1408.4258, 1098.7039, 2024.1693, 1409.5897, 2033.3088, 1573.1072, 10.345272, 2730.1091, 2333.7258]
2025-05-10 23:56:27,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [457.0, 516.0, 1000.0, 769.0, 546.0, 789.0, 1000.0, 17.0, 1000.0, 1000.0]
2025-05-10 23:56:27,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 28 minutes, 13 seconds)
2025-05-10 23:59:32,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:59:43,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 948.30725 ± 677.564
2025-05-10 23:59:43,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [833.89307, 1937.3256, 1661.8646, 1110.0266, 618.0158, 351.392, 3.069061, 213.13112, 2008.5024, 745.8513]
2025-05-10 23:59:43,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 680.0, 1000.0, 241.0, 143.0, 14.0, 98.0, 705.0, 1000.0]
2025-05-10 23:59:43,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 26 minutes, 22 seconds)
2025-05-11 00:02:28,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:02:42,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1765.58923 ± 978.049
2025-05-11 00:02:42,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1731.1957, 2661.9692, 1791.0028, 408.04077, 659.1238, 1576.7483, 2674.8752, 3031.6323, 306.8678, 2814.4358]
2025-05-11 00:02:42,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 153.0, 267.0, 582.0, 1000.0, 1000.0, 121.0, 1000.0]
2025-05-11 00:02:42,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 22 minutes, 23 seconds)
2025-05-11 00:05:42,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:05:54,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1330.63342 ± 863.151
2025-05-11 00:05:54,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2614.2554, 1196.787, 1288.8022, 1057.1865, 2880.779, 234.05557, 249.8423, 1296.1682, 1921.9132, 566.54407]
2025-05-11 00:05:54,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 516.0, 389.0, 1000.0, 94.0, 115.0, 1000.0, 727.0, 249.0]
2025-05-11 00:05:54,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 15 minutes, 21 seconds)
2025-05-11 00:08:51,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:09:06,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1204.62561 ± 422.212
2025-05-11 00:09:06,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1235.8942, 1212.2538, 1445.5676, 702.06647, 1026.0566, 983.9509, 2216.6536, 766.5915, 1522.8589, 934.3614]
2025-05-11 00:09:06,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [431.0, 1000.0, 655.0, 294.0, 1000.0, 1000.0, 841.0, 265.0, 1000.0, 1000.0]
2025-05-11 00:09:06,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 14 minutes, 48 seconds)
2025-05-11 00:12:07,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:12:21,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1946.11560 ± 895.116
2025-05-11 00:12:21,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [203.8738, 893.6106, 1749.1394, 1457.5775, 2539.074, 2931.9966, 2584.0952, 1430.6129, 2754.853, 2916.3232]
2025-05-11 00:12:21,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 345.0, 723.0, 569.0, 1000.0, 1000.0, 1000.0, 535.0, 1000.0, 1000.0]
2025-05-11 00:12:21,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 10 minutes, 46 seconds)
2025-05-11 00:15:16,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:15:28,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1434.45715 ± 993.710
2025-05-11 00:15:28,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [481.27686, 2416.9363, 216.04846, 2658.4282, 409.42545, 2020.9639, 2243.3057, 1018.0312, 257.7556, 2622.3992]
2025-05-11 00:15:28,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [170.0, 1000.0, 89.0, 1000.0, 185.0, 1000.0, 1000.0, 362.0, 158.0, 1000.0]
2025-05-11 00:15:28,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 5 minutes, 44 seconds)
2025-05-11 00:18:37,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:18:53,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1982.01660 ± 978.157
2025-05-11 00:18:53,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1258.993, 1703.8281, 2913.785, 2088.9785, 3242.976, 2864.9136, 2180.4656, 2735.2444, 96.22728, 734.7556]
2025-05-11 00:18:53,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 519.0, 1000.0, 712.0, 990.0, 945.0, 823.0, 1000.0, 59.0, 1000.0]
2025-05-11 00:18:53,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 7 minutes, 37 seconds)
2025-05-11 00:21:46,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:21:57,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1489.60120 ± 932.774
2025-05-11 00:21:57,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [988.24414, 2209.411, 468.97934, 2087.577, 352.80743, 30.262562, 2939.5994, 2446.607, 1770.7228, 1601.8029]
2025-05-11 00:21:57,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 767.0, 164.0, 746.0, 197.0, 29.0, 1000.0, 787.0, 679.0, 631.0]
2025-05-11 00:21:58,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 3 minutes, 2 seconds)
2025-05-11 00:25:03,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:25:08,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 655.35181 ± 572.223
2025-05-11 00:25:08,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [69.63458, 74.82043, 1083.4775, 832.9723, 508.34164, 464.77545, 73.33529, 316.18433, 1238.7336, 1891.2426]
2025-05-11 00:25:08,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 53.0, 1000.0, 302.0, 197.0, 132.0, 63.0, 111.0, 468.0, 534.0]
2025-05-11 00:25:08,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 59 minutes, 32 seconds)
2025-05-11 00:28:06,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:28:20,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1913.13733 ± 1084.303
2025-05-11 00:28:20,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2682.8696, 2197.3667, 2036.8464, 137.04851, 432.91, 2760.729, 2937.9875, 2728.206, 362.80045, 2854.6094]
2025-05-11 00:28:20,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 714.0, 59.0, 185.0, 1000.0, 1000.0, 1000.0, 140.0, 1000.0]
2025-05-11 00:28:20,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 55 minutes, 51 seconds)
2025-05-11 00:31:19,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:31:31,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1565.74829 ± 782.806
2025-05-11 00:31:31,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1980.2849, 1791.3479, 990.61633, 1499.8639, 638.85614, 432.08267, 1485.0745, 2918.354, 2759.8638, 1161.1399]
2025-05-11 00:31:31,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [697.0, 725.0, 376.0, 567.0, 275.0, 182.0, 1000.0, 1000.0, 1000.0, 402.0]
2025-05-11 00:31:31,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 53 minutes, 18 seconds)
2025-05-11 00:34:33,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:34:42,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1241.25562 ± 996.193
2025-05-11 00:34:42,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [423.1077, 150.64014, 2447.5352, 887.84717, 3126.7827, 536.29236, 1005.79333, 33.05178, 2055.5527, 1745.9535]
2025-05-11 00:34:42,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [165.0, 51.0, 752.0, 322.0, 1000.0, 177.0, 1000.0, 38.0, 658.0, 570.0]
2025-05-11 00:34:42,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 47 minutes, 44 seconds)
2025-05-11 00:37:38,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:37:51,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1540.89514 ± 859.951
2025-05-11 00:37:51,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [821.9895, 2847.3987, 295.05908, 2431.0388, 1440.263, 1049.3403, 623.3602, 1082.8416, 2500.6465, 2317.0132]
2025-05-11 00:37:51,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [271.0, 1000.0, 95.0, 1000.0, 480.0, 430.0, 232.0, 1000.0, 985.0, 839.0]
2025-05-11 00:37:51,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 45 minutes, 13 seconds)
2025-05-11 00:40:52,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:41:07,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2050.66504 ± 1026.143
2025-05-11 00:41:07,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1941.3917, 236.78912, 643.0579, 3127.8137, 2844.5198, 3234.8274, 2570.1057, 2800.3389, 2157.3052, 950.5021]
2025-05-11 00:41:07,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [573.0, 72.0, 215.0, 993.0, 1000.0, 1000.0, 1000.0, 1000.0, 739.0, 1000.0]
2025-05-11 00:41:07,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 42 minutes, 57 seconds)
2025-05-11 00:44:16,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:44:29,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1144.36841 ± 807.470
2025-05-11 00:44:29,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [930.8322, 609.37494, 1187.6758, 1495.0211, 3024.9368, 1119.3389, 224.58025, 140.54556, 1895.7297, 815.64935]
2025-05-11 00:44:29,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 276.0, 400.0, 1000.0, 1000.0, 1000.0, 81.0, 81.0, 556.0, 1000.0]
2025-05-11 00:44:29,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 41 minutes, 27 seconds)
2025-05-11 00:47:20,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:47:26,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1008.67596 ± 878.189
2025-05-11 00:47:26,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1680.0371, 383.18405, 673.29083, 107.073654, 2561.715, 52.266155, 607.1712, 2508.779, 557.12036, 956.12244]
2025-05-11 00:47:26,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [529.0, 158.0, 224.0, 54.0, 830.0, 28.0, 208.0, 756.0, 172.0, 358.0]
2025-05-11 00:47:26,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 36 minutes, 1 second)
2025-05-11 00:50:36,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:50:50,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2035.86206 ± 960.960
2025-05-11 00:50:50,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3016.768, 1004.8976, 2759.709, 1754.7023, 659.5436, 3347.5999, 1910.1841, 2207.359, 649.6874, 3048.1719]
2025-05-11 00:50:50,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [964.0, 1000.0, 1000.0, 571.0, 218.0, 1000.0, 619.0, 780.0, 199.0, 1000.0]
2025-05-11 00:50:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 34 minutes, 50 seconds)
2025-05-11 00:53:42,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:53:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1992.80603 ± 906.992
2025-05-11 00:53:55,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1971.9207, 1036.4282, 643.39575, 2771.604, 3320.5793, 1103.3302, 3338.8584, 2462.8196, 1485.7197, 1793.4048]
2025-05-11 00:53:55,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 304.0, 182.0, 896.0, 1000.0, 361.0, 1000.0, 829.0, 422.0, 569.0]
2025-05-11 00:53:55,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 31 minutes, 2 seconds)
2025-05-11 00:57:03,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:57:12,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1467.89465 ± 1018.995
2025-05-11 00:57:12,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2191.2847, 2886.0544, 384.84836, 2093.131, 474.28595, 1198.491, 552.59625, 1903.9509, 48.762234, 2945.5413]
2025-05-11 00:57:12,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [791.0, 1000.0, 146.0, 735.0, 193.0, 356.0, 165.0, 689.0, 29.0, 1000.0]
2025-05-11 00:57:12,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 28 minutes, 3 seconds)
2025-05-11 01:00:00,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:00:14,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1787.67346 ± 1210.950
2025-05-11 01:00:14,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [640.015, 3198.3105, 3451.1233, 154.17413, 2932.2625, 3113.1753, 1074.9149, 1570.6494, 314.70493, 1427.4052]
2025-05-11 01:00:14,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [229.0, 908.0, 1000.0, 61.0, 1000.0, 1000.0, 1000.0, 530.0, 121.0, 1000.0]
2025-05-11 01:00:14,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 21 minutes, 40 seconds)
2025-05-11 01:03:16,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:03:29,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1736.16235 ± 952.543
2025-05-11 01:03:29,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [761.7834, 1037.4043, 818.8314, 921.8058, 3223.2563, 2844.54, 1669.7288, 3289.201, 1246.872, 1548.1995]
2025-05-11 01:03:29,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [230.0, 342.0, 1000.0, 1000.0, 1000.0, 926.0, 521.0, 1000.0, 361.0, 502.0]
2025-05-11 01:03:29,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 21 minutes, 18 seconds)
2025-05-11 01:06:33,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:06:41,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1345.87964 ± 1354.682
2025-05-11 01:06:41,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3185.1787, 3158.5427, 155.53099, 2410.4143, 12.461754, 3138.837, 636.4798, 145.7659, 164.21683, 451.3678]
2025-05-11 01:06:41,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 71.0, 772.0, 18.0, 1000.0, 215.0, 62.0, 66.0, 134.0]
2025-05-11 01:06:41,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 16 minutes, 21 seconds)
2025-05-11 01:09:41,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:09:52,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1566.19604 ± 991.474
2025-05-11 01:09:52,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1538.2349, 1905.6935, 297.42963, 533.9251, 1774.906, 1537.7742, 3071.0798, 945.72015, 629.1072, 3428.0898]
2025-05-11 01:09:52,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [509.0, 606.0, 80.0, 208.0, 618.0, 489.0, 1000.0, 1000.0, 230.0, 1000.0]
2025-05-11 01:09:52,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 13 minutes, 57 seconds)
2025-05-11 01:12:51,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:13:02,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1779.64221 ± 1137.925
2025-05-11 01:13:02,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1169.8746, 267.81818, 3126.1858, 3302.0813, 1877.4076, 209.10242, 544.9295, 1676.2826, 2911.4128, 2711.3274]
2025-05-11 01:13:02,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [374.0, 88.0, 1000.0, 989.0, 527.0, 80.0, 161.0, 560.0, 874.0, 1000.0]
2025-05-11 01:13:02,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 9 minutes, 44 seconds)
2025-05-11 01:16:04,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:16:22,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2434.98511 ± 895.751
2025-05-11 01:16:22,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3072.6917, 2804.2266, 3205.5078, 1346.6746, 3328.857, 2005.8973, 2996.4695, 1179.7966, 1038.0834, 3371.6475]
2025-05-11 01:16:22,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 703.0, 1000.0, 366.0, 1000.0, 1000.0]
2025-05-11 01:16:22,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2434.99) for latency MM1Queue_a033_s075
2025-05-11 01:16:22,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 01:16:22,402 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 01:16:22,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 9 minutes, 6 seconds)
2025-05-11 01:19:22,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:19:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2280.22998 ± 1168.538
2025-05-11 01:19:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2786.0815, 3014.2356, 3185.7102, 3219.088, 3172.1155, 2948.7847, 32.15201, 527.24347, 1068.928, 2847.9612]
2025-05-11 01:19:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 900.0, 1000.0, 1000.0, 1000.0, 1000.0, 19.0, 228.0, 330.0, 1000.0]
2025-05-11 01:19:36,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 5 minutes, 41 seconds)
2025-05-11 01:22:33,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:22:45,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1995.60291 ± 1111.208
2025-05-11 01:22:45,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3140.0637, 2633.0383, 3055.5415, 756.102, 3055.5828, 1590.7454, 716.5207, 519.8762, 1072.3329, 3416.2268]
2025-05-11 01:22:45,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 794.0, 1000.0, 251.0, 1000.0, 478.0, 235.0, 161.0, 339.0, 1000.0]
2025-05-11 01:22:45,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 2 minutes, 6 seconds)
2025-05-11 01:25:46,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:25:59,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2124.97339 ± 997.513
2025-05-11 01:25:59,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1400.5189, 372.412, 3271.822, 1406.6263, 2993.3596, 2178.144, 3031.707, 3067.451, 2711.0913, 816.60175]
2025-05-11 01:25:59,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [463.0, 124.0, 1000.0, 404.0, 1000.0, 718.0, 866.0, 1000.0, 822.0, 301.0]
2025-05-11 01:25:59,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 59 minutes, 17 seconds)
2025-05-11 01:29:10,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:29:28,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2440.09741 ± 862.125
2025-05-11 01:29:28,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2980.3958, 3053.4648, 2204.5183, 953.1599, 1101.355, 3158.5337, 3271.736, 2954.6267, 3142.9062, 1580.2776]
2025-05-11 01:29:28,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 401.0, 1000.0, 1000.0, 1000.0, 980.0, 527.0]
2025-05-11 01:29:28,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2440.10) for latency MM1Queue_a033_s075
2025-05-11 01:29:28,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 01:29:28,590 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 01:29:28,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 58 minutes, 22 seconds)
2025-05-11 01:32:17,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:32:31,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1886.99390 ± 887.089
2025-05-11 01:32:31,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [871.4676, 1484.8331, 3462.435, 874.4005, 2931.2495, 1289.5865, 1291.5306, 3050.4775, 1821.7172, 1792.2417]
2025-05-11 01:32:31,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [287.0, 460.0, 1000.0, 323.0, 1000.0, 397.0, 1000.0, 1000.0, 1000.0, 569.0]
2025-05-11 01:32:31,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 53 minutes, 5 seconds)
2025-05-11 01:35:33,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:35:43,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1754.07983 ± 1257.265
2025-05-11 01:35:43,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3135.3508, 1422.5576, 3278.173, 1749.5459, 138.52647, 3118.0781, 508.47427, 3222.6575, 657.0138, 310.42184]
2025-05-11 01:35:43,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 423.0, 1000.0, 506.0, 72.0, 911.0, 184.0, 1000.0, 205.0, 90.0]
2025-05-11 01:35:43,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 49 minutes, 37 seconds)
2025-05-11 01:38:53,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:39:06,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1665.11292 ± 1102.897
2025-05-11 01:39:06,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [187.21732, 2729.7788, 1127.676, 3084.2578, 3349.5842, 199.26913, 2242.8235, 1902.6537, 775.27234, 1052.5961]
2025-05-11 01:39:06,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 964.0, 368.0, 1000.0, 1000.0, 62.0, 666.0, 1000.0, 254.0, 1000.0]
2025-05-11 01:39:06,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 47 minutes, 49 seconds)
2025-05-11 01:41:56,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:42:10,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2170.46777 ± 990.688
2025-05-11 01:42:10,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2937.0552, 3078.5728, 3416.0564, 861.4179, 1107.7902, 710.6684, 2151.3674, 3100.912, 2881.3403, 1459.4983]
2025-05-11 01:42:10,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 261.0, 343.0, 255.0, 758.0, 1000.0, 1000.0, 393.0]
2025-05-11 01:42:10,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 43 minutes, 33 seconds)
2025-05-11 01:45:16,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:45:26,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1395.85962 ± 962.611
2025-05-11 01:45:26,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1249.9396, 755.1755, 3316.0273, 850.9449, 1249.8063, 120.210976, 2583.531, 2357.0933, 709.8857, 765.9823]
2025-05-11 01:45:26,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 268.0, 1000.0, 328.0, 355.0, 54.0, 768.0, 692.0, 238.0, 263.0]
2025-05-11 01:45:26,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 38 minutes, 57 seconds)
2025-05-11 01:48:27,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:48:35,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1471.08850 ± 1211.156
2025-05-11 01:48:35,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2352.613, 343.78766, 2917.6616, 1747.5972, 384.30286, 512.143, 3412.0168, 47.606365, 365.96365, 2627.1926]
2025-05-11 01:48:35,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [751.0, 100.0, 1000.0, 513.0, 148.0, 144.0, 1000.0, 35.0, 101.0, 812.0]
2025-05-11 01:48:35,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 36 minutes, 25 seconds)
2025-05-11 01:51:27,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:51:42,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2491.73438 ± 909.036
2025-05-11 01:51:42,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3401.8276, 2626.4124, 3195.7017, 3214.7683, 2888.8638, 1061.0361, 1135.2843, 3151.2256, 3027.888, 1214.3341]
2025-05-11 01:51:42,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 955.0, 982.0, 374.0, 383.0, 1000.0, 1000.0, 377.0]
2025-05-11 01:51:42,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2491.73) for latency MM1Queue_a033_s075
2025-05-11 01:51:42,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 01:51:42,891 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 01:51:42,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 32 minutes, 41 seconds)
2025-05-11 01:54:48,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:54:58,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1465.38074 ± 1218.757
2025-05-11 01:54:58,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [982.00836, 3131.352, 2984.6306, 125.24077, 47.198284, 1192.1788, 212.43758, 1525.8423, 1017.61536, 3435.3035]
2025-05-11 01:54:58,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [293.0, 920.0, 1000.0, 55.0, 40.0, 1000.0, 81.0, 528.0, 365.0, 1000.0]
2025-05-11 01:54:58,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 28 minutes, 52 seconds)
2025-05-11 01:57:51,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:58:01,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1695.84631 ± 1367.572
2025-05-11 01:58:01,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2474.9836, 3214.2253, 3547.5195, 3559.9604, 165.4127, 272.10657, 291.479, 1952.227, 168.91617, 1311.6312]
2025-05-11 01:58:01,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 79.0, 85.0, 109.0, 610.0, 92.0, 452.0]
2025-05-11 01:58:01,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 25 minutes, 39 seconds)
2025-05-11 02:00:56,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:01:11,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2248.66919 ± 888.464
2025-05-11 02:01:11,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3114.2595, 1561.5498, 1219.4092, 701.77264, 3443.2087, 2871.7117, 3012.9434, 1948.949, 2908.2327, 1704.6533]
2025-05-11 02:01:11,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 486.0, 392.0, 278.0, 1000.0, 1000.0, 1000.0, 598.0, 1000.0, 1000.0]
2025-05-11 02:01:11,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 21 minutes, 54 seconds)
2025-05-11 02:04:10,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:04:21,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1829.48889 ± 1059.103
2025-05-11 02:04:21,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [268.13177, 3225.94, 1791.4272, 1053.6904, 2253.9988, 199.55368, 1853.6335, 1468.955, 3291.8125, 2887.7468]
2025-05-11 02:04:21,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 1000.0, 1000.0, 307.0, 695.0, 64.0, 554.0, 499.0, 1000.0, 1000.0]
2025-05-11 02:04:21,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 18 minutes, 48 seconds)
2025-05-11 02:07:26,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:07:33,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1397.05725 ± 1141.391
2025-05-11 02:07:33,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [825.0756, 2572.5583, 169.49644, 3312.0557, 441.04504, 1154.1312, 3247.6729, 1191.1595, 215.3439, 842.03326]
2025-05-11 02:07:33,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [227.0, 777.0, 63.0, 1000.0, 168.0, 324.0, 1000.0, 334.0, 75.0, 287.0]
2025-05-11 02:07:33,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 16 minutes, 3 seconds)
2025-05-11 02:10:33,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:10:45,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2160.88721 ± 1196.836
2025-05-11 02:10:45,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3244.1377, 3215.9395, 3176.4365, 1033.7212, 2883.823, 2956.3137, 3188.4663, 1212.5847, 179.37865, 518.0719]
2025-05-11 02:10:45,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 319.0, 1000.0, 921.0, 1000.0, 353.0, 59.0, 150.0]
2025-05-11 02:10:45,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 12 minutes, 38 seconds)
2025-05-11 02:13:37,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:13:51,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2266.19409 ± 1249.872
2025-05-11 02:13:51,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3224.6057, 2992.1782, 3024.0657, 2834.4683, 3352.7793, 213.6667, 43.457798, 3146.6326, 934.4764, 2895.6084]
2025-05-11 02:13:51,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 952.0, 1000.0, 74.0, 29.0, 1000.0, 284.0, 1000.0]
2025-05-11 02:13:51,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 9 minutes, 36 seconds)
2025-05-11 02:16:59,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:17:10,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2047.53394 ± 1235.874
2025-05-11 02:17:10,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3428.0884, 3485.0195, 2315.8306, 1380.6371, 535.7958, 3595.2466, 225.83919, 1716.914, 3042.7693, 749.1995]
2025-05-11 02:17:10,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 719.0, 439.0, 194.0, 1000.0, 91.0, 542.0, 868.0, 246.0]
2025-05-11 02:17:10,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 7 minutes, 9 seconds)
2025-05-11 02:20:17,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:20:27,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1463.89355 ± 1093.120
2025-05-11 02:20:27,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1806.9326, 2989.11, 601.5596, 626.5379, 175.5452, 2504.7605, 112.61693, 1277.1682, 3308.9546, 1235.7504]
2025-05-11 02:20:27,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [532.0, 951.0, 217.0, 178.0, 58.0, 767.0, 67.0, 383.0, 1000.0, 1000.0]
2025-05-11 02:20:27,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 4 minutes, 23 seconds)
2025-05-11 02:23:17,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:23:29,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2084.56079 ± 1134.279
2025-05-11 02:23:29,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3399.501, 1512.2341, 3523.5469, 1832.9545, 3294.6006, 1071.2751, 2358.139, 2875.4746, 931.62036, 46.25928]
2025-05-11 02:23:29,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 460.0, 1000.0, 1000.0, 1000.0, 340.0, 688.0, 881.0, 253.0, 53.0]
2025-05-11 02:23:29,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 33 seconds)
2025-05-11 02:26:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:26:37,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2123.50244 ± 938.365
2025-05-11 02:26:37,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [378.87326, 3304.516, 3257.1013, 2560.6885, 1747.3823, 819.94775, 2129.9963, 2972.0378, 2427.936, 1636.5446]
2025-05-11 02:26:37,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [119.0, 1000.0, 1000.0, 797.0, 537.0, 1000.0, 632.0, 1000.0, 698.0, 541.0]
2025-05-11 02:26:37,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 57 minutes, 4 seconds)
2025-05-11 02:29:43,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:29:58,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2189.48706 ± 1083.109
2025-05-11 02:29:58,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3302.2844, 1448.0027, 3188.2522, 1318.5028, 649.4011, 365.88074, 3177.261, 2340.1873, 3025.7893, 3079.3103]
2025-05-11 02:29:58,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 470.0, 1000.0, 1000.0, 230.0, 153.0, 1000.0, 721.0, 951.0, 1000.0]
2025-05-11 02:29:58,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 54 minutes, 48 seconds)
2025-05-11 02:32:47,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:33:00,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2247.85889 ± 897.307
2025-05-11 02:33:00,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3012.8884, 2384.901, 1868.6315, 3272.5234, 3162.6367, 568.60675, 1866.5463, 2242.186, 960.5161, 3139.156]
2025-05-11 02:33:00,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 718.0, 558.0, 1000.0, 1000.0, 235.0, 619.0, 1000.0, 314.0, 1000.0]
2025-05-11 02:33:00,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 50 minutes, 40 seconds)
2025-05-11 02:36:07,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:36:19,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2116.11401 ± 1438.097
2025-05-11 02:36:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [475.94562, 3294.6814, 3415.0867, 141.38742, 3421.496, 3256.8066, 741.7228, 2994.109, 125.62, 3294.2854]
2025-05-11 02:36:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [161.0, 1000.0, 1000.0, 49.0, 1000.0, 1000.0, 205.0, 1000.0, 54.0, 1000.0]
2025-05-11 02:36:19,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 47 minutes, 35 seconds)
2025-05-11 02:39:03,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:39:16,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2139.20923 ± 1160.894
2025-05-11 02:39:16,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3330.0427, 1225.1481, 3580.0261, 1741.948, 352.1515, 3290.0688, 2436.722, 548.30615, 1531.0844, 3356.5942]
2025-05-11 02:39:16,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 392.0, 1000.0, 559.0, 104.0, 1000.0, 1000.0, 203.0, 440.0, 1000.0]
2025-05-11 02:39:16,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 44 minutes, 9 seconds)
2025-05-11 02:42:17,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:42:25,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1527.70105 ± 1226.105
2025-05-11 02:42:25,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1518.6445, 3508.8247, 101.80046, 3146.1533, 1295.0277, 522.8296, 1056.1957, 3146.754, 935.49835, 45.282578]
2025-05-11 02:42:25,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [471.0, 1000.0, 42.0, 1000.0, 406.0, 188.0, 327.0, 1000.0, 280.0, 26.0]
2025-05-11 02:42:25,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 41 minutes, 6 seconds)
2025-05-11 02:45:15,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:45:30,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2577.34253 ± 1217.105
2025-05-11 02:45:30,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3225.2039, 3502.614, 3155.312, 420.13284, 2015.2755, 3195.3267, 3521.1484, 3158.1353, 149.30064, 3430.977]
2025-05-11 02:45:30,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 178.0, 550.0, 1000.0, 1000.0, 1000.0, 48.0, 1000.0]
2025-05-11 02:45:30,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2577.34) for latency MM1Queue_a033_s075
2025-05-11 02:45:30,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-11 02:45:30,366 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 02:45:30,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 37 minutes, 17 seconds)
2025-05-11 02:48:40,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:48:52,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1907.94788 ± 1341.979
2025-05-11 02:48:52,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3068.2188, 3138.4565, 408.19818, 675.4174, 3210.9932, 2951.3672, 159.85524, 1823.1501, 161.21243, 3482.6094]
2025-05-11 02:48:52,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 118.0, 219.0, 1000.0, 1000.0, 78.0, 1000.0, 76.0, 1000.0]
2025-05-11 02:48:52,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 34 minutes, 53 seconds)
2025-05-11 02:51:49,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:52:02,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2165.24390 ± 1136.635
2025-05-11 02:52:02,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [957.06525, 2605.8865, 3402.8486, 2076.225, 3491.3523, 853.9725, 1413.9374, 3260.4133, 3246.315, 344.42456]
2025-05-11 02:52:02,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [297.0, 771.0, 1000.0, 1000.0, 1000.0, 274.0, 395.0, 1000.0, 1000.0, 108.0]
2025-05-11 02:52:02,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 31 minutes, 26 seconds)
2025-05-11 02:54:46,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:54:57,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1870.41931 ± 1307.182
2025-05-11 02:54:57,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3337.815, 1550.623, 1967.1249, 3188.6191, 335.63382, 337.993, 3396.517, 3472.5635, 411.13437, 706.16925]
2025-05-11 02:54:57,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 648.0, 1000.0, 144.0, 157.0, 1000.0, 1000.0, 146.0, 229.0]
2025-05-11 02:54:57,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 28 minutes, 14 seconds)
2025-05-11 02:57:59,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:58:13,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2223.15503 ± 1139.885
2025-05-11 02:58:13,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1497.6887, 2850.0151, 2706.8975, 3254.035, 3378.1982, 338.65802, 975.9681, 3204.0183, 3294.8455, 731.2254]
2025-05-11 02:58:13,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 866.0, 816.0, 1000.0, 1000.0, 119.0, 372.0, 1000.0, 1000.0, 231.0]
2025-05-11 02:58:13,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 25 minutes, 16 seconds)
2025-05-11 03:01:15,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:01:25,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1545.20044 ± 1083.184
2025-05-11 03:01:25,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3489.5002, 1743.3877, 948.9413, 1005.99927, 226.8448, 3466.9788, 664.9515, 1349.9766, 1940.92, 614.50354]
2025-05-11 03:01:25,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 518.0, 343.0, 280.0, 88.0, 1000.0, 239.0, 442.0, 1000.0, 200.0]
2025-05-11 03:01:25,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 16 seconds)
2025-05-11 03:04:09,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:04:21,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1872.64258 ± 1023.682
2025-05-11 03:04:21,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1607.5355, 1957.4718, 104.86296, 1380.769, 3214.177, 1445.085, 3241.8909, 1366.9791, 3366.2317, 1041.4224]
2025-05-11 03:04:21,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [472.0, 658.0, 47.0, 368.0, 1000.0, 394.0, 1000.0, 402.0, 1000.0, 1000.0]
2025-05-11 03:04:21,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 18 minutes, 35 seconds)
2025-05-11 03:07:33,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:07:46,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2010.86877 ± 992.139
2025-05-11 03:07:46,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1743.1499, 2691.534, 1756.3525, 1206.1608, 3436.5752, 3083.8655, 1312.9868, 334.12955, 1277.6598, 3266.2744]
2025-05-11 03:07:46,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [610.0, 815.0, 538.0, 376.0, 1000.0, 1000.0, 391.0, 113.0, 1000.0, 1000.0]
2025-05-11 03:07:46,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 43 seconds)
2025-05-11 03:10:37,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:10:47,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1710.81250 ± 1357.565
2025-05-11 03:10:47,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3359.597, 2962.3306, 237.43614, 931.2611, 36.405903, 996.48206, 1697.0184, 3401.7468, 131.85413, 3353.9932]
2025-05-11 03:10:47,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [973.0, 1000.0, 79.0, 289.0, 26.0, 305.0, 561.0, 1000.0, 54.0, 1000.0]
2025-05-11 03:10:47,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 39 seconds)
2025-05-11 03:13:39,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:13:50,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1403.73364 ± 908.086
2025-05-11 03:13:50,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1169.7128, 270.95157, 3541.8982, 1875.0563, 1638.686, 1525.5671, 1257.3501, 1012.3423, 64.77703, 1680.9951]
2025-05-11 03:13:50,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 90.0, 1000.0, 1000.0, 524.0, 462.0, 1000.0, 293.0, 27.0, 501.0]
2025-05-11 03:13:50,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 22 seconds)
2025-05-11 03:16:42,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:16:57,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2448.66748 ± 921.926
2025-05-11 03:16:57,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3165.9834, 3303.4177, 2222.5403, 3113.0664, 2111.6245, 3583.9907, 3059.2812, 1395.3574, 531.50134, 1999.91]
2025-05-11 03:16:57,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 875.0, 624.0, 1000.0, 1000.0, 408.0, 174.0, 574.0]
2025-05-11 03:16:57,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 12 seconds)
2025-05-11 03:20:04,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:20:18,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2058.91162 ± 1119.663
2025-05-11 03:20:18,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3205.5256, 1072.1168, 787.65125, 3516.7026, 1458.9308, 3650.4124, 3110.0693, 626.7536, 1576.2264, 1584.7268]
2025-05-11 03:20:18,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 363.0, 290.0, 1000.0, 1000.0, 1000.0, 1000.0, 233.0, 1000.0, 477.0]
2025-05-11 03:20:18,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 11 seconds)
2025-05-11 03:23:06,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:23:23,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2489.37549 ± 930.875
2025-05-11 03:23:23,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3620.0815, 1008.12225, 2197.967, 3192.8374, 1078.231, 1578.3824, 2915.917, 2558.2046, 3218.8542, 3525.16]
2025-05-11 03:23:23,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 630.0, 1000.0, 1000.0, 455.0, 1000.0, 777.0, 1000.0, 1000.0]
2025-05-11 03:23:23,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1251 [DEBUG]: Training session finished
