2025-05-09 02:41:56,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16
2025-05-09 02:41:56,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16
2025-05-09 02:41:56,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7da26f1cc160>}
2025-05-09 02:41:56,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1111 [DEBUG]: using device: cpu
2025-05-09 02:41:56,838 baseline-bpql-noisy-ant:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-05-09 02:41:56,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-09 02:41:56,843 baseline-bpql-noisy-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=155, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-09 02:41:56,843 baseline-bpql-noisy-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 02:41:57,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-09 02:41:57,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-09 02:44:43,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:44:43,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: -21.23356 ± 35.370
2025-05-09 02:44:43,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [-85.248405, 2.9876049, -10.157698, -8.53716, -0.96565926, -12.0511465, 3.1127908, -11.601041, 6.2370667, -96.112]
2025-05-09 02:44:43,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 24.0, 25.0, 29.0, 25.0, 31.0, 24.0, 25.0, 23.0, 90.0]
2025-05-09 02:44:43,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (-21.23) for latency MM1Queue_a033_s075
2025-05-09 02:44:43,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 02:44:43,718 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 02:44:43,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 34 minutes, 58 seconds)
2025-05-09 02:47:33,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:47:41,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: -253.68979 ± 360.955
2025-05-09 02:47:41,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [-18.100386, -729.70996, -17.210398, -2.699791, -2.5113144, -807.03485, -15.2631, -871.2921, -36.023655, -37.05257]
2025-05-09 02:47:41,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 1000.0, 43.0, 39.0, 54.0, 1000.0, 54.0, 1000.0, 68.0, 58.0]
2025-05-09 02:47:41,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 40 minutes, 55 seconds)
2025-05-09 02:50:11,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:50:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: -44.17733 ± 36.223
2025-05-09 02:50:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [-54.400467, -92.10603, -10.123194, -56.519093, -15.838343, -87.04557, -63.358147, -77.0887, 5.7564645, 8.949743]
2025-05-09 02:50:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [114.0, 106.0, 123.0, 127.0, 64.0, 129.0, 159.0, 178.0, 49.0, 45.0]
2025-05-09 02:50:13,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 27 minutes, 41 seconds)
2025-05-09 02:52:59,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:53:03,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: -81.68908 ± 197.092
2025-05-09 02:53:03,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [-653.06537, 26.877617, -3.7231889, 3.948272, 12.602214, -87.61391, -22.745739, 16.29763, 26.904394, -136.37273]
2025-05-09 02:53:03,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 109.0, 75.0, 94.0, 60.0, 131.0, 90.0, 78.0, 48.0, 188.0]
2025-05-09 02:53:03,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 26 minutes, 39 seconds)
2025-05-09 02:56:00,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:56:09,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 31.55555 ± 46.083
2025-05-09 02:56:09,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [-3.7208502, 55.34148, 38.543182, 58.29648, -42.33615, 30.699417, 125.06729, -30.06638, 26.908113, 56.822845]
2025-05-09 02:56:09,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [154.0, 1000.0, 239.0, 127.0, 1000.0, 119.0, 406.0, 1000.0, 102.0, 234.0]
2025-05-09 02:56:09,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (31.56) for latency MM1Queue_a033_s075
2025-05-09 02:56:09,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 02:56:09,750 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 02:56:09,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 30 minutes)
2025-05-09 02:58:53,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:59:16,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 530.00061 ± 57.076
2025-05-09 02:59:16,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [520.5953, 498.12726, 462.5395, 627.8149, 523.82874, 501.79037, 637.4653, 473.46094, 556.8494, 497.5342]
2025-05-09 02:59:16,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 02:59:16,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (530.00) for latency MM1Queue_a033_s075
2025-05-09 02:59:16,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 02:59:16,687 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 02:59:16,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 33 minutes, 31 seconds)
2025-05-09 03:02:11,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:02:35,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 737.01355 ± 49.846
2025-05-09 03:02:35,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [786.3079, 734.454, 686.86005, 790.4391, 663.9429, 792.2564, 714.9425, 661.4044, 757.5186, 782.0094]
2025-05-09 03:02:35,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 03:02:35,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (737.01) for latency MM1Queue_a033_s075
2025-05-09 03:02:35,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:02:35,231 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:02:35,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 37 minutes, 11 seconds)
2025-05-09 03:05:30,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:05:47,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 594.96063 ± 256.461
2025-05-09 03:05:47,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [825.2702, 34.381393, 313.6167, 783.5005, 776.5275, 352.95325, 700.8368, 807.3226, 593.24194, 761.95514]
2025-05-09 03:05:47,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 30.0, 383.0, 1000.0, 1000.0, 445.0, 1000.0, 1000.0, 737.0, 1000.0]
2025-05-09 03:05:47,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 46 minutes, 23 seconds)
2025-05-09 03:08:36,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:08:48,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 396.80453 ± 330.580
2025-05-09 03:08:48,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [807.758, 780.51044, 222.65253, 771.4951, 60.298965, 797.21, 29.648891, 323.01697, 111.53502, 63.91961]
2025-05-09 03:08:48,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 313.0, 1000.0, 89.0, 1000.0, 31.0, 498.0, 81.0, 53.0]
2025-05-09 03:08:48,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 46 minutes, 31 seconds)
2025-05-09 03:11:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:11:34,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 313.61459 ± 327.462
2025-05-09 03:11:34,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [68.62764, 28.658117, 391.01172, 829.09515, 62.179436, 25.947659, 146.72473, 46.633934, 758.6737, 778.5937]
2025-05-09 03:11:34,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [115.0, 34.0, 446.0, 1000.0, 48.0, 31.0, 114.0, 43.0, 1000.0, 1000.0]
2025-05-09 03:11:34,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 37 minutes, 23 seconds)
2025-05-09 03:14:26,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:14:40,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 596.97876 ± 348.597
2025-05-09 03:14:40,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [57.186264, 839.2348, 71.79967, 909.34216, 836.29663, 996.65936, 933.49457, 382.02533, 676.79047, 266.95798]
2025-05-09 03:14:40,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [50.0, 1000.0, 56.0, 1000.0, 1000.0, 1000.0, 1000.0, 320.0, 567.0, 272.0]
2025-05-09 03:14:40,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 34 minutes)
2025-05-09 03:17:37,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:17:47,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 456.52448 ± 348.366
2025-05-09 03:17:47,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [77.184074, 807.1098, 181.99155, 582.9004, 952.76886, 334.36777, 308.04672, 85.383644, 1055.0586, 180.4331]
2025-05-09 03:17:47,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 1000.0, 133.0, 537.0, 883.0, 265.0, 248.0, 64.0, 988.0, 159.0]
2025-05-09 03:17:47,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 27 minutes, 30 seconds)
2025-05-09 03:20:36,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:20:42,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 280.63928 ± 266.045
2025-05-09 03:20:42,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [39.90029, 699.9915, 46.90501, 819.8036, 299.84494, 26.400639, 378.44272, 112.82083, 261.5193, 120.76417]
2025-05-09 03:20:42,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 697.0, 42.0, 1000.0, 280.0, 40.0, 321.0, 74.0, 179.0, 108.0]
2025-05-09 03:20:42,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 19 minutes, 31 seconds)
2025-05-09 03:23:38,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:23:49,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 472.14667 ± 346.749
2025-05-09 03:23:49,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1114.4402, 203.78595, 734.68744, 40.788883, 401.21237, 777.02075, 306.66864, 41.968224, 815.1289, 285.76538]
2025-05-09 03:23:49,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [867.0, 168.0, 688.0, 43.0, 314.0, 1000.0, 269.0, 44.0, 1000.0, 185.0]
2025-05-09 03:23:49,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 18 minutes, 13 seconds)
2025-05-09 03:26:33,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:26:47,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 677.57953 ± 357.026
2025-05-09 03:26:47,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [743.1634, 184.42485, 1340.7174, 821.0312, 1057.5789, 285.72476, 270.9581, 890.43036, 737.87354, 443.89285]
2025-05-09 03:26:47,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [590.0, 132.0, 1000.0, 1000.0, 1000.0, 212.0, 273.0, 1000.0, 606.0, 377.0]
2025-05-09 03:26:47,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 18 minutes, 37 seconds)
2025-05-09 03:29:41,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:29:56,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 860.51678 ± 458.617
2025-05-09 03:29:56,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1349.4271, 679.8774, 1021.92114, 183.4234, 960.39703, 1376.8497, 409.68756, 1379.3999, 1114.215, 129.97]
2025-05-09 03:29:56,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 496.0, 751.0, 134.0, 1000.0, 1000.0, 323.0, 1000.0, 1000.0, 113.0]
2025-05-09 03:29:56,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (860.52) for latency MM1Queue_a033_s075
2025-05-09 03:29:56,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:29:56,657 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:29:56,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 16 minutes, 34 seconds)
2025-05-09 03:32:36,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:32:53,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1043.82410 ± 458.616
2025-05-09 03:32:53,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [653.6447, 54.300106, 1506.7302, 1341.8004, 1475.8448, 1127.3932, 1116.9978, 1479.8032, 526.7162, 1155.0111]
2025-05-09 03:32:53,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [426.0, 47.0, 1000.0, 953.0, 1000.0, 1000.0, 1000.0, 1000.0, 290.0, 805.0]
2025-05-09 03:32:53,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1043.82) for latency MM1Queue_a033_s075
2025-05-09 03:32:53,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:32:53,289 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:32:53,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 10 minutes, 41 seconds)
2025-05-09 03:35:42,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:35:54,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 752.91150 ± 538.877
2025-05-09 03:35:54,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [154.97888, 753.4475, 1555.1599, 550.3497, 176.22908, 1277.5441, 335.68375, 91.09844, 1366.0527, 1268.5714]
2025-05-09 03:35:54,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [99.0, 1000.0, 1000.0, 399.0, 106.0, 798.0, 250.0, 70.0, 1000.0, 748.0]
2025-05-09 03:35:54,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 9 minutes, 16 seconds)
2025-05-09 03:38:49,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:39:05,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 925.01349 ± 462.364
2025-05-09 03:39:05,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [553.66376, 1231.4069, 1320.5172, 1379.6167, 1146.8241, 1443.5251, 1022.5735, 857.43616, 123.99984, 170.572]
2025-05-09 03:39:05,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [370.0, 1000.0, 831.0, 1000.0, 1000.0, 1000.0, 627.0, 1000.0, 90.0, 120.0]
2025-05-09 03:39:05,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 7 minutes, 18 seconds)
2025-05-09 03:41:52,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:42:09,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 656.05328 ± 287.830
2025-05-09 03:42:09,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [688.16974, 259.60242, 955.81775, 904.2496, 35.456448, 724.14136, 524.0655, 704.4799, 823.54205, 941.0076]
2025-05-09 03:42:09,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 166.0, 1000.0, 1000.0, 32.0, 1000.0, 401.0, 1000.0, 563.0, 1000.0]
2025-05-09 03:42:09,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 5 minutes, 49 seconds)
2025-05-09 03:44:57,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:45:06,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 584.57007 ± 513.846
2025-05-09 03:45:06,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [329.83734, 1124.624, 78.86314, 832.97437, 1693.153, 329.15793, 953.92523, 90.87884, 153.12683, 259.1596]
2025-05-09 03:45:06,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [191.0, 767.0, 74.0, 1000.0, 1000.0, 216.0, 577.0, 85.0, 91.0, 180.0]
2025-05-09 03:45:06,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 59 minutes, 39 seconds)
2025-05-09 03:48:09,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:48:20,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 643.40765 ± 403.499
2025-05-09 03:48:20,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [902.01575, 312.54343, 742.70605, 396.75104, 726.2132, 954.5259, 183.42203, 1560.3394, 214.14645, 441.41342]
2025-05-09 03:48:20,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 176.0, 427.0, 295.0, 466.0, 1000.0, 149.0, 905.0, 145.0, 345.0]
2025-05-09 03:48:20,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 1 minute, 6 seconds)
2025-05-09 03:50:59,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:51:14,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1167.27869 ± 575.726
2025-05-09 03:51:14,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [267.57864, 1503.4072, 1785.3188, 506.36627, 519.1444, 1544.9813, 1840.5487, 703.82697, 1255.7715, 1745.8438]
2025-05-09 03:51:14,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [138.0, 799.0, 1000.0, 303.0, 266.0, 764.0, 1000.0, 390.0, 1000.0, 1000.0]
2025-05-09 03:51:14,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1167.28) for latency MM1Queue_a033_s075
2025-05-09 03:51:14,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:51:14,522 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:51:14,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 56 minutes, 7 seconds)
2025-05-09 03:54:03,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:54:24,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1643.48950 ± 409.005
2025-05-09 03:54:24,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1462.4763, 1692.1083, 1834.4581, 1953.91, 1938.4877, 1898.906, 1759.6637, 503.61554, 1564.5145, 1826.7556]
2025-05-09 03:54:24,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [823.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 257.0, 1000.0, 1000.0]
2025-05-09 03:54:24,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1643.49) for latency MM1Queue_a033_s075
2025-05-09 03:54:24,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:54:24,253 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:54:24,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 52 minutes, 52 seconds)
2025-05-09 03:57:25,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:57:33,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 520.00934 ± 455.966
2025-05-09 03:57:33,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [480.16757, 720.63995, 258.82687, 67.866776, 190.17361, 648.9606, 303.9199, 1626.8986, 874.2881, 28.350893]
2025-05-09 03:57:33,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [266.0, 479.0, 118.0, 38.0, 103.0, 351.0, 178.0, 1000.0, 1000.0, 31.0]
2025-05-09 03:57:33,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 51 minutes, 1 second)
2025-05-09 04:00:22,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:00:39,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1198.84900 ± 674.555
2025-05-09 04:00:39,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1913.2268, 1361.2036, 1017.8654, 31.660206, 1856.4594, 1699.4808, 51.429848, 872.9589, 1273.2642, 1910.9409]
2025-05-09 04:00:39,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [948.0, 857.0, 542.0, 31.0, 1000.0, 1000.0, 38.0, 1000.0, 1000.0, 1000.0]
2025-05-09 04:00:39,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 49 minutes, 59 seconds)
2025-05-09 04:03:11,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:03:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 939.22852 ± 669.138
2025-05-09 04:03:23,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1698.1539, 82.01285, 150.0108, 1791.297, 588.7716, 2035.9199, 991.0183, 794.8961, 257.54364, 1002.66125]
2025-05-09 04:03:23,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 56.0, 91.0, 862.0, 300.0, 1000.0, 444.0, 490.0, 151.0, 1000.0]
2025-05-09 04:03:23,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 39 minutes, 44 seconds)
2025-05-09 04:06:13,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:06:28,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1382.56616 ± 651.654
2025-05-09 04:06:28,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1880.8075, 626.1962, 1941.0126, 113.68367, 2017.7678, 1276.1749, 1840.68, 1904.2327, 1592.9927, 632.1137]
2025-05-09 04:06:28,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 311.0, 1000.0, 67.0, 1000.0, 615.0, 1000.0, 1000.0, 848.0, 328.0]
2025-05-09 04:06:28,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 39 minutes, 25 seconds)
2025-05-09 04:09:19,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:09:34,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1290.99243 ± 706.943
2025-05-09 04:09:34,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [445.7719, 1885.3965, 228.44696, 1737.1438, 746.3896, 1877.097, 2137.3845, 412.27875, 1490.6754, 1949.3395]
2025-05-09 04:09:34,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [195.0, 1000.0, 133.0, 922.0, 365.0, 1000.0, 1000.0, 189.0, 825.0, 1000.0]
2025-05-09 04:09:34,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 35 minutes, 19 seconds)
2025-05-09 04:12:22,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:12:43,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1962.13733 ± 214.036
2025-05-09 04:12:43,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1979.9866, 2195.2036, 1503.0646, 2085.6545, 1910.8524, 2175.0984, 1994.9996, 2075.9019, 1637.6417, 2062.9688]
2025-05-09 04:12:43,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 722.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 734.0, 1000.0]
2025-05-09 04:12:43,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (1962.14) for latency MM1Queue_a033_s075
2025-05-09 04:12:43,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 04:12:43,246 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 04:12:43,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 32 minutes, 22 seconds)
2025-05-09 04:15:33,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:15:53,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1906.94849 ± 540.384
2025-05-09 04:15:53,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2078.3306, 2224.8806, 1223.7994, 2356.973, 2040.5472, 2241.2888, 578.0001, 2217.692, 1847.9023, 2260.0696]
2025-05-09 04:15:53,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 571.0, 1000.0, 1000.0, 1000.0, 278.0, 1000.0, 919.0, 1000.0]
2025-05-09 04:15:53,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 30 minutes, 14 seconds)
2025-05-09 04:18:41,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:18:52,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 960.98651 ± 599.084
2025-05-09 04:18:52,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1224.9062, 813.78107, 582.3282, 171.17708, 143.34488, 2029.7292, 846.9308, 623.7158, 1518.86, 1655.0925]
2025-05-09 04:18:52,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [530.0, 1000.0, 254.0, 71.0, 108.0, 886.0, 407.0, 247.0, 1000.0, 752.0]
2025-05-09 04:18:52,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 30 minutes, 34 seconds)
2025-05-09 04:21:49,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:22:05,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1589.32373 ± 747.532
2025-05-09 04:22:05,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2101.7664, 2279.0193, 2442.3547, 1414.2146, 666.6565, 2052.565, 634.5763, 2281.7908, 1704.2574, 316.03476]
2025-05-09 04:22:05,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 991.0, 626.0, 298.0, 901.0, 233.0, 1000.0, 1000.0, 146.0]
2025-05-09 04:22:05,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 29 minutes, 11 seconds)
2025-05-09 04:24:49,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:25:09,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2096.03760 ± 426.695
2025-05-09 04:25:09,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2334.983, 2243.8228, 2321.1587, 2320.558, 2248.7927, 943.82965, 2295.3274, 1676.5652, 2269.9795, 2305.3608]
2025-05-09 04:25:09,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 388.0, 1000.0, 777.0, 1000.0, 1000.0]
2025-05-09 04:25:09,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2096.04) for latency MM1Queue_a033_s075
2025-05-09 04:25:09,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 04:25:09,547 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 04:25:09,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 25 minutes, 48 seconds)
2025-05-09 04:28:04,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:28:17,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1411.22534 ± 966.532
2025-05-09 04:28:17,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2648.778, 2187.9192, 41.28408, 27.01028, 196.61276, 2310.574, 1131.2014, 2056.8196, 2201.655, 1310.3998]
2025-05-09 04:28:17,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 39.0, 26.0, 94.0, 969.0, 499.0, 771.0, 1000.0, 476.0]
2025-05-09 04:28:17,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 22 minutes, 28 seconds)
2025-05-09 04:30:59,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:31:15,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1493.20093 ± 883.293
2025-05-09 04:31:15,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2300.588, 98.407196, 25.360092, 1033.1328, 2338.4785, 2362.0835, 1177.5934, 2232.141, 1111.1241, 2253.1013]
2025-05-09 04:31:15,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 57.0, 25.0, 1000.0, 1000.0, 1000.0, 479.0, 1000.0, 439.0, 1000.0]
2025-05-09 04:31:15,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 16 minutes, 45 seconds)
2025-05-09 04:34:06,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:34:23,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1820.23438 ± 829.526
2025-05-09 04:34:23,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2471.6885, 1361.8684, 620.93304, 2110.0178, 2626.796, 2719.5667, 2313.1606, 422.81577, 1047.801, 2507.6958]
2025-05-09 04:34:23,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 560.0, 306.0, 1000.0, 1000.0, 1000.0, 1000.0, 196.0, 413.0, 1000.0]
2025-05-09 04:34:23,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 15 minutes, 24 seconds)
2025-05-09 04:37:16,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:37:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2077.50635 ± 177.607
2025-05-09 04:37:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2074.2964, 2230.8096, 2139.18, 2100.5732, 2193.084, 1568.16, 2167.5974, 2135.151, 2042.4877, 2123.7246]
2025-05-09 04:37:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 735.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 04:37:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 12 minutes, 47 seconds)
2025-05-09 04:40:30,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:40:47,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1975.16968 ± 875.442
2025-05-09 04:40:47,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2693.5896, 2270.348, 2613.5215, 2771.681, 2567.1165, 1534.7301, 146.37125, 2444.9182, 599.9547, 2109.466]
2025-05-09 04:40:47,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 920.0, 1000.0, 1000.0, 1000.0, 611.0, 79.0, 1000.0, 253.0, 838.0]
2025-05-09 04:40:47,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 10 minutes, 46 seconds)
2025-05-09 04:43:39,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:43:59,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2154.92456 ± 697.575
2025-05-09 04:43:59,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1073.8792, 2435.787, 2513.2397, 1947.9526, 2514.7598, 2749.5598, 2533.9568, 2535.1365, 2649.3914, 595.5861]
2025-05-09 04:43:59,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 767.0, 1000.0, 1000.0, 1000.0, 931.0, 1000.0, 241.0]
2025-05-09 04:43:59,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2154.92) for latency MM1Queue_a033_s075
2025-05-09 04:43:59,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 04:43:59,942 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 04:43:59,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 8 minutes, 26 seconds)
2025-05-09 04:46:48,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:47:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2293.46338 ± 362.743
2025-05-09 04:47:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2577.348, 2333.837, 1564.7468, 2492.9978, 2348.4358, 2559.9263, 1610.1095, 2395.1616, 2485.7512, 2566.321]
2025-05-09 04:47:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 861.0, 1000.0, 594.0, 1000.0, 1000.0, 1000.0]
2025-05-09 04:47:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2293.46) for latency MM1Queue_a033_s075
2025-05-09 04:47:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 04:47:09,765 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 04:47:09,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 7 minutes, 39 seconds)
2025-05-09 04:49:58,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:50:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1518.84900 ± 912.165
2025-05-09 04:50:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2445.6948, 701.716, 2510.303, 2445.6409, 2489.0083, 148.82117, 1160.7157, 1917.449, 1161.227, 207.91364]
2025-05-09 04:50:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 312.0, 1000.0, 1000.0, 1000.0, 72.0, 505.0, 1000.0, 498.0, 129.0]
2025-05-09 04:50:12,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 3 minutes, 35 seconds)
2025-05-09 04:52:59,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:53:14,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1839.72852 ± 858.674
2025-05-09 04:53:14,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [465.78088, 1496.0701, 2737.6948, 2691.0706, 2216.1223, 477.21326, 1232.2094, 1700.7571, 2650.2456, 2730.1216]
2025-05-09 04:53:14,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [189.0, 581.0, 1000.0, 1000.0, 811.0, 281.0, 468.0, 612.0, 1000.0, 1000.0]
2025-05-09 04:53:14,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 57 minutes, 49 seconds)
2025-05-09 04:56:00,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:56:21,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2339.38428 ± 437.266
2025-05-09 04:56:21,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2379.2397, 2645.6917, 2653.5034, 2354.4426, 2247.0767, 2370.5083, 2597.6333, 2477.394, 2579.8342, 1088.5193]
2025-05-09 04:56:21,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 465.0]
2025-05-09 04:56:21,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2339.38) for latency MM1Queue_a033_s075
2025-05-09 04:56:21,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 04:56:21,974 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 04:56:22,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 54 minutes, 23 seconds)
2025-05-09 04:59:10,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:59:22,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1421.47949 ± 768.962
2025-05-09 04:59:22,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [99.95215, 2082.8875, 1108.5676, 535.344, 618.44904, 2294.0798, 1482.1322, 2386.0833, 1476.9696, 2130.3303]
2025-05-09 04:59:22,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [64.0, 834.0, 399.0, 211.0, 247.0, 1000.0, 582.0, 1000.0, 537.0, 858.0]
2025-05-09 04:59:22,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 49 minutes, 7 seconds)
2025-05-09 05:02:13,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:02:23,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1238.23608 ± 1000.171
2025-05-09 05:02:23,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2356.7637, 171.99245, 321.90518, 848.4163, 433.74612, 90.07196, 1080.4882, 2547.5, 2901.2495, 1630.2262]
2025-05-09 05:02:23,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [862.0, 76.0, 182.0, 302.0, 206.0, 66.0, 406.0, 1000.0, 1000.0, 634.0]
2025-05-09 05:02:23,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 44 minutes, 33 seconds)
2025-05-09 05:05:21,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:05:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2343.16040 ± 705.706
2025-05-09 05:05:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1016.4626, 2752.233, 2719.0615, 2669.3582, 2732.711, 857.7923, 2562.2112, 2679.687, 2712.4675, 2729.6216]
2025-05-09 05:05:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [413.0, 1000.0, 1000.0, 1000.0, 1000.0, 343.0, 913.0, 1000.0, 1000.0, 1000.0]
2025-05-09 05:05:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2343.16) for latency MM1Queue_a033_s075
2025-05-09 05:05:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 05:05:40,505 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 05:05:40,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 43 minutes, 53 seconds)
2025-05-09 05:08:24,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:08:43,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2128.36572 ± 816.790
2025-05-09 05:08:43,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2377.6907, 2316.1135, 188.96538, 2626.2847, 2518.5337, 2765.0188, 2470.2583, 897.6171, 2549.5295, 2573.646]
2025-05-09 05:08:43,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 91.0, 1000.0, 1000.0, 1000.0, 1000.0, 431.0, 1000.0, 1000.0]
2025-05-09 05:08:43,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 41 minutes, 1 second)
2025-05-09 05:11:37,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:11:53,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1909.08862 ± 944.380
2025-05-09 05:11:53,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2872.6887, 2775.5042, 127.33731, 929.75195, 771.18585, 1468.9857, 2447.1562, 2425.7437, 2597.1643, 2675.3687]
2025-05-09 05:11:53,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [985.0, 1000.0, 64.0, 317.0, 290.0, 554.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 05:11:53,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 38 minutes, 17 seconds)
2025-05-09 05:14:47,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:15:03,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1775.83008 ± 892.810
2025-05-09 05:15:03,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [486.6953, 464.9594, 886.72986, 2411.8757, 2645.9292, 2652.5176, 1568.6791, 1303.9883, 2742.674, 2594.2512]
2025-05-09 05:15:03,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [224.0, 180.0, 1000.0, 1000.0, 1000.0, 1000.0, 548.0, 494.0, 1000.0, 1000.0]
2025-05-09 05:15:03,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 36 minutes, 53 seconds)
2025-05-09 05:17:57,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:18:13,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1987.89478 ± 834.342
2025-05-09 05:18:13,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [784.23315, 2673.803, 540.5527, 2606.2664, 2662.815, 2823.2458, 2294.4175, 1857.1141, 2606.558, 1029.9423]
2025-05-09 05:18:13,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [311.0, 1000.0, 240.0, 1000.0, 1000.0, 1000.0, 750.0, 701.0, 1000.0, 406.0]
2025-05-09 05:18:13,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 35 minutes, 3 seconds)
2025-05-09 05:20:54,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:21:09,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1938.13220 ± 816.701
2025-05-09 05:21:09,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2406.0535, 2800.23, 1229.6044, 2667.9768, 742.81494, 2635.3022, 1025.3427, 2068.1094, 926.783, 2879.1047]
2025-05-09 05:21:09,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 476.0, 1000.0, 249.0, 1000.0, 409.0, 691.0, 375.0, 1000.0]
2025-05-09 05:21:09,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 28 minutes, 38 seconds)
2025-05-09 05:24:00,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:24:12,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1424.61792 ± 1033.916
2025-05-09 05:24:12,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2609.5444, 560.9509, 224.95514, 309.81985, 2613.8145, 358.2976, 2839.5552, 1126.2601, 2483.9578, 1119.0223]
2025-05-09 05:24:12,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 271.0, 99.0, 122.0, 1000.0, 147.0, 1000.0, 419.0, 947.0, 425.0]
2025-05-09 05:24:12,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 25 minutes, 35 seconds)
2025-05-09 05:27:08,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:27:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2317.05713 ± 797.678
2025-05-09 05:27:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2787.2227, 2694.5496, 370.61618, 2816.0073, 2965.35, 2601.021, 2122.8567, 2726.962, 2786.259, 1299.7267]
2025-05-09 05:27:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 160.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 05:27:28,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 23 minutes, 27 seconds)
2025-05-09 05:30:20,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:30:39,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2310.53955 ± 725.877
2025-05-09 05:30:39,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2954.7136, 2759.3198, 2872.722, 781.6096, 2890.0422, 1332.409, 2727.6003, 2214.0312, 2797.3362, 1775.6113]
2025-05-09 05:30:39,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 317.0, 1000.0, 528.0, 1000.0, 1000.0, 1000.0, 603.0]
2025-05-09 05:30:39,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 20 minutes, 18 seconds)
2025-05-09 05:33:32,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:33:50,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2007.30664 ± 1014.599
2025-05-09 05:33:50,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2795.084, 806.371, 2295.5576, 2460.5603, 2705.3162, 449.58527, 215.54135, 2782.2756, 2880.448, 2682.3284]
2025-05-09 05:33:50,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 795.0, 897.0, 1000.0, 171.0, 87.0, 1000.0, 1000.0, 1000.0]
2025-05-09 05:33:50,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 17 minutes, 27 seconds)
2025-05-09 05:36:43,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:37:00,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2165.28467 ± 797.440
2025-05-09 05:37:00,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [749.45654, 2993.8188, 2518.054, 2786.94, 1352.9714, 1680.252, 2946.67, 2958.1956, 2466.5989, 1199.8875]
2025-05-09 05:37:00,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [262.0, 1000.0, 1000.0, 1000.0, 496.0, 586.0, 1000.0, 1000.0, 883.0, 460.0]
2025-05-09 05:37:00,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 16 minutes, 19 seconds)
2025-05-09 05:39:44,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:39:59,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1971.16187 ± 1041.026
2025-05-09 05:39:59,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2895.8286, 1991.8485, 745.795, 2923.8765, 2967.0884, 2929.0151, 1381.016, 2905.562, 606.935, 364.65256]
2025-05-09 05:39:59,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 637.0, 287.0, 1000.0, 1000.0, 1000.0, 538.0, 1000.0, 218.0, 147.0]
2025-05-09 05:39:59,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 12 minutes, 33 seconds)
2025-05-09 05:42:47,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:43:01,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 1801.80505 ± 1113.539
2025-05-09 05:43:01,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2793.1914, 1238.2422, 193.79593, 1357.738, 3116.9277, 2829.7263, 2676.5488, 508.11624, 2886.064, 417.69922]
2025-05-09 05:43:01,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 481.0, 96.0, 496.0, 1000.0, 1000.0, 1000.0, 174.0, 1000.0, 201.0]
2025-05-09 05:43:01,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 7 minutes, 30 seconds)
2025-05-09 05:46:03,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:46:26,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2847.02393 ± 101.676
2025-05-09 05:46:26,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2811.7444, 2828.3005, 2834.5447, 3133.8364, 2812.5374, 2798.768, 2804.1985, 2757.7917, 2789.3164, 2899.202]
2025-05-09 05:46:26,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 05:46:26,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2847.02) for latency MM1Queue_a033_s075
2025-05-09 05:46:26,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 05:46:26,158 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 05:46:26,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 6 minutes, 15 seconds)
2025-05-09 05:49:14,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:49:32,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2313.07471 ± 828.935
2025-05-09 05:49:32,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2993.589, 3037.9424, 746.8407, 3011.6875, 1144.2687, 2561.8586, 2735.2566, 1369.6371, 2628.498, 2901.168]
2025-05-09 05:49:32,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 272.0, 1000.0, 475.0, 1000.0, 1000.0, 504.0, 1000.0, 1000.0]
2025-05-09 05:49:32,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 2 minutes, 27 seconds)
2025-05-09 05:52:12,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:52:31,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2058.56519 ± 895.934
2025-05-09 05:52:31,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1839.6953, 1543.3081, 2771.1748, 1014.3359, 1480.1757, 2971.8113, 347.6842, 2813.4805, 2785.706, 3018.2808]
2025-05-09 05:52:31,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 599.0, 1000.0, 1000.0, 506.0, 1000.0, 148.0, 1000.0, 1000.0, 1000.0]
2025-05-09 05:52:31,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 57 minutes, 54 seconds)
2025-05-09 05:55:23,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:55:40,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2077.54150 ± 715.439
2025-05-09 05:55:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1033.8612, 1540.7135, 1087.827, 2665.1838, 2263.188, 2769.1392, 2576.1064, 2882.7092, 1275.8824, 2680.802]
2025-05-09 05:55:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [359.0, 533.0, 453.0, 873.0, 1000.0, 1000.0, 1000.0, 1000.0, 515.0, 1000.0]
2025-05-09 05:55:40,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 56 minutes, 6 seconds)
2025-05-09 05:58:34,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:58:57,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2853.26147 ± 98.073
2025-05-09 05:58:57,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2836.8457, 2826.5906, 3009.047, 2735.8643, 2838.1023, 2955.3052, 3009.3674, 2742.0938, 2818.8557, 2760.545]
2025-05-09 05:58:57,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 05:58:57,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2853.26) for latency MM1Queue_a033_s075
2025-05-09 05:58:57,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 05:58:57,350 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 05:58:57,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 54 minutes, 40 seconds)
2025-05-09 06:01:46,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:02:02,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2121.51318 ± 1053.810
2025-05-09 06:02:02,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1837.2683, 1921.5308, 2884.0605, 2450.2485, 2944.942, 3091.1892, 313.10883, 46.759712, 2894.1748, 2831.851]
2025-05-09 06:02:02,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [620.0, 681.0, 1000.0, 1000.0, 1000.0, 1000.0, 138.0, 37.0, 1000.0, 1000.0]
2025-05-09 06:02:02,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 49 minutes, 16 seconds)
2025-05-09 06:04:59,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:05:17,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2017.66211 ± 980.079
2025-05-09 06:05:17,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2454.1187, 939.80786, 1566.0553, 426.28687, 2977.9497, 2882.9033, 2706.4678, 2844.8477, 2800.7253, 577.458]
2025-05-09 06:05:17,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 614.0, 210.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 290.0]
2025-05-09 06:05:17,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 47 minutes, 4 seconds)
2025-05-09 06:07:54,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:08:15,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2661.27856 ± 708.265
2025-05-09 06:08:15,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3033.2473, 3240.771, 2801.9458, 2887.9219, 3028.844, 2570.3389, 2775.575, 2786.3303, 600.262, 2887.5496]
2025-05-09 06:08:15,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 860.0, 1000.0, 1000.0, 252.0, 1000.0]
2025-05-09 06:08:15,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 43 minutes, 48 seconds)
2025-05-09 06:11:17,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:11:37,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2705.69775 ± 787.116
2025-05-09 06:11:37,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2812.4414, 2920.0955, 3085.186, 2760.878, 3130.7964, 3087.4854, 3016.5144, 3072.2314, 376.38232, 2794.9675]
2025-05-09 06:11:37,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 141.0, 1000.0]
2025-05-09 06:11:37,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 42 minutes, 3 seconds)
2025-05-09 06:14:39,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:14:57,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2157.68604 ± 1067.184
2025-05-09 06:14:57,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2873.2651, 69.92411, 3000.7505, 2603.5796, 2988.7068, 2077.668, 1006.70667, 3140.1606, 790.7343, 3025.363]
2025-05-09 06:14:57,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 45.0, 1000.0, 1000.0, 1000.0, 753.0, 296.0, 1000.0, 1000.0, 1000.0]
2025-05-09 06:14:57,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 39 minutes, 14 seconds)
2025-05-09 06:17:34,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:17:54,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2383.87109 ± 755.605
2025-05-09 06:17:54,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3222.468, 2023.2078, 1896.0009, 772.93646, 2880.3552, 2597.9905, 2708.897, 1571.7881, 3150.3972, 3014.6704]
2025-05-09 06:17:54,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 686.0, 1000.0, 1000.0, 1000.0, 1000.0, 499.0, 1000.0, 1000.0]
2025-05-09 06:17:54,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 35 minutes, 12 seconds)
2025-05-09 06:20:42,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:21:01,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2290.62744 ± 758.874
2025-05-09 06:21:01,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2971.6987, 1239.494, 2854.8462, 1776.3778, 2967.7056, 2642.027, 3149.7866, 1028.7162, 1568.5278, 2707.0947]
2025-05-09 06:21:01,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 558.0, 1000.0, 1000.0, 1000.0, 445.0, 532.0, 1000.0]
2025-05-09 06:21:01,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 31 minutes, 18 seconds)
2025-05-09 06:24:06,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:24:25,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2321.57568 ± 859.833
2025-05-09 06:24:25,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2445.4268, 2636.478, 2848.2878, 1733.356, 2925.75, 103.47378, 1781.9097, 2780.7766, 2998.1428, 2962.1536]
2025-05-09 06:24:25,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 657.0, 1000.0, 79.0, 616.0, 1000.0, 1000.0, 1000.0]
2025-05-09 06:24:25,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 30 minutes, 33 seconds)
2025-05-09 06:27:03,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:27:21,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2334.43799 ± 1090.391
2025-05-09 06:27:21,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2254.0845, 2973.93, 2996.1501, 2753.2832, 96.98207, 340.22427, 2976.779, 3186.3525, 3144.8938, 2621.7]
2025-05-09 06:27:21,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [827.0, 1000.0, 1000.0, 1000.0, 53.0, 151.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 06:27:21,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 24 minutes, 58 seconds)
2025-05-09 06:30:14,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:30:35,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2326.19092 ± 915.519
2025-05-09 06:30:35,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3244.0012, 3046.8699, 2876.1184, 1799.0815, 3222.7124, 787.35895, 1317.1866, 1109.2991, 3072.3845, 2786.8958]
2025-05-09 06:30:35,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 364.0, 1000.0, 1000.0]
2025-05-09 06:30:35,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 21 minutes, 18 seconds)
2025-05-09 06:33:22,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:33:41,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2270.14746 ± 867.458
2025-05-09 06:33:41,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2759.3547, 2813.122, 1222.4653, 2955.0994, 2971.8901, 1467.4509, 3142.7266, 1458.4116, 800.31757, 3110.6367]
2025-05-09 06:33:41,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 474.0, 1000.0, 1000.0, 572.0, 1000.0, 1000.0, 263.0, 1000.0]
2025-05-09 06:33:41,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 18 minutes, 54 seconds)
2025-05-09 06:36:32,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:36:51,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2652.64502 ± 911.361
2025-05-09 06:36:51,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2661.2617, 1609.6527, 3102.173, 3253.9446, 3298.2756, 3117.055, 3021.9802, 3044.4536, 303.4414, 3114.2124]
2025-05-09 06:36:51,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [834.0, 561.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 149.0, 1000.0]
2025-05-09 06:36:51,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 15 minutes, 59 seconds)
2025-05-09 06:39:41,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:40:04,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2796.35400 ± 586.878
2025-05-09 06:40:04,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2943.1602, 3105.0547, 2879.0803, 1055.5393, 3055.1218, 2957.2017, 3104.7573, 2824.6665, 3049.5703, 2989.389]
2025-05-09 06:40:04,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 06:40:04,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 12 minutes)
2025-05-09 06:42:58,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:43:19,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2759.56860 ± 494.693
2025-05-09 06:43:19,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3272.5598, 2249.509, 1860.1295, 2928.9373, 3082.3228, 2004.187, 3232.4397, 2848.109, 3036.5818, 3080.9082]
2025-05-09 06:43:19,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 752.0, 591.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 06:43:19,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 10 minutes, 13 seconds)
2025-05-09 06:46:10,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:46:28,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2266.25146 ± 844.819
2025-05-09 06:46:28,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2888.6826, 701.951, 3112.539, 2926.2598, 2016.3416, 808.11383, 2530.3916, 3026.8804, 1933.945, 2717.4082]
2025-05-09 06:46:28,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 238.0, 1000.0, 1000.0, 1000.0, 308.0, 875.0, 1000.0, 617.0, 1000.0]
2025-05-09 06:46:28,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 6 minutes, 38 seconds)
2025-05-09 06:49:23,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:49:40,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2077.84985 ± 1019.078
2025-05-09 06:49:40,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2377.216, 495.6798, 2823.5513, 528.6326, 2846.4163, 2900.5583, 589.37146, 2781.5034, 2578.3296, 2857.238]
2025-05-09 06:49:40,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [880.0, 166.0, 1000.0, 199.0, 1000.0, 1000.0, 263.0, 1000.0, 1000.0, 1000.0]
2025-05-09 06:49:40,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 3 minutes, 54 seconds)
2025-05-09 06:52:30,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:52:51,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2622.21240 ± 855.467
2025-05-09 06:52:51,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2996.6233, 3170.756, 1022.5075, 3103.1775, 2992.7942, 837.11395, 2802.0647, 3127.526, 3238.463, 2931.097]
2025-05-09 06:52:51,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 333.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 06:52:51,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 48 seconds)
2025-05-09 06:55:32,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:55:55,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2926.26929 ± 450.299
2025-05-09 06:55:55,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1599.2296, 2984.2583, 2980.6567, 3202.9934, 3090.2234, 3004.35, 3064.212, 3218.24, 3133.8691, 2984.6606]
2025-05-09 06:55:55,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 06:55:55,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2926.27) for latency MM1Queue_a033_s075
2025-05-09 06:55:55,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 06:55:55,617 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 06:55:55,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 57 minutes, 2 seconds)
2025-05-09 06:58:46,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:59:07,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2793.47021 ± 807.934
2025-05-09 06:59:07,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3196.989, 3083.953, 2998.076, 2923.414, 394.41022, 3227.4536, 2954.2837, 3186.3484, 2874.672, 3095.1045]
2025-05-09 06:59:07,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 182.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 06:59:07,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 53 minutes, 41 seconds)
2025-05-09 07:01:58,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:02:21,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2985.98682 ± 135.295
2025-05-09 07:02:21,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3195.4956, 3074.0647, 3111.9868, 2821.9714, 2726.4873, 2905.092, 2932.9788, 3077.8594, 2970.8848, 3043.047]
2025-05-09 07:02:21,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:02:21,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (2985.99) for latency MM1Queue_a033_s075
2025-05-09 07:02:21,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 07:02:21,288 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 07:02:21,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 50 minutes, 50 seconds)
2025-05-09 07:05:20,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:05:38,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2208.09033 ± 972.504
2025-05-09 07:05:38,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1845.1571, 614.11304, 3181.4731, 448.25385, 1590.3972, 3000.4937, 2646.1484, 2878.498, 2829.1128, 3047.2566]
2025-05-09 07:05:38,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [641.0, 223.0, 1000.0, 155.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:05:38,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 47 minutes, 55 seconds)
2025-05-09 07:08:30,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:08:50,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2633.65430 ± 811.994
2025-05-09 07:08:50,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [1565.0719, 594.5639, 3002.6863, 3218.5444, 3069.924, 3045.5493, 2941.4133, 2831.3286, 3060.2559, 3007.205]
2025-05-09 07:08:50,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [546.0, 226.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:08:50,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 44 minutes, 42 seconds)
2025-05-09 07:11:50,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:12:11,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2964.23828 ± 407.809
2025-05-09 07:12:11,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3245.9148, 1797.3392, 3106.5176, 3375.8843, 2970.4875, 3023.6606, 3092.317, 3071.93, 2965.5308, 2992.8018]
2025-05-09 07:12:11,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 705.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:12:11,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 42 minutes, 18 seconds)
2025-05-09 07:14:56,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:15:15,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2407.98926 ± 1059.286
2025-05-09 07:15:15,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2971.3696, 3211.05, 1335.5471, 989.54236, 2859.558, 3218.0356, 2944.333, 3105.9238, 218.7461, 3225.7864]
2025-05-09 07:15:15,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 468.0, 1000.0, 1000.0, 1000.0, 957.0, 1000.0, 85.0, 1000.0]
2025-05-09 07:15:15,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 38 minutes, 43 seconds)
2025-05-09 07:18:19,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:18:39,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2662.62476 ± 628.772
2025-05-09 07:18:39,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3067.1682, 1530.7151, 3034.5986, 2004.5686, 1662.5538, 3027.336, 2922.5188, 3180.7415, 2887.0972, 3308.949]
2025-05-09 07:18:39,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 477.0, 1000.0, 660.0, 563.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:18:39,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 35 minutes, 51 seconds)
2025-05-09 07:21:34,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:21:55,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2573.67090 ± 715.976
2025-05-09 07:21:55,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3054.902, 3051.836, 2857.7678, 2983.9985, 3163.9636, 1861.0305, 2887.277, 1306.6349, 1365.463, 3203.8354]
2025-05-09 07:21:55,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 607.0, 1000.0, 1000.0, 466.0, 1000.0]
2025-05-09 07:21:55,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 32 minutes, 33 seconds)
2025-05-09 07:24:40,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:25:02,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 3047.49121 ± 218.732
2025-05-09 07:25:02,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3275.2568, 3190.6284, 3225.827, 2487.849, 3089.6838, 3139.7402, 3033.5017, 2848.1775, 3033.5017, 3150.7468]
2025-05-09 07:25:02,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 827.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:25:02,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (3047.49) for latency MM1Queue_a033_s075
2025-05-09 07:25:02,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 07:25:02,671 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 07:25:02,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 29 minutes, 10 seconds)
2025-05-09 07:28:00,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:28:22,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2951.38501 ± 254.827
2025-05-09 07:28:22,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3041.9058, 3113.934, 3041.3384, 3008.6772, 2944.4226, 2316.6926, 2854.8362, 2848.0447, 2968.094, 3375.904]
2025-05-09 07:28:22,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 853.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:28:22,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 25 minutes, 52 seconds)
2025-05-09 07:31:18,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:31:37,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2738.85376 ± 878.252
2025-05-09 07:31:37,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3274.1426, 3353.3428, 2837.267, 2926.4602, 3114.6206, 3112.2778, 3279.853, 275.3277, 3012.2983, 2202.9456]
2025-05-09 07:31:37,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 100.0, 1000.0, 679.0]
2025-05-09 07:31:37,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 55 seconds)
2025-05-09 07:34:27,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:34:45,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2554.90796 ± 1022.152
2025-05-09 07:34:45,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [2735.0288, 3069.236, 2820.9272, 3480.823, 424.90668, 684.45636, 3052.716, 3044.503, 3298.7783, 2937.7048]
2025-05-09 07:34:45,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [923.0, 1000.0, 1000.0, 1000.0, 140.0, 283.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:34:45,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 20 seconds)
2025-05-09 07:37:48,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:38:07,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2529.89404 ± 1107.957
2025-05-09 07:38:07,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [339.0099, 2982.1228, 3140.4075, 3261.9553, 2835.5605, 3116.659, 310.99173, 3020.2185, 3175.136, 3116.879]
2025-05-09 07:38:07,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [143.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 132.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:38:07,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 11 seconds)
2025-05-09 07:41:02,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:41:25,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 3074.16748 ± 106.776
2025-05-09 07:41:25,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3064.1262, 3093.8027, 3057.7095, 2875.089, 2960.4285, 3075.1738, 3124.4844, 3027.0876, 3176.2708, 3287.5005]
2025-05-09 07:41:25,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:41:25,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1226 [INFO]: New best (3074.17) for latency MM1Queue_a033_s075
2025-05-09 07:41:25,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1229 [INFO]: saving network
2025-05-09 07:41:25,536 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 07:41:25,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 6 seconds)
2025-05-09 07:44:15,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:44:38,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2833.56665 ± 408.433
2025-05-09 07:44:38,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3206.914, 2780.7588, 3016.693, 3059.0007, 2916.0125, 2896.123, 3150.4302, 2670.5164, 1696.8569, 2942.3596]
2025-05-09 07:44:38,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 07:44:38,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 45 seconds)
2025-05-09 07:47:30,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:47:51,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2938.31885 ± 514.539
2025-05-09 07:47:51,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3308.888, 3160.186, 3167.7014, 3275.8745, 3174.982, 2705.721, 2864.8113, 3145.9238, 3093.9036, 1485.193]
2025-05-09 07:47:51,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 511.0]
2025-05-09 07:47:51,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 29 seconds)
2025-05-09 07:50:46,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:51:05,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2371.06641 ± 994.480
2025-05-09 07:51:05,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [447.3399, 2852.1519, 2975.7314, 2800.5337, 2642.5542, 2294.7683, 2970.4998, 436.70786, 3264.6855, 3025.6929]
2025-05-09 07:51:05,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [153.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 222.0, 1000.0, 1000.0]
2025-05-09 07:51:05,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 15 seconds)
2025-05-09 07:54:00,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:54:20,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1221 [DEBUG]: Total Reward: 2617.63379 ± 865.859
2025-05-09 07:54:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1222 [DEBUG]: All rewards: [3235.1316, 3045.1775, 784.2233, 3129.6338, 3276.8076, 2955.5671, 2069.472, 3036.483, 3349.0447, 1294.7966]
2025-05-09 07:54:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 712.0, 1000.0, 1000.0, 386.0]
2025-05-09 07:54:20,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1251 [DEBUG]: Training session finished
