2026-01-22 23:14:12,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mem1  
2026-01-22 23:14:12,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mem1  
2026-01-22 23:14:12,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14596512ead0>}
2026-01-22 23:14:12,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:13,061 baseline-bpql-noisy-ant:77 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-22 23:14:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-22 23:14:13,066 baseline-bpql-noisy-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=35, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:14:13,066 baseline-bpql-noisy-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:13,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:13,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:45,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:46,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: -21.08954 ± 10.386
2026-01-22 23:15:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [-10.666579, -15.3701935, -9.880097, -32.718872, -27.874258, -11.673137, -34.203842, -26.728184, -7.892177, -33.888103]
2026-01-22 23:15:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [13.0, 14.0, 13.0, 18.0, 16.0, 12.0, 22.0, 17.0, 16.0, 19.0]
2026-01-22 23:15:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (-21.09) for latency DatasetOffice
2026-01-22 23:15:46,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 32 minutes, 8 seconds)
2026-01-22 23:17:19,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:26,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 22.62202 ± 68.728
2026-01-22 23:17:26,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [-63.589417, -59.665882, -12.833549, 142.62877, 71.05174, 57.41533, -52.82676, -12.859453, 96.23361, 60.665783]
2026-01-22 23:17:26,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 184.0, 1000.0, 805.0, 302.0, 709.0, 257.0, 1000.0, 764.0, 774.0]
2026-01-22 23:17:26,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (22.62) for latency DatasetOffice
2026-01-22 23:17:26,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 37 minutes, 44 seconds)
2026-01-22 23:19:08,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:17,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 294.63760 ± 198.413
2026-01-22 23:19:17,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [203.52538, 513.75665, 522.6896, 0.6438894, 174.63573, 478.31058, 51.023956, 189.5709, 572.76654, 239.4526]
2026-01-22 23:19:17,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 14.0, 1000.0, 1000.0, 84.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:19:17,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (294.64) for latency DatasetOffice
2026-01-22 23:19:17,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 43 minutes, 30 seconds)
2026-01-22 23:20:54,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:20:59,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 221.86909 ± 189.087
2026-01-22 23:20:59,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [254.92183, 90.948235, 624.0749, 305.92938, 365.19223, 125.41853, -2.8329685, 68.94382, 12.647106, 373.4478]
2026-01-22 23:20:59,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 133.0, 1000.0, 1000.0, 795.0, 183.0, 119.0, 79.0, 37.0, 494.0]
2026-01-22 23:20:59,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 42 minutes, 13 seconds)
2026-01-22 23:22:31,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:22:37,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 335.72418 ± 296.820
2026-01-22 23:22:37,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [666.5267, 49.644474, 73.29357, 120.776215, 226.5088, 684.79364, 727.1864, 30.089666, 82.19238, 696.23016]
2026-01-22 23:22:37,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 280.0, 133.0, 159.0, 1000.0, 1000.0, 1000.0, 131.0, 119.0, 1000.0]
2026-01-22 23:22:37,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (335.72) for latency DatasetOffice
2026-01-22 23:22:37,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 39 minutes, 33 seconds)
2026-01-22 23:24:13,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:24:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 427.43561 ± 314.389
2026-01-22 23:24:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [211.91066, 821.15216, 768.90894, 46.78347, 847.9234, 783.6915, 198.41402, 277.3105, 122.076645, 196.18465]
2026-01-22 23:24:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 129.0, 1000.0, 1000.0, 958.0, 432.0, 275.0, 744.0]
2026-01-22 23:24:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (427.44) for latency DatasetOffice
2026-01-22 23:24:22,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 41 minutes, 42 seconds)
2026-01-22 23:25:56,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:04,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 588.34509 ± 310.804
2026-01-22 23:26:04,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [261.70358, 183.45819, 275.35263, 128.31366, 806.7789, 815.13153, 872.2326, 799.33344, 837.3058, 903.841]
2026-01-22 23:26:04,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [365.0, 227.0, 279.0, 214.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:26:04,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (588.35) for latency DatasetOffice
2026-01-22 23:26:04,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 40 minutes, 25 seconds)
2026-01-22 23:27:42,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:27:49,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 542.47485 ± 385.055
2026-01-22 23:27:49,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [846.0123, 848.1225, 29.726448, 115.083206, 855.6736, 42.084084, 882.58374, 100.828835, 865.12897, 839.505]
2026-01-22 23:27:49,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 22.0, 100.0, 1000.0, 79.0, 1000.0, 123.0, 1000.0, 1000.0]
2026-01-22 23:27:49,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 37 minutes, 3 seconds)
2026-01-22 23:29:21,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:29:25,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 353.52484 ± 358.292
2026-01-22 23:29:25,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [195.08723, 877.0119, 336.84592, 930.4056, 49.30806, 838.3751, 46.701305, 47.71405, 192.30243, 21.496473]
2026-01-22 23:29:25,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [224.0, 1000.0, 291.0, 1000.0, 41.0, 1000.0, 81.0, 47.0, 227.0, 37.0]
2026-01-22 23:29:25,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 33 minutes, 34 seconds)
2026-01-22 23:31:08,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:19,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 788.74841 ± 204.189
2026-01-22 23:31:19,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [888.2877, 937.24774, 1030.9742, 873.3657, 895.7606, 914.29767, 537.79266, 533.38275, 394.8804, 881.4946]
2026-01-22 23:31:19,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 644.0, 1000.0, 1000.0]
2026-01-22 23:31:19,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (788.75) for latency DatasetOffice
2026-01-22 23:31:19,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 36 minutes, 37 seconds)
2026-01-22 23:32:52,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:59,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 398.20508 ± 242.652
2026-01-22 23:32:59,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [242.97668, 612.9646, 153.03284, 540.32654, 55.33284, 404.10266, 761.785, 54.626755, 526.19244, 630.7106]
2026-01-22 23:32:59,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [174.0, 1000.0, 159.0, 1000.0, 71.0, 1000.0, 1000.0, 52.0, 1000.0, 1000.0]
2026-01-22 23:32:59,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 33 minutes, 27 seconds)
2026-01-22 23:34:31,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:34:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 491.83856 ± 273.265
2026-01-22 23:34:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [75.35137, 360.45117, 463.04776, 760.85333, 312.3065, 689.4047, 582.77625, 50.287956, 770.90594, 853.0006]
2026-01-22 23:34:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [80.0, 268.0, 1000.0, 460.0, 260.0, 1000.0, 542.0, 58.0, 1000.0, 1000.0]
2026-01-22 23:34:37,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 30 minutes, 32 seconds)
2026-01-22 23:36:16,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:23,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 752.31885 ± 483.668
2026-01-22 23:36:23,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [947.58453, 640.8902, 1467.8865, 1088.423, 76.98263, 519.7011, 168.90276, 1240.8516, 154.11302, 1217.8535]
2026-01-22 23:36:23,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 437.0, 973.0, 1000.0, 65.0, 1000.0, 124.0, 1000.0, 148.0, 820.0]
2026-01-22 23:36:23,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 29 minutes, 5 seconds)
2026-01-22 23:37:57,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:06,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 774.46552 ± 425.013
2026-01-22 23:38:06,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [578.51495, 257.35678, 329.5868, 861.84784, 971.1051, 1446.9277, 748.1084, 1266.2456, 1155.6836, 129.27882]
2026-01-22 23:38:06,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 210.0, 258.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 119.0]
2026-01-22 23:38:06,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 29 minutes, 11 seconds)
2026-01-22 23:39:44,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:39:50,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 518.13879 ± 350.167
2026-01-22 23:39:50,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [147.30365, 996.60516, 743.00085, 1074.2925, 60.567886, 417.51877, 213.32274, 157.44006, 710.25024, 661.086]
2026-01-22 23:39:50,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [95.0, 614.0, 1000.0, 1000.0, 40.0, 264.0, 147.0, 113.0, 1000.0, 1000.0]
2026-01-22 23:39:50,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 24 minutes, 34 seconds)
2026-01-22 23:41:24,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:41:32,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 748.90613 ± 353.729
2026-01-22 23:41:32,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [786.167, 1425.8884, 292.0236, 71.734505, 936.69855, 733.3975, 631.1925, 878.7058, 743.5096, 989.7443]
2026-01-22 23:41:32,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 855.0, 185.0, 44.0, 546.0, 1000.0, 378.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:41:32,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 23 minutes, 40 seconds)
2026-01-22 23:43:14,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:22,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1183.75769 ± 528.870
2026-01-22 23:43:22,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1534.3195, 1928.3489, 731.7973, 1328.1921, 1033.5286, 1690.1605, 799.3146, 1126.5823, 1613.0474, 52.286304]
2026-01-22 23:43:22,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 460.0, 1000.0, 537.0, 1000.0, 424.0, 589.0, 1000.0, 70.0]
2026-01-22 23:43:22,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1183.76) for latency DatasetOffice
2026-01-22 23:43:22,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 25 minutes, 13 seconds)
2026-01-22 23:44:55,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:45:02,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 962.20056 ± 622.597
2026-01-22 23:45:02,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [831.1682, 2052.172, 2047.8273, 660.9397, 296.9211, 888.08466, 556.2043, 202.04932, 1357.8044, 728.8345]
2026-01-22 23:45:02,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 293.0, 169.0, 426.0, 326.0, 94.0, 1000.0, 1000.0]
2026-01-22 23:45:02,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 21 minutes, 46 seconds)
2026-01-22 23:46:32,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:40,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1008.60486 ± 616.256
2026-01-22 23:46:40,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1036.8988, 976.94434, 1703.7031, 769.3107, 713.46814, 828.005, 1753.4377, 120.22158, 2034.8558, 149.20435]
2026-01-22 23:46:40,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [563.0, 1000.0, 1000.0, 1000.0, 1000.0, 405.0, 1000.0, 70.0, 914.0, 79.0]
2026-01-22 23:46:40,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 18 minutes, 52 seconds)
2026-01-22 23:48:21,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:29,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1333.38745 ± 615.246
2026-01-22 23:48:29,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [772.4913, 2006.634, 827.1511, 199.9068, 1518.962, 762.19995, 1637.2023, 2125.449, 1918.2479, 1565.6312]
2026-01-22 23:48:29,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [346.0, 863.0, 390.0, 108.0, 722.0, 1000.0, 1000.0, 1000.0, 1000.0, 798.0]
2026-01-22 23:48:29,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1333.39) for latency DatasetOffice
2026-01-22 23:48:29,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 18 minutes, 28 seconds)
2026-01-22 23:49:59,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:50:07,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1305.78784 ± 813.628
2026-01-22 23:50:07,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [896.64056, 295.29803, 819.7921, 457.68207, 2460.9226, 2267.4026, 1902.3743, 2417.8945, 611.9255, 927.9467]
2026-01-22 23:50:07,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 137.0, 1000.0, 204.0, 1000.0, 1000.0, 836.0, 1000.0, 283.0, 1000.0]
2026-01-22 23:50:07,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 43 seconds)
2026-01-22 23:51:43,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:52,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1764.07849 ± 690.470
2026-01-22 23:51:52,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2405.034, 1259.2183, 2194.74, 2327.3762, 935.02325, 1644.6097, 2401.8677, 1894.0642, 283.65808, 2295.195]
2026-01-22 23:51:52,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 425.0, 1000.0, 1000.0, 1000.0, 132.0, 1000.0]
2026-01-22 23:51:52,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1764.08) for latency DatasetOffice
2026-01-22 23:51:52,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 12 minutes, 33 seconds)
2026-01-22 23:53:33,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:41,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1003.72400 ± 634.567
2026-01-22 23:53:41,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1670.3207, 1190.3015, 1226.6543, 870.7485, 1263.0021, 575.8248, 293.14612, 81.466225, 565.1192, 2300.657]
2026-01-22 23:53:41,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 126.0, 69.0, 305.0, 1000.0]
2026-01-22 23:53:41,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 13 minutes, 19 seconds)
2026-01-22 23:55:13,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:23,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2055.20947 ± 806.908
2026-01-22 23:55:23,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2541.8364, 1928.534, 2751.626, 2638.2478, 2711.645, 2548.4932, 851.2536, 1842.3903, 2446.6514, 291.4167]
2026-01-22 23:55:23,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 120.0]
2026-01-22 23:55:23,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2055.21) for latency DatasetOffice
2026-01-22 23:55:23,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 12 minutes, 24 seconds)
2026-01-22 23:57:03,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:11,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1258.54565 ± 971.170
2026-01-22 23:57:11,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2410.6113, 2508.7969, 786.8654, 40.56616, 971.3187, 68.04086, 2001.2898, 2444.1362, 1269.2362, 84.59559]
2026-01-22 23:57:11,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 33.0, 362.0, 39.0, 766.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:57:11,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 10 minutes, 32 seconds)
2026-01-22 23:58:41,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:48,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1269.20410 ± 899.761
2026-01-22 23:58:48,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [881.13165, 355.14844, 6.3149166, 328.15488, 2606.6602, 2486.593, 1406.285, 770.54663, 1587.1223, 2264.0845]
2026-01-22 23:58:48,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 141.0, 13.0, 164.0, 1000.0, 1000.0, 570.0, 1000.0, 577.0, 1000.0]
2026-01-22 23:58:48,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 8 minutes, 27 seconds)
2026-01-23 00:00:24,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1431.43677 ± 787.415
2026-01-23 00:00:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [892.3123, 2523.3306, 1055.3849, 1001.83673, 2541.0908, 896.43933, 1651.019, 772.0376, 398.7105, 2582.2056]
2026-01-23 00:00:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 408.0, 953.0, 1000.0, 1000.0, 305.0, 193.0, 1000.0]
2026-01-23 00:00:33,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 6 minutes, 45 seconds)
2026-01-23 00:02:12,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:19,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1725.71606 ± 1021.570
2026-01-23 00:02:19,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2431.6873, 565.8687, 3029.2969, 391.21332, 3068.9084, 1559.3971, 717.50214, 669.7019, 2733.746, 2089.8372]
2026-01-23 00:02:19,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 200.0, 1000.0, 141.0, 1000.0, 1000.0, 248.0, 258.0, 1000.0, 775.0]
2026-01-23 00:02:19,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 4 minutes, 18 seconds)
2026-01-23 00:03:53,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:05,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2095.57935 ± 912.361
2026-01-23 00:04:05,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2846.6912, 702.7954, 2776.6162, 695.53973, 1233.0974, 2735.2454, 2919.9907, 2701.9204, 1402.5085, 2941.3896]
2026-01-23 00:04:05,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:04:05,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2095.58) for latency DatasetOffice
2026-01-23 00:04:05,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 3 minutes, 31 seconds)
2026-01-23 00:05:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:51,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1959.82642 ± 898.751
2026-01-23 00:05:51,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [875.5561, 979.9301, 347.57532, 2985.9624, 2931.12, 2743.7952, 2640.486, 1855.8878, 2440.39, 1797.5597]
2026-01-23 00:05:51,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 139.0, 1000.0, 1000.0, 1000.0, 1000.0, 657.0, 812.0, 916.0]
2026-01-23 00:05:51,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 1 minute, 20 seconds)
2026-01-23 00:07:29,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:36,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1416.28320 ± 958.462
2026-01-23 00:07:36,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2323.3706, 2796.1863, 1254.7922, 784.94507, 848.02374, 640.4258, 2503.6191, 2513.0745, 232.13246, 266.2628]
2026-01-23 00:07:36,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 339.0, 370.0, 262.0, 1000.0, 1000.0, 110.0, 118.0]
2026-01-23 00:07:36,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 1 minute, 17 seconds)
2026-01-23 00:09:13,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:23,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2198.44458 ± 1174.583
2026-01-23 00:09:23,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [176.34389, 3034.502, 2334.1843, 2992.7307, 3048.1924, 2988.3018, 804.0894, 364.38156, 3163.236, 3078.485]
2026-01-23 00:09:23,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [101.0, 1000.0, 747.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:09:23,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2198.44) for latency DatasetOffice
2026-01-23 00:09:23,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 seconds)
2026-01-23 00:10:57,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:05,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2266.76318 ± 1133.707
2026-01-23 00:11:05,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [114.69578, 3029.5461, 1755.2682, 3056.4373, 147.44565, 3016.9014, 3162.2625, 2649.2383, 2988.9033, 2746.9346]
2026-01-23 00:11:05,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [88.0, 1000.0, 580.0, 1000.0, 80.0, 1000.0, 1000.0, 810.0, 1000.0, 1000.0]
2026-01-23 00:11:05,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2266.76) for latency DatasetOffice
2026-01-23 00:11:05,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 57 minutes, 26 seconds)
2026-01-23 00:12:38,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:45,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1862.14258 ± 1040.340
2026-01-23 00:12:45,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2321.4727, 528.4127, 937.6032, 2937.0557, 3052.4656, 257.3591, 2878.914, 952.4518, 2845.5967, 1910.0947]
2026-01-23 00:12:45,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [878.0, 186.0, 331.0, 1000.0, 1000.0, 97.0, 1000.0, 333.0, 1000.0, 1000.0]
2026-01-23 00:12:45,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 54 minutes, 29 seconds)
2026-01-23 00:14:18,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:27,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1976.65820 ± 964.104
2026-01-23 00:14:27,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3211.0876, 3065.706, 3111.2546, 481.98227, 2041.4985, 2253.5198, 374.04364, 1503.8673, 1574.917, 2148.703]
2026-01-23 00:14:27,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 781.0, 846.0, 151.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:14:27,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 51 minutes, 52 seconds)
2026-01-23 00:16:09,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:19,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2511.97412 ± 800.848
2026-01-23 00:16:19,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2713.2444, 424.2045, 2133.9963, 3140.632, 3280.0342, 3011.939, 3177.3477, 2145.941, 2715.4243, 2376.978]
2026-01-23 00:16:19,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 345.0, 692.0, 1000.0, 1000.0, 1000.0, 1000.0, 709.0, 1000.0, 1000.0]
2026-01-23 00:16:19,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2511.97) for latency DatasetOffice
2026-01-23 00:16:19,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 51 minutes, 38 seconds)
2026-01-23 00:17:57,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:07,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2824.64282 ± 659.481
2026-01-23 00:18:07,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1321.2472, 3262.1667, 3239.708, 3307.167, 3229.643, 2077.9236, 3149.3674, 2240.708, 3328.6807, 3089.818]
2026-01-23 00:18:07,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 754.0, 1000.0, 1000.0]
2026-01-23 00:18:07,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2824.64) for latency DatasetOffice
2026-01-23 00:18:07,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 50 minutes, 3 seconds)
2026-01-23 00:19:38,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:46,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2227.81494 ± 1195.582
2026-01-23 00:19:46,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1904.3217, 3269.6267, 3038.3828, 3127.0222, 3191.697, 2844.682, 3187.7727, 1452.0271, 217.66776, 44.94917]
2026-01-23 00:19:46,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [618.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 479.0, 85.0, 41.0]
2026-01-23 00:19:46,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 47 minutes, 44 seconds)
2026-01-23 00:21:24,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:32,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2142.98486 ± 912.242
2026-01-23 00:21:32,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3044.9265, 3107.0125, 1434.1426, 3116.096, 3266.613, 1015.5786, 1466.2837, 914.5578, 1482.5168, 2582.1218]
2026-01-23 00:21:32,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 455.0, 1000.0, 1000.0, 1000.0, 479.0, 307.0, 474.0, 873.0]
2026-01-23 00:21:32,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 47 minutes, 12 seconds)
2026-01-23 00:23:08,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:17,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2554.59131 ± 1065.328
2026-01-23 00:23:17,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [847.95087, 3431.046, 872.70447, 2853.4805, 3306.8137, 2557.5308, 3436.1392, 1266.6176, 3414.5938, 3559.036]
2026-01-23 00:23:17,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [249.0, 1000.0, 255.0, 897.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:23:17,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 46 minutes)
2026-01-23 00:24:47,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:24:55,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2141.21411 ± 1393.545
2026-01-23 00:24:55,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3375.8862, 3407.744, 69.19053, 3046.6592, 3167.6033, 1086.7137, 3364.0422, 637.24994, 90.71883, 3166.3315]
2026-01-23 00:24:55,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 35.0, 1000.0, 1000.0, 1000.0, 1000.0, 215.0, 44.0, 1000.0]
2026-01-23 00:24:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 41 minutes, 25 seconds)
2026-01-23 00:26:35,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:42,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2114.50146 ± 1424.758
2026-01-23 00:26:42,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3377.8306, 20.039885, 3365.6863, 2912.0974, 633.79584, 6.775915, 3468.5706, 969.8185, 3208.1792, 3182.2207]
2026-01-23 00:26:42,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 23.0, 1000.0, 1000.0, 213.0, 15.0, 1000.0, 304.0, 977.0, 1000.0]
2026-01-23 00:26:42,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 39 minutes, 35 seconds)
2026-01-23 00:28:15,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:23,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2477.32642 ± 1023.787
2026-01-23 00:28:23,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [400.94858, 3546.862, 1560.8987, 2782.49, 3376.003, 1350.2993, 3374.531, 3524.548, 2194.9617, 2661.723]
2026-01-23 00:28:23,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [128.0, 1000.0, 443.0, 751.0, 1000.0, 393.0, 1000.0, 1000.0, 1000.0, 854.0]
2026-01-23 00:28:23,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 38 minutes, 12 seconds)
2026-01-23 00:29:57,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:03,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2039.02344 ± 1503.245
2026-01-23 00:30:03,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3349.899, 314.43167, 3556.9856, 3485.3826, 3487.213, 9.934045, 1771.5161, 944.7239, 3443.932, 26.216602]
2026-01-23 00:30:03,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 115.0, 1000.0, 1000.0, 1000.0, 13.0, 531.0, 273.0, 1000.0, 21.0]
2026-01-23 00:30:03,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 35 minutes, 21 seconds)
2026-01-23 00:31:46,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:53,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1479.37024 ± 1329.286
2026-01-23 00:31:53,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1902.2001, 3722.5464, 930.475, 3521.228, 97.59087, 779.6293, 2710.511, -74.76941, 357.32162, 846.9694]
2026-01-23 00:31:53,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 272.0, 1000.0, 48.0, 1000.0, 738.0, 1000.0, 123.0, 248.0]
2026-01-23 00:31:53,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 34 minutes, 35 seconds)
2026-01-23 00:33:30,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:40,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3432.51416 ± 304.002
2026-01-23 00:33:40,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3544.042, 3533.8835, 2563.418, 3726.584, 3397.0542, 3502.371, 3465.4343, 3530.9143, 3422.408, 3639.031]
2026-01-23 00:33:40,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 737.0, 1000.0, 1000.0, 975.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:33:40,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3432.51) for latency DatasetOffice
2026-01-23 00:33:40,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 34 minutes, 38 seconds)
2026-01-23 00:35:08,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:17,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2888.68628 ± 973.968
2026-01-23 00:35:17,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4019.9805, 3767.684, 3638.7168, 3492.4006, 1468.7421, 3670.035, 2515.7195, 1682.2316, 1413.2941, 3218.0596]
2026-01-23 00:35:17,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 457.0, 372.0, 855.0]
2026-01-23 00:35:17,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 59 seconds)
2026-01-23 00:37:00,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:08,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2595.27393 ± 1202.481
2026-01-23 00:37:08,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3825.6323, 773.5607, 1493.7025, 3467.7239, 3743.8052, 1982.0477, 3533.5637, 3542.35, 591.07446, 2999.278]
2026-01-23 00:37:08,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 201.0, 439.0, 1000.0, 1000.0, 579.0, 1000.0, 1000.0, 184.0, 1000.0]
2026-01-23 00:37:08,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 30 minutes, 56 seconds)
2026-01-23 00:38:39,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:47,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1904.43030 ± 1561.121
2026-01-23 00:38:47,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [213.9517, 2.4314826, 3491.841, 2868.758, 658.5008, -213.9852, 3538.1895, 1390.5677, 3742.4817, 3351.5662]
2026-01-23 00:38:47,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [67.0, 13.0, 1000.0, 787.0, 211.0, 1000.0, 1000.0, 386.0, 1000.0, 1000.0]
2026-01-23 00:38:47,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 28 minutes, 58 seconds)
2026-01-23 00:40:20,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:29,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2824.90088 ± 1228.883
2026-01-23 00:40:29,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3768.79, 3638.4788, 3779.5356, 3768.1855, 1472.3724, 3856.5952, 3817.9275, 487.55957, 2290.2576, 1369.3062]
2026-01-23 00:40:29,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 416.0, 1000.0, 1000.0, 147.0, 1000.0, 370.0]
2026-01-23 00:40:29,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 25 minutes, 55 seconds)
2026-01-23 00:42:03,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:10,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2344.38525 ± 1354.805
2026-01-23 00:42:10,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1215.0906, 950.0092, 917.4945, 2945.9695, 10.690385, 3738.9167, 3851.6597, 2940.6462, 3028.676, 3844.6995]
2026-01-23 00:42:10,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [335.0, 264.0, 271.0, 758.0, 15.0, 1000.0, 1000.0, 813.0, 768.0, 1000.0]
2026-01-23 00:42:10,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 13 seconds)
2026-01-23 00:43:43,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:53,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3457.20508 ± 1133.787
2026-01-23 00:43:53,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3781.3213, 3965.3909, 3830.3457, 3964.53, 3879.4666, 3742.3044, 3594.713, 3799.4026, 3943.096, 71.48181]
2026-01-23 00:43:53,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 34.0]
2026-01-23 00:43:53,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3457.21) for latency DatasetOffice
2026-01-23 00:43:53,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 22 minutes, 31 seconds)
2026-01-23 00:45:36,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:43,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2156.99463 ± 1461.483
2026-01-23 00:45:43,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [-456.0638, 2667.5374, 3890.8313, 1735.9165, 4014.2524, 22.754759, 1213.075, 2086.0024, 3097.7083, 3297.932]
2026-01-23 00:45:43,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 716.0, 1000.0, 462.0, 1000.0, 22.0, 324.0, 532.0, 951.0, 1000.0]
2026-01-23 00:45:43,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 20 minutes, 42 seconds)
2026-01-23 00:47:15,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:20,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1999.39294 ± 1611.880
2026-01-23 00:47:20,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [666.0915, 3771.553, 557.52893, 2270.468, 3900.2246, 3991.453, 364.17355, 285.26636, 3781.7065, 405.46475]
2026-01-23 00:47:20,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [194.0, 1000.0, 156.0, 606.0, 1000.0, 1000.0, 122.0, 101.0, 1000.0, 114.0]
2026-01-23 00:47:20,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 18 minutes, 48 seconds)
2026-01-23 00:48:55,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:02,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2506.53247 ± 1551.274
2026-01-23 00:49:02,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4032.6174, 2862.1672, 3981.361, 1006.9222, 54.85416, 4180.271, 1317.8778, 3787.866, 398.8703, 3442.5159]
2026-01-23 00:49:02,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 764.0, 1000.0, 268.0, 27.0, 1000.0, 388.0, 1000.0, 125.0, 861.0]
2026-01-23 00:49:02,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes, 54 seconds)
2026-01-23 00:50:38,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:46,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2437.26123 ± 1421.298
2026-01-23 00:50:46,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1072.8315, 1234.3258, 1996.417, 3990.329, 1285.3656, 92.707146, 4040.6294, 2679.3896, 4056.1597, 3924.455]
2026-01-23 00:50:46,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [282.0, 333.0, 1000.0, 1000.0, 340.0, 39.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:50:46,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 15 minutes, 40 seconds)
2026-01-23 00:52:12,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:21,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3362.64648 ± 1363.482
2026-01-23 00:52:21,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3899.8064, 3986.0718, 58.93765, 3828.6406, 1358.5542, 4013.0562, 3942.0947, 4210.1924, 4130.1562, 4198.955]
2026-01-23 00:52:21,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 34.0, 1000.0, 342.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:52:21,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 12 minutes, 50 seconds)
2026-01-23 00:53:59,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:08,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3263.74268 ± 915.564
2026-01-23 00:54:08,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3592.2976, 3453.007, 3590.2488, 4192.437, 2473.6252, 4133.2173, 2308.0603, 3904.4905, 3795.7334, 1194.3096]
2026-01-23 00:54:08,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [938.0, 914.0, 1000.0, 1000.0, 624.0, 1000.0, 1000.0, 1000.0, 925.0, 378.0]
2026-01-23 00:54:08,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 10 minutes, 42 seconds)
2026-01-23 00:55:42,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:50,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2897.65894 ± 1544.324
2026-01-23 00:55:50,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [540.2567, 4254.0474, 3954.2183, 141.21347, 4007.42, 3384.9873, 1091.9468, 3556.0264, 3880.094, 4166.3784]
2026-01-23 00:55:50,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [131.0, 1000.0, 1000.0, 60.0, 1000.0, 1000.0, 363.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:55:50,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 40 seconds)
2026-01-23 00:57:20,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:28,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2163.85864 ± 1104.281
2026-01-23 00:57:28,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4102.962, 2351.9255, 1607.12, 2431.3103, 994.66736, 2477.1665, 1089.2677, 3693.9001, 2450.941, 439.3241]
2026-01-23 00:57:28,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 446.0, 1000.0, 280.0, 718.0, 1000.0, 1000.0, 649.0, 130.0]
2026-01-23 00:57:28,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 29 seconds)
2026-01-23 00:59:04,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:12,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2637.62744 ± 1543.507
2026-01-23 00:59:12,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3967.8506, 3608.4258, 588.83826, 3748.612, 3966.2615, 29.90147, 3686.4526, 3702.7654, 2648.6553, 428.51303]
2026-01-23 00:59:12,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 186.0, 1000.0, 1000.0, 15.0, 1000.0, 1000.0, 1000.0, 128.0]
2026-01-23 00:59:12,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 44 seconds)
2026-01-23 01:00:46,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:53,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2594.39111 ± 1532.070
2026-01-23 01:00:53,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4076.4004, 4020.5234, 1487.4857, 4230.8086, 4273.6704, 566.91907, 1241.0079, 3932.5186, 882.8461, 1231.7317]
2026-01-23 01:00:53,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 376.0, 1000.0, 1000.0, 149.0, 377.0, 965.0, 240.0, 299.0]
2026-01-23 01:00:53,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 47 seconds)
2026-01-23 01:02:26,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3547.59717 ± 1376.483
2026-01-23 01:02:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4441.9966, 4258.091, 4306.8906, 4258.154, 4140.685, 4313.4404, 4082.2043, 4067.8542, 758.077, 848.5767]
2026-01-23 01:02:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 210.0, 222.0]
2026-01-23 01:02:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3547.60) for latency DatasetOffice
2026-01-23 01:02:35,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 31 seconds)
2026-01-23 01:04:08,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:17,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3179.77271 ± 1403.251
2026-01-23 01:04:17,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2934.138, 4365.277, 4390.9033, 3640.1365, 4081.9255, 64.31035, 1074.5416, 3910.0715, 3127.7727, 4208.6475]
2026-01-23 01:04:17,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [691.0, 1000.0, 1000.0, 1000.0, 1000.0, 30.0, 269.0, 941.0, 701.0, 1000.0]
2026-01-23 01:04:17,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 45 seconds)
2026-01-23 01:05:49,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:57,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2934.23584 ± 1406.370
2026-01-23 01:05:57,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3310.595, 426.1836, 4098.7666, 4216.403, 3877.4143, 4137.0967, 2460.4507, 4144.273, 2113.4868, 557.68915]
2026-01-23 01:05:57,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 123.0, 1000.0, 1000.0, 1000.0, 1000.0, 582.0, 1000.0, 531.0, 167.0]
2026-01-23 01:05:57,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 25 seconds)
2026-01-23 01:07:26,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:33,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2730.52686 ± 1239.979
2026-01-23 01:07:33,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1805.4685, 4075.3328, 1054.0942, 912.11035, 2065.3608, 2433.4026, 3964.024, 4204.381, 4257.2817, 2533.813]
2026-01-23 01:07:33,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [491.0, 1000.0, 260.0, 227.0, 530.0, 604.0, 942.0, 1000.0, 1000.0, 621.0]
2026-01-23 01:07:33,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 56 minutes, 50 seconds)
2026-01-23 01:09:12,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:21,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3644.34229 ± 710.914
2026-01-23 01:09:21,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3942.1357, 4097.1245, 4102.655, 2714.518, 4070.3184, 4161.636, 1908.5452, 3547.5847, 4067.9634, 3830.9387]
2026-01-23 01:09:21,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 701.0, 1000.0, 1000.0, 471.0, 842.0, 1000.0, 902.0]
2026-01-23 01:09:21,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3644.34) for latency DatasetOffice
2026-01-23 01:09:21,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 55 seconds)
2026-01-23 01:10:50,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:58,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2723.07739 ± 1323.116
2026-01-23 01:10:58,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2924.2817, 1094.3522, 4056.4978, 3862.7996, 3249.7122, 2716.5925, 1354.348, 132.83829, 3998.1409, 3841.211]
2026-01-23 01:10:58,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [695.0, 322.0, 1000.0, 1000.0, 773.0, 680.0, 1000.0, 49.0, 1000.0, 903.0]
2026-01-23 01:10:58,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 53 minutes, 40 seconds)
2026-01-23 01:12:37,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:44,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2712.37622 ± 1489.755
2026-01-23 01:12:44,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2779.241, 4235.7124, 4103.304, 4086.923, 99.07249, 3248.9355, 2711.031, 3842.1326, 47.970695, 1969.4388]
2026-01-23 01:12:44,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [677.0, 1000.0, 1000.0, 1000.0, 40.0, 800.0, 613.0, 919.0, 29.0, 461.0]
2026-01-23 01:12:44,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 26 seconds)
2026-01-23 01:14:12,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:21,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3131.20093 ± 1319.094
2026-01-23 01:14:21,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3948.3228, 1669.4796, 4221.507, 2228.442, 3779.298, 3978.6472, 4075.0881, 39.219368, 3246.1384, 4125.867]
2026-01-23 01:14:21,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 596.0, 1000.0, 1000.0, 1000.0, 123.0, 1000.0, 1000.0]
2026-01-23 01:14:21,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 22 seconds)
2026-01-23 01:16:01,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2112.79810 ± 1474.699
2026-01-23 01:16:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3894.7854, 2035.9614, 153.37508, 1186.2043, 3970.8887, 816.5053, 3110.92, 1699.4164, 4075.1472, 184.77779]
2026-01-23 01:16:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 500.0, 73.0, 317.0, 1000.0, 245.0, 744.0, 446.0, 1000.0, 73.0]
2026-01-23 01:16:06,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 36 seconds)
2026-01-23 01:17:37,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:43,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2411.06885 ± 1623.882
2026-01-23 01:17:43,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1833.2373, 21.172186, 672.7403, 2427.8113, 4237.9194, 4232.5776, 3953.0923, 2342.9163, 4215.8604, 173.3611]
2026-01-23 01:17:43,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [437.0, 31.0, 179.0, 654.0, 1000.0, 1000.0, 969.0, 552.0, 1000.0, 54.0]
2026-01-23 01:17:43,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 48 seconds)
2026-01-23 01:19:16,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2373.57910 ± 1667.218
2026-01-23 01:19:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [971.3284, 1276.1344, 3953.8557, 1957.7487, 29.312061, 50.81375, 2633.0625, 4336.032, 4249.8105, 4277.693]
2026-01-23 01:19:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [313.0, 297.0, 1000.0, 440.0, 44.0, 42.0, 613.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:22,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 21 seconds)
2026-01-23 01:20:51,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:58,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2334.54517 ± 1398.008
2026-01-23 01:20:58,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3748.797, 2434.0461, 696.47046, 3278.4895, 755.5417, 2758.8743, 4194.2627, 431.41113, 3961.5823, 1085.977]
2026-01-23 01:20:58,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [863.0, 651.0, 205.0, 843.0, 217.0, 679.0, 1000.0, 135.0, 948.0, 310.0]
2026-01-23 01:20:58,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 45 seconds)
2026-01-23 01:22:34,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:41,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2714.95654 ± 1646.367
2026-01-23 01:22:41,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [956.1926, 3446.6104, 4410.9194, 1901.0991, 4240.764, 408.83618, 4264.4424, 12.039106, 3293.6448, 4215.02]
2026-01-23 01:22:41,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [236.0, 1000.0, 1000.0, 470.0, 1000.0, 139.0, 1000.0, 14.0, 768.0, 1000.0]
2026-01-23 01:22:41,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 41 minutes, 41 seconds)
2026-01-23 01:24:15,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:23,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2983.22534 ± 1273.202
2026-01-23 01:24:23,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1145.1626, 970.2176, 3758.48, 3409.5728, 1197.5219, 4226.992, 4088.933, 3052.293, 4007.7395, 3975.341]
2026-01-23 01:24:23,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [323.0, 320.0, 1000.0, 812.0, 288.0, 1000.0, 1000.0, 751.0, 1000.0, 1000.0]
2026-01-23 01:24:23,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 45 seconds)
2026-01-23 01:25:54,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:02,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2560.40845 ± 1437.004
2026-01-23 01:26:02,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4200.21, 3917.3928, 3140.3955, 4171.9136, 3959.1208, 2117.3516, 1683.6501, 1392.2281, 11.324093, 1010.4966]
2026-01-23 01:26:02,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 761.0, 1000.0, 1000.0, 1000.0, 424.0, 363.0, 22.0, 260.0]
2026-01-23 01:26:02,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 13 seconds)
2026-01-23 01:27:40,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:46,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2655.17065 ± 1510.552
2026-01-23 01:27:46,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3676.4036, 4174.929, 248.01003, 593.897, 600.5355, 4183.9844, 3315.824, 2543.1355, 4155.2925, 3059.6948]
2026-01-23 01:27:46,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [917.0, 1000.0, 80.0, 152.0, 181.0, 1000.0, 737.0, 619.0, 1000.0, 733.0]
2026-01-23 01:27:46,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 58 seconds)
2026-01-23 01:29:22,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:30,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3007.19580 ± 1288.938
2026-01-23 01:29:30,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2598.8894, 2727.4443, 914.5271, 738.91516, 4104.7603, 3007.257, 4337.9795, 4387.8765, 4399.7026, 2854.6057]
2026-01-23 01:29:30,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [597.0, 619.0, 247.0, 202.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 692.0]
2026-01-23 01:29:30,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 52 seconds)
2026-01-23 01:31:01,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:06,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1877.12366 ± 1317.712
2026-01-23 01:31:06,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1993.3058, 3407.4187, 1076.8649, 4145.8086, 821.4012, 823.9959, 276.70636, 1116.7654, 3766.9746, 1341.9939]
2026-01-23 01:31:06,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [530.0, 803.0, 298.0, 1000.0, 207.0, 214.0, 80.0, 291.0, 933.0, 391.0]
2026-01-23 01:31:06,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 37 seconds)
2026-01-23 01:32:39,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:47,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3262.77588 ± 1114.881
2026-01-23 01:32:47,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1245.4663, 4217.7466, 4144.3794, 4271.889, 3210.8914, 3133.743, 4004.494, 1171.372, 4094.7979, 3132.9812]
2026-01-23 01:32:47,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [321.0, 1000.0, 1000.0, 1000.0, 716.0, 733.0, 1000.0, 260.0, 1000.0, 744.0]
2026-01-23 01:32:47,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 53 seconds)
2026-01-23 01:34:15,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:20,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1801.68420 ± 1298.156
2026-01-23 01:34:20,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3632.6838, 448.07776, 2196.6235, 1297.9158, 1798.7563, 541.40607, 2256.4045, 4306.8184, 1495.0911, 43.06426]
2026-01-23 01:34:20,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 128.0, 574.0, 307.0, 447.0, 143.0, 479.0, 1000.0, 368.0, 28.0]
2026-01-23 01:34:20,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 52 seconds)
2026-01-23 01:35:53,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:00,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2793.42969 ± 1526.542
2026-01-23 01:36:00,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2816.798, 4265.948, 4262.6924, 3893.9563, 1247.1926, 756.4503, 897.3613, 4469.2847, 4209.3174, 1115.2965]
2026-01-23 01:36:00,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 891.0, 316.0, 249.0, 236.0, 1000.0, 1000.0, 282.0]
2026-01-23 01:36:00,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 58 seconds)
2026-01-23 01:37:36,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:41,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1944.48267 ± 1632.422
2026-01-23 01:37:41,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [978.4435, 1118.4176, 217.96994, 1637.1443, 2614.902, 60.48373, 4351.198, 3893.5984, 4302.906, 269.76212]
2026-01-23 01:37:41,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [270.0, 319.0, 71.0, 401.0, 623.0, 32.0, 1000.0, 1000.0, 1000.0, 78.0]
2026-01-23 01:37:41,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 11 seconds)
2026-01-23 01:39:20,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:28,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2994.86670 ± 1594.582
2026-01-23 01:39:28,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [115.51646, 4309.783, 4586.6147, 4207.019, 214.592, 4223.1978, 2801.937, 3199.3792, 2135.058, 4155.57]
2026-01-23 01:39:28,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [44.0, 1000.0, 1000.0, 1000.0, 65.0, 1000.0, 639.0, 723.0, 485.0, 1000.0]
2026-01-23 01:39:28,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 5 seconds)
2026-01-23 01:40:58,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:05,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2500.28760 ± 1608.305
2026-01-23 01:41:05,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [684.2163, 113.49912, 3790.8276, 3604.8955, 4434.6016, 4120.1475, 3939.9712, 2461.8499, 1470.5898, 382.27747]
2026-01-23 01:41:05,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [174.0, 53.0, 860.0, 811.0, 1000.0, 1000.0, 983.0, 622.0, 345.0, 144.0]
2026-01-23 01:41:05,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 13 seconds)
2026-01-23 01:42:31,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3660.02686 ± 1310.586
2026-01-23 01:42:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4630.157, 4251.0093, 4486.75, 3858.822, 4402.698, 1290.8258, 4378.8643, 4081.132, 863.70386, 4356.308]
2026-01-23 01:42:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 874.0, 1000.0, 306.0, 1000.0, 1000.0, 203.0, 1000.0]
2026-01-23 01:42:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3660.03) for latency DatasetOffice
2026-01-23 01:42:40,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 42 seconds)
2026-01-23 01:44:17,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:26,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3290.96362 ± 1450.861
2026-01-23 01:44:26,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4345.872, 4139.245, 1661.7501, 4085.3125, 511.37735, 4280.988, 4419.4365, 4204.669, 1173.5709, 4087.417]
2026-01-23 01:44:26,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 461.0, 1000.0, 154.0, 1000.0, 1000.0, 1000.0, 331.0, 1000.0]
2026-01-23 01:44:26,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 13 seconds)
2026-01-23 01:46:00,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:06,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2509.90625 ± 1894.421
2026-01-23 01:46:06,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [89.71714, 365.15222, 4497.551, 4661.9346, 3665.3096, 1163.8595, 4505.0557, 4340.692, 124.59238, 1685.1991]
2026-01-23 01:46:06,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [49.0, 112.0, 1000.0, 1000.0, 834.0, 303.0, 1000.0, 1000.0, 48.0, 447.0]
2026-01-23 01:46:06,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 31 seconds)
2026-01-23 01:47:42,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:47,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2019.10254 ± 1348.280
2026-01-23 01:47:47,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1371.7814, 2358.2183, 836.6221, 37.507626, 1546.6372, 1736.393, 4527.2524, 1382.83, 4325.2515, 2068.532]
2026-01-23 01:47:47,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [332.0, 518.0, 250.0, 31.0, 359.0, 474.0, 1000.0, 331.0, 1000.0, 523.0]
2026-01-23 01:47:47,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 39 seconds)
2026-01-23 01:49:17,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:25,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3298.64648 ± 1623.603
2026-01-23 01:49:25,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4075.5493, 2477.4675, 4249.426, 4662.9316, 4284.942, 4496.4, 4300.0015, 521.1239, 9.271307, 3909.352]
2026-01-23 01:49:25,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 575.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 135.0, 13.0, 1000.0]
2026-01-23 01:49:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 1 second)
2026-01-23 01:51:00,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:04,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1633.44055 ± 1645.634
2026-01-23 01:51:04,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [920.5259, 186.53455, 925.76385, 305.09134, 114.26049, 2459.2856, 2441.2366, 4578.114, 4352.8887, 50.704334]
2026-01-23 01:51:04,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [218.0, 56.0, 207.0, 87.0, 50.0, 560.0, 543.0, 1000.0, 1000.0, 41.0]
2026-01-23 01:51:04,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 25 seconds)
2026-01-23 01:52:32,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:40,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3404.95508 ± 1476.329
2026-01-23 01:52:40,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4544.4263, 4426.283, 4389.688, 4350.127, 1615.5159, 271.17743, 4208.9614, 4175.4297, 1798.4635, 4269.479]
2026-01-23 01:52:40,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 960.0, 1000.0, 372.0, 79.0, 1000.0, 1000.0, 416.0, 1000.0]
2026-01-23 01:52:40,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 32 seconds)
2026-01-23 01:54:19,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:26,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2672.95068 ± 1593.905
2026-01-23 01:54:26,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4321.858, 4250.5317, 4445.9263, 1804.7852, 4231.985, 1254.5065, 1353.671, 615.6646, 569.3591, 3881.217]
2026-01-23 01:54:26,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 421.0, 1000.0, 353.0, 330.0, 198.0, 153.0, 913.0]
2026-01-23 01:54:26,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 59 seconds)
2026-01-23 01:56:03,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:10,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2640.43408 ± 1393.980
2026-01-23 01:56:10,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2462.3738, 1630.9617, 4063.003, 2109.8472, 1079.2267, 428.14908, 1826.7198, 4481.586, 4343.7295, 3978.7434]
2026-01-23 01:56:10,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [549.0, 370.0, 1000.0, 681.0, 427.0, 108.0, 438.0, 1000.0, 987.0, 872.0]
2026-01-23 01:56:10,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 22 seconds)
2026-01-23 01:57:40,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:47,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2748.21997 ± 1589.885
2026-01-23 01:57:47,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2731.1309, 387.14987, 141.38258, 4273.334, 4333.87, 3904.4429, 4077.7283, 2032.979, 1309.2635, 4290.9175]
2026-01-23 01:57:47,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [614.0, 106.0, 63.0, 976.0, 971.0, 893.0, 1000.0, 509.0, 337.0, 1000.0]
2026-01-23 01:57:47,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 41 seconds)
2026-01-23 01:59:18,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:27,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3139.87158 ± 1438.575
2026-01-23 01:59:27,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4387.395, 4478.7188, 305.73184, 2096.1716, 4324.6387, 4337.0957, 3745.6914, 1595.6011, 4209.2876, 1918.3856]
2026-01-23 01:59:27,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 100.0, 551.0, 1000.0, 1000.0, 855.0, 1000.0, 1000.0, 460.0]
2026-01-23 01:59:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 1 second)
2026-01-23 02:01:08,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:16,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2710.07178 ± 1337.347
2026-01-23 02:01:16,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1513.6698, 3678.145, 4296.988, 4347.7124, 4361.651, 2941.681, 814.8663, 2477.9307, 1107.9131, 1560.1586]
2026-01-23 02:01:16,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [347.0, 850.0, 1000.0, 1000.0, 1000.0, 736.0, 1000.0, 570.0, 298.0, 381.0]
2026-01-23 02:01:16,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 26 seconds)
2026-01-23 02:02:43,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:52,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3699.70459 ± 1431.477
2026-01-23 02:02:52,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4528.651, 4474.1045, 75.89396, 4362.774, 4336.244, 4349.991, 4143.016, 4477.041, 4417.45, 1831.8779]
2026-01-23 02:02:52,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 36.0, 1000.0, 1000.0, 1000.0, 917.0, 1000.0, 1000.0, 455.0]
2026-01-23 02:02:52,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3699.70) for latency DatasetOffice
2026-01-23 02:02:52,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2026-01-23 02:04:30,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:38,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3261.32178 ± 1601.040
2026-01-23 02:04:38,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4300.5234, 672.44086, 1431.3403, 3833.1924, 509.86325, 4451.456, 4504.7046, 4665.1206, 4429.738, 3814.8403]
2026-01-23 02:04:38,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 190.0, 342.0, 1000.0, 141.0, 1000.0, 1000.0, 1000.0, 1000.0, 893.0]
2026-01-23 02:04:38,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1299 [DEBUG]: Training session finished
