2026-01-22 23:14:08,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mem5 
2026-01-22 23:14:08,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mem5 
2026-01-22 23:14:08,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x146f61abb110>}
2026-01-22 23:14:08,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:08,732 baseline-bpql-noisy-ant:77 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-22 23:14:08,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-22 23:14:08,752 baseline-bpql-noisy-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=67, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:14:08,752 baseline-bpql-noisy-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:09,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:09,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:39,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:40,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: -76.22178 ± 186.211
2026-01-22 23:15:40,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [-27.593561, -4.587684, 2.478363, -6.45876, -46.97581, -14.475003, -633.3004, -7.5263486, -21.594938, -2.1836686]
2026-01-22 23:15:40,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [22.0, 14.0, 24.0, 14.0, 40.0, 22.0, 377.0, 14.0, 21.0, 20.0]
2026-01-22 23:15:40,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (-76.22) for latency DatasetOffice
2026-01-22 23:15:40,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 29 minutes, 49 seconds)
2026-01-22 23:17:16,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:19,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: -50.37539 ± 76.712
2026-01-22 23:17:19,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [8.320115, -125.44895, -246.07303, -49.330242, 12.804634, -37.359455, -51.292023, 11.240395, 6.1992083, -32.81456]
2026-01-22 23:17:19,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [57.0, 527.0, 1000.0, 360.0, 42.0, 65.0, 264.0, 17.0, 16.0, 193.0]
2026-01-22 23:17:19,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (-50.38) for latency DatasetOffice
2026-01-22 23:17:19,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 34 minutes, 58 seconds)
2026-01-22 23:19:00,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:04,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 6.82679 ± 70.693
2026-01-22 23:19:04,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [25.785847, 34.819126, 33.202374, 87.37405, 75.56813, -128.20468, -120.05348, 57.09039, 2.8271267, -0.14102569]
2026-01-22 23:19:04,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [107.0, 112.0, 215.0, 332.0, 310.0, 373.0, 459.0, 1000.0, 162.0, 355.0]
2026-01-22 23:19:04,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (6.83) for latency DatasetOffice
2026-01-22 23:19:04,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 38 minutes, 50 seconds)
2026-01-22 23:20:33,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:20:39,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 205.70544 ± 118.332
2026-01-22 23:20:39,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [418.6859, 300.0508, 291.02206, 251.36423, 77.01035, 49.45952, 212.36174, 283.92758, 107.5893, 65.582924]
2026-01-22 23:20:39,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 583.0, 596.0, 1000.0, 203.0, 95.0, 619.0, 1000.0, 496.0, 171.0]
2026-01-22 23:20:39,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (205.71) for latency DatasetOffice
2026-01-22 23:20:39,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 35 minutes, 55 seconds)
2026-01-22 23:22:15,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:22:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 317.53381 ± 200.942
2026-01-22 23:22:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [308.65884, 73.46327, 557.1719, 31.810505, 399.61612, 418.91098, 632.59125, 467.58313, 193.67767, 91.85446]
2026-01-22 23:22:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [958.0, 116.0, 1000.0, 41.0, 1000.0, 1000.0, 1000.0, 1000.0, 380.0, 72.0]
2026-01-22 23:22:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (317.53) for latency DatasetOffice
2026-01-22 23:22:22,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 36 minutes, 13 seconds)
2026-01-22 23:23:59,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:24:07,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 290.40259 ± 118.199
2026-01-22 23:24:07,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [204.28497, 528.74524, 440.46188, 290.30112, 223.65846, 194.6339, 192.68092, 147.23492, 386.697, 295.32745]
2026-01-22 23:24:07,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [213.0, 1000.0, 1000.0, 1000.0, 1000.0, 331.0, 392.0, 158.0, 1000.0, 531.0]
2026-01-22 23:24:07,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 38 minutes, 45 seconds)
2026-01-22 23:25:44,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:53,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 426.30191 ± 216.625
2026-01-22 23:25:53,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [283.12183, 540.52026, 559.3519, 603.2935, 620.90784, 501.13177, 578.4031, 45.37611, 513.6856, 17.226816]
2026-01-22 23:25:53,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [580.0, 1000.0, 1000.0, 978.0, 1000.0, 1000.0, 1000.0, 47.0, 1000.0, 17.0]
2026-01-22 23:25:53,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (426.30) for latency DatasetOffice
2026-01-22 23:25:53,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 39 minutes, 18 seconds)
2026-01-22 23:27:34,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:27:38,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 364.17960 ± 311.939
2026-01-22 23:27:38,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [331.93753, 350.60468, 192.38333, 37.103474, 87.35493, 564.271, 729.78253, 1059.8712, 97.98815, 190.49915]
2026-01-22 23:27:38,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [251.0, 411.0, 164.0, 56.0, 66.0, 1000.0, 1000.0, 1000.0, 86.0, 148.0]
2026-01-22 23:27:38,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 37 minutes, 49 seconds)
2026-01-22 23:29:09,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:29:19,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 642.49597 ± 344.562
2026-01-22 23:29:19,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1053.4604, 538.3377, 906.24536, 358.5526, 190.54614, 717.7056, 955.5196, 1087.0426, 562.7847, 54.76457]
2026-01-22 23:29:19,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [778.0, 1000.0, 1000.0, 1000.0, 154.0, 1000.0, 1000.0, 1000.0, 1000.0, 34.0]
2026-01-22 23:29:19,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (642.50) for latency DatasetOffice
2026-01-22 23:29:19,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 37 minutes, 37 seconds)
2026-01-22 23:30:54,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:03,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 787.97394 ± 444.692
2026-01-22 23:31:03,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [85.314476, 1451.7365, 1108.2552, 607.5039, 90.554924, 686.454, 582.1722, 1272.9691, 1149.1594, 845.6191]
2026-01-22 23:31:03,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [73.0, 1000.0, 1000.0, 1000.0, 62.0, 1000.0, 491.0, 1000.0, 986.0, 1000.0]
2026-01-22 23:31:03,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (787.97) for latency DatasetOffice
2026-01-22 23:31:03,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 36 minutes, 4 seconds)
2026-01-22 23:32:35,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:40,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 667.19080 ± 406.625
2026-01-22 23:32:40,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [171.80507, 1205.8538, 548.2408, 516.37573, 1318.7269, 1150.8445, 379.42075, 488.35587, 770.2102, 122.07468]
2026-01-22 23:32:40,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [114.0, 782.0, 355.0, 349.0, 1000.0, 1000.0, 272.0, 323.0, 462.0, 79.0]
2026-01-22 23:32:40,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 32 minutes, 24 seconds)
2026-01-22 23:34:22,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:34:30,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 965.29932 ± 531.481
2026-01-22 23:34:30,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [243.2377, 1513.536, 238.95721, 811.769, 1630.7073, 1036.6787, 380.7138, 1437.2284, 763.51697, 1596.6478]
2026-01-22 23:34:30,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [178.0, 1000.0, 164.0, 1000.0, 1000.0, 622.0, 230.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:34:30,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (965.30) for latency DatasetOffice
2026-01-22 23:34:30,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 31 minutes, 44 seconds)
2026-01-22 23:36:03,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:10,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1011.28009 ± 502.123
2026-01-22 23:36:10,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [265.2565, 1584.8007, 464.5441, 558.0905, 1673.8337, 1116.9623, 1165.8905, 1636.3134, 510.264, 1136.8458]
2026-01-22 23:36:10,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [179.0, 1000.0, 297.0, 323.0, 1000.0, 652.0, 748.0, 1000.0, 275.0, 815.0]
2026-01-22 23:36:10,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1011.28) for latency DatasetOffice
2026-01-22 23:36:10,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 28 minutes, 27 seconds)
2026-01-22 23:37:43,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:37:52,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 917.91290 ± 354.525
2026-01-22 23:37:52,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [437.57596, 623.76605, 1092.7952, 1085.3488, 1645.8566, 734.6462, 641.0238, 1221.0365, 1112.4972, 584.5831]
2026-01-22 23:37:52,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [190.0, 1000.0, 1000.0, 1000.0, 1000.0, 440.0, 322.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:37:52,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 27 minutes, 19 seconds)
2026-01-22 23:39:36,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:39:44,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1145.93054 ± 653.489
2026-01-22 23:39:44,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [334.75116, 1776.2319, 1216.0327, 68.83219, 1855.9352, 311.10928, 1565.0874, 1960.5613, 1243.1042, 1127.6598]
2026-01-22 23:39:44,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [208.0, 865.0, 632.0, 38.0, 1000.0, 158.0, 797.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:39:44,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1145.93) for latency DatasetOffice
2026-01-22 23:39:44,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 27 minutes, 37 seconds)
2026-01-22 23:41:12,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:41:19,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 764.82727 ± 579.095
2026-01-22 23:41:19,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [887.498, 1863.1794, 1652.2402, 899.9693, 192.58246, 607.9128, 902.48016, 388.09778, 134.72137, 119.59141]
2026-01-22 23:41:19,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 946.0, 434.0, 109.0, 1000.0, 1000.0, 245.0, 73.0, 72.0]
2026-01-22 23:41:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 25 minutes, 5 seconds)
2026-01-22 23:43:00,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:07,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1145.28601 ± 523.006
2026-01-22 23:43:07,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1443.3656, 2145.1174, 1034.3798, 1388.8545, 189.8266, 1430.2966, 615.73126, 635.98553, 1319.3896, 1249.9133]
2026-01-22 23:43:07,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [720.0, 1000.0, 1000.0, 585.0, 77.0, 725.0, 242.0, 337.0, 1000.0, 546.0]
2026-01-22 23:43:07,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 22 minutes, 56 seconds)
2026-01-22 23:44:42,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:52,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1903.22791 ± 517.249
2026-01-22 23:44:52,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1368.8818, 2365.659, 1329.9595, 2388.7761, 2273.4302, 1515.5446, 2523.3137, 1661.3138, 1146.5327, 2458.8665]
2026-01-22 23:44:52,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [581.0, 1000.0, 526.0, 1000.0, 944.0, 1000.0, 1000.0, 692.0, 1000.0, 1000.0]
2026-01-22 23:44:52,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1903.23) for latency DatasetOffice
2026-01-22 23:44:52,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 22 minutes, 36 seconds)
2026-01-22 23:46:24,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:32,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1599.63708 ± 793.246
2026-01-22 23:46:32,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1418.5612, 2611.3784, 2542.401, 1007.50037, 2398.8984, 2062.4548, 1046.2952, 1911.241, 100.91295, 896.72797]
2026-01-22 23:46:32,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 398.0, 1000.0, 883.0, 431.0, 1000.0, 50.0, 363.0]
2026-01-22 23:46:32,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 20 minutes, 18 seconds)
2026-01-22 23:48:15,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:22,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1282.64429 ± 813.636
2026-01-22 23:48:22,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [969.8528, 1039.4557, 468.06204, 2587.3677, 1243.4347, 1278.3685, 237.68895, 2425.6538, 367.08792, 2209.4695]
2026-01-22 23:48:22,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [380.0, 433.0, 187.0, 1000.0, 1000.0, 542.0, 106.0, 972.0, 165.0, 1000.0]
2026-01-22 23:48:22,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 18 minutes, 8 seconds)
2026-01-22 23:49:51,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:50:02,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2314.64600 ± 636.322
2026-01-22 23:50:02,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2777.9973, 1825.5697, 2773.1985, 1240.5181, 2433.6775, 2728.2214, 2595.6057, 2933.9666, 1121.942, 2715.7651]
2026-01-22 23:50:02,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 890.0, 1000.0, 1000.0, 1000.0, 481.0, 1000.0]
2026-01-22 23:50:02,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2314.65) for latency DatasetOffice
2026-01-22 23:50:02,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 17 minutes, 51 seconds)
2026-01-22 23:51:44,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:52,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1571.53101 ± 1038.401
2026-01-22 23:51:52,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1159.629, 2664.7817, 364.21207, 710.1155, 2920.9082, 2548.113, 2647.218, 539.5539, 147.86801, 2012.9102]
2026-01-22 23:51:52,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 158.0, 276.0, 1000.0, 1000.0, 1000.0, 1000.0, 92.0, 1000.0]
2026-01-22 23:51:52,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 16 minutes, 36 seconds)
2026-01-22 23:53:23,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:33,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2030.20703 ± 907.084
2026-01-22 23:53:33,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2053.0688, 2636.1506, 2748.9233, 2530.4785, 645.95386, 2541.6055, 2776.6301, 2803.95, 213.79323, 1351.5164]
2026-01-22 23:53:33,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [743.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 92.0, 1000.0]
2026-01-22 23:53:33,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 13 minutes, 47 seconds)
2026-01-22 23:55:06,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:17,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2640.26416 ± 721.865
2026-01-22 23:55:17,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2913.9053, 2990.5017, 2889.7576, 2919.7202, 487.40677, 2693.1313, 2946.9353, 2880.4998, 2882.48, 2798.305]
2026-01-22 23:55:17,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 204.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:55:17,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2640.26) for latency DatasetOffice
2026-01-22 23:55:17,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 12 minutes, 51 seconds)
2026-01-22 23:56:55,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:04,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1933.07520 ± 879.211
2026-01-22 23:57:04,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2949.6553, 1178.7732, 2883.902, 2901.3257, 1017.4122, 1207.6837, 2859.1992, 1275.623, 2336.1538, 721.0228]
2026-01-22 23:57:04,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 408.0, 1000.0, 1000.0, 1000.0, 489.0, 1000.0, 1000.0, 792.0, 311.0]
2026-01-22 23:57:04,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 10 minutes, 32 seconds)
2026-01-22 23:58:34,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:42,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2175.31104 ± 894.483
2026-01-22 23:58:42,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3132.576, 1297.5778, 2280.1497, 3223.2517, 1749.2186, 2033.86, 2808.0059, 102.11564, 2517.5046, 2608.8528]
2026-01-22 23:58:42,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 490.0, 814.0, 1000.0, 572.0, 698.0, 884.0, 66.0, 829.0, 1000.0]
2026-01-22 23:58:42,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 8 minutes, 14 seconds)
2026-01-23 00:00:18,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:22,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 842.72510 ± 562.961
2026-01-23 00:00:22,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [928.0619, 1339.2578, 1944.8217, 665.5556, 90.08071, 214.4787, 835.7371, 1062.4972, 124.4935, 1222.2657]
2026-01-23 00:00:22,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [291.0, 1000.0, 592.0, 232.0, 49.0, 91.0, 315.0, 340.0, 47.0, 1000.0]
2026-01-23 00:00:22,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 4 minutes, 4 seconds)
2026-01-23 00:02:00,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:08,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1804.51794 ± 1138.364
2026-01-23 00:02:08,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3292.2507, 3111.0757, 25.82052, 357.26202, 1861.4261, 1921.0951, 1901.678, 411.08527, 2032.2537, 3131.2327]
2026-01-23 00:02:08,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 25.0, 1000.0, 622.0, 634.0, 619.0, 169.0, 1000.0, 1000.0]
2026-01-23 00:02:08,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 3 minutes, 24 seconds)
2026-01-23 00:03:43,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:52,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2686.32471 ± 1110.652
2026-01-23 00:03:52,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3380.3345, 3213.6313, 783.0419, 794.7991, 3433.8015, 3462.5752, 1484.433, 3506.487, 3592.2507, 3211.8933]
2026-01-23 00:03:52,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 240.0, 288.0, 1000.0, 1000.0, 430.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:03:52,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2686.32) for latency DatasetOffice
2026-01-23 00:03:52,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute, 53 seconds)
2026-01-23 00:05:21,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:30,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2478.27222 ± 967.558
2026-01-23 00:05:30,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3357.01, 1358.7981, 2795.859, 3242.457, 3575.8726, 1213.1145, 3165.7017, 1277.3612, 1418.8479, 3377.6992]
2026-01-23 00:05:30,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 413.0, 1000.0, 1000.0, 1000.0, 444.0, 1000.0, 1000.0, 443.0, 1000.0]
2026-01-23 00:05:30,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 58 minutes, 10 seconds)
2026-01-23 00:07:02,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:11,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2510.50635 ± 912.292
2026-01-23 00:07:11,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3179.184, 2156.7776, 1657.1196, 2017.252, 3215.4277, 3372.2837, 587.1343, 3706.1338, 3049.5427, 2164.208]
2026-01-23 00:07:11,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 489.0, 736.0, 1000.0, 1000.0, 216.0, 1000.0, 879.0, 665.0]
2026-01-23 00:07:11,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 57 minutes)
2026-01-23 00:08:51,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:59,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1952.10510 ± 1015.299
2026-01-23 00:08:59,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1711.6715, 884.3911, 1454.9401, 534.4103, 1343.7672, 3113.2273, 3390.8972, 3521.561, 1261.2021, 2304.9822]
2026-01-23 00:08:59,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [540.0, 287.0, 1000.0, 180.0, 1000.0, 821.0, 1000.0, 1000.0, 388.0, 694.0]
2026-01-23 00:08:59,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 57 minutes, 9 seconds)
2026-01-23 00:10:30,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:38,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2423.35693 ± 1053.315
2026-01-23 00:10:38,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1828.6361, 3839.213, 1912.157, 3263.7473, 3549.4373, 1116.47, 3502.803, 2921.352, 889.03406, 1410.7198]
2026-01-23 00:10:38,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [485.0, 1000.0, 573.0, 932.0, 985.0, 372.0, 1000.0, 740.0, 284.0, 395.0]
2026-01-23 00:10:38,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 53 minutes, 54 seconds)
2026-01-23 00:12:07,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:16,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3036.24341 ± 1019.148
2026-01-23 00:12:16,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2485.5305, 1116.6666, 3816.3677, 2665.4512, 3537.205, 1366.5051, 3878.846, 3883.4744, 3811.8054, 3800.5818]
2026-01-23 00:12:16,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [678.0, 330.0, 1000.0, 737.0, 1000.0, 376.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:12:16,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3036.24) for latency DatasetOffice
2026-01-23 00:12:16,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 50 minutes, 52 seconds)
2026-01-23 00:13:56,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:05,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2533.61279 ± 1047.709
2026-01-23 00:14:05,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3403.5625, 1289.6794, 3408.774, 2406.8179, 3473.8333, 783.8947, 3374.1177, 2519.7107, 1080.4941, 3595.245]
2026-01-23 00:14:05,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 420.0, 1000.0, 1000.0, 1000.0, 233.0, 1000.0, 1000.0, 321.0, 1000.0]
2026-01-23 00:14:05,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 51 minutes, 24 seconds)
2026-01-23 00:15:36,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:44,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2917.70020 ± 1193.891
2026-01-23 00:15:44,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3701.3408, 3866.1055, 1567.0835, 3732.7168, 3711.1099, 3228.6372, 303.31384, 3562.4873, 3805.663, 1698.545]
2026-01-23 00:15:44,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 393.0, 1000.0, 1000.0, 901.0, 121.0, 1000.0, 1000.0, 507.0]
2026-01-23 00:15:44,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 49 minutes, 33 seconds)
2026-01-23 00:17:18,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:17:25,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2499.84985 ± 1298.924
2026-01-23 00:17:25,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1038.3217, 1059.8461, 3945.0151, 3859.0295, 1769.351, 1159.6978, 3054.8462, 1178.6014, 4105.9897, 3827.7983]
2026-01-23 00:17:25,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [269.0, 300.0, 1000.0, 974.0, 456.0, 319.0, 765.0, 308.0, 1000.0, 1000.0]
2026-01-23 00:17:25,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 46 minutes, 18 seconds)
2026-01-23 00:19:00,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:09,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2930.77148 ± 1125.019
2026-01-23 00:19:09,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3899.981, 3751.7292, 1000.61414, 627.7905, 4142.2085, 3467.1238, 3116.3662, 3403.2856, 2964.2693, 2934.3472]
2026-01-23 00:19:09,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 272.0, 181.0, 1000.0, 901.0, 833.0, 839.0, 765.0, 769.0]
2026-01-23 00:19:09,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 45 minutes, 35 seconds)
2026-01-23 00:20:42,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:50,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2935.28174 ± 931.871
2026-01-23 00:20:50,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2475.9783, 3964.3364, 3961.7913, 2183.5864, 3572.5513, 1375.9539, 3811.3445, 3772.6792, 2391.7324, 1842.8651]
2026-01-23 00:20:50,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [618.0, 1000.0, 1000.0, 612.0, 937.0, 375.0, 1000.0, 1000.0, 685.0, 500.0]
2026-01-23 00:20:50,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 44 minutes, 40 seconds)
2026-01-23 00:22:26,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:22:36,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3375.69995 ± 1114.796
2026-01-23 00:22:36,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3767.591, 4009.4387, 375.0338, 2276.2468, 3846.4177, 3820.9517, 3887.2844, 4115.3374, 3730.5989, 3928.1]
2026-01-23 00:22:36,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 120.0, 594.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:22:36,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3375.70) for latency DatasetOffice
2026-01-23 00:22:36,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 17 seconds)
2026-01-23 00:24:02,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:24:11,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2749.90381 ± 1053.470
2026-01-23 00:24:11,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [509.48175, 3824.0298, 2191.1865, 3786.6365, 3462.1648, 2618.8794, 2807.9604, 1576.5828, 2695.393, 4026.7239]
2026-01-23 00:24:11,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [140.0, 991.0, 623.0, 1000.0, 1000.0, 676.0, 1000.0, 435.0, 1000.0, 1000.0]
2026-01-23 00:24:11,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 39 minutes, 33 seconds)
2026-01-23 00:25:49,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:56,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2633.63525 ± 1375.141
2026-01-23 00:25:56,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1400.3859, 432.7923, 3235.616, 3072.077, 3872.1436, 3946.9749, 4299.5, 1137.8345, 3823.3481, 1115.682]
2026-01-23 00:25:56,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [359.0, 127.0, 806.0, 1000.0, 1000.0, 1000.0, 1000.0, 270.0, 1000.0, 326.0]
2026-01-23 00:25:56,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 38 minutes, 50 seconds)
2026-01-23 00:27:24,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:27:34,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3384.90698 ± 1014.761
2026-01-23 00:27:34,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [415.7618, 3077.458, 3718.6707, 3852.486, 3837.421, 3911.3484, 3672.4868, 3769.1733, 3747.6606, 3846.6042]
2026-01-23 00:27:34,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [133.0, 764.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:27:34,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3384.91) for latency DatasetOffice
2026-01-23 00:27:34,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 36 minutes, 1 second)
2026-01-23 00:29:15,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:25,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3118.69849 ± 1216.491
2026-01-23 00:29:25,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [760.3178, 4092.0947, 3885.7493, 4042.7295, 4008.5928, 4049.2493, 1092.1208, 2709.0542, 2615.4219, 3931.6567]
2026-01-23 00:29:25,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [217.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 676.0, 649.0, 1000.0]
2026-01-23 00:29:25,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 36 minutes, 1 second)
2026-01-23 00:30:53,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:02,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2993.17603 ± 1491.191
2026-01-23 00:31:02,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4197.743, 181.00806, 3510.6375, 4004.376, 1358.701, 3836.517, 4170.4316, 3971.8455, 763.7776, 3936.7212]
2026-01-23 00:31:02,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 218.0, 1000.0, 1000.0, 356.0, 1000.0, 1000.0, 1000.0, 191.0, 1000.0]
2026-01-23 00:31:02,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 32 minutes, 42 seconds)
2026-01-23 00:32:38,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2767.28003 ± 1491.189
2026-01-23 00:32:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [955.89813, 1932.7379, 2650.0564, 123.12669, 1582.7771, 4243.8877, 3152.5554, 4139.861, 4476.6294, 4415.2705]
2026-01-23 00:32:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [244.0, 456.0, 664.0, 47.0, 402.0, 1000.0, 819.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:32:45,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 32 minutes, 37 seconds)
2026-01-23 00:34:17,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:34:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3136.00049 ± 1129.137
2026-01-23 00:34:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4331.3237, 4150.988, 2358.5232, 2807.9375, 4409.5356, 1239.2584, 4024.299, 1488.2445, 3913.7354, 2636.1616]
2026-01-23 00:34:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 575.0, 691.0, 1000.0, 315.0, 1000.0, 368.0, 1000.0, 1000.0]
2026-01-23 00:34:26,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 1 second)
2026-01-23 00:36:02,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:09,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2131.05396 ± 1403.682
2026-01-23 00:36:09,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3093.0906, 2635.8508, 648.3706, 92.413925, 4377.354, 2574.8618, 1894.3671, 1252.8827, 4100.3574, 640.99054]
2026-01-23 00:36:09,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 753.0, 253.0, 48.0, 1000.0, 655.0, 1000.0, 336.0, 1000.0, 204.0]
2026-01-23 00:36:09,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 29 minutes, 15 seconds)
2026-01-23 00:37:36,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:46,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3777.47803 ± 622.642
2026-01-23 00:37:46,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4145.6577, 2018.9426, 4096.15, 4242.3804, 3972.6091, 3913.2942, 3420.196, 3947.9248, 4082.5889, 3935.0366]
2026-01-23 00:37:46,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 546.0, 1000.0, 1000.0, 1000.0, 1000.0, 919.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:37:46,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3777.48) for latency DatasetOffice
2026-01-23 00:37:46,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 25 minutes, 12 seconds)
2026-01-23 00:39:22,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:29,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2811.39819 ± 1527.152
2026-01-23 00:39:29,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [179.90625, 1360.0186, 4455.1753, 2411.469, 3982.097, 4313.482, 4147.3994, 845.3006, 4167.2837, 2251.8516]
2026-01-23 00:39:29,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [60.0, 346.0, 1000.0, 591.0, 1000.0, 1000.0, 1000.0, 237.0, 1000.0, 581.0]
2026-01-23 00:39:29,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 24 minutes, 35 seconds)
2026-01-23 00:41:01,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:09,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2595.37183 ± 1190.354
2026-01-23 00:41:09,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3977.7952, 4233.4795, 1808.2096, 2765.2036, 2787.8215, 769.0997, 3020.4333, 527.7733, 2433.0745, 3630.8286]
2026-01-23 00:41:09,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 464.0, 662.0, 1000.0, 226.0, 734.0, 147.0, 548.0, 895.0]
2026-01-23 00:41:09,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 22 minutes, 14 seconds)
2026-01-23 00:42:45,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:52,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2554.94873 ± 1601.765
2026-01-23 00:42:52,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4065.177, 93.577515, 1686.9032, 4160.0527, 3967.3972, 4124.5825, 1440.3633, 189.06378, 4014.6394, 1807.7303]
2026-01-23 00:42:52,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 39.0, 424.0, 1000.0, 1000.0, 1000.0, 377.0, 62.0, 1000.0, 462.0]
2026-01-23 00:42:52,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 20 minutes, 56 seconds)
2026-01-23 00:44:29,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:38,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3144.15967 ± 1105.637
2026-01-23 00:44:38,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4171.7446, 4013.8403, 350.57953, 3439.8467, 3137.992, 2331.113, 4178.5815, 3358.636, 2625.9695, 3833.2954]
2026-01-23 00:44:38,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 110.0, 1000.0, 782.0, 582.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:44:38,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 19 minutes, 49 seconds)
2026-01-23 00:46:04,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:14,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3927.74365 ± 550.607
2026-01-23 00:46:14,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4323.535, 4331.5156, 4157.5654, 3687.7288, 4114.985, 4088.1038, 4147.6577, 3942.9165, 2361.1086, 4122.3193]
2026-01-23 00:46:14,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 913.0, 1000.0, 1000.0, 1000.0, 1000.0, 627.0, 1000.0]
2026-01-23 00:46:14,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3927.74) for latency DatasetOffice
2026-01-23 00:46:14,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 17 minutes, 57 seconds)
2026-01-23 00:47:52,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:02,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3037.02515 ± 1293.185
2026-01-23 00:48:02,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3293.3303, 2260.6443, 1015.6429, 4399.487, 2649.578, 4110.073, 612.48755, 3980.3215, 4240.2, 3808.4863]
2026-01-23 00:48:02,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [798.0, 1000.0, 1000.0, 1000.0, 598.0, 1000.0, 159.0, 1000.0, 1000.0, 921.0]
2026-01-23 00:48:02,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes, 51 seconds)
2026-01-23 00:49:37,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:48,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3920.96436 ± 558.062
2026-01-23 00:49:48,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4256.3916, 4127.764, 4181.858, 3661.53, 4040.311, 3993.2385, 2348.946, 4006.0398, 4443.843, 4149.724]
2026-01-23 00:49:48,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:49:48,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 7 seconds)
2026-01-23 00:51:19,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:26,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2188.65381 ± 1423.117
2026-01-23 00:51:26,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [623.6668, 3356.3767, 1178.0815, 4482.784, 242.32283, 2718.692, 1439.2078, 1972.5247, 4412.866, 1460.014]
2026-01-23 00:51:26,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 801.0, 273.0, 1000.0, 79.0, 678.0, 329.0, 1000.0, 1000.0, 340.0]
2026-01-23 00:51:26,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 13 minutes, 46 seconds)
2026-01-23 00:52:58,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:09,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3799.82666 ± 791.092
2026-01-23 00:53:09,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1461.2258, 4084.0864, 4058.62, 3840.1406, 3879.149, 4234.3477, 4124.663, 4066.1416, 4293.5645, 3956.33]
2026-01-23 00:53:09,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [370.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:53:09,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 25 seconds)
2026-01-23 00:54:40,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:48,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2878.55225 ± 1543.973
2026-01-23 00:54:48,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3649.06, 247.63895, 2568.9736, 4258.4155, 63.27689, 4436.9775, 3801.5146, 3861.82, 1914.4592, 3983.3853]
2026-01-23 00:54:48,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 78.0, 609.0, 1000.0, 31.0, 1000.0, 1000.0, 1000.0, 470.0, 1000.0]
2026-01-23 00:54:48,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 10 minutes, 9 seconds)
2026-01-23 00:56:25,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:35,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3689.36450 ± 1094.566
2026-01-23 00:56:35,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4413.088, 4181.911, 4175.2964, 4374.324, 4070.6902, 3542.1003, 3283.5403, 4331.9897, 3945.6062, 575.09894]
2026-01-23 00:56:35,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 866.0, 786.0, 1000.0, 1000.0, 167.0]
2026-01-23 00:56:35,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 25 seconds)
2026-01-23 00:58:08,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:15,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2503.48926 ± 1289.834
2026-01-23 00:58:15,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2104.7183, 1350.373, 461.12085, 2575.5, 4151.1914, 696.5347, 2499.7273, 3462.8235, 3444.613, 4288.2905]
2026-01-23 00:58:15,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [474.0, 317.0, 152.0, 642.0, 954.0, 178.0, 594.0, 774.0, 840.0, 1000.0]
2026-01-23 00:58:15,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 54 seconds)
2026-01-23 00:59:45,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:54,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3520.72778 ± 1372.271
2026-01-23 00:59:54,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1271.2804, 3953.3042, 4131.311, 4240.32, 4465.3735, 4297.395, 4192.9937, 364.9934, 4206.4937, 4083.8108]
2026-01-23 00:59:54,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [295.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 112.0, 1000.0, 1000.0]
2026-01-23 00:59:54,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 17 seconds)
2026-01-23 01:01:30,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:38,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3147.12671 ± 1580.739
2026-01-23 01:01:38,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1956.6202, 1817.2631, 4259.1143, 1366.7614, 4316.805, 68.15546, 4593.2026, 4432.0664, 4288.9233, 4372.356]
2026-01-23 01:01:38,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [464.0, 418.0, 1000.0, 311.0, 1000.0, 28.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:01:38,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 50 seconds)
2026-01-23 01:03:12,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:23,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3862.59814 ± 970.455
2026-01-23 01:03:23,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4290.435, 4332.264, 4008.559, 4048.294, 4352.5713, 969.63916, 4162.606, 4124.4126, 4127.9756, 4209.224]
2026-01-23 01:03:23,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:03:23,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 53 seconds)
2026-01-23 01:05:02,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:09,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2578.52710 ± 1638.883
2026-01-23 01:05:09,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1751.7054, 4393.7817, 4024.4084, 4375.5063, 2004.6116, 160.09637, 4558.3276, 198.46802, 3009.681, 1308.6832]
2026-01-23 01:05:09,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [416.0, 1000.0, 1000.0, 987.0, 505.0, 60.0, 1000.0, 63.0, 1000.0, 392.0]
2026-01-23 01:05:09,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 55 seconds)
2026-01-23 01:06:35,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:44,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3462.03979 ± 989.627
2026-01-23 01:06:44,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4045.6064, 4634.4297, 2119.412, 4111.666, 4423.7617, 1835.2186, 3369.5862, 4448.7065, 2333.4722, 3298.5408]
2026-01-23 01:06:44,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 495.0, 1000.0, 1000.0, 450.0, 830.0, 1000.0, 576.0, 830.0]
2026-01-23 01:06:44,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 40 seconds)
2026-01-23 01:08:26,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:35,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3443.32031 ± 1384.988
2026-01-23 01:08:35,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3799.06, 4586.6714, 4455.109, 4239.766, 4283.463, 4572.4624, 1316.9335, 1133.9584, 1626.2722, 4419.508]
2026-01-23 01:08:35,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 292.0, 270.0, 411.0, 1000.0]
2026-01-23 01:08:35,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 16 seconds)
2026-01-23 01:10:06,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:13,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2870.43408 ± 1420.766
2026-01-23 01:10:13,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1775.1185, 176.88513, 4369.4062, 4237.2554, 4217.5093, 3658.0227, 1812.3401, 3357.7173, 1204.9573, 3895.1262]
2026-01-23 01:10:13,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [421.0, 61.0, 1000.0, 1000.0, 1000.0, 832.0, 430.0, 767.0, 313.0, 882.0]
2026-01-23 01:10:13,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 56 seconds)
2026-01-23 01:11:44,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:52,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3061.23462 ± 1257.453
2026-01-23 01:11:52,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2149.3794, 4221.69, 4084.956, 385.57037, 2568.0159, 3941.1782, 1902.9867, 4344.244, 4222.5796, 2791.7434]
2026-01-23 01:11:52,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [523.0, 951.0, 1000.0, 108.0, 587.0, 913.0, 451.0, 1000.0, 1000.0, 646.0]
2026-01-23 01:11:52,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 34 seconds)
2026-01-23 01:13:23,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:28,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1922.62036 ± 1735.830
2026-01-23 01:13:28,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4471.9556, 972.1778, 455.4801, 4044.4048, 4712.2617, 2173.79, 198.4848, 33.25552, 712.07587, 1452.3192]
2026-01-23 01:13:28,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 254.0, 114.0, 1000.0, 1000.0, 573.0, 144.0, 52.0, 198.0, 354.0]
2026-01-23 01:13:28,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 49 minutes, 56 seconds)
2026-01-23 01:15:06,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:15,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2931.52002 ± 1435.307
2026-01-23 01:15:15,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4518.976, 1548.0596, 2286.0208, 852.0415, 1707.7026, 4330.266, 4383.6265, 1299.7611, 4119.819, 4268.93]
2026-01-23 01:15:15,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 395.0, 1000.0, 221.0, 392.0, 1000.0, 1000.0, 324.0, 1000.0, 1000.0]
2026-01-23 01:15:15,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 23 seconds)
2026-01-23 01:16:46,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:54,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3374.32178 ± 1245.997
2026-01-23 01:16:54,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3989.0063, 4607.594, 1190.483, 4591.7573, 4238.957, 1431.9779, 2393.7568, 4516.0728, 3934.4731, 2849.1409]
2026-01-23 01:16:54,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [931.0, 1000.0, 288.0, 1000.0, 1000.0, 341.0, 550.0, 1000.0, 1000.0, 645.0]
2026-01-23 01:16:54,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 37 seconds)
2026-01-23 01:18:29,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:35,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2297.57471 ± 1824.355
2026-01-23 01:18:35,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [885.2325, 4460.6763, 4481.095, 417.38052, 64.16097, 1954.0396, 1052.4783, 4482.2812, 757.9914, 4420.4097]
2026-01-23 01:18:35,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [203.0, 1000.0, 1000.0, 112.0, 32.0, 451.0, 244.0, 1000.0, 199.0, 1000.0]
2026-01-23 01:18:35,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 8 seconds)
2026-01-23 01:20:08,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:16,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3126.05151 ± 1490.510
2026-01-23 01:20:16,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1067.9624, 37.45749, 1739.9225, 3487.4893, 4174.8374, 4163.944, 4032.2224, 4288.211, 4073.921, 4194.546]
2026-01-23 01:20:16,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [254.0, 27.0, 416.0, 817.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:20:16,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 43 minutes, 40 seconds)
2026-01-23 01:21:49,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:58,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3312.28271 ± 1723.714
2026-01-23 01:21:58,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4673.777, 4299.414, 4425.2197, 72.1997, 4303.7524, 4218.8833, 4349.095, 548.966, 4624.727, 1606.7913]
2026-01-23 01:21:58,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 40.0, 1000.0, 1000.0, 1000.0, 132.0, 1000.0, 380.0]
2026-01-23 01:21:58,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 27 seconds)
2026-01-23 01:23:29,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:38,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3378.07300 ± 1138.271
2026-01-23 01:23:38,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3146.1318, 4034.1475, 2282.015, 4160.528, 2691.482, 4598.8237, 3125.1733, 4473.715, 4412.9165, 855.7962]
2026-01-23 01:23:38,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [685.0, 1000.0, 511.0, 1000.0, 1000.0, 1000.0, 695.0, 1000.0, 1000.0, 219.0]
2026-01-23 01:23:38,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 14 seconds)
2026-01-23 01:25:12,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:22,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3927.95776 ± 844.852
2026-01-23 01:25:22,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1929.4282, 4276.0215, 4268.8975, 3225.6, 4514.7256, 4515.984, 3102.4714, 4574.0913, 4667.8745, 4204.4863]
2026-01-23 01:25:22,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [514.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 721.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:25:22,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3927.96) for latency DatasetOffice
2026-01-23 01:25:22,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 56 seconds)
2026-01-23 01:26:55,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:05,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3687.94775 ± 1119.031
2026-01-23 01:27:05,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4313.1606, 4356.256, 3878.8262, 4119.856, 4399.985, 2659.414, 688.45325, 3788.057, 4488.5264, 4186.9395]
2026-01-23 01:27:05,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 655.0, 147.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:27:05,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 24 seconds)
2026-01-23 01:28:45,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:54,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3664.08789 ± 1222.549
2026-01-23 01:28:54,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3214.321, 4573.002, 553.6703, 4472.8193, 3306.238, 4447.606, 4471.769, 4403.979, 2693.5247, 4503.949]
2026-01-23 01:28:54,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 129.0, 1000.0, 748.0, 1000.0, 1000.0, 1000.0, 639.0, 1000.0]
2026-01-23 01:28:54,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 16 seconds)
2026-01-23 01:30:24,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:34,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2963.64795 ± 1478.580
2026-01-23 01:30:34,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2533.6812, 4410.184, 4338.938, 2072.2407, 4144.4834, 1225.2986, 4205.668, 4390.3794, 66.45939, 2249.1458]
2026-01-23 01:30:34,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [578.0, 1000.0, 1000.0, 483.0, 951.0, 1000.0, 992.0, 1000.0, 1000.0, 540.0]
2026-01-23 01:30:34,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 24 seconds)
2026-01-23 01:32:06,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:16,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3904.11255 ± 1126.589
2026-01-23 01:32:16,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [943.7583, 4205.3535, 4430.0405, 4278.12, 4449.96, 4551.823, 4419.7183, 4700.8223, 2657.7664, 4403.76]
2026-01-23 01:32:16,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [240.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 630.0, 1000.0]
2026-01-23 01:32:16,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 49 seconds)
2026-01-23 01:33:46,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:56,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3733.70386 ± 1332.318
2026-01-23 01:33:56,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4490.6226, 3991.6558, 4385.369, 405.2121, 4238.7437, 1976.2847, 4639.8, 4239.882, 4656.236, 4313.235]
2026-01-23 01:33:56,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 106.0, 1000.0, 1000.0, 1000.0, 923.0, 1000.0, 1000.0]
2026-01-23 01:33:56,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 49 seconds)
2026-01-23 01:35:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:47,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3547.24146 ± 1271.337
2026-01-23 01:35:47,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4369.1763, 4491.018, 4551.9536, 2260.5664, 4600.032, 1867.7393, 946.7588, 4414.9194, 3757.3562, 4212.8945]
2026-01-23 01:35:47,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 494.0, 1000.0, 1000.0, 216.0, 1000.0, 837.0, 937.0]
2026-01-23 01:35:47,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 34 seconds)
2026-01-23 01:37:17,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:25,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3278.79053 ± 1658.832
2026-01-23 01:37:25,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4306.871, 2680.1365, 4336.478, 4489.882, 4414.042, 4329.4717, 4296.9927, 3688.239, 122.78863, 123.003876]
2026-01-23 01:37:25,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 616.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 829.0, 112.0, 105.0]
2026-01-23 01:37:25,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 14 seconds)
2026-01-23 01:38:59,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:08,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3678.29736 ± 1109.221
2026-01-23 01:39:08,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4488.202, 4261.347, 2239.5083, 4447.9873, 2465.739, 1416.2253, 4032.6992, 4508.1035, 4553.127, 4370.034]
2026-01-23 01:39:08,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 557.0, 1000.0, 560.0, 326.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:39:08,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 43 seconds)
2026-01-23 01:40:43,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:54,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4082.80127 ± 501.487
2026-01-23 01:40:54,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4116.492, 4235.0195, 4343.427, 4393.189, 2596.2969, 4217.668, 4290.5454, 4245.403, 4152.174, 4237.8]
2026-01-23 01:40:54,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:40:54,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (4082.80) for latency DatasetOffice
2026-01-23 01:40:54,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 10 seconds)
2026-01-23 01:42:23,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4127.14453 ± 651.321
2026-01-23 01:42:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4794.3486, 4070.581, 4443.4556, 4772.1343, 3050.5186, 3721.6921, 2910.1787, 4675.989, 4367.136, 4465.411]
2026-01-23 01:42:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 767.0, 1000.0, 689.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:42:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (4127.14) for latency DatasetOffice
2026-01-23 01:42:34,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 26 seconds)
2026-01-23 01:44:13,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:22,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3348.60010 ± 1361.349
2026-01-23 01:44:22,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4616.315, 4411.4136, 4420.624, 4351.081, 1412.3625, 3563.4214, 3510.2764, 4566.166, 1522.869, 1111.4688]
2026-01-23 01:44:22,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 813.0, 1000.0, 326.0, 269.0]
2026-01-23 01:44:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 37 seconds)
2026-01-23 01:45:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:02,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4179.06787 ± 516.657
2026-01-23 01:46:02,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3730.8386, 4154.975, 4296.805, 4331.7764, 4436.012, 4482.3804, 4323.7935, 4383.5474, 2835.4375, 4815.115]
2026-01-23 01:46:02,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [820.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 644.0, 1000.0]
2026-01-23 01:46:02,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (4179.07) for latency DatasetOffice
2026-01-23 01:46:02,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 57 seconds)
2026-01-23 01:47:35,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:44,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3212.65186 ± 1607.634
2026-01-23 01:47:44,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4359.7646, 4035.313, 169.09938, 4224.7217, 2993.8364, 4525.754, 4636.8154, 1896.1366, 4563.6206, 721.45557]
2026-01-23 01:47:44,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 66.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 178.0]
2026-01-23 01:47:44,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 11 seconds)
2026-01-23 01:49:21,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3500.76880 ± 1501.467
2026-01-23 01:49:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4464.788, 4489.8765, 734.7566, 1921.509, 1103.8431, 4133.145, 4627.601, 4496.8643, 4607.835, 4427.4688]
2026-01-23 01:49:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 195.0, 423.0, 280.0, 1000.0, 1000.0, 970.0, 1000.0, 1000.0]
2026-01-23 01:49:30,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 28 seconds)
2026-01-23 01:51:04,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:13,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2689.62769 ± 1459.343
2026-01-23 01:51:13,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3125.9944, 227.7519, 1959.0836, 993.58624, 4449.478, 4435.192, 2104.543, 3407.4163, 4563.316, 1629.9166]
2026-01-23 01:51:13,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [682.0, 68.0, 432.0, 1000.0, 1000.0, 1000.0, 502.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:51:13,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 50 seconds)
2026-01-23 01:52:48,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:58,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3959.91162 ± 1131.792
2026-01-23 01:52:58,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [952.90393, 4576.407, 4420.4004, 4442.0063, 4265.933, 4554.142, 4411.17, 4644.368, 4574.5034, 2757.2847]
2026-01-23 01:52:58,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [222.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 632.0]
2026-01-23 01:52:58,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 1 second)
2026-01-23 01:54:29,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:38,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3671.02026 ± 1617.215
2026-01-23 01:54:38,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4546.3564, 4513.2207, 4375.3926, 4564.312, 4393.632, 4582.6997, 901.7863, 35.16474, 4603.0356, 4194.6035]
2026-01-23 01:54:38,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 219.0, 26.0, 1000.0, 1000.0]
2026-01-23 01:55:20,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 8 seconds)
2026-01-23 01:56:56,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2871.32080 ± 1316.720
2026-01-23 01:57:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2588.031, 4425.649, 4737.9663, 3500.5354, 1214.6902, 1416.2825, 925.2196, 2724.8818, 4383.0054, 2796.9443]
2026-01-23 01:57:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [622.0, 1000.0, 1000.0, 939.0, 391.0, 337.0, 1000.0, 578.0, 1000.0, 599.0]
2026-01-23 01:57:04,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 20 seconds)
2026-01-23 01:58:40,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:47,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2574.36768 ± 1767.413
2026-01-23 01:58:47,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4334.764, 4728.487, 2559.8958, 4523.365, 170.09589, 469.55994, 901.08075, 1741.7421, 1527.0295, 4787.658]
2026-01-23 01:58:47,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 54.0, 110.0, 190.0, 404.0, 1000.0, 1000.0]
2026-01-23 01:58:47,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 26 seconds)
2026-01-23 02:00:20,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:30,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3854.13550 ± 1186.149
2026-01-23 02:00:30,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4528.6836, 4490.976, 4279.4233, 4538.9116, 4483.7993, 4308.147, 4433.441, 4462.7505, 1852.6996, 1162.5244]
2026-01-23 02:00:30,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 461.0, 256.0]
2026-01-23 02:00:30,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 34 seconds)
2026-01-23 02:02:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:12,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2586.14551 ± 1627.103
2026-01-23 02:02:12,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4352.8877, 1078.8826, 2597.5337, 4388.818, 2673.6062, 520.43164, 4169.0425, 4567.6934, 1218.8427, 293.71588]
2026-01-23 02:02:12,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 316.0, 575.0, 1000.0, 666.0, 198.0, 1000.0, 1000.0, 283.0, 77.0]
2026-01-23 02:02:12,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 41 seconds)
2026-01-23 02:03:46,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:53,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2850.87451 ± 1381.126
2026-01-23 02:03:53,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2763.6428, 4783.443, 1423.812, 1982.7822, 556.41473, 3963.942, 3205.4236, 3769.3003, 1447.9482, 4612.0396]
2026-01-23 02:03:53,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [604.0, 1000.0, 307.0, 429.0, 147.0, 819.0, 659.0, 863.0, 315.0, 988.0]
2026-01-23 02:03:53,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 42 seconds)
2026-01-23 02:05:28,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:34,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2631.22217 ± 1697.551
2026-01-23 02:05:34,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [645.5177, 165.78082, 4335.3364, 4404.304, 2233.6682, 4646.878, 2715.0327, 4703.296, 1622.3508, 840.0542]
2026-01-23 02:05:34,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [178.0, 50.0, 1000.0, 956.0, 518.0, 1000.0, 626.0, 1000.0, 399.0, 200.0]
2026-01-23 02:05:35,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1299 [DEBUG]: Training session finished
