2026-01-22 23:14:13,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mem2
2026-01-22 23:14:13,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mem2
2026-01-22 23:14:13,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x1540cae15690>}
2026-01-22 23:14:13,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:13,665 baseline-bpql-noisy-ant:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-22 23:14:13,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-22 23:14:13,682 baseline-bpql-noisy-ant:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=43, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:14:13,682 baseline-bpql-noisy-ant:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:14,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:14,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:50,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:16:02,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: -2273.92212 ± 489.614
2026-01-22 23:16:02,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [-2529.121, -2513.2441, -2529.345, -2543.5813, -2553.3728, -1284.0028, -2439.2974, -1309.6617, -2548.3918, -2489.2031]
2026-01-22 23:16:02,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:16:02,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (-2273.92) for latency DatasetOffice
2026-01-22 23:16:02,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 58 minutes, 14 seconds)
2026-01-22 23:17:38,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:43,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: -40.06763 ± 40.491
2026-01-22 23:17:43,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [-16.304874, -99.384186, -15.832253, -26.015432, -17.436867, -3.4231777, -41.890366, -4.6971, -43.956615, -131.73547]
2026-01-22 23:17:43,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [94.0, 1000.0, 72.0, 1000.0, 107.0, 219.0, 245.0, 189.0, 147.0, 1000.0]
2026-01-22 23:17:43,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (-40.07) for latency DatasetOffice
2026-01-22 23:17:43,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 50 minutes, 36 seconds)
2026-01-22 23:19:22,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:32,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 282.98920 ± 51.201
2026-01-22 23:19:32,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [362.30032, 240.50287, 266.71817, 255.21152, 312.13422, 214.2491, 255.92203, 252.19054, 288.70456, 381.95883]
2026-01-22 23:19:32,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 565.0, 639.0, 839.0, 1000.0, 1000.0, 1000.0, 716.0, 1000.0, 1000.0]
2026-01-22 23:19:32,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (282.99) for latency DatasetOffice
2026-01-22 23:19:32,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 51 minutes, 26 seconds)
2026-01-22 23:21:10,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:21:21,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 410.39630 ± 100.848
2026-01-22 23:21:21,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [320.73434, 322.97986, 531.6745, 535.9952, 324.18958, 394.2313, 259.8084, 375.04544, 524.4399, 514.86426]
2026-01-22 23:21:21,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 419.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:21:21,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (410.40) for latency DatasetOffice
2026-01-22 23:21:21,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 50 minutes, 41 seconds)
2026-01-22 23:23:01,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 475.88199 ± 228.053
2026-01-22 23:23:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [706.356, 46.32677, 390.8144, 454.87482, 660.07935, 564.87854, 70.81796, 680.288, 590.8619, 593.522]
2026-01-22 23:23:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 92.0, 621.0, 596.0, 1000.0, 1000.0, 73.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:23:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (475.88) for latency DatasetOffice
2026-01-22 23:23:09,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 49 minutes, 25 seconds)
2026-01-22 23:24:44,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:24:54,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 406.02756 ± 301.767
2026-01-22 23:24:54,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [484.81262, 599.34796, -291.33878, 760.96027, 34.14631, 578.0818, 256.5014, 510.80615, 519.1003, 607.85736]
2026-01-22 23:24:54,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 32.0, 1000.0, 214.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:24:54,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 46 minutes, 39 seconds)
2026-01-22 23:26:34,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:41,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 415.29004 ± 198.764
2026-01-22 23:26:41,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [119.5614, 492.9001, 764.81866, 539.26434, 103.97356, 519.0451, 437.81573, 240.19063, 368.37756, 566.9535]
2026-01-22 23:26:41,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [116.0, 1000.0, 1000.0, 1000.0, 112.0, 631.0, 488.0, 290.0, 353.0, 1000.0]
2026-01-22 23:26:41,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 46 minutes, 47 seconds)
2026-01-22 23:28:24,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:32,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 644.49280 ± 278.837
2026-01-22 23:28:32,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [610.81116, 1018.42377, 919.2133, 407.99576, 901.50134, 224.82085, 330.63806, 942.43005, 711.445, 377.64963]
2026-01-22 23:28:32,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 378.0, 1000.0, 226.0, 276.0, 1000.0, 1000.0, 346.0]
2026-01-22 23:28:32,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (644.49) for latency DatasetOffice
2026-01-22 23:28:32,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 45 minutes, 37 seconds)
2026-01-22 23:30:05,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:30:12,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 527.81580 ± 288.946
2026-01-22 23:30:12,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [204.30022, 88.74636, 534.2066, 810.5706, 72.76624, 741.26965, 818.16345, 472.36646, 734.79816, 800.96967]
2026-01-22 23:30:12,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [169.0, 68.0, 604.0, 1000.0, 44.0, 1000.0, 1000.0, 378.0, 1000.0, 1000.0]
2026-01-22 23:30:12,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 41 minutes, 14 seconds)
2026-01-22 23:31:51,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:56,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 344.71274 ± 303.562
2026-01-22 23:31:56,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [742.3221, 77.42961, 87.05575, 768.7058, 641.37274, 113.92652, 94.6992, 701.32056, 159.23976, 61.05552]
2026-01-22 23:31:56,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 38.0, 100.0, 1000.0, 1000.0, 132.0, 58.0, 1000.0, 121.0, 38.0]
2026-01-22 23:31:56,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 38 minutes, 9 seconds)
2026-01-22 23:33:42,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:33:48,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 439.54791 ± 313.784
2026-01-22 23:33:48,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [892.9253, 575.8583, 14.919054, 672.27893, 402.15033, 771.0242, 82.04603, 222.47052, 39.64658, 722.1599]
2026-01-22 23:33:48,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [561.0, 1000.0, 15.0, 1000.0, 231.0, 1000.0, 53.0, 127.0, 34.0, 1000.0]
2026-01-22 23:33:48,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 38 minutes, 24 seconds)
2026-01-22 23:35:21,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:35:29,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 651.43231 ± 301.837
2026-01-22 23:35:29,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [326.36954, 763.1067, 818.6505, 965.11694, 774.84143, 961.4689, 455.6637, 313.29388, 1008.6941, 127.11698]
2026-01-22 23:35:29,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [200.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 265.0, 231.0, 1000.0, 85.0]
2026-01-22 23:35:29,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (651.43) for latency DatasetOffice
2026-01-22 23:35:29,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 34 minutes, 56 seconds)
2026-01-22 23:37:11,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:37:17,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 549.52039 ± 380.779
2026-01-22 23:37:17,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1026.3126, 830.73364, 1113.5187, 68.304405, 655.83795, 35.310753, 564.0607, 789.07806, 95.21181, 316.8357]
2026-01-22 23:37:17,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 675.0, 52.0, 1000.0, 45.0, 359.0, 1000.0, 82.0, 214.0]
2026-01-22 23:37:17,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 32 minutes, 17 seconds)
2026-01-22 23:38:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:59,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 812.94769 ± 463.724
2026-01-22 23:38:59,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1100.6278, 1638.0863, 917.19635, 659.7093, 219.60854, 1316.0127, 895.3152, 954.49023, 256.22546, 172.20494]
2026-01-22 23:38:59,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 388.0, 135.0, 1000.0, 544.0, 1000.0, 163.0, 105.0]
2026-01-22 23:38:59,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (812.95) for latency DatasetOffice
2026-01-22 23:39:00,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 31 minutes, 8 seconds)
2026-01-22 23:40:44,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:55,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 992.86511 ± 327.203
2026-01-22 23:40:55,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1049.5715, 254.2238, 1091.9933, 910.1807, 1132.1376, 916.49994, 802.5958, 1143.0483, 1641.8057, 986.5946]
2026-01-22 23:40:55,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 149.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:40:55,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (992.87) for latency DatasetOffice
2026-01-22 23:40:55,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 32 minutes, 41 seconds)
2026-01-22 23:42:33,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:41,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 808.68878 ± 404.689
2026-01-22 23:42:41,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [898.29626, 960.74207, 1200.2733, 1163.8069, 609.2381, 58.996986, 1025.065, 1134.6562, 966.8687, 68.94476]
2026-01-22 23:42:41,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 711.0, 1000.0, 303.0, 29.0, 1000.0, 1000.0, 1000.0, 47.0]
2026-01-22 23:42:41,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 29 minutes, 18 seconds)
2026-01-22 23:44:18,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:27,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1068.99243 ± 491.804
2026-01-22 23:44:27,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1181.0425, 1866.0009, 1041.5258, 1108.4966, 1647.1992, 1089.4961, 366.35794, 246.95824, 707.80963, 1435.0378]
2026-01-22 23:44:27,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 965.0, 1000.0, 1000.0, 908.0, 1000.0, 207.0, 144.0, 1000.0, 1000.0]
2026-01-22 23:44:27,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1068.99) for latency DatasetOffice
2026-01-22 23:44:27,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 28 minutes, 51 seconds)
2026-01-22 23:46:03,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:12,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 950.21124 ± 449.447
2026-01-22 23:46:12,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [153.3606, 1642.585, 875.3122, 1329.293, 993.7838, 307.15744, 1478.4745, 905.1733, 1076.3699, 740.60254]
2026-01-22 23:46:12,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [95.0, 1000.0, 1000.0, 1000.0, 1000.0, 188.0, 1000.0, 1000.0, 1000.0, 454.0]
2026-01-22 23:46:12,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 26 minutes, 9 seconds)
2026-01-22 23:47:57,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:05,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 861.68634 ± 380.363
2026-01-22 23:48:05,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1071.4308, 825.8673, 566.81494, 785.7985, 670.9626, 830.89813, 949.24634, 1768.231, 202.18036, 945.43365]
2026-01-22 23:48:05,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 284.0, 454.0, 342.0, 1000.0, 549.0, 1000.0, 122.0, 1000.0]
2026-01-22 23:48:05,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 27 minutes, 12 seconds)
2026-01-22 23:49:37,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:49:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1102.37952 ± 652.240
2026-01-22 23:49:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1337.8958, 2320.1035, 1131.116, 970.3657, 355.66354, 1996.0962, 287.58362, 1417.9199, 403.3828, 803.66833]
2026-01-22 23:49:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 156.0, 968.0, 176.0, 1000.0, 195.0, 1000.0]
2026-01-22 23:49:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1102.38) for latency DatasetOffice
2026-01-22 23:49:45,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 21 minutes, 20 seconds)
2026-01-22 23:51:24,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:33,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1300.16956 ± 628.199
2026-01-22 23:51:33,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [409.3149, 2057.0112, 1734.0848, 2381.4583, 1184.1879, 1676.9061, 850.07776, 959.1961, 1314.1661, 435.29224]
2026-01-22 23:51:33,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [182.0, 950.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 570.0, 233.0]
2026-01-22 23:51:33,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1300.17) for latency DatasetOffice
2026-01-22 23:51:33,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 20 minutes, 7 seconds)
2026-01-22 23:53:16,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:25,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1740.47656 ± 673.288
2026-01-22 23:53:25,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2343.9192, 2000.0254, 2040.2852, 1271.8296, 74.753456, 2363.4136, 1204.3619, 2263.0989, 1962.4258, 1880.6532]
2026-01-22 23:53:25,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 569.0, 53.0, 1000.0, 467.0, 1000.0, 937.0, 1000.0]
2026-01-22 23:53:25,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (1740.48) for latency DatasetOffice
2026-01-22 23:53:25,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 19 minutes, 53 seconds)
2026-01-22 23:54:58,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:07,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1684.23169 ± 913.311
2026-01-22 23:55:07,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1009.6633, 2554.6018, 1106.876, 396.01703, 1511.6674, 2643.2354, 186.06134, 2654.8623, 2341.859, 2437.4734]
2026-01-22 23:55:07,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 147.0, 615.0, 1000.0, 72.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:55:07,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 17 minutes, 23 seconds)
2026-01-22 23:56:45,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:56:56,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2367.50464 ± 483.771
2026-01-22 23:56:56,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2748.7656, 1591.3746, 1609.1812, 2529.4407, 2582.957, 2471.2212, 2850.4775, 1767.9943, 2638.3057, 2885.3271]
2026-01-22 23:56:56,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 658.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:56:56,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2367.50) for latency DatasetOffice
2026-01-22 23:56:56,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 14 minutes, 38 seconds)
2026-01-22 23:58:36,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:44,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1160.15320 ± 821.155
2026-01-22 23:58:44,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [176.1831, 853.58606, 410.98926, 2643.8433, 1223.414, 895.04474, 964.701, 1365.7365, 419.59158, 2648.4424]
2026-01-22 23:58:44,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [73.0, 1000.0, 159.0, 1000.0, 1000.0, 1000.0, 1000.0, 535.0, 160.0, 1000.0]
2026-01-22 23:58:44,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 14 minutes, 42 seconds)
2026-01-23 00:00:27,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:36,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2454.88916 ± 674.354
2026-01-23 00:00:36,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2855.0854, 3084.0552, 2925.4512, 1651.1117, 980.3712, 1998.9086, 2265.133, 2863.0388, 3070.667, 2855.0713]
2026-01-23 00:00:36,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 557.0, 1000.0, 691.0, 822.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:00:36,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2454.89) for latency DatasetOffice
2026-01-23 00:00:36,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 13 minutes, 59 seconds)
2026-01-23 00:02:12,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:21,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2133.08838 ± 1142.808
2026-01-23 00:02:21,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2996.147, 2942.122, 171.58713, 2610.2085, 3080.5566, 1055.5059, 2595.8652, 2959.9832, 95.41514, 2823.4902]
2026-01-23 00:02:21,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 87.0, 1000.0, 1000.0, 359.0, 1000.0, 1000.0, 50.0, 1000.0]
2026-01-23 00:02:21,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 10 minutes, 17 seconds)
2026-01-23 00:04:01,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:10,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2260.63916 ± 911.033
2026-01-23 00:04:10,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2982.0132, 2653.678, 3116.326, 2977.3403, 1457.5684, 2950.9763, 1592.3159, 1878.696, 2822.387, 175.09187]
2026-01-23 00:04:10,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 867.0, 1000.0, 1000.0, 493.0, 1000.0, 561.0, 1000.0, 1000.0, 66.0]
2026-01-23 00:04:10,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 10 minutes, 15 seconds)
2026-01-23 00:05:48,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1967.35669 ± 963.983
2026-01-23 00:05:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3059.3245, 903.56244, 3199.8672, 2488.2205, 2347.783, 358.87427, 3055.3486, 1273.1389, 1086.3257, 1901.1206]
2026-01-23 00:05:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 748.0, 1000.0, 134.0, 1000.0, 1000.0, 324.0, 649.0]
2026-01-23 00:05:57,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 8 minutes)
2026-01-23 00:07:36,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:44,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2099.46362 ± 846.468
2026-01-23 00:07:44,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2057.3735, 1901.3595, 2386.1853, 2826.9934, 1388.6715, 903.1027, 597.4957, 2934.9866, 3037.0889, 2961.379]
2026-01-23 00:07:44,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [598.0, 713.0, 1000.0, 1000.0, 477.0, 317.0, 196.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:07:44,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 6 minutes)
2026-01-23 00:09:28,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:36,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2076.58423 ± 1084.881
2026-01-23 00:09:36,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1439.9567, 750.34625, 3262.779, 3023.203, 636.78656, 1715.6147, 3116.5818, 3008.2554, 662.9168, 3149.4016]
2026-01-23 00:09:36,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [486.0, 1000.0, 1000.0, 1000.0, 191.0, 555.0, 1000.0, 914.0, 259.0, 1000.0]
2026-01-23 00:09:36,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 4 minutes, 3 seconds)
2026-01-23 00:11:06,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:15,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2493.18506 ± 1239.941
2026-01-23 00:11:15,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3402.9854, 2968.54, 584.27295, 3302.7756, 3492.4084, 3512.9836, 411.54236, 3309.1204, 3079.08, 868.14026]
2026-01-23 00:11:15,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 157.0, 1000.0, 1000.0, 1000.0, 139.0, 1000.0, 1000.0, 259.0]
2026-01-23 00:11:15,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2493.19) for latency DatasetOffice
2026-01-23 00:11:15,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 1 minute, 5 seconds)
2026-01-23 00:12:59,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:10,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2685.52026 ± 960.875
2026-01-23 00:13:10,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [717.31506, 970.9371, 3220.3997, 3267.5742, 2452.812, 3423.7336, 3299.4592, 3375.531, 2896.2644, 3231.177]
2026-01-23 00:13:10,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [255.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 982.0, 1000.0]
2026-01-23 00:13:10,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (2685.52) for latency DatasetOffice
2026-01-23 00:13:10,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 29 seconds)
2026-01-23 00:14:42,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3175.27759 ± 480.919
2026-01-23 00:14:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3337.825, 3139.042, 3309.0503, 3406.3362, 3481.989, 3253.4316, 1768.1866, 3492.9856, 3211.1738, 3352.754]
2026-01-23 00:14:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 553.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:14:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3175.28) for latency DatasetOffice
2026-01-23 00:14:53,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 57 minutes, 49 seconds)
2026-01-23 00:16:30,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:39,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2452.56128 ± 1136.070
2026-01-23 00:16:39,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1451.9537, 1014.77124, 187.78255, 3727.5276, 3588.3494, 3639.9536, 2871.398, 2565.6475, 2863.5723, 2614.6572]
2026-01-23 00:16:39,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [488.0, 1000.0, 64.0, 1000.0, 1000.0, 1000.0, 1000.0, 712.0, 1000.0, 726.0]
2026-01-23 00:16:39,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 55 minutes, 51 seconds)
2026-01-23 00:18:20,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:27,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 1812.26794 ± 1006.831
2026-01-23 00:18:27,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2852.4375, 1283.3475, 3613.0586, 3228.4749, 1098.952, 1691.2157, 1015.1318, 709.55133, 1888.8193, 741.6905]
2026-01-23 00:18:27,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [764.0, 1000.0, 1000.0, 931.0, 316.0, 493.0, 279.0, 239.0, 515.0, 219.0]
2026-01-23 00:18:27,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 53 minutes, 13 seconds)
2026-01-23 00:20:04,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:14,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3440.60669 ± 814.722
2026-01-23 00:20:14,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3487.6458, 3860.3394, 3822.4043, 3558.9685, 3719.8997, 3795.381, 1037.9292, 3777.0571, 3898.3894, 3448.0525]
2026-01-23 00:20:14,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 283.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:20:14,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3440.61) for latency DatasetOffice
2026-01-23 00:20:14,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 53 minutes, 13 seconds)
2026-01-23 00:21:55,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:22:03,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2624.75220 ± 1131.799
2026-01-23 00:22:03,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3594.282, 2552.1992, 2819.2378, 3666.8684, 2703.7773, 850.9106, 2960.1833, 186.9314, 3729.09, 3184.0403]
2026-01-23 00:22:03,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 667.0, 1000.0, 1000.0, 713.0, 245.0, 1000.0, 57.0, 1000.0, 850.0]
2026-01-23 00:22:03,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 50 minutes, 19 seconds)
2026-01-23 00:23:40,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:49,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2660.75317 ± 1086.518
2026-01-23 00:23:49,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1815.7106, 3341.3838, 2511.3462, 3558.5486, 104.30209, 3556.556, 3496.509, 1803.7687, 3671.2188, 2748.1882]
2026-01-23 00:23:49,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [533.0, 1000.0, 1000.0, 1000.0, 53.0, 1000.0, 1000.0, 534.0, 1000.0, 1000.0]
2026-01-23 00:23:49,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 48 minutes, 59 seconds)
2026-01-23 00:25:26,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:36,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3264.94336 ± 951.755
2026-01-23 00:25:36,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1919.7228, 3055.1262, 3780.4893, 3832.975, 2601.108, 4049.9956, 1317.854, 4088.499, 4045.9001, 3957.7632]
2026-01-23 00:25:36,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [532.0, 1000.0, 1000.0, 1000.0, 720.0, 1000.0, 328.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:25:36,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 47 minutes, 23 seconds)
2026-01-23 00:27:14,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:27:23,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3052.69678 ± 1283.500
2026-01-23 00:27:23,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3881.2615, 3742.6064, 3917.8608, 3993.5889, 1628.6367, 3864.6106, 1191.3655, 3702.76, 573.48474, 4030.7935]
2026-01-23 00:27:23,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 468.0, 1000.0, 316.0, 1000.0, 170.0, 1000.0]
2026-01-23 00:27:23,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 45 minutes, 32 seconds)
2026-01-23 00:29:05,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:13,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2625.33667 ± 1338.194
2026-01-23 00:29:13,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [806.8084, 1819.4319, 3703.3274, 3461.2234, 4100.831, 2277.1038, 3808.264, 17.237825, 2394.0483, 3865.0923]
2026-01-23 00:29:13,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [224.0, 468.0, 1000.0, 890.0, 1000.0, 1000.0, 1000.0, 17.0, 599.0, 1000.0]
2026-01-23 00:29:13,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 44 minutes, 7 seconds)
2026-01-23 00:30:54,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:05,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3874.55737 ± 141.186
2026-01-23 00:31:05,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4177.7734, 3918.1147, 3771.3708, 3734.4246, 4015.9656, 3849.7456, 3964.9895, 3852.9824, 3776.0637, 3684.1448]
2026-01-23 00:31:05,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:31:05,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (3874.56) for latency DatasetOffice
2026-01-23 00:31:05,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 42 minutes, 53 seconds)
2026-01-23 00:32:46,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:55,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2901.96875 ± 1383.564
2026-01-23 00:32:55,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [123.13319, 3808.1936, 738.9368, 3605.2139, 1763.0299, 3533.5466, 3959.5562, 3970.5957, 3732.191, 3785.2893]
2026-01-23 00:32:55,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [46.0, 1000.0, 224.0, 1000.0, 644.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:32:55,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 41 minutes, 55 seconds)
2026-01-23 00:34:30,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:34:41,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3341.32495 ± 984.794
2026-01-23 00:34:41,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1965.9448, 2937.6619, 1085.4463, 4064.5159, 3944.948, 4130.101, 3516.8337, 3936.0203, 3929.1125, 3902.667]
2026-01-23 00:34:41,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [505.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 871.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:34:41,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 39 minutes, 55 seconds)
2026-01-23 00:36:21,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:30,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3272.19336 ± 1504.270
2026-01-23 00:36:30,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [388.26346, 3949.4824, 4040.6936, 3867.3047, 4340.618, 3739.7478, 4021.1382, 172.62392, 4048.8738, 4153.188]
2026-01-23 00:36:30,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [128.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 60.0, 960.0, 1000.0]
2026-01-23 00:36:30,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 38 minutes, 30 seconds)
2026-01-23 00:38:12,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:21,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2977.33398 ± 1434.145
2026-01-23 00:38:21,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2552.7998, 4019.033, 4032.0042, 3843.8247, 4311.7783, 10.21982, 3324.696, 3032.3992, 4037.2751, 609.3076]
2026-01-23 00:38:21,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 18.0, 861.0, 1000.0, 1000.0, 200.0]
2026-01-23 00:38:21,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 36 minutes, 50 seconds)
2026-01-23 00:39:57,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:08,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3498.30469 ± 1206.080
2026-01-23 00:40:08,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3890.4766, 4129.743, 4287.976, 3833.715, 4091.6953, 4110.478, 3994.8503, 304.9308, 2202.5022, 4136.6787]
2026-01-23 00:40:08,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:40:08,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 34 minutes, 12 seconds)
2026-01-23 00:41:41,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:52,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3642.62427 ± 1045.850
2026-01-23 00:41:52,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3964.3755, 4123.9756, 3712.305, 4033.4487, 4131.0625, 4106.73, 4209.743, 3770.846, 541.4644, 3832.288]
2026-01-23 00:41:52,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 182.0, 1000.0]
2026-01-23 00:41:52,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 31 minutes, 17 seconds)
2026-01-23 00:43:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:38,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2964.17920 ± 1403.489
2026-01-23 00:43:38,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4000.923, 2971.7107, 3824.1929, 4109.5386, 4099.798, 1291.3582, 4126.665, 3837.5503, 803.6249, 576.4322]
2026-01-23 00:43:38,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 685.0, 1000.0, 1000.0, 1000.0, 341.0, 1000.0, 1000.0, 257.0, 138.0]
2026-01-23 00:43:38,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 29 minutes, 28 seconds)
2026-01-23 00:45:17,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:28,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3723.09912 ± 1016.643
2026-01-23 00:45:28,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [688.2032, 3987.7942, 3841.242, 4067.0193, 4154.5303, 4114.8696, 3966.1104, 4210.1743, 4131.572, 4069.478]
2026-01-23 00:45:28,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [185.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:45:28,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 27 minutes, 43 seconds)
2026-01-23 00:47:13,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:20,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2822.02417 ± 1692.512
2026-01-23 00:47:20,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1000.58826, 4332.402, 4056.4243, 4156.853, 314.14954, 4161.891, 4191.803, 4017.2446, 1908.7603, 80.12689]
2026-01-23 00:47:20,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [242.0, 1000.0, 1000.0, 1000.0, 97.0, 1000.0, 1000.0, 1000.0, 481.0, 166.0]
2026-01-23 00:47:20,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 26 minutes, 19 seconds)
2026-01-23 00:48:58,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:07,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3302.59570 ± 1401.968
2026-01-23 00:49:07,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4013.1724, 4070.0022, 4150.5093, 2739.9905, 4209.38, 19.14301, 4100.5117, 1373.1614, 4032.2144, 4317.871]
2026-01-23 00:49:07,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 18.0, 1000.0, 322.0, 1000.0, 1000.0]
2026-01-23 00:49:07,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 24 minutes, 25 seconds)
2026-01-23 00:50:46,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:54,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3276.29370 ± 1458.133
2026-01-23 00:50:54,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1258.4587, 3894.1133, 4352.7373, 4140.6997, 4303.734, 2048.3423, 4117.797, 4326.993, 4164.3433, 155.71791]
2026-01-23 00:50:54,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [310.0, 1000.0, 1000.0, 1000.0, 1000.0, 559.0, 1000.0, 1000.0, 1000.0, 59.0]
2026-01-23 00:50:55,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 23 minutes, 14 seconds)
2026-01-23 00:52:33,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:43,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3088.62061 ± 1015.990
2026-01-23 00:52:43,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2501.3071, 4226.2734, 3324.6985, 4189.929, 2051.2974, 4110.399, 3859.54, 1466.6362, 1698.2899, 3457.8337]
2026-01-23 00:52:43,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [906.0, 1000.0, 1000.0, 1000.0, 526.0, 1000.0, 1000.0, 380.0, 567.0, 1000.0]
2026-01-23 00:52:43,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 21 minutes, 45 seconds)
2026-01-23 00:54:21,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:31,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3418.10229 ± 1515.020
2026-01-23 00:54:31,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [734.16876, 4046.5562, 4469.883, 4181.4336, 4093.147, 4232.4683, 4099.392, 4066.492, 87.03457, 4170.449]
2026-01-23 00:54:31,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [215.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 35.0, 1000.0]
2026-01-23 00:54:31,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 19 minutes, 40 seconds)
2026-01-23 00:56:04,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:12,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2956.51904 ± 1280.136
2026-01-23 00:56:12,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4131.085, 2332.621, 4305.7295, 1659.7991, 1737.4249, 4096.047, 2627.362, 4111.96, 552.7691, 4010.394]
2026-01-23 00:56:12,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 401.0, 435.0, 1000.0, 673.0, 1000.0, 157.0, 1000.0]
2026-01-23 00:56:12,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 16 minutes, 15 seconds)
2026-01-23 00:57:55,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:06,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4093.28125 ± 495.897
2026-01-23 00:58:06,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4363.1216, 4247.1147, 4175.896, 4331.5796, 4223.8306, 4090.5964, 4269.9766, 4372.014, 4232.9155, 2625.77]
2026-01-23 00:58:06,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:58:06,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (4093.28) for latency DatasetOffice
2026-01-23 00:58:06,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 15 minutes, 23 seconds)
2026-01-23 00:59:44,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:55,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3767.14453 ± 837.233
2026-01-23 00:59:55,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4066.668, 3957.9443, 4275.483, 1344.5377, 4082.2327, 4298.7314, 4220.0586, 3483.2368, 3920.7827, 4021.7705]
2026-01-23 00:59:55,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 339.0, 1000.0, 1000.0, 1000.0, 824.0, 977.0, 1000.0]
2026-01-23 00:59:55,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 13 minutes, 48 seconds)
2026-01-23 01:01:27,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:36,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3207.81519 ± 1329.808
2026-01-23 01:01:36,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2949.653, 4405.98, 4429.586, 1292.4762, 4392.2227, 3977.3499, 4284.369, 2813.7756, 3099.7212, 433.02026]
2026-01-23 01:01:36,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 299.0, 1000.0, 1000.0, 1000.0, 696.0, 910.0, 242.0]
2026-01-23 01:01:36,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 11 minutes, 4 seconds)
2026-01-23 01:03:17,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:26,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3355.62744 ± 1253.042
2026-01-23 01:03:26,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4093.4177, 4382.079, 2777.8792, 4344.459, 4198.6455, 3786.6777, 753.9338, 1381.063, 4463.361, 3374.7546]
2026-01-23 01:03:26,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 673.0, 1000.0, 1000.0, 1000.0, 194.0, 377.0, 1000.0, 772.0]
2026-01-23 01:03:26,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 9 minutes, 32 seconds)
2026-01-23 01:05:01,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:10,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3319.92578 ± 1442.503
2026-01-23 01:05:10,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4317.6416, 1848.4321, 4085.1445, 523.02026, 4340.053, 1131.5015, 4264.29, 4179.68, 4378.3115, 4131.1816]
2026-01-23 01:05:10,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 935.0, 130.0, 1000.0, 269.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:05:10,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 8 minutes, 6 seconds)
2026-01-23 01:06:50,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:01,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4098.51904 ± 621.073
2026-01-23 01:07:01,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4404.0396, 4241.961, 4465.9585, 4546.05, 4348.7095, 4430.854, 2930.2117, 2813.139, 4531.649, 4272.621]
2026-01-23 01:07:01,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 628.0, 1000.0, 1000.0]
2026-01-23 01:07:01,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (4098.52) for latency DatasetOffice
2026-01-23 01:07:01,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 6 minutes)
2026-01-23 01:08:41,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:51,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3607.27271 ± 1119.529
2026-01-23 01:08:51,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4305.0767, 4077.1443, 4015.17, 595.5079, 4393.311, 3341.9917, 4324.1807, 4071.0623, 2720.9316, 4228.3516]
2026-01-23 01:08:51,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 154.0, 1000.0, 752.0, 1000.0, 1000.0, 655.0, 1000.0]
2026-01-23 01:08:51,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 4 minutes, 20 seconds)
2026-01-23 01:10:29,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:38,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3567.52808 ± 1118.456
2026-01-23 01:10:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4199.9214, 4424.7026, 4208.2837, 4440.43, 3887.8098, 1320.0135, 4387.7466, 1765.6353, 2770.5986, 4270.1387]
2026-01-23 01:10:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 900.0, 310.0, 1000.0, 457.0, 682.0, 1000.0]
2026-01-23 01:10:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 3 minutes, 13 seconds)
2026-01-23 01:12:07,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:17,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3655.44653 ± 1048.710
2026-01-23 01:12:17,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4250.0283, 4236.0225, 4062.854, 1205.0865, 4445.3516, 4281.394, 3156.2905, 2278.4612, 4536.489, 4102.4883]
2026-01-23 01:12:17,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 711.0, 562.0, 1000.0, 1000.0]
2026-01-23 01:12:17,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 11 seconds)
2026-01-23 01:14:00,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:10,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3974.23242 ± 586.919
2026-01-23 01:14:10,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4231.0405, 4018.1467, 4513.058, 2406.7544, 4147.251, 4037.494, 4292.188, 4390.3535, 3470.9465, 4235.0923]
2026-01-23 01:14:10,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 540.0, 1000.0, 1000.0, 1000.0, 1000.0, 784.0, 1000.0]
2026-01-23 01:14:10,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 59 minutes, 26 seconds)
2026-01-23 01:15:43,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:51,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3129.27393 ± 1506.275
2026-01-23 01:15:51,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4080.513, 4441.878, 1836.9542, 4484.541, 827.2156, 4398.4336, 1444.5499, 4360.723, 1138.0264, 4279.9033]
2026-01-23 01:15:51,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 432.0, 1000.0, 244.0, 1000.0, 387.0, 1000.0, 270.0, 1000.0]
2026-01-23 01:15:51,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 56 minutes, 31 seconds)
2026-01-23 01:17:25,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:34,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3794.82080 ± 1144.294
2026-01-23 01:17:34,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4124.105, 4443.4937, 4611.277, 4535.9663, 4310.5977, 1440.6437, 1621.6101, 4045.42, 4361.1997, 4453.893]
2026-01-23 01:17:34,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 360.0, 368.0, 933.0, 1000.0, 1000.0]
2026-01-23 01:17:34,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 5 seconds)
2026-01-23 01:19:19,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:29,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4073.47656 ± 991.397
2026-01-23 01:19:29,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4409.578, 4371.525, 1106.9385, 4349.6006, 4482.9614, 4298.468, 4507.6875, 4379.319, 4320.7173, 4507.967]
2026-01-23 01:19:29,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 252.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:29,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 53 minutes, 6 seconds)
2026-01-23 01:21:05,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:13,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3139.00854 ± 1499.433
2026-01-23 01:21:13,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4254.2515, 3174.419, 2134.8113, 42.7101, 3826.8599, 4217.218, 4548.719, 3870.1174, 924.53735, 4396.444]
2026-01-23 01:21:13,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 726.0, 624.0, 25.0, 1000.0, 1000.0, 1000.0, 922.0, 239.0, 1000.0]
2026-01-23 01:21:13,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 51 minutes, 49 seconds)
2026-01-23 01:22:42,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:50,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3311.22583 ± 1310.608
2026-01-23 01:22:50,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4378.8667, 4300.112, 3098.5642, 522.4404, 4378.8247, 2351.966, 1762.3082, 3348.3054, 4388.914, 4581.9575]
2026-01-23 01:22:50,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 735.0, 136.0, 1000.0, 511.0, 378.0, 771.0, 1000.0, 1000.0]
2026-01-23 01:22:50,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 32 seconds)
2026-01-23 01:24:30,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:38,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3156.34131 ± 1571.891
2026-01-23 01:24:38,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4408.4, 1001.4305, 4538.1772, 4195.1016, 2546.5728, 956.1799, 4429.767, 4510.2583, 763.9326, 4213.5933]
2026-01-23 01:24:38,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 238.0, 1000.0, 1000.0, 684.0, 235.0, 1000.0, 1000.0, 197.0, 1000.0]
2026-01-23 01:24:38,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 27 seconds)
2026-01-23 01:26:16,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:26,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3305.92896 ± 1597.050
2026-01-23 01:26:26,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4507.002, 2023.6508, 927.39526, 4336.609, 78.436264, 3482.1372, 4657.725, 4020.7346, 4510.675, 4514.9243]
2026-01-23 01:26:26,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 429.0, 1000.0, 1000.0, 101.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:26:26,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 46 minutes, 2 seconds)
2026-01-23 01:28:02,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:12,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3808.89526 ± 1140.986
2026-01-23 01:28:12,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4410.89, 2386.2493, 4486.6255, 4670.2188, 3237.085, 4274.8975, 4514.7056, 4603.761, 1090.3477, 4414.173]
2026-01-23 01:28:12,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 706.0, 1000.0, 1000.0, 1000.0, 259.0, 1000.0]
2026-01-23 01:28:12,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 34 seconds)
2026-01-23 01:29:49,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:59,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3631.28198 ± 1121.964
2026-01-23 01:29:59,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4233.633, 867.73486, 4379.0303, 4012.631, 4725.514, 2884.683, 3867.0154, 4403.1045, 2625.5693, 4313.9033]
2026-01-23 01:29:59,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 208.0, 1000.0, 857.0, 985.0, 716.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:29:59,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 5 seconds)
2026-01-23 01:31:38,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:47,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3009.61621 ± 1376.321
2026-01-23 01:31:47,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3132.1792, 4910.9, 4563.6304, 4316.615, 4497.8853, 1584.1606, 1573.9053, 2191.5095, 2202.8835, 1122.4935]
2026-01-23 01:31:47,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 339.0, 506.0, 1000.0, 245.0]
2026-01-23 01:31:47,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 9 seconds)
2026-01-23 01:33:23,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:32,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3694.37427 ± 1517.021
2026-01-23 01:33:32,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4431.8135, 4310.851, 1379.6046, 4352.2134, 4455.314, 4467.862, 4538.79, 61.679264, 4469.6196, 4475.9976]
2026-01-23 01:33:32,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 314.0, 1000.0, 1000.0, 1000.0, 1000.0, 33.0, 1000.0, 1000.0]
2026-01-23 01:33:32,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 8 seconds)
2026-01-23 01:35:02,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:09,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2455.01099 ± 1714.540
2026-01-23 01:35:09,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4215.1587, 4389.815, 1190.5172, 2712.2769, 4154.4814, 2668.79, 314.24863, 44.67119, 485.59424, 4374.5566]
2026-01-23 01:35:09,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 265.0, 1000.0, 1000.0, 628.0, 84.0, 30.0, 122.0, 1000.0]
2026-01-23 01:35:09,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 39 seconds)
2026-01-23 01:36:46,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:54,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3523.81885 ± 1411.054
2026-01-23 01:36:54,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [685.8815, 2070.3765, 4454.202, 4376.168, 4510.0757, 1536.3347, 4713.1514, 4407.2114, 4272.6953, 4212.092]
2026-01-23 01:36:54,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [195.0, 495.0, 1000.0, 1000.0, 1000.0, 355.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:36:54,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 50 seconds)
2026-01-23 01:38:30,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:41,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4461.11719 ± 142.024
2026-01-23 01:38:41,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4675.9355, 4663.618, 4334.432, 4360.2593, 4283.227, 4310.905, 4580.197, 4392.636, 4429.4595, 4580.498]
2026-01-23 01:38:41,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:38:41,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (4461.12) for latency DatasetOffice
2026-01-23 01:38:41,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 3 seconds)
2026-01-23 01:40:16,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:27,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3681.34448 ± 1070.252
2026-01-23 01:40:27,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4602.3535, 4407.9688, 4570.197, 3476.4639, 3904.0337, 3042.5479, 3530.0178, 831.49335, 4058.9995, 4389.3706]
2026-01-23 01:40:27,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 762.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:40:27,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 9 seconds)
2026-01-23 01:42:13,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3121.87842 ± 1816.251
2026-01-23 01:42:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [3365.2458, 114.29049, 4661.59, 4280.513, 1089.2605, 4431.876, 60.280434, 4468.649, 4402.029, 4345.049]
2026-01-23 01:42:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 46.0, 1000.0, 1000.0, 1000.0, 1000.0, 34.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:42:22,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes)
2026-01-23 01:43:55,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:04,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3432.13623 ± 1420.523
2026-01-23 01:44:04,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4267.635, 4340.2524, 4321.7314, 4482.1636, 4355.676, 4497.12, 2039.7991, 457.2691, 4032.0007, 1527.7184]
2026-01-23 01:44:04,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 108.0, 1000.0, 388.0]
2026-01-23 01:44:04,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 32 seconds)
2026-01-23 01:45:37,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:47,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3709.00513 ± 1267.373
2026-01-23 01:45:47,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2907.485, 4611.637, 3750.1638, 4308.9536, 4610.562, 4682.8267, 4397.7295, 2793.7122, 474.88943, 4552.093]
2026-01-23 01:45:47,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 830.0, 1000.0, 1000.0, 1000.0, 1000.0, 624.0, 119.0, 1000.0]
2026-01-23 01:45:47,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 36 seconds)
2026-01-23 01:47:21,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:32,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4297.73145 ± 109.880
2026-01-23 01:47:32,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4399.231, 4263.731, 4092.9775, 4485.5405, 4341.0117, 4214.2534, 4318.254, 4400.5635, 4271.526, 4190.225]
2026-01-23 01:47:32,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:47:32,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 46 seconds)
2026-01-23 01:49:14,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:22,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2829.38843 ± 1694.637
2026-01-23 01:49:22,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4300.0854, 633.5362, 1484.428, 4532.027, 4410.8647, 4736.2144, 2277.9534, 841.76666, 4336.3296, 740.6802]
2026-01-23 01:49:22,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 154.0, 365.0, 1000.0, 1000.0, 1000.0, 523.0, 214.0, 1000.0, 1000.0]
2026-01-23 01:49:22,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 12 seconds)
2026-01-23 01:51:02,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:11,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3916.33911 ± 962.549
2026-01-23 01:51:11,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4646.4355, 4713.9473, 3391.6348, 4543.6313, 4274.0947, 4521.1704, 2822.0095, 4136.8193, 1606.0552, 4507.5938]
2026-01-23 01:51:11,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 625.0, 1000.0, 366.0, 1000.0]
2026-01-23 01:51:11,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 11 seconds)
2026-01-23 01:52:42,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:50,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 2862.91064 ± 1858.778
2026-01-23 01:52:50,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [1150.3291, 4516.494, 4711.3086, 4627.5723, 105.1374, 2944.5464, 4552.221, 370.74783, 4509.6025, 1141.1447]
2026-01-23 01:52:50,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [266.0, 1000.0, 1000.0, 1000.0, 48.0, 676.0, 1000.0, 95.0, 980.0, 1000.0]
2026-01-23 01:52:50,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 16 seconds)
2026-01-23 01:54:33,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:42,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3217.45557 ± 1837.675
2026-01-23 01:54:42,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [2319.1213, 47.10631, 367.01413, 4502.4976, 4351.525, 1634.1967, 4844.048, 4892.084, 4716.0303, 4500.9287]
2026-01-23 01:54:42,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [528.0, 31.0, 91.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:55:24,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 14 seconds)
2026-01-23 01:56:55,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:03,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3385.03833 ± 1454.906
2026-01-23 01:57:03,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4131.839, 4288.2183, 4313.76, 4271.9175, 844.79987, 4493.301, 761.52966, 4305.126, 4377.546, 2062.3494]
2026-01-23 01:57:03,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 190.0, 1000.0, 205.0, 1000.0, 1000.0, 470.0]
2026-01-23 01:57:03,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 8 seconds)
2026-01-23 01:58:39,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:49,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3667.46606 ± 1331.472
2026-01-23 01:58:49,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4432.024, 2965.9443, 4384.096, 2083.9146, 4564.4204, 4333.239, 4533.62, 467.48328, 4179.178, 4730.739]
2026-01-23 01:58:49,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 437.0, 984.0, 1000.0, 1000.0, 115.0, 1000.0, 1000.0]
2026-01-23 01:58:49,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 6 seconds)
2026-01-23 02:00:31,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3801.44141 ± 1268.311
2026-01-23 02:00:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4630.668, 117.92479, 3800.734, 4447.4453, 4472.71, 4268.9414, 4652.5044, 3835.7847, 3869.612, 3918.088]
2026-01-23 02:00:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 217.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 879.0, 1000.0, 1000.0]
2026-01-23 02:00:41,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 18 seconds)
2026-01-23 02:02:12,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:23,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4480.32031 ± 141.970
2026-01-23 02:02:23,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4588.672, 4371.3286, 4667.558, 4726.8867, 4413.762, 4403.8447, 4442.5713, 4574.425, 4334.444, 4279.711]
2026-01-23 02:02:23,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:02:23,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (4480.32) for latency DatasetOffice
2026-01-23 02:02:23,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 28 seconds)
2026-01-23 02:04:06,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:14,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3326.23877 ± 1726.346
2026-01-23 02:04:14,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4573.9785, 4358.5806, 1558.7552, 4610.168, 4562.6724, 467.59756, 277.06702, 4553.9097, 3627.416, 4672.2407]
2026-01-23 02:04:14,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 330.0, 1000.0, 1000.0, 116.0, 76.0, 1000.0, 797.0, 1000.0]
2026-01-23 02:04:14,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 50 seconds)
2026-01-23 02:05:47,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:56,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3506.51245 ± 1405.091
2026-01-23 02:05:56,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4570.2124, 1195.2462, 1253.5293, 4462.8716, 4765.434, 1978.9486, 3194.758, 4619.262, 4486.4565, 4538.407]
2026-01-23 02:05:56,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 278.0, 278.0, 1000.0, 1000.0, 469.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:05:56,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 6 seconds)
2026-01-23 02:07:35,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:45,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4587.31201 ± 203.395
2026-01-23 02:07:45,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4806.6763, 4699.042, 4683.263, 4776.9014, 4666.1807, 4472.7114, 4577.869, 4084.9543, 4682.6577, 4422.8677]
2026-01-23 02:07:45,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 889.0, 1000.0, 1000.0]
2026-01-23 02:07:45,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1274 [INFO]: New best (4587.31) for latency DatasetOffice
2026-01-23 02:07:45,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 21 seconds)
2026-01-23 02:09:22,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:09:33,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4305.26221 ± 316.420
2026-01-23 02:09:33,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4434.8457, 4254.0757, 4544.954, 4377.3477, 4236.958, 4514.5684, 4076.117, 3484.223, 4496.275, 4633.259]
2026-01-23 02:09:33,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:09:33,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 32 seconds)
2026-01-23 02:11:08,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:17,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 4235.49512 ± 817.961
2026-01-23 02:11:17,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [4623.1235, 3396.6384, 4600.6206, 4803.741, 4404.4946, 2066.9272, 4451.5063, 4617.353, 4825.545, 4564.995]
2026-01-23 02:11:17,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 746.0, 1000.0, 1000.0, 1000.0, 453.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:11:17,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 46 seconds)
2026-01-23 02:12:51,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:59,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1269 [DEBUG]: Total Reward: 3329.75781 ± 1555.984
2026-01-23 02:12:59,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1270 [DEBUG]: All rewards: [667.0576, 4503.0986, 2208.7512, 4707.904, 4722.7827, 4240.1387, 1091.328, 4504.8765, 4634.9062, 2016.7357]
2026-01-23 02:12:59,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1271 [DEBUG]: All trajectory lengths: [158.0, 1000.0, 517.0, 1000.0, 1000.0, 1000.0, 260.0, 1000.0, 1000.0, 497.0]
2026-01-23 02:12:59,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1299 [DEBUG]: Training session finished
