2026-01-23 01:49:46,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mda-highdim-mem1  
2026-01-23 01:49:46,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mda-highdim-mem1  
2026-01-23 01:49:46,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14c87c35ecd0>}
2026-01-23 01:49:46,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-23 01:49:46,987 baseline-bpql-mda-noisy-ant:91 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-23 01:49:46,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-23 01:49:47,004 baseline-bpql-mda-noisy-ant:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:49:47,004 baseline-bpql-mda-noisy-ant:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:49:47,012 baseline-bpql-mda-noisy-ant:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(8, 512, batch_first=True)
)
2026-01-23 01:49:47,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-23 01:49:47,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-23 01:53:59,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:14,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 747.65833 ± 13.757
2026-01-23 01:54:14,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [752.10956, 757.5436, 755.48785, 749.33136, 747.28284, 729.10706, 753.2325, 760.73773, 714.81946, 756.93225]
2026-01-23 01:54:14,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:54:14,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (747.66) for latency DatasetOffice
2026-01-23 01:54:14,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 19 minutes, 57 seconds)
2026-01-23 01:58:23,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:38,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 801.75525 ± 8.964
2026-01-23 01:58:38,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [810.9394, 811.9731, 799.6204, 802.38544, 807.80975, 788.766, 811.9908, 786.70667, 793.47284, 803.8889]
2026-01-23 01:58:38,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:58:38,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (801.76) for latency DatasetOffice
2026-01-23 01:58:38,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 13 minutes, 37 seconds)
2026-01-23 02:02:48,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:03,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 567.91724 ± 382.329
2026-01-23 02:03:03,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [734.84894, 578.26324, 699.8155, 743.3583, -561.2446, 722.18915, 747.49774, 728.6861, 549.44257, 736.3154]
2026-01-23 02:03:03,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:03:03,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 8 minutes, 46 seconds)
2026-01-23 02:07:17,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:32,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 807.91687 ± 4.435
2026-01-23 02:07:32,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [811.2869, 811.1005, 810.9886, 812.1661, 799.81903, 806.1869, 799.4838, 808.36237, 809.99664, 809.7781]
2026-01-23 02:07:32,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:07:32,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (807.92) for latency DatasetOffice
2026-01-23 02:07:32,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 5 minutes, 49 seconds)
2026-01-23 02:11:24,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:34,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 360.60455 ± 232.416
2026-01-23 02:11:34,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [505.41043, 601.133, 519.8106, 519.8679, 44.38619, 434.35294, -3.532025, 463.4122, -5.8923316, 527.0966]
2026-01-23 02:11:34,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 107.0, 1000.0, 10.0, 1000.0, 11.0, 1000.0]
2026-01-23 02:11:34,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 53 minutes, 50 seconds)
2026-01-23 02:15:41,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:56,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 847.19366 ± 3.126
2026-01-23 02:15:56,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [843.72723, 847.3034, 848.37946, 852.0998, 846.38495, 841.92804, 850.32874, 851.061, 844.9562, 845.7682]
2026-01-23 02:15:56,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:15:56,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (847.19) for latency DatasetOffice
2026-01-23 02:15:56,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 48 minutes, 2 seconds)
2026-01-23 02:20:03,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:18,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 828.91602 ± 10.060
2026-01-23 02:20:18,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [827.2829, 826.7648, 829.52277, 856.82916, 822.32245, 826.8757, 828.2892, 826.8893, 828.6248, 815.75946]
2026-01-23 02:20:18,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:20:18,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 42 minutes, 59 seconds)
2026-01-23 02:24:26,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:41,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 847.23761 ± 18.230
2026-01-23 02:24:41,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [858.0751, 800.1726, 854.0064, 857.7201, 825.5482, 860.32404, 854.72687, 855.13824, 852.81555, 853.84937]
2026-01-23 02:24:41,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:24:41,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (847.24) for latency DatasetOffice
2026-01-23 02:24:41,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 37 minutes, 58 seconds)
2026-01-23 02:28:48,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:03,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 829.38269 ± 4.980
2026-01-23 02:29:03,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [833.6874, 834.8383, 835.1887, 818.62335, 825.0176, 825.3797, 833.12494, 829.92285, 828.4586, 829.58594]
2026-01-23 02:29:03,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:29:03,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 31 minutes, 41 seconds)
2026-01-23 02:33:11,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:26,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 830.48126 ± 3.120
2026-01-23 02:33:26,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [831.6282, 827.7869, 828.5296, 829.0198, 836.4744, 825.2338, 829.8639, 833.3273, 833.58746, 829.3609]
2026-01-23 02:33:26,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:26,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 33 minutes, 24 seconds)
2026-01-23 02:37:33,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:48,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 852.65479 ± 3.451
2026-01-23 02:37:48,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [849.2036, 852.112, 858.5166, 856.986, 847.36035, 851.18176, 849.49805, 856.04333, 853.8383, 851.8072]
2026-01-23 02:37:48,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:37:48,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (852.65) for latency DatasetOffice
2026-01-23 02:37:48,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 29 minutes, 10 seconds)
2026-01-23 02:41:55,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:10,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 854.16907 ± 3.471
2026-01-23 02:42:10,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [847.4668, 853.7274, 857.5853, 860.0329, 852.7011, 850.0554, 853.43896, 853.94885, 856.4359, 856.2974]
2026-01-23 02:42:10,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:42:10,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (854.17) for latency DatasetOffice
2026-01-23 02:42:10,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 24 minutes, 52 seconds)
2026-01-23 02:46:17,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:32,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 852.63416 ± 6.131
2026-01-23 02:46:32,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [857.0393, 852.64075, 852.25464, 852.375, 844.3517, 858.9192, 848.45807, 841.2436, 857.8174, 861.2426]
2026-01-23 02:46:32,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:46:32,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 20 minutes, 22 seconds)
2026-01-23 02:50:40,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:55,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 817.60321 ± 10.305
2026-01-23 02:50:55,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [829.388, 798.6445, 821.92865, 818.23236, 818.94464, 804.8843, 830.233, 828.91595, 816.89386, 807.9673]
2026-01-23 02:50:55,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:50:55,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 16 minutes)
2026-01-23 02:55:02,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 855.11768 ± 7.791
2026-01-23 02:55:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [840.6276, 856.71027, 860.9133, 840.76447, 866.18896, 857.4801, 858.7011, 854.7162, 856.52637, 858.5482]
2026-01-23 02:55:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:55:17,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (855.12) for latency DatasetOffice
2026-01-23 02:55:17,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 11 minutes, 34 seconds)
2026-01-23 02:59:24,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:39,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 840.10468 ± 8.689
2026-01-23 02:59:39,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [837.39105, 840.83246, 846.0899, 838.22284, 848.41595, 846.64777, 840.2223, 820.9881, 851.8042, 830.43243]
2026-01-23 02:59:39,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:59:39,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 7 minutes, 6 seconds)
2026-01-23 03:03:47,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:02,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 879.00830 ± 7.471
2026-01-23 03:04:02,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [875.9909, 879.4827, 867.6425, 891.2986, 868.1389, 887.46387, 874.4067, 886.4758, 878.7736, 880.4089]
2026-01-23 03:04:02,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:04:02,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (879.01) for latency DatasetOffice
2026-01-23 03:04:02,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 2 minutes, 58 seconds)
2026-01-23 03:08:09,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:24,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 895.06219 ± 8.576
2026-01-23 03:08:24,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [897.3664, 890.31085, 877.263, 887.4486, 890.5308, 905.5187, 898.4848, 907.86035, 900.2701, 895.56793]
2026-01-23 03:08:24,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:08:24,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (895.06) for latency DatasetOffice
2026-01-23 03:08:24,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 58 minutes, 36 seconds)
2026-01-23 03:12:31,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:47,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 894.25372 ± 14.238
2026-01-23 03:12:47,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [892.66187, 919.2918, 914.029, 873.0866, 896.7781, 897.9876, 882.0275, 887.24255, 877.4968, 901.936]
2026-01-23 03:12:47,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:12:47,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 54 minutes, 9 seconds)
2026-01-23 03:16:54,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:17:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 989.28973 ± 29.347
2026-01-23 03:17:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1006.913, 1012.0612, 1020.1312, 973.70306, 991.60547, 1025.3029, 973.6202, 991.0621, 918.98737, 979.51074]
2026-01-23 03:17:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:17:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (989.29) for latency DatasetOffice
2026-01-23 03:17:09,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 49 minutes, 46 seconds)
2026-01-23 03:21:16,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:21:30,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 993.06866 ± 104.353
2026-01-23 03:21:30,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [841.12787, 1075.4948, 1038.3687, 895.41656, 1084.6863, 1015.93634, 786.7347, 1068.9667, 1043.5422, 1080.4127]
2026-01-23 03:21:30,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:21:30,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (993.07) for latency DatasetOffice
2026-01-23 03:21:30,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 45 minutes, 16 seconds)
2026-01-23 03:25:37,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:25:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1132.35730 ± 141.802
2026-01-23 03:25:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1152.1849, 1095.9712, 1084.3217, 758.80426, 1217.7605, 1249.495, 1066.1755, 1202.0009, 1235.778, 1261.0807]
2026-01-23 03:25:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:25:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1132.36) for latency DatasetOffice
2026-01-23 03:25:52,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 40 minutes, 34 seconds)
2026-01-23 03:29:59,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:14,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1053.41431 ± 51.562
2026-01-23 03:30:14,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1160.8196, 1085.3044, 1084.9808, 1058.8387, 1059.0051, 1000.21796, 1058.282, 972.91, 993.4596, 1060.3256]
2026-01-23 03:30:14,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:30:14,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 36 minutes, 11 seconds)
2026-01-23 03:34:21,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:36,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1115.79871 ± 106.928
2026-01-23 03:34:36,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1242.7449, 1169.3357, 1070.925, 1190.2001, 1156.1771, 1183.8401, 844.86414, 1168.6904, 1077.9277, 1053.2823]
2026-01-23 03:34:36,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:34:36,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 31 minutes, 42 seconds)
2026-01-23 03:38:43,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:58,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1076.75610 ± 109.635
2026-01-23 03:38:58,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1085.3525, 1310.6151, 1012.08044, 1073.9249, 940.3742, 1023.139, 1096.5718, 1048.9847, 949.80426, 1226.7134]
2026-01-23 03:38:58,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:38:58,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 27 minutes, 13 seconds)
2026-01-23 03:43:05,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:43:19,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1328.46118 ± 37.617
2026-01-23 03:43:19,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1338.9985, 1330.5437, 1262.4877, 1360.2888, 1349.2731, 1351.664, 1324.0645, 1340.9156, 1253.7882, 1372.5878]
2026-01-23 03:43:19,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:43:19,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1328.46) for latency DatasetOffice
2026-01-23 03:43:19,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 22 minutes, 54 seconds)
2026-01-23 03:47:26,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:47:41,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1353.81152 ± 47.474
2026-01-23 03:47:41,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1378.203, 1353.988, 1232.2456, 1343.4648, 1392.2827, 1364.6263, 1363.1257, 1316.3878, 1385.4763, 1408.3148]
2026-01-23 03:47:41,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:47:41,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1353.81) for latency DatasetOffice
2026-01-23 03:47:41,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 18 minutes, 30 seconds)
2026-01-23 03:51:48,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:52:03,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1309.47375 ± 63.165
2026-01-23 03:52:03,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1328.3959, 1302.854, 1295.7858, 1337.5381, 1315.19, 1366.6674, 1310.7029, 1385.0251, 1137.7766, 1314.802]
2026-01-23 03:52:03,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:52:03,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 14 minutes, 8 seconds)
2026-01-23 03:56:10,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:56:25,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1387.79053 ± 95.342
2026-01-23 03:56:25,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1406.281, 1266.238, 1426.9977, 1471.4288, 1423.097, 1508.5936, 1316.5933, 1439.3478, 1435.1157, 1184.212]
2026-01-23 03:56:25,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:56:25,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1387.79) for latency DatasetOffice
2026-01-23 03:56:25,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 9 minutes, 51 seconds)
2026-01-23 04:00:32,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:00:47,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1275.97546 ± 48.121
2026-01-23 04:00:47,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1233.8018, 1285.1022, 1280.291, 1344.2017, 1224.1576, 1180.8599, 1308.1555, 1272.9376, 1295.3887, 1334.8582]
2026-01-23 04:00:47,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:00:47,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 5 minutes, 28 seconds)
2026-01-23 04:04:53,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:05:08,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1353.77869 ± 43.086
2026-01-23 04:05:08,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1383.94, 1381.0643, 1326.8693, 1375.129, 1383.6348, 1347.5569, 1376.734, 1390.74, 1242.8473, 1329.2706]
2026-01-23 04:05:08,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:05:08,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 55 seconds)
2026-01-23 04:09:30,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:09:44,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1408.43042 ± 66.936
2026-01-23 04:09:44,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1484.1516, 1330.6823, 1247.6417, 1457.207, 1442.0807, 1450.3586, 1411.6449, 1451.1191, 1400.1014, 1409.3163]
2026-01-23 04:09:44,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:09:44,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1408.43) for latency DatasetOffice
2026-01-23 04:09:44,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 59 minutes, 56 seconds)
2026-01-23 04:13:26,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:13:41,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1422.57703 ± 28.718
2026-01-23 04:13:41,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1410.7675, 1394.555, 1419.3732, 1387.0817, 1422.0533, 1455.0342, 1473.9655, 1460.3807, 1398.0101, 1404.5498]
2026-01-23 04:13:41,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:13:41,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1422.58) for latency DatasetOffice
2026-01-23 04:13:41,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 49 minutes, 52 seconds)
2026-01-23 04:17:46,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:18:01,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1444.59375 ± 66.216
2026-01-23 04:18:01,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1409.6993, 1462.6827, 1444.231, 1451.2568, 1420.5957, 1476.5409, 1483.5265, 1524.7441, 1500.1954, 1272.4648]
2026-01-23 04:18:01,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:18:01,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1444.59) for latency DatasetOffice
2026-01-23 04:18:01,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 45 minutes, 7 seconds)
2026-01-23 04:22:06,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:22:20,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1506.06201 ± 30.490
2026-01-23 04:22:20,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1518.0007, 1455.609, 1507.2754, 1455.6774, 1475.7015, 1543.6897, 1525.4688, 1521.5574, 1535.0624, 1522.5778]
2026-01-23 04:22:20,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:22:20,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1506.06) for latency DatasetOffice
2026-01-23 04:22:20,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 40 minutes, 11 seconds)
2026-01-23 04:26:33,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:26:47,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1568.17346 ± 44.251
2026-01-23 04:26:47,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1595.2429, 1550.7122, 1537.6025, 1598.1676, 1563.694, 1592.66, 1606.7157, 1603.5618, 1453.2443, 1580.1343]
2026-01-23 04:26:47,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:26:47,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1568.17) for latency DatasetOffice
2026-01-23 04:26:48,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 37 minutes, 15 seconds)
2026-01-23 04:30:52,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:31:06,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1530.44592 ± 75.732
2026-01-23 04:31:06,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1549.2611, 1493.9908, 1430.6622, 1369.6493, 1561.4303, 1630.4539, 1517.9359, 1585.743, 1593.3611, 1571.9702]
2026-01-23 04:31:06,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:31:06,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 29 minutes, 9 seconds)
2026-01-23 04:35:10,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:35:24,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1544.62585 ± 118.602
2026-01-23 04:35:24,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1556.4208, 1547.4563, 1199.6667, 1588.4398, 1602.9636, 1627.046, 1528.5013, 1603.1019, 1582.5923, 1610.0696]
2026-01-23 04:35:24,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:35:24,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 29 minutes, 20 seconds)
2026-01-23 04:39:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:39:42,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1570.32788 ± 79.631
2026-01-23 04:39:42,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1589.1038, 1557.254, 1643.1945, 1572.1453, 1613.1506, 1594.9058, 1635.3757, 1584.5879, 1568.3978, 1345.1643]
2026-01-23 04:39:42,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:39:42,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1570.33) for latency DatasetOffice
2026-01-23 04:39:42,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 24 minutes, 32 seconds)
2026-01-23 04:43:37,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:43:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1605.33081 ± 32.355
2026-01-23 04:43:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1581.127, 1537.4178, 1596.8776, 1613.0508, 1579.4192, 1618.5232, 1662.0276, 1608.0629, 1630.3394, 1626.4618]
2026-01-23 04:43:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:43:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1605.33) for latency DatasetOffice
2026-01-23 04:43:52,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 18 minutes, 18 seconds)
2026-01-23 04:48:17,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:48:31,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1641.11938 ± 52.991
2026-01-23 04:48:31,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1633.7445, 1530.3436, 1714.6582, 1672.3014, 1678.2803, 1687.9148, 1642.8131, 1587.3904, 1594.842, 1668.9061]
2026-01-23 04:48:31,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:48:31,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1641.12) for latency DatasetOffice
2026-01-23 04:48:31,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 16 minutes, 24 seconds)
2026-01-23 04:52:35,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:52:49,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1573.34888 ± 78.552
2026-01-23 04:52:49,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1580.6885, 1488.8987, 1376.9417, 1602.0237, 1613.2384, 1647.1145, 1560.2615, 1633.3123, 1595.9305, 1635.079]
2026-01-23 04:52:49,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:52:49,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 11 minutes, 58 seconds)
2026-01-23 04:56:45,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:57:00,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1604.25537 ± 33.419
2026-01-23 04:57:00,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1609.7206, 1609.4215, 1577.1521, 1641.1718, 1604.3969, 1656.5286, 1561.4674, 1629.8556, 1542.91, 1609.9294]
2026-01-23 04:57:00,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:57:00,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 6 minutes, 8 seconds)
2026-01-23 05:00:49,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:01:04,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1632.25964 ± 68.567
2026-01-23 05:01:04,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1622.1487, 1643.7588, 1632.8562, 1629.7782, 1697.0183, 1658.0054, 1663.0012, 1685.2317, 1652.8147, 1437.9838]
2026-01-23 05:01:04,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:01:04,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 59 minutes, 12 seconds)
2026-01-23 05:05:23,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:05:38,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1630.88501 ± 22.522
2026-01-23 05:05:38,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1625.2412, 1601.8113, 1599.0853, 1638.4465, 1605.1257, 1657.5149, 1643.7263, 1622.3684, 1663.7567, 1651.7743]
2026-01-23 05:05:38,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:05:38,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 59 minutes, 30 seconds)
2026-01-23 05:09:42,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:09:56,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1623.37622 ± 35.140
2026-01-23 05:09:56,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1662.1904, 1571.6025, 1590.0598, 1651.4415, 1653.0122, 1657.4025, 1640.5215, 1600.1625, 1567.4958, 1639.8744]
2026-01-23 05:09:56,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:09:56,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 51 minutes, 17 seconds)
2026-01-23 05:14:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:14:14,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1617.29504 ± 109.690
2026-01-23 05:14:14,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1624.2692, 1588.6896, 1313.071, 1698.5631, 1665.9767, 1677.4818, 1666.0386, 1677.9686, 1693.2964, 1567.5946]
2026-01-23 05:14:14,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:14:14,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 46 minutes, 59 seconds)
2026-01-23 05:18:18,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:18:33,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1656.24182 ± 29.474
2026-01-23 05:18:33,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1653.2579, 1586.3633, 1631.1351, 1686.8207, 1668.179, 1688.4707, 1682.6853, 1641.1133, 1657.2207, 1667.1715]
2026-01-23 05:18:33,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:18:33,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1656.24) for latency DatasetOffice
2026-01-23 05:18:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 44 minutes, 4 seconds)
2026-01-23 05:22:36,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:22:51,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1602.34937 ± 102.740
2026-01-23 05:22:51,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1647.6487, 1588.6427, 1622.3341, 1629.1937, 1662.9851, 1626.002, 1633.7701, 1649.4532, 1662.917, 1300.5482]
2026-01-23 05:22:51,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:22:51,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 42 minutes, 14 seconds)
2026-01-23 05:26:55,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:27:09,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1643.34888 ± 23.148
2026-01-23 05:27:09,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1614.0872, 1629.8821, 1612.004, 1658.3087, 1632.8384, 1670.2753, 1622.9211, 1676.3289, 1645.78, 1671.0609]
2026-01-23 05:27:09,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:27:09,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 35 minutes, 10 seconds)
2026-01-23 05:31:12,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:31:26,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1687.84058 ± 35.714
2026-01-23 05:31:26,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1697.6777, 1681.3824, 1656.1626, 1719.4777, 1699.6014, 1714.0303, 1751.7601, 1630.3931, 1638.8491, 1689.0717]
2026-01-23 05:31:26,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:31:26,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1687.84) for latency DatasetOffice
2026-01-23 05:31:26,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 30 minutes, 44 seconds)
2026-01-23 05:35:13,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:35:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1555.88013 ± 149.711
2026-01-23 05:35:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1603.2843, 1701.9342, 1462.7566, 1432.1848, 1649.1171, 1583.5026, 1655.9236, 1668.3169, 1181.308, 1620.4724]
2026-01-23 05:35:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:35:27,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 23 minutes, 42 seconds)
2026-01-23 05:39:28,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:39:43,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1661.61584 ± 29.333
2026-01-23 05:39:43,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1664.4563, 1669.5194, 1666.0355, 1613.4159, 1649.7919, 1709.8746, 1706.099, 1621.1902, 1651.5023, 1664.2728]
2026-01-23 05:39:43,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:39:43,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 19 minutes)
2026-01-23 05:43:44,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:43:58,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1677.21387 ± 111.570
2026-01-23 05:43:58,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1703.6932, 1730.3586, 1659.9498, 1663.8997, 1709.028, 1765.5656, 1680.9425, 1743.2704, 1756.112, 1359.3181]
2026-01-23 05:43:58,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:43:58,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 14 minutes, 18 seconds)
2026-01-23 05:47:59,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:48:13,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1704.72205 ± 22.994
2026-01-23 05:48:13,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1705.5525, 1670.8906, 1685.7711, 1679.7834, 1705.8325, 1710.494, 1730.4481, 1699.713, 1754.112, 1704.6216]
2026-01-23 05:48:13,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:48:13,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1704.72) for latency DatasetOffice
2026-01-23 05:48:13,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 9 minutes, 37 seconds)
2026-01-23 05:52:14,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:52:28,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1689.31250 ± 46.658
2026-01-23 05:52:28,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1626.9102, 1662.5823, 1673.9836, 1727.1716, 1730.3584, 1720.5374, 1743.9423, 1744.0498, 1642.667, 1620.923]
2026-01-23 05:52:28,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:52:28,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 5 minutes, 5 seconds)
2026-01-23 05:56:29,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:56:44,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1654.64392 ± 102.009
2026-01-23 05:56:44,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1646.4885, 1683.987, 1508.8724, 1419.7954, 1681.7953, 1741.6956, 1724.2919, 1689.4252, 1759.9623, 1690.1268]
2026-01-23 05:56:44,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:56:44,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 2 minutes, 55 seconds)
2026-01-23 06:00:44,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:00:59,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1688.31970 ± 34.508
2026-01-23 06:00:59,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1644.5441, 1657.8062, 1697.7145, 1622.3417, 1695.1754, 1725.3511, 1713.5846, 1698.4888, 1737.8682, 1690.3236]
2026-01-23 06:00:59,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:00:59,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 58 minutes, 39 seconds)
2026-01-23 06:05:00,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:05:14,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1693.58911 ± 95.775
2026-01-23 06:05:14,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1720.4194, 1737.043, 1701.7343, 1685.8395, 1729.2126, 1711.6128, 1729.0681, 1763.628, 1744.2294, 1413.1031]
2026-01-23 06:05:14,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:06:27,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 4 minutes, 21 seconds)
2026-01-23 06:10:27,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:10:41,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1729.34119 ± 27.805
2026-01-23 06:10:41,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1709.9663, 1684.2534, 1713.9812, 1730.44, 1707.7959, 1739.421, 1756.1825, 1783.1025, 1754.9117, 1713.3568]
2026-01-23 06:10:41,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:10:41,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1729.34) for latency DatasetOffice
2026-01-23 06:10:41,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 59 minutes, 43 seconds)
2026-01-23 06:14:42,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:14:56,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1705.36096 ± 47.252
2026-01-23 06:14:56,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1727.6128, 1713.753, 1620.5967, 1752.6512, 1769.309, 1723.5792, 1708.9349, 1740.4725, 1639.0543, 1657.6439]
2026-01-23 06:14:56,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:14:56,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 55 minutes, 11 seconds)
2026-01-23 06:19:12,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:19:26,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1664.85645 ± 118.042
2026-01-23 06:19:26,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1718.3398, 1635.0245, 1322.642, 1682.7258, 1697.6871, 1758.3461, 1698.1748, 1709.1477, 1730.367, 1696.1094]
2026-01-23 06:19:26,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:19:26,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 52 minutes, 37 seconds)
2026-01-23 06:23:27,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:23:42,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1726.23401 ± 50.488
2026-01-23 06:23:42,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1747.0042, 1773.6847, 1670.8063, 1777.3579, 1741.6427, 1762.1171, 1641.6969, 1725.3662, 1648.0665, 1774.597]
2026-01-23 06:23:42,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:23:42,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 48 minutes, 4 seconds)
2026-01-23 06:27:42,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:27:57,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1438.27649 ± 846.741
2026-01-23 06:27:57,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1699.0314, 1721.5415, -1074.1456, 1733.6686, 1825.4448, 1794.5886, 1780.6716, 1746.8158, 1794.1926, 1360.956]
2026-01-23 06:27:57,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:27:57,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 34 minutes, 44 seconds)
2026-01-23 06:31:58,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:32:12,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1730.49829 ± 41.456
2026-01-23 06:32:12,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1753.4116, 1681.2483, 1645.316, 1764.3641, 1716.3776, 1696.914, 1768.1656, 1778.4014, 1750.4084, 1750.3767]
2026-01-23 06:32:12,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:32:12,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1730.50) for latency DatasetOffice
2026-01-23 06:32:12,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 30 minutes, 34 seconds)
2026-01-23 06:36:13,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:36:27,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1753.60718 ± 50.545
2026-01-23 06:36:27,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1792.792, 1736.6158, 1714.3516, 1831.1835, 1775.6774, 1729.3566, 1758.7563, 1716.8328, 1658.5387, 1821.9674]
2026-01-23 06:36:27,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:36:27,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1753.61) for latency DatasetOffice
2026-01-23 06:36:27,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 26 minutes, 17 seconds)
2026-01-23 06:40:04,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:40:18,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1757.51001 ± 111.051
2026-01-23 06:40:18,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1779.4763, 1792.3289, 1765.2236, 1432.8832, 1801.3717, 1795.4053, 1841.1693, 1753.0547, 1828.1859, 1786.0009]
2026-01-23 06:40:18,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:40:18,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1757.51) for latency DatasetOffice
2026-01-23 06:40:18,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 17 minutes, 40 seconds)
2026-01-23 06:44:19,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:44:33,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1754.32495 ± 38.868
2026-01-23 06:44:33,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1690.5199, 1761.1714, 1740.176, 1709.8027, 1797.3405, 1728.9669, 1771.4193, 1831.806, 1753.5797, 1758.467]
2026-01-23 06:44:33,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:44:33,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 13 minutes, 30 seconds)
2026-01-23 06:48:34,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:48:48,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1771.39844 ± 57.711
2026-01-23 06:48:48,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1784.3976, 1798.3473, 1615.455, 1776.9088, 1743.9186, 1835.0355, 1777.6392, 1771.6079, 1826.7137, 1783.9608]
2026-01-23 06:48:48,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:48:48,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1771.40) for latency DatasetOffice
2026-01-23 06:48:48,735 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 9 minutes, 20 seconds)
2026-01-23 06:53:05,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:53:19,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1813.29553 ± 28.958
2026-01-23 06:53:19,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1808.1566, 1764.856, 1772.0214, 1806.7104, 1818.5105, 1835.4136, 1868.5148, 1810.3351, 1839.2708, 1809.1656]
2026-01-23 06:53:19,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:53:19,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1813.30) for latency DatasetOffice
2026-01-23 06:53:19,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 6 minutes, 43 seconds)
2026-01-23 06:57:20,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:57:34,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1831.72070 ± 98.221
2026-01-23 06:57:34,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1844.1805, 1867.1982, 1783.8582, 1889.0532, 1837.3013, 1916.9072, 1912.0422, 1877.1566, 1560.2946, 1829.2144]
2026-01-23 06:57:34,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:57:34,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1831.72) for latency DatasetOffice
2026-01-23 06:57:34,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 2 minutes, 29 seconds)
2026-01-23 07:01:35,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:01:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1799.86255 ± 89.022
2026-01-23 07:01:49,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1837.8369, 1767.8528, 1677.5319, 1602.8195, 1907.271, 1804.6188, 1836.481, 1853.9772, 1878.0988, 1832.1373]
2026-01-23 07:01:49,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:01:49,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 31 seconds)
2026-01-23 07:05:46,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:06:01,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1730.34314 ± 33.576
2026-01-23 07:06:01,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1728.3757, 1718.0471, 1748.0682, 1712.2184, 1700.2745, 1810.2158, 1740.4067, 1692.841, 1697.7301, 1755.2543]
2026-01-23 07:06:01,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:06:01,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 55 minutes, 52 seconds)
2026-01-23 07:09:48,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:10:02,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1820.94592 ± 34.182
2026-01-23 07:10:02,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1815.7396, 1810.3264, 1739.4795, 1838.3947, 1828.1519, 1830.505, 1840.0752, 1789.6698, 1870.521, 1846.596]
2026-01-23 07:10:02,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:10:02,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 50 minutes, 24 seconds)
2026-01-23 07:14:03,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:14:17,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1831.64062 ± 45.939
2026-01-23 07:14:17,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1735.1263, 1847.4429, 1846.4558, 1751.7197, 1842.0181, 1882.9657, 1851.647, 1840.3348, 1868.8065, 1849.8893]
2026-01-23 07:14:17,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:14:17,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 44 minutes, 50 seconds)
2026-01-23 07:18:18,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:18:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1825.09631 ± 54.460
2026-01-23 07:18:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1902.8223, 1867.6868, 1813.9078, 1734.9781, 1872.3053, 1839.5005, 1854.0739, 1847.1438, 1783.0343, 1735.5101]
2026-01-23 07:18:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:18:32,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 40 minutes, 38 seconds)
2026-01-23 07:22:33,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:22:47,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1800.21851 ± 147.091
2026-01-23 07:22:47,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1834.5181, 1876.524, 1718.8691, 1394.2727, 1885.3953, 1887.7758, 1877.6699, 1756.8585, 1904.1534, 1866.1487]
2026-01-23 07:22:47,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:22:47,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 36 minutes, 26 seconds)
2026-01-23 07:26:48,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:27:02,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1841.53943 ± 47.252
2026-01-23 07:27:02,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1856.1049, 1824.629, 1801.2042, 1763.572, 1898.646, 1870.7228, 1930.9496, 1832.6613, 1839.3245, 1797.5798]
2026-01-23 07:27:02,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:27:02,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1841.54) for latency DatasetOffice
2026-01-23 07:27:02,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 32 minutes, 30 seconds)
2026-01-23 07:31:03,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:31:17,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1882.45239 ± 30.387
2026-01-23 07:31:17,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1899.5167, 1877.8385, 1819.091, 1867.6017, 1872.3708, 1872.7926, 1871.2396, 1887.0171, 1921.4971, 1935.5582]
2026-01-23 07:31:17,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:31:17,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1882.45) for latency DatasetOffice
2026-01-23 07:31:17,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 29 minutes, 16 seconds)
2026-01-23 07:35:39,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:35:52,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1765.46228 ± 439.372
2026-01-23 07:35:52,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1927.6411, 1844.258, 1906.884, 1938.4543, 450.4824, 1888.3145, 1953.467, 1926.7476, 1932.6791, 1885.6959]
2026-01-23 07:35:52,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 250.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:35:52,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 26 minutes, 18 seconds)
2026-01-23 07:39:53,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:40:07,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1830.99023 ± 72.139
2026-01-23 07:40:07,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1820.1241, 1860.0425, 1864.3324, 1779.417, 1898.0026, 1932.5131, 1894.5105, 1828.4653, 1748.9321, 1683.5619]
2026-01-23 07:40:07,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:40:07,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 21 minutes, 59 seconds)
2026-01-23 07:44:07,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:44:21,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1778.37793 ± 135.731
2026-01-23 07:44:21,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1786.6676, 1812.4852, 1770.4324, 1393.5314, 1884.661, 1843.0934, 1871.2118, 1733.9554, 1834.6042, 1853.1365]
2026-01-23 07:44:21,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:44:21,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 17 minutes, 38 seconds)
2026-01-23 07:48:22,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:48:36,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1842.80542 ± 50.831
2026-01-23 07:48:36,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1750.599, 1908.3949, 1800.9518, 1804.204, 1837.3907, 1896.6249, 1898.9476, 1799.8976, 1886.0272, 1845.0177]
2026-01-23 07:48:36,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:48:36,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 13 minutes, 20 seconds)
2026-01-23 07:52:31,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:52:45,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1733.89941 ± 399.469
2026-01-23 07:52:45,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1846.7472, 1827.5468, 1793.5508, 1934.2625, 1884.8231, 1862.1532, 1819.0266, 542.9248, 1894.0269, 1933.9313]
2026-01-23 07:52:45,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:52:45,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 8 minutes, 40 seconds)
2026-01-23 07:56:46,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:57:00,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1949.62085 ± 46.207
2026-01-23 07:57:00,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1923.5543, 1858.5459, 1899.6853, 1955.442, 1972.0631, 1942.9003, 1972.1776, 1975.0042, 2039.5577, 1957.2765]
2026-01-23 07:57:00,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:57:00,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1949.62) for latency DatasetOffice
2026-01-23 07:57:00,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 3 minutes, 24 seconds)
2026-01-23 08:01:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:01:15,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1886.37268 ± 77.172
2026-01-23 08:01:15,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1888.35, 1880.9445, 1873.2648, 1787.7163, 1939.4646, 1908.5669, 1918.0884, 1984.063, 1967.6366, 1715.6318]
2026-01-23 08:01:15,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:01:15,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 59 minutes, 12 seconds)
2026-01-23 08:05:16,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:05:30,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1885.17163 ± 114.296
2026-01-23 08:05:30,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1981.944, 1922.2166, 1892.9161, 1565.3221, 1835.6289, 1874.4135, 1964.4271, 1926.263, 1945.8596, 1942.7257]
2026-01-23 08:05:30,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:05:30,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 54 minutes, 59 seconds)
2026-01-23 08:09:26,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:09:40,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1919.62622 ± 46.467
2026-01-23 08:09:40,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1871.5574, 1875.7979, 1924.951, 1946.9366, 1836.7272, 1963.9144, 1998.5222, 1956.2164, 1914.9109, 1906.7278]
2026-01-23 08:09:40,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:09:40,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 50 minutes, 33 seconds)
2026-01-23 08:13:41,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:13:56,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1932.50317 ± 37.348
2026-01-23 08:13:56,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1979.7405, 1907.69, 1857.6505, 1901.5603, 1952.9431, 1973.7485, 1904.495, 1926.497, 1965.3823, 1955.3243]
2026-01-23 08:13:56,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:13:56,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 46 minutes, 34 seconds)
2026-01-23 08:17:44,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:17:58,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1948.05212 ± 28.360
2026-01-23 08:17:58,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1955.604, 1876.3862, 1953.1593, 1931.4114, 1977.3767, 1930.9174, 1960.3259, 1976.5249, 1968.7323, 1950.0846]
2026-01-23 08:17:58,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:17:58,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 41 minutes, 56 seconds)
2026-01-23 08:22:00,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:22:14,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1979.02148 ± 49.767
2026-01-23 08:22:14,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1981.457, 2024.5442, 1951.5725, 1962.1309, 1931.0403, 2010.4487, 2014.0957, 2023.4794, 2026.1483, 1865.2965]
2026-01-23 08:22:14,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:22:14,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1979.02) for latency DatasetOffice
2026-01-23 08:22:14,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 37 minutes, 45 seconds)
2026-01-23 08:26:17,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:26:32,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1889.69824 ± 146.416
2026-01-23 08:26:32,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [2002.4526, 1865.3323, 2004.1748, 1859.715, 1480.9872, 1937.7825, 1893.8442, 1977.9169, 1885.6162, 1989.162]
2026-01-23 08:26:32,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:26:32,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 33 minutes, 38 seconds)
2026-01-23 08:30:33,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:30:47,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1929.30212 ± 167.930
2026-01-23 08:30:47,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1994.7052, 1956.9395, 1935.4824, 1911.9657, 1442.6887, 2048.0989, 1952.6633, 2041.0243, 2030.3679, 1979.086]
2026-01-23 08:30:47,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 771.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:30:47,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 29 minutes, 33 seconds)
2026-01-23 08:34:48,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:35:02,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1886.00232 ± 313.261
2026-01-23 08:35:02,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1966.1095, 2067.4778, 1987.7563, 1858.2745, 959.9722, 1955.228, 2029.9037, 1996.473, 2016.1947, 2022.6337]
2026-01-23 08:35:02,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:35:02,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 25 minutes, 20 seconds)
2026-01-23 08:39:15,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:39:28,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1767.47791 ± 365.094
2026-01-23 08:39:28,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1936.4271, 1989.8328, 1234.2876, 1022.8327, 1442.5189, 1944.3171, 1942.0604, 2045.8711, 2025.6241, 2091.0073]
2026-01-23 08:39:28,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 667.0, 541.0, 760.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:39:28,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 21 minutes, 29 seconds)
2026-01-23 08:43:29,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:43:42,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1997.51831 ± 66.029
2026-01-23 08:43:42,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [2029.58, 2054.2715, 2054.9146, 1892.2869, 1865.288, 1991.8181, 1965.7566, 2061.5593, 2041.3063, 2018.4015]
2026-01-23 08:43:42,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:43:42,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1997.52) for latency DatasetOffice
2026-01-23 08:43:42,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 10 seconds)
2026-01-23 08:47:40,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:47:54,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1963.39087 ± 115.356
2026-01-23 08:47:54,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [2048.4258, 2072.6707, 1930.1683, 2039.3912, 1756.6394, 1742.3329, 2044.7666, 2031.4427, 2024.6489, 1943.4218]
2026-01-23 08:47:54,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:47:54,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 49 seconds)
2026-01-23 08:51:52,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:52:06,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1980.62830 ± 55.250
2026-01-23 08:52:06,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [2047.915, 1981.8749, 1952.5477, 1931.1725, 1976.101, 1885.231, 1999.3229, 2044.2267, 2060.6228, 1927.2677]
2026-01-23 08:52:06,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:52:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 31 seconds)
2026-01-23 08:55:59,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:56:12,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1949.10449 ± 53.356
2026-01-23 08:56:12,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [2057.9448, 1862.143, 1941.5188, 1982.2354, 1915.3419, 1898.3884, 1986.8477, 1922.3691, 1932.8503, 1991.4066]
2026-01-23 08:56:12,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:56:12,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 14 seconds)
2026-01-23 09:00:10,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:00:22,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1648.02893 ± 594.882
2026-01-23 09:00:22,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [318.93576, 687.5077, 1986.6929, 1909.8447, 1545.205, 1950.9312, 1992.7174, 2023.5148, 1966.1445, 2098.795]
2026-01-23 09:00:22,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [172.0, 370.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:00:22,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1299 [DEBUG]: Training session finished
