2026-01-23 01:53:25,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mda-highdim-mem2
2026-01-23 01:53:25,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mda-highdim-mem2
2026-01-23 01:53:25,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14ff41bea990>}
2026-01-23 01:53:25,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-23 01:53:25,725 baseline-bpql-mda-noisy-ant:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-23 01:53:25,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-23 01:53:25,741 baseline-bpql-mda-noisy-ant:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:53:25,741 baseline-bpql-mda-noisy-ant:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:53:25,749 baseline-bpql-mda-noisy-ant:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(8, 512, batch_first=True)
)
2026-01-23 01:53:26,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-23 01:53:26,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-23 01:57:34,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:50,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 795.55731 ± 10.710
2026-01-23 01:57:50,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [805.6598, 804.29987, 784.17163, 808.62604, 782.75073, 804.0738, 789.57745, 798.41425, 776.7191, 801.28064]
2026-01-23 01:57:50,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:50,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (795.56) for latency DatasetOffice
2026-01-23 01:57:50,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 15 minutes, 29 seconds)
2026-01-23 02:02:07,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:23,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 817.92944 ± 5.337
2026-01-23 02:02:23,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [819.0256, 821.17285, 810.00836, 816.04614, 808.2742, 814.95807, 818.8731, 821.7543, 824.0805, 825.10156]
2026-01-23 02:02:23,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:02:23,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (817.93) for latency DatasetOffice
2026-01-23 02:02:23,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 18 minutes, 27 seconds)
2026-01-23 02:06:40,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:55,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 774.94373 ± 24.310
2026-01-23 02:06:55,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [789.214, 728.2392, 787.8554, 799.2998, 795.8675, 761.5037, 749.78033, 756.68536, 809.15485, 771.837]
2026-01-23 02:06:55,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:06:55,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 16 minutes, 1 second)
2026-01-23 02:11:08,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:24,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 776.95471 ± 118.019
2026-01-23 02:11:24,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [804.9443, 820.80853, 814.9379, 423.91135, 822.124, 819.021, 819.17535, 796.73615, 830.90594, 816.9825]
2026-01-23 02:11:24,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:11:24,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 10 minutes, 57 seconds)
2026-01-23 02:15:36,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 785.32166 ± 126.838
2026-01-23 02:15:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [847.92065, 850.4498, 555.9358, 509.38885, 849.4983, 854.0803, 848.55707, 848.1588, 852.8644, 836.3628]
2026-01-23 02:15:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:15:52,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 6 minutes, 8 seconds)
2026-01-23 02:20:04,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:20,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 836.14880 ± 9.294
2026-01-23 02:20:20,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [835.24774, 841.2992, 844.16425, 846.7764, 837.17834, 829.41046, 830.0203, 826.04767, 851.045, 820.2987]
2026-01-23 02:20:20,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:20:20,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (836.15) for latency DatasetOffice
2026-01-23 02:20:20,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 2 minutes, 48 seconds)
2026-01-23 02:24:32,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:47,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 814.16016 ± 171.017
2026-01-23 02:24:47,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [866.0083, 888.4652, 866.4788, 871.88873, 879.52264, 871.13715, 869.77795, 864.7421, 301.58856, 861.99243]
2026-01-23 02:24:47,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:24:47,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 56 minutes, 45 seconds)
2026-01-23 02:29:15,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:30,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 785.52112 ± 27.965
2026-01-23 02:29:30,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [809.00586, 788.2326, 809.6274, 787.7837, 750.94507, 720.99817, 818.46967, 785.77216, 800.62225, 783.7537]
2026-01-23 02:29:30,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:29:30,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 55 minutes, 31 seconds)
2026-01-23 02:33:38,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:53,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 682.62134 ± 222.204
2026-01-23 02:33:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [242.37833, 833.6169, 716.0672, 777.1831, 768.03503, 243.43823, 809.288, 803.8561, 800.7172, 831.6331]
2026-01-23 02:33:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:53,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 49 minutes, 13 seconds)
2026-01-23 02:37:58,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:11,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 248.43179 ± 689.224
2026-01-23 02:38:11,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-474.89374, 766.234, 789.1852, 798.0005, 718.9408, 795.93567, -11.015334, -627.57935, 790.57245, -1061.0624]
2026-01-23 02:38:11,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 23.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:38:11,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 41 minutes, 51 seconds)
2026-01-23 02:42:14,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:28,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 632.67053 ± 233.239
2026-01-23 02:42:28,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [755.3503, 24.073414, 767.2272, 728.02844, 759.3262, 688.6026, 723.372, 705.8393, 370.04425, 804.8415]
2026-01-23 02:42:28,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 65.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:42:28,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 33 minutes, 59 seconds)
2026-01-23 02:46:33,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:47,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 674.91101 ± 16.678
2026-01-23 02:46:47,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [704.3105, 677.4281, 679.2144, 655.9556, 679.7339, 645.15686, 655.9324, 686.2918, 685.55585, 679.5311]
2026-01-23 02:46:47,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:46:47,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 27 minutes, 6 seconds)
2026-01-23 02:50:50,811 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:51:01,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: -134.81908 ± 197.035
2026-01-23 02:51:01,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [29.220974, -465.51907, -175.25957, 222.43271, -123.70543, -172.27304, 3.1916747, -367.40973, -303.69598, 4.8267336]
2026-01-23 02:51:01,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 240.0, 1000.0, 1000.0, 1000.0, 13.0, 1000.0, 1000.0, 29.0]
2026-01-23 02:51:01,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 14 minutes, 10 seconds)
2026-01-23 02:54:47,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:58,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 382.81787 ± 464.615
2026-01-23 02:54:58,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [694.93805, 789.43475, -704.12604, 3.6169016, 776.9675, 702.7097, -16.246832, 724.75977, 300.1593, 555.96594]
2026-01-23 02:54:58,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 37.0, 1000.0, 1000.0, 32.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:54:58,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 2 minutes, 44 seconds)
2026-01-23 02:59:03,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:16,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 588.04962 ± 201.533
2026-01-23 02:59:16,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [674.2005, 709.78186, 593.56885, 563.9913, 683.86084, 686.4176, 705.5569, 584.9498, 2.4351673, 675.73315]
2026-01-23 02:59:16,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 56.0, 1000.0]
2026-01-23 02:59:16,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 58 minutes, 17 seconds)
2026-01-23 03:03:31,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:03:44,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 500.25000 ± 203.166
2026-01-23 03:03:44,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [676.2657, 525.476, 483.5893, -25.123144, 514.9259, 700.70435, 675.52484, 521.5381, 588.8295, 340.76917]
2026-01-23 03:03:44,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 55.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:03:44,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 57 minutes, 24 seconds)
2026-01-23 03:07:40,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:53,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 78.04961 ± 197.363
2026-01-23 03:07:53,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [214.43282, 160.98364, -33.737045, 154.59329, 158.13136, 212.94292, -283.90396, 284.28488, -286.44412, 199.21233]
2026-01-23 03:07:53,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 59.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:07:53,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 50 minutes, 7 seconds)
2026-01-23 03:11:43,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: -172.02945 ± 512.571
2026-01-23 03:11:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-4.3318815, -26.546204, -1226.7163, -60.1761, -968.9407, -30.325624, 686.4338, -29.078928, -15.923438, -44.689194]
2026-01-23 03:11:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [18.0, 23.0, 1000.0, 90.0, 1000.0, 39.0, 1000.0, 30.0, 40.0, 85.0]
2026-01-23 03:11:48,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 40 minutes, 55 seconds)
2026-01-23 03:16:04,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:16:14,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 215.96199 ± 478.579
2026-01-23 03:16:14,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-564.67346, 714.20935, -12.246266, 627.9439, 642.3189, 617.6988, -528.2879, 4.6236625, 678.0273, -19.994333]
2026-01-23 03:16:14,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 15.0, 1000.0, 1000.0, 1000.0, 1000.0, 21.0, 1000.0, 29.0]
2026-01-23 03:16:14,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 44 minutes, 34 seconds)
2026-01-23 03:20:03,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:20:10,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: -45.41543 ± 392.866
2026-01-23 03:20:10,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-8.680356, 316.84155, 295.49185, -18.093382, -1097.7083, -35.059654, -57.099525, -12.057699, 357.8704, -195.65918]
2026-01-23 03:20:10,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [16.0, 1000.0, 1000.0, 18.0, 1000.0, 34.0, 76.0, 15.0, 1000.0, 1000.0]
2026-01-23 03:20:10,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 34 minutes, 20 seconds)
2026-01-23 03:24:28,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:41,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 523.25793 ± 195.352
2026-01-23 03:24:41,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [596.0937, 557.01556, 619.1866, 619.844, 632.0835, -0.2449154, 632.21796, 327.39087, 627.4006, 621.5919]
2026-01-23 03:24:41,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 7.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:24:41,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 30 minutes, 59 seconds)
2026-01-23 03:28:44,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:28:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 432.07959 ± 281.738
2026-01-23 03:28:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [598.8276, 726.7553, -168.04767, 639.2948, 368.04163, -26.727383, 488.22836, 585.7824, 505.50653, 603.13403]
2026-01-23 03:28:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 27.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:28:57,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 28 minutes, 38 seconds)
2026-01-23 03:32:59,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:33:11,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 594.03680 ± 327.481
2026-01-23 03:33:11,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [793.12885, 760.1102, 761.8101, 739.02203, 752.3059, -68.84985, 758.7651, 733.2447, -51.519005, 762.3501]
2026-01-23 03:33:11,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 99.0, 1000.0, 1000.0, 126.0, 1000.0]
2026-01-23 03:33:11,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 29 minutes, 21 seconds)
2026-01-23 03:37:14,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:28,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 791.42120 ± 17.174
2026-01-23 03:37:28,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [776.465, 810.26416, 763.37463, 825.86395, 790.05975, 790.74774, 789.42786, 773.4876, 798.14154, 796.3795]
2026-01-23 03:37:28,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:37:28,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 22 minutes, 45 seconds)
2026-01-23 03:41:31,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:41:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 697.03931 ± 229.547
2026-01-23 03:41:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [531.1125, 818.47253, 791.89185, 795.24, 781.3961, 809.10693, 788.79126, 792.7648, 809.6123, 52.00509]
2026-01-23 03:41:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:41:46,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 23 minutes, 57 seconds)
2026-01-23 03:45:48,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:46:03,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 631.00690 ± 441.149
2026-01-23 03:46:03,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [784.90283, 772.10034, 775.2614, 774.45685, 783.2585, 791.35345, 750.2345, 783.60223, 786.9343, -692.03564]
2026-01-23 03:46:03,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:46:03,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 16 minutes, 9 seconds)
2026-01-23 03:50:06,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:50:20,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 713.53674 ± 317.492
2026-01-23 03:50:20,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [813.2397, 813.516, 827.48145, 827.3388, 822.675, 817.8588, 819.5273, 811.6825, -238.8081, 820.8564]
2026-01-23 03:50:20,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:50:20,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 12 minutes, 14 seconds)
2026-01-23 03:54:23,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:54:37,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 754.19995 ± 316.047
2026-01-23 03:54:37,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [851.81635, 869.1602, 857.96875, 860.29285, -193.45705, 858.6438, 879.1993, 865.0491, 839.1513, 854.175]
2026-01-23 03:54:37,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:54:37,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 8 minutes, 38 seconds)
2026-01-23 03:58:40,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:58:54,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 607.71857 ± 473.722
2026-01-23 03:58:54,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [830.0468, 821.9112, 840.10657, 835.9308, 819.43097, 1.8274684, 850.24677, 839.9774, -602.27795, 839.9861]
2026-01-23 03:58:54,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:58:54,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 4 minutes, 22 seconds)
2026-01-23 04:02:57,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:03:09,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 653.40533 ± 341.908
2026-01-23 04:03:09,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [864.84625, 855.3901, 536.1003, 861.17694, -4.143904, 853.0928, 860.7945, 858.6851, 850.65857, -2.5476933]
2026-01-23 04:03:09,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 14.0, 1000.0, 1000.0, 1000.0, 1000.0, 13.0]
2026-01-23 04:03:09,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 59 minutes, 25 seconds)
2026-01-23 04:07:12,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:07:26,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 829.07849 ± 81.201
2026-01-23 04:07:26,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [863.9549, 866.992, 866.09076, 858.4162, 839.4923, 866.12, 868.86237, 794.94794, 594.4925, 871.4165]
2026-01-23 04:07:26,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:07:26,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 55 minutes, 8 seconds)
2026-01-23 04:11:29,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:11:43,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 697.09137 ± 261.557
2026-01-23 04:11:43,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [871.7642, 874.65735, 323.93536, 153.67654, 868.2324, 875.5635, 459.46536, 858.89966, 814.62103, 870.0983]
2026-01-23 04:11:43,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:11:43,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 50 minutes, 50 seconds)
2026-01-23 04:15:46,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:16:00,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 841.99988 ± 11.514
2026-01-23 04:16:00,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [852.0629, 843.04553, 841.57477, 847.685, 849.5605, 840.4547, 858.5634, 837.301, 813.73346, 836.0176]
2026-01-23 04:16:00,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:16:00,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (842.00) for latency DatasetOffice
2026-01-23 04:16:00,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 46 minutes, 35 seconds)
2026-01-23 04:20:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:20:17,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 876.05798 ± 9.830
2026-01-23 04:20:17,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [886.8148, 876.2588, 859.241, 874.1474, 885.1613, 883.2832, 889.7776, 864.7406, 865.44684, 875.70844]
2026-01-23 04:20:17,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:20:17,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (876.06) for latency DatasetOffice
2026-01-23 04:20:17,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 42 minutes, 16 seconds)
2026-01-23 04:24:20,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:24:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 891.93719 ± 11.145
2026-01-23 04:24:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [880.62604, 872.68353, 893.97705, 896.68835, 909.7035, 892.01764, 878.68835, 904.85455, 890.64966, 899.4831]
2026-01-23 04:24:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:24:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (891.94) for latency DatasetOffice
2026-01-23 04:24:35,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 38 minutes, 37 seconds)
2026-01-23 04:28:38,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:28:52,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 883.66034 ± 9.710
2026-01-23 04:28:52,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [886.82574, 860.5057, 890.39685, 895.4799, 874.1888, 882.3615, 880.3214, 893.19775, 886.621, 886.7047]
2026-01-23 04:28:52,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:28:52,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 34 minutes, 21 seconds)
2026-01-23 04:32:55,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:33:09,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 898.17615 ± 14.924
2026-01-23 04:33:09,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [857.8401, 900.9223, 886.13574, 900.04376, 902.34827, 903.0729, 904.0914, 909.2163, 911.2414, 906.8498]
2026-01-23 04:33:09,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:33:09,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (898.18) for latency DatasetOffice
2026-01-23 04:33:09,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 30 minutes, 6 seconds)
2026-01-23 04:37:12,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:37:26,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 896.84747 ± 8.958
2026-01-23 04:37:26,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [904.2923, 888.80475, 905.4255, 911.7582, 888.86127, 880.8701, 895.74066, 901.34875, 891.0352, 900.3374]
2026-01-23 04:37:26,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:37:26,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 25 minutes, 44 seconds)
2026-01-23 04:41:29,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:41:44,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 879.67413 ± 16.523
2026-01-23 04:41:44,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [899.0008, 895.4858, 857.3376, 879.6825, 877.8546, 888.6133, 902.1401, 869.5534, 849.45856, 877.61475]
2026-01-23 04:41:44,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:41:44,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 21 minutes, 30 seconds)
2026-01-23 04:45:46,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:46:00,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 836.85388 ± 17.545
2026-01-23 04:46:00,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [837.3446, 821.60394, 813.3631, 844.14734, 828.5592, 814.1703, 838.1128, 840.267, 864.042, 866.9292]
2026-01-23 04:46:00,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:46:00,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 17 minutes, 8 seconds)
2026-01-23 04:50:03,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:50:18,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 911.21790 ± 7.538
2026-01-23 04:50:18,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [918.9005, 902.4281, 917.0229, 906.8216, 914.2704, 908.33575, 911.12836, 922.08966, 915.07007, 896.1115]
2026-01-23 04:50:18,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:50:18,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (911.22) for latency DatasetOffice
2026-01-23 04:50:18,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 12 minutes, 49 seconds)
2026-01-23 04:54:20,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:54:35,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 929.74316 ± 8.044
2026-01-23 04:54:35,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [932.89636, 947.6395, 914.3192, 927.696, 933.4903, 923.81885, 927.5755, 929.28143, 927.1994, 933.51544]
2026-01-23 04:54:35,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:54:35,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (929.74) for latency DatasetOffice
2026-01-23 04:54:35,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 8 minutes, 30 seconds)
2026-01-23 04:58:37,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:58:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 679.65723 ± 707.400
2026-01-23 04:58:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [918.52295, 917.9005, 910.2787, 909.8425, 918.82666, 919.18164, 907.60425, 916.3124, 920.60474, -1442.5021]
2026-01-23 04:58:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:58:52,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 4 minutes, 13 seconds)
2026-01-23 05:02:34,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:02:49,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 910.62762 ± 4.147
2026-01-23 05:02:49,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [907.78174, 912.1332, 914.0973, 906.6939, 907.45184, 915.60254, 908.9698, 906.88745, 919.1559, 907.5029]
2026-01-23 05:02:49,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:02:49,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 56 minutes, 7 seconds)
2026-01-23 05:06:52,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:07:06,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 883.54639 ± 13.264
2026-01-23 05:07:06,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [891.0132, 864.1933, 898.338, 882.35736, 875.5682, 881.74725, 888.0611, 907.4857, 884.31934, 862.38043]
2026-01-23 05:07:06,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:07:06,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 52 minutes, 2 seconds)
2026-01-23 05:11:09,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:11:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 908.65039 ± 8.141
2026-01-23 05:11:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [905.427, 910.0729, 902.37427, 906.95197, 905.835, 900.6852, 911.10144, 928.57043, 915.87616, 899.6094]
2026-01-23 05:11:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:11:24,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 47 minutes, 52 seconds)
2026-01-23 05:15:26,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:15:41,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 926.92383 ± 11.196
2026-01-23 05:15:41,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [937.0689, 907.64276, 933.75433, 927.4289, 932.27637, 915.0119, 937.3137, 934.5268, 908.8917, 935.3221]
2026-01-23 05:15:41,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:15:41,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 43 minutes, 41 seconds)
2026-01-23 05:19:44,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:19:58,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 938.09436 ± 40.483
2026-01-23 05:19:58,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [968.83325, 954.7417, 956.13995, 932.30585, 957.2714, 952.23987, 936.3397, 958.22644, 944.19916, 820.64667]
2026-01-23 05:19:58,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 977.0]
2026-01-23 05:19:58,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (938.09) for latency DatasetOffice
2026-01-23 05:19:58,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 39 minutes, 30 seconds)
2026-01-23 05:24:01,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:24:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 950.44073 ± 10.895
2026-01-23 05:24:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [950.3003, 926.7561, 971.69226, 942.9934, 947.9711, 952.466, 958.7336, 947.5675, 956.0449, 949.8816]
2026-01-23 05:24:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:24:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (950.44) for latency DatasetOffice
2026-01-23 05:24:15,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 38 minutes, 39 seconds)
2026-01-23 05:28:18,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:28:32,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 932.49512 ± 7.033
2026-01-23 05:28:32,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [942.13336, 929.2722, 929.4638, 925.5721, 933.5223, 930.7252, 943.45557, 921.76465, 941.0998, 927.9424]
2026-01-23 05:28:32,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:28:32,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 34 minutes, 19 seconds)
2026-01-23 05:32:35,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:32:49,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 965.96777 ± 5.389
2026-01-23 05:32:49,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [960.6079, 962.8337, 965.22107, 972.5746, 965.12164, 964.4762, 964.11035, 972.2577, 957.2491, 975.2261]
2026-01-23 05:32:49,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:32:49,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (965.97) for latency DatasetOffice
2026-01-23 05:32:49,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 29 minutes, 59 seconds)
2026-01-23 05:36:52,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:37:06,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 924.16327 ± 4.239
2026-01-23 05:37:06,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [921.8557, 923.3246, 923.31714, 926.0679, 926.91064, 926.69135, 923.66046, 913.94214, 924.5919, 931.2702]
2026-01-23 05:37:06,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:37:06,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 25 minutes, 42 seconds)
2026-01-23 05:41:09,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:41:24,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 977.37189 ± 7.850
2026-01-23 05:41:24,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [984.8016, 980.521, 977.8835, 962.63995, 973.46, 977.6069, 974.4108, 973.73126, 994.3207, 974.34375]
2026-01-23 05:41:24,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:41:24,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (977.37) for latency DatasetOffice
2026-01-23 05:41:24,147 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 21 minutes, 24 seconds)
2026-01-23 05:45:26,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:45:41,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 992.18103 ± 11.150
2026-01-23 05:45:41,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [981.2766, 979.1423, 1006.1113, 973.9803, 987.537, 1002.29297, 1007.9102, 995.05035, 998.777, 989.7321]
2026-01-23 05:45:41,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:45:41,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (992.18) for latency DatasetOffice
2026-01-23 05:45:41,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 17 minutes, 11 seconds)
2026-01-23 05:49:44,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:49:58,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1016.40265 ± 7.356
2026-01-23 05:49:58,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1020.91974, 1011.08057, 1021.69763, 999.3159, 1027.5947, 1021.66504, 1018.6079, 1014.5143, 1015.38934, 1013.2412]
2026-01-23 05:49:58,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:49:58,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1016.40) for latency DatasetOffice
2026-01-23 05:49:58,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 12 minutes, 52 seconds)
2026-01-23 05:54:18,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:54:32,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1051.01733 ± 99.439
2026-01-23 05:54:32,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1096.4254, 1110.1337, 1153.4421, 1036.6519, 1045.6243, 941.97034, 807.3285, 1098.0867, 1142.5737, 1077.9379]
2026-01-23 05:54:32,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:54:32,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1051.02) for latency DatasetOffice
2026-01-23 05:54:32,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 11 minutes, 1 second)
2026-01-23 05:58:34,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:58:49,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1173.31421 ± 40.898
2026-01-23 05:58:49,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1189.4697, 1224.144, 1175.6332, 1082.5975, 1174.7374, 1188.5942, 1155.6244, 1178.4413, 1231.5635, 1132.3369]
2026-01-23 05:58:49,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:58:49,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1173.31) for latency DatasetOffice
2026-01-23 05:58:49,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 6 minutes, 38 seconds)
2026-01-23 06:02:51,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:03:06,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1169.03101 ± 23.420
2026-01-23 06:03:06,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1195.6355, 1175.8163, 1160.4545, 1155.2786, 1156.8478, 1148.0168, 1186.1542, 1198.3596, 1122.0203, 1191.7263]
2026-01-23 06:03:06,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:03:06,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 2 minutes, 16 seconds)
2026-01-23 06:07:08,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:07:23,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1303.70557 ± 98.723
2026-01-23 06:07:23,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1030.0961, 1283.5353, 1326.661, 1306.659, 1381.3915, 1399.4438, 1370.845, 1334.2799, 1279.6165, 1324.527]
2026-01-23 06:07:23,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:07:23,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1303.71) for latency DatasetOffice
2026-01-23 06:07:23,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 57 minutes, 54 seconds)
2026-01-23 06:11:25,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:11:39,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1257.23267 ± 45.092
2026-01-23 06:11:39,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1285.7006, 1252.1107, 1217.8826, 1285.6531, 1264.2565, 1337.0079, 1274.0737, 1257.3875, 1242.0825, 1156.1716]
2026-01-23 06:11:39,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:11:39,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 53 minutes, 32 seconds)
2026-01-23 06:15:42,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:15:56,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1337.66077 ± 27.804
2026-01-23 06:15:56,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1362.2335, 1364.2081, 1326.2308, 1348.5646, 1324.7281, 1321.6821, 1310.8214, 1393.2861, 1330.6145, 1294.2385]
2026-01-23 06:15:56,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:15:56,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1337.66) for latency DatasetOffice
2026-01-23 06:15:56,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 46 minutes, 59 seconds)
2026-01-23 06:19:59,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:20:13,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1513.53699 ± 21.003
2026-01-23 06:20:13,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1560.7805, 1490.0743, 1504.0635, 1509.4244, 1524.216, 1498.8564, 1535.5294, 1521.2411, 1497.2627, 1493.9218]
2026-01-23 06:20:13,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:20:13,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1513.54) for latency DatasetOffice
2026-01-23 06:20:13,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 42 minutes, 40 seconds)
2026-01-23 06:24:16,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:24:30,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1473.25415 ± 41.187
2026-01-23 06:24:30,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1472.6786, 1435.1082, 1519.0804, 1462.2334, 1521.8665, 1465.6835, 1453.1783, 1517.0873, 1384.9811, 1500.6431]
2026-01-23 06:24:30,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:24:30,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 38 minutes, 21 seconds)
2026-01-23 06:28:08,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:28:22,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1536.65894 ± 25.017
2026-01-23 06:28:22,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1492.685, 1499.2865, 1545.0593, 1550.3049, 1547.7415, 1550.0599, 1508.4292, 1569.9749, 1548.3435, 1554.705]
2026-01-23 06:28:22,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:28:22,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1536.66) for latency DatasetOffice
2026-01-23 06:28:22,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 31 minutes, 9 seconds)
2026-01-23 06:32:25,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:32:39,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1550.16211 ± 73.569
2026-01-23 06:32:39,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1591.1664, 1547.8313, 1540.8319, 1354.2206, 1604.5391, 1610.357, 1596.836, 1523.4728, 1610.7507, 1521.6152]
2026-01-23 06:32:39,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:32:39,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1550.16) for latency DatasetOffice
2026-01-23 06:32:39,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 26 minutes, 56 seconds)
2026-01-23 06:36:42,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:36:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1633.82141 ± 22.191
2026-01-23 06:36:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1580.694, 1647.4366, 1641.5415, 1637.0072, 1630.6421, 1658.955, 1654.4012, 1645.9526, 1632.9275, 1608.6549]
2026-01-23 06:36:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:36:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1633.82) for latency DatasetOffice
2026-01-23 06:36:56,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 22 minutes, 44 seconds)
2026-01-23 06:40:58,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:41:12,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1648.96558 ± 28.484
2026-01-23 06:41:12,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1605.7292, 1637.4434, 1645.218, 1634.4286, 1660.5115, 1664.4167, 1658.7828, 1683.8094, 1695.4973, 1603.8185]
2026-01-23 06:41:12,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:41:12,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1648.97) for latency DatasetOffice
2026-01-23 06:41:12,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 18 minutes, 33 seconds)
2026-01-23 06:45:15,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:45:28,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1448.03601 ± 479.574
2026-01-23 06:45:28,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1573.0868, 1597.5021, 1610.4038, 1596.1022, 1615.6951, 1607.1923, 10.08747, 1630.6058, 1611.3591, 1628.3256]
2026-01-23 06:45:28,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 52.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:45:28,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 14 minutes, 13 seconds)
2026-01-23 06:49:31,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:49:45,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1669.33203 ± 26.868
2026-01-23 06:49:45,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1691.5386, 1606.5387, 1676.6676, 1672.5383, 1656.9498, 1648.5131, 1706.9875, 1660.9663, 1692.1243, 1680.4951]
2026-01-23 06:49:45,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:49:45,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1669.33) for latency DatasetOffice
2026-01-23 06:49:45,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 12 minutes, 32 seconds)
2026-01-23 06:53:48,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:54:02,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1657.18127 ± 76.075
2026-01-23 06:54:02,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1665.0333, 1701.5962, 1658.6445, 1645.5215, 1437.0795, 1684.9102, 1674.5814, 1709.5095, 1687.4309, 1707.5056]
2026-01-23 06:54:02,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:54:02,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 8 minutes, 15 seconds)
2026-01-23 06:58:04,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:58:18,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1612.70605 ± 284.490
2026-01-23 06:58:18,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1689.6538, 1663.4567, 1721.6174, 763.0808, 1709.741, 1720.7357, 1734.2815, 1732.0471, 1657.0858, 1735.3616]
2026-01-23 06:58:18,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:58:18,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 4 minutes)
2026-01-23 07:02:30,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:02:44,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1650.58850 ± 23.134
2026-01-23 07:02:44,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1666.8163, 1633.2848, 1694.3607, 1616.0122, 1657.7577, 1631.8234, 1633.3788, 1677.1334, 1636.3461, 1658.9725]
2026-01-23 07:02:44,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:02:44,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 31 seconds)
2026-01-23 07:06:47,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:07:01,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1732.96851 ± 26.573
2026-01-23 07:07:01,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1695.3317, 1752.4032, 1743.6405, 1739.2217, 1722.7412, 1755.2301, 1690.2513, 1778.6747, 1710.1941, 1741.9962]
2026-01-23 07:07:01,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:07:01,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1732.97) for latency DatasetOffice
2026-01-23 07:07:01,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 56 minutes, 19 seconds)
2026-01-23 07:11:07,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:11:22,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1744.28101 ± 19.668
2026-01-23 07:11:22,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1762.0745, 1746.4106, 1742.9332, 1742.7814, 1697.6146, 1764.8314, 1766.4254, 1745.277, 1723.9622, 1750.5004]
2026-01-23 07:11:22,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:11:22,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1744.28) for latency DatasetOffice
2026-01-23 07:11:22,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 52 minutes, 27 seconds)
2026-01-23 07:15:37,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:15:49,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1442.99316 ± 566.008
2026-01-23 07:15:49,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1818.012, 1654.1671, 1823.6539, 1834.7825, 980.61743, 1799.8463, 1840.3384, 1821.9421, 333.06778, 523.5053]
2026-01-23 07:15:49,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 195.0, 328.0]
2026-01-23 07:15:49,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 48 minutes, 59 seconds)
2026-01-23 07:20:04,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:20:19,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1745.80627 ± 38.711
2026-01-23 07:20:19,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1809.3737, 1732.6962, 1654.9388, 1747.0996, 1731.4006, 1789.2777, 1731.4406, 1751.2454, 1754.6537, 1755.9369]
2026-01-23 07:20:19,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:20:19,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1745.81) for latency DatasetOffice
2026-01-23 07:20:19,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 45 minutes, 38 seconds)
2026-01-23 07:24:34,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:24:49,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1583.21973 ± 554.762
2026-01-23 07:24:49,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1818.359, -69.37865, 1749.6506, 1818.0006, 1793.0034, 1754.405, 1777.3823, 1770.143, 1587.5281, 1833.1039]
2026-01-23 07:24:49,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:24:49,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 41 minutes, 34 seconds)
2026-01-23 07:28:56,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:29:11,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1819.01794 ± 16.977
2026-01-23 07:29:11,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1804.8282, 1792.0005, 1801.9911, 1848.1257, 1813.7084, 1829.7451, 1806.0507, 1837.1873, 1827.6184, 1828.9238]
2026-01-23 07:29:11,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:29:11,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1819.02) for latency DatasetOffice
2026-01-23 07:29:11,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 37 minutes, 36 seconds)
2026-01-23 07:33:24,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:33:40,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1806.61462 ± 14.484
2026-01-23 07:33:40,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1791.7987, 1796.1014, 1825.5721, 1804.8601, 1811.4939, 1834.6815, 1802.53, 1789.416, 1816.5728, 1793.1211]
2026-01-23 07:33:40,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:33:40,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 33 minutes, 35 seconds)
2026-01-23 07:37:54,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:38:09,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1723.60486 ± 102.616
2026-01-23 07:38:09,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1770.209, 1761.2083, 1761.5399, 1729.4744, 1720.0435, 1788.7571, 1422.5585, 1762.2794, 1735.3765, 1784.601]
2026-01-23 07:38:09,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:38:09,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 29 minutes, 19 seconds)
2026-01-23 07:42:24,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:42:39,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1808.61499 ± 27.365
2026-01-23 07:42:39,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1770.609, 1785.2999, 1792.2953, 1833.8955, 1818.8649, 1841.7092, 1778.2839, 1826.1652, 1788.4586, 1850.5688]
2026-01-23 07:42:39,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:42:39,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 24 minutes, 51 seconds)
2026-01-23 07:46:54,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:47:09,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1854.86694 ± 42.728
2026-01-23 07:47:09,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1822.7745, 1850.7174, 1844.2881, 1892.5977, 1861.8129, 1811.2245, 1898.9143, 1906.6228, 1767.2029, 1892.5155]
2026-01-23 07:47:09,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:47:09,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1854.87) for latency DatasetOffice
2026-01-23 07:47:09,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 20 minutes, 23 seconds)
2026-01-23 07:51:24,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:51:39,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1834.54395 ± 25.548
2026-01-23 07:51:39,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1817.5912, 1876.9729, 1838.3466, 1811.5994, 1786.9216, 1860.8337, 1838.9843, 1816.5217, 1839.171, 1858.4954]
2026-01-23 07:51:39,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:51:39,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 16 minutes, 22 seconds)
2026-01-23 07:55:53,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:56:08,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1881.07458 ± 32.215
2026-01-23 07:56:08,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1891.9116, 1924.0466, 1808.7266, 1917.2584, 1854.7488, 1896.0326, 1881.4822, 1870.2205, 1862.824, 1903.4938]
2026-01-23 07:56:08,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:56:08,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1881.07) for latency DatasetOffice
2026-01-23 07:56:08,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 11 minutes, 54 seconds)
2026-01-23 08:00:41,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:00:56,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1794.67163 ± 110.824
2026-01-23 08:00:56,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1883.711, 1829.9259, 1717.8336, 1863.8452, 1800.2743, 1500.049, 1876.005, 1854.4103, 1862.5352, 1758.1284]
2026-01-23 08:00:56,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:00:56,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 8 minutes, 19 seconds)
2026-01-23 08:04:53,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:05:08,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1838.34766 ± 34.192
2026-01-23 08:05:08,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1860.0631, 1817.1644, 1778.4193, 1808.8191, 1840.6555, 1861.6146, 1809.2642, 1832.9719, 1880.088, 1894.4167]
2026-01-23 08:05:08,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:05:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 2 minutes, 57 seconds)
2026-01-23 08:09:22,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:09:38,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1863.07788 ± 41.496
2026-01-23 08:09:38,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1894.1067, 1927.4498, 1828.8743, 1888.8386, 1909.016, 1801.2491, 1806.1923, 1832.0422, 1873.0809, 1869.9299]
2026-01-23 08:09:38,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:09:38,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 58 minutes, 27 seconds)
2026-01-23 08:14:09,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:14:24,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1898.78870 ± 34.958
2026-01-23 08:14:24,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1848.3536, 1867.3765, 1884.5598, 1886.7268, 1857.4728, 1910.9501, 1942.9569, 1956.0908, 1899.2178, 1934.1807]
2026-01-23 08:14:24,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:14:24,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1898.79) for latency DatasetOffice
2026-01-23 08:14:24,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 54 minutes, 35 seconds)
2026-01-23 08:18:38,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:18:54,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1891.18616 ± 45.274
2026-01-23 08:18:54,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1907.6226, 1881.5803, 1777.6532, 1890.5599, 1927.9094, 1873.5332, 1914.9604, 1958.7958, 1904.507, 1874.738]
2026-01-23 08:18:54,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:18:54,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 50 minutes, 4 seconds)
2026-01-23 08:22:52,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:23:07,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1842.26721 ± 109.285
2026-01-23 08:23:07,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1916.2083, 1931.5095, 1875.271, 1808.0415, 1823.9374, 1813.4469, 1548.0221, 1942.0685, 1919.9065, 1844.2603]
2026-01-23 08:23:07,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:23:07,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 44 minutes, 23 seconds)
2026-01-23 08:27:22,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:27:37,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1938.53540 ± 57.407
2026-01-23 08:27:37,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1899.3955, 1937.982, 1803.747, 1965.6721, 1898.1195, 2000.7892, 1922.404, 1986.6461, 1979.7678, 1990.8309]
2026-01-23 08:27:37,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:27:37,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1938.54) for latency DatasetOffice
2026-01-23 08:27:51,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 40 minutes, 53 seconds)
2026-01-23 08:32:14,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:32:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1852.85510 ± 40.008
2026-01-23 08:32:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1891.1082, 1907.6403, 1874.9691, 1851.0908, 1796.2839, 1777.6667, 1865.3258, 1849.8477, 1825.753, 1888.8658]
2026-01-23 08:32:29,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:32:29,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 36 minutes, 34 seconds)
2026-01-23 08:36:47,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:37:02,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1940.09888 ± 28.628
2026-01-23 08:37:02,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1963.9987, 1923.923, 1944.779, 1913.4285, 1969.0411, 1920.7069, 1942.9575, 1882.5372, 1982.6387, 1956.9791]
2026-01-23 08:37:02,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:37:02,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1940.10) for latency DatasetOffice
2026-01-23 08:37:02,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 31 minutes, 41 seconds)
2026-01-23 08:41:20,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:41:34,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1858.03479 ± 309.608
2026-01-23 08:41:34,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1952.6782, 1991.2173, 1995.0138, 1987.554, 1947.9956, 951.7456, 1985.677, 2002.3533, 2002.4562, 1763.6569]
2026-01-23 08:41:34,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 502.0, 1000.0, 1000.0, 1000.0, 899.0]
2026-01-23 08:41:34,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 27 minutes, 12 seconds)
2026-01-23 08:45:30,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:45:44,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1806.87622 ± 253.987
2026-01-23 08:45:44,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1946.3956, 2032.4069, 1458.5126, 1909.3926, 1910.6128, 1879.17, 1782.1029, 1206.6992, 2048.3088, 1895.16]
2026-01-23 08:45:44,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 717.0, 1000.0, 1000.0]
2026-01-23 08:45:44,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 22 minutes, 36 seconds)
2026-01-23 08:49:58,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:50:13,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1890.89722 ± 32.668
2026-01-23 08:50:13,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1890.0693, 1851.1655, 1911.6417, 1822.8335, 1912.9388, 1902.5037, 1942.4911, 1881.0275, 1880.2493, 1914.0531]
2026-01-23 08:50:13,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:50:13,735 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 53 seconds)
2026-01-23 08:54:28,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:54:43,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1904.82935 ± 36.185
2026-01-23 08:54:43,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1865.2238, 1923.9275, 1872.3551, 1917.9707, 1879.3132, 1957.6162, 1947.6243, 1877.0032, 1947.7122, 1859.5479]
2026-01-23 08:54:43,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:54:43,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 20 seconds)
2026-01-23 08:59:00,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:59:15,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1851.64258 ± 295.732
2026-01-23 08:59:15,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1590.9373, 1967.5894, 2103.9028, 1926.8032, 2034.2374, 1054.6503, 1881.1526, 1970.9131, 2003.4735, 1982.7668]
2026-01-23 08:59:15,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 978.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:59:15,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 53 seconds)
2026-01-23 09:03:23,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:03:37,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 2032.05103 ± 105.946
2026-01-23 09:03:37,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [2139.3726, 2175.3076, 2061.8145, 2081.8809, 1758.2708, 2027.0972, 1998.9167, 2039.6168, 2005.5891, 2032.6447]
2026-01-23 09:03:37,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 890.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:03:37,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (2032.05) for latency DatasetOffice
2026-01-23 09:03:37,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 24 seconds)
2026-01-23 09:08:10,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:08:24,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1833.14026 ± 291.768
2026-01-23 09:08:24,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1993.3564, 1987.9254, 1936.7731, 1930.2551, 1880.1027, 1987.5443, 1680.0231, 1913.1302, 2020.1812, 1002.11035]
2026-01-23 09:08:24,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 513.0]
2026-01-23 09:08:24,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1299 [DEBUG]: Training session finished
