2026-01-23 01:52:23,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mda-highdim-mem5 
2026-01-23 01:52:23,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-bpql-mda-highdim-mem5 
2026-01-23 01:52:23,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14e7d0cf9890>}
2026-01-23 01:52:23,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-23 01:52:23,779 baseline-bpql-mda-noisy-ant:91 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-23 01:52:23,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-23 01:52:23,796 baseline-bpql-mda-noisy-ant:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:52:23,796 baseline-bpql-mda-noisy-ant:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:52:23,803 baseline-bpql-mda-noisy-ant:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(8, 512, batch_first=True)
)
2026-01-23 01:52:24,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-23 01:52:24,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-23 01:56:39,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:56,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 799.42120 ± 19.455
2026-01-23 01:56:56,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [819.6795, 795.4217, 791.5906, 794.96704, 823.8756, 828.70984, 788.30804, 808.906, 767.1753, 775.5785]
2026-01-23 01:56:56,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:56:56,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (799.42) for latency DatasetOffice
2026-01-23 01:56:56,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 28 minutes, 14 seconds)
2026-01-23 02:01:19,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:35,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 678.71478 ± 186.275
2026-01-23 02:01:35,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [246.3287, 812.17236, 521.068, 809.4245, 781.13916, 773.13, 791.7626, 756.51196, 474.41455, 821.19635]
2026-01-23 02:01:35,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:01:35,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 29 minutes, 51 seconds)
2026-01-23 02:05:59,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:15,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 339.20462 ± 694.740
2026-01-23 02:06:15,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [783.3984, -825.9954, 729.82996, 810.1758, -496.5064, -818.5754, 801.0571, 808.51276, 801.3399, 798.8096]
2026-01-23 02:06:15,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:06:15,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 27 minutes, 28 seconds)
2026-01-23 02:10:38,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:54,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 765.53406 ± 11.477
2026-01-23 02:10:54,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [780.055, 770.9432, 771.77386, 747.58563, 750.4776, 777.90607, 752.53613, 758.29034, 774.7288, 771.0446]
2026-01-23 02:10:54,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:10:54,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 23 minutes, 51 seconds)
2026-01-23 02:15:16,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:32,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 782.34998 ± 47.483
2026-01-23 02:15:32,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [783.9261, 801.1551, 797.5408, 641.2188, 807.28754, 808.13586, 794.6018, 796.83307, 796.6674, 796.1326]
2026-01-23 02:15:32,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:15:32,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 19 minutes, 27 seconds)
2026-01-23 02:19:53,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:09,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 765.97626 ± 29.150
2026-01-23 02:20:09,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [787.0519, 784.0907, 766.00287, 785.04553, 727.714, 794.6485, 777.3365, 763.11523, 778.01514, 696.74274]
2026-01-23 02:20:09,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:20:09,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 16 minutes, 35 seconds)
2026-01-23 02:24:30,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:46,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 299.26270 ± 44.687
2026-01-23 02:24:46,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [322.79678, 327.22937, 223.67076, 337.22076, 222.3114, 338.18845, 328.82977, 251.58937, 319.66492, 321.12537]
2026-01-23 02:24:46,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:24:46,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 7 hours, 11 minutes, 15 seconds)
2026-01-23 02:29:07,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:17,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: -865.42542 ± 684.834
2026-01-23 02:29:17,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-1094.1896, -1879.0168, -38.84406, -1063.9838, -59.905415, -143.20023, -1364.9037, -1264.6781, -64.016075, -1681.5166]
2026-01-23 02:29:17,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 66.0, 1000.0, 51.0, 105.0, 1000.0, 1000.0, 72.0, 1000.0]
2026-01-23 02:29:17,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 7 hours, 3 minutes, 54 seconds)
2026-01-23 02:33:38,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:54,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 244.14726 ± 133.116
2026-01-23 02:33:54,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [334.99454, 274.57724, 351.01852, 185.61023, 367.19983, 216.94293, 372.79822, 14.785448, 326.93927, -3.3935065]
2026-01-23 02:33:54,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:54,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 58 minutes, 37 seconds)
2026-01-23 02:38:19,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:35,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: -17.45612 ± 642.455
2026-01-23 02:38:35,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-805.83374, -818.21844, -254.46323, 735.2178, -562.2847, 736.0192, -373.6019, 751.36084, -319.26605, 736.5092]
2026-01-23 02:38:35,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:38:35,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 54 minutes, 50 seconds)
2026-01-23 02:42:55,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:11,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 531.48669 ± 261.676
2026-01-23 02:43:11,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [720.3164, 738.6443, -51.54626, 424.97458, 696.89276, 339.38, 741.1718, 260.63135, 738.95776, 705.4437]
2026-01-23 02:43:11,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:43:11,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 49 minutes, 51 seconds)
2026-01-23 02:47:28,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 760.67786 ± 3.841
2026-01-23 02:47:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [760.58923, 758.9468, 756.3921, 760.7042, 755.1334, 764.65295, 758.6236, 767.5932, 758.63007, 765.5125]
2026-01-23 02:47:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:47:44,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 44 minutes, 5 seconds)
2026-01-23 02:52:00,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:16,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 743.24719 ± 5.839
2026-01-23 02:52:16,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [752.80493, 735.8238, 750.8409, 735.06213, 745.9843, 741.1079, 741.7948, 743.2905, 748.1458, 737.6165]
2026-01-23 02:52:16,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:52:16,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 39 minutes, 56 seconds)
2026-01-23 02:56:33,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:56:48,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 679.50293 ± 223.249
2026-01-23 02:56:48,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [734.3042, 10.283223, 739.65515, 758.7352, 757.51843, 761.94415, 759.63025, 758.87286, 754.1161, 759.9696]
2026-01-23 02:56:48,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:56:48,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 33 minutes, 59 seconds)
2026-01-23 03:01:05,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:01:20,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 325.66074 ± 564.312
2026-01-23 03:01:20,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-313.17795, 792.9746, 784.90857, 770.71155, 787.86676, -275.89078, -390.96957, 773.12213, -471.7455, 798.8077]
2026-01-23 03:01:20,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:01:20,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 26 minutes, 54 seconds)
2026-01-23 03:05:27,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:43,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 700.46417 ± 16.858
2026-01-23 03:05:43,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [720.7307, 687.6745, 683.5456, 728.1261, 683.1278, 684.53424, 703.7473, 689.8058, 699.7131, 723.6365]
2026-01-23 03:05:43,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:05:43,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 18 minutes, 30 seconds)
2026-01-23 03:10:00,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:10:15,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: -192.57721 ± 625.721
2026-01-23 03:10:15,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-786.81, -825.91003, -159.6277, 753.8864, 591.67236, -415.02594, 741.7516, -812.6674, -237.22995, -775.8115]
2026-01-23 03:10:15,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:10:15,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 13 minutes, 48 seconds)
2026-01-23 03:14:33,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:49,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 579.41443 ± 185.847
2026-01-23 03:14:49,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [606.39026, 683.4861, 699.7594, 503.34973, 684.95935, 591.69275, 643.1439, 47.53659, 665.1226, 668.70435]
2026-01-23 03:14:49,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:14:49,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 9 minutes, 41 seconds)
2026-01-23 03:19:05,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:18,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 390.54562 ± 386.708
2026-01-23 03:19:18,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [683.2866, -3.3798082, 756.12665, -319.05438, 11.672788, 649.91956, 693.51025, 677.72144, 718.3933, 37.259926]
2026-01-23 03:19:18,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 34.0, 1000.0, 1000.0, 35.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:19:18,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 4 minutes, 20 seconds)
2026-01-23 03:23:29,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:45,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 626.25916 ± 50.173
2026-01-23 03:23:45,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [621.54034, 483.55093, 621.52045, 658.3935, 646.037, 615.1528, 661.32135, 646.1419, 654.63763, 654.296]
2026-01-23 03:23:45,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:23:45,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 58 minutes, 27 seconds)
2026-01-23 03:28:20,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:28:36,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 704.36603 ± 27.797
2026-01-23 03:28:36,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [704.75134, 709.12445, 719.7534, 711.3434, 710.93787, 712.51605, 622.67523, 726.2184, 711.98926, 714.3507]
2026-01-23 03:28:36,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:28:36,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 6 hours, 1 minute, 38 seconds)
2026-01-23 03:32:53,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:33:09,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 735.41370 ± 4.412
2026-01-23 03:33:09,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [740.7634, 735.41547, 735.3587, 737.8334, 740.50494, 734.5376, 736.4466, 729.06165, 738.0309, 726.1841]
2026-01-23 03:33:09,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:33:09,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 57 minutes, 8 seconds)
2026-01-23 03:37:27,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:42,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 682.43860 ± 35.204
2026-01-23 03:37:42,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [725.5776, 687.59503, 671.6473, 671.8283, 718.70776, 593.95605, 689.71686, 667.7501, 686.001, 711.60596]
2026-01-23 03:37:42,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:37:42,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 52 minutes, 35 seconds)
2026-01-23 03:41:58,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:14,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 791.26282 ± 18.956
2026-01-23 03:42:14,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [798.21313, 799.66925, 801.33704, 792.5639, 805.5662, 790.24603, 739.2413, 806.1722, 800.5937, 779.0256]
2026-01-23 03:42:14,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:42:14,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 48 minutes, 34 seconds)
2026-01-23 03:46:14,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:46:30,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 800.43982 ± 9.187
2026-01-23 03:46:30,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [803.09546, 796.86884, 799.99805, 820.8048, 794.6815, 796.5008, 797.25195, 785.587, 811.5787, 798.0314]
2026-01-23 03:46:30,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:46:30,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (800.44) for latency DatasetOffice
2026-01-23 03:46:30,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 41 minutes, 19 seconds)
2026-01-23 03:50:59,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:51:15,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 765.92255 ± 11.088
2026-01-23 03:51:15,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [762.88464, 779.2498, 785.678, 770.8993, 753.1792, 749.768, 756.19916, 775.4061, 761.8977, 764.0637]
2026-01-23 03:51:15,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:51:15,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 35 minutes, 10 seconds)
2026-01-23 03:55:32,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:55:48,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 634.94141 ± 39.370
2026-01-23 03:55:48,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [567.2023, 638.72675, 678.6922, 626.81024, 566.2887, 627.53534, 652.3658, 634.732, 683.2796, 673.7807]
2026-01-23 03:55:48,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:55:48,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 30 minutes, 43 seconds)
2026-01-23 04:00:03,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:00:18,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 650.42151 ± 36.087
2026-01-23 04:00:18,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [648.4259, 550.4266, 674.1861, 646.0155, 664.36206, 658.0136, 637.60443, 681.76416, 679.79675, 663.6192]
2026-01-23 04:00:18,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:00:18,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 25 minutes, 24 seconds)
2026-01-23 04:04:09,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:04:24,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: -33.80087 ± 346.514
2026-01-23 04:04:24,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-657.12665, -294.61264, 274.7064, -217.14395, -465.19116, 167.41562, 216.45824, 350.1055, -96.89512, 384.27515]
2026-01-23 04:04:24,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 648.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:04:24,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 14 minutes, 49 seconds)
2026-01-23 04:08:40,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:08:56,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 70.93795 ± 476.489
2026-01-23 04:08:56,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [375.08908, 229.98907, 159.60394, 243.26009, -181.98877, 47.251526, 633.25055, -1213.0128, 52.650597, 363.28632]
2026-01-23 04:08:56,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:08:56,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 14 minutes)
2026-01-23 04:13:25,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:13:38,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 380.34100 ± 196.946
2026-01-23 04:13:38,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [438.5999, 460.8517, 469.53452, 517.8445, -5.5039086, 559.605, 444.79578, 470.0719, 457.16385, -9.553065]
2026-01-23 04:13:38,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 17.0, 1000.0, 1000.0, 1000.0, 1000.0, 17.0]
2026-01-23 04:13:38,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 8 minutes, 54 seconds)
2026-01-23 04:17:38,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:17:54,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 686.41296 ± 162.153
2026-01-23 04:17:54,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [738.5553, 756.1679, 743.4886, 760.9129, 613.1491, 755.4866, 768.3512, 750.9696, 759.81964, 217.2292]
2026-01-23 04:17:54,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:17:54,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 40 seconds)
2026-01-23 04:22:26,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:22:41,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 750.35120 ± 39.652
2026-01-23 04:22:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [756.655, 771.5528, 772.2442, 774.7005, 712.99835, 762.23706, 770.4206, 771.47784, 643.27136, 767.95416]
2026-01-23 04:22:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:22:41,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 59 minutes, 54 seconds)
2026-01-23 04:26:56,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:27:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 727.91956 ± 22.102
2026-01-23 04:27:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [688.11725, 758.99133, 698.9158, 702.6786, 738.4355, 731.9849, 732.5791, 743.67975, 736.0597, 747.7538]
2026-01-23 04:27:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:27:12,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 54 seconds)
2026-01-23 04:31:05,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:31:13,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: -296.52563 ± 429.948
2026-01-23 04:31:13,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-23.631437, -631.11206, 15.28556, 405.7237, -97.689224, -967.57153, -729.5095, -54.42856, -57.760307, -824.56305]
2026-01-23 04:31:13,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [29.0, 1000.0, 44.0, 1000.0, 91.0, 1000.0, 1000.0, 33.0, 113.0, 1000.0]
2026-01-23 04:31:13,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 49 minutes, 47 seconds)
2026-01-23 04:35:47,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:36:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 688.51428 ± 47.675
2026-01-23 04:36:03,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [691.71545, 708.7789, 699.10315, 711.1181, 735.64734, 688.2161, 725.6177, 556.45874, 701.7318, 666.7559]
2026-01-23 04:36:03,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:36:03,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 46 minutes, 58 seconds)
2026-01-23 04:40:18,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:40:34,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 815.18695 ± 8.280
2026-01-23 04:40:34,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [815.7206, 821.408, 814.59436, 817.4969, 813.43085, 817.9692, 825.0444, 804.97614, 824.5208, 796.7078]
2026-01-23 04:40:34,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:40:34,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (815.19) for latency DatasetOffice
2026-01-23 04:40:34,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 45 minutes, 31 seconds)
2026-01-23 04:44:50,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:45:03,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: -283.92465 ± 365.332
2026-01-23 04:45:03,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-279.04987, -513.0002, 1.7711992, 129.86986, -1174.913, -64.038414, -8.526515, -64.12397, -338.3686, -528.8668]
2026-01-23 04:45:03,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 46.0, 1000.0, 1000.0, 1000.0, 16.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:45:03,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 37 minutes, 17 seconds)
2026-01-23 04:49:20,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:49:36,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 823.07800 ± 8.822
2026-01-23 04:49:36,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [821.2616, 817.79315, 821.09894, 830.3561, 829.20685, 826.16907, 801.8214, 818.50134, 831.67145, 832.9003]
2026-01-23 04:49:36,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:49:36,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (823.08) for latency DatasetOffice
2026-01-23 04:49:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 33 minutes, 21 seconds)
2026-01-23 04:53:51,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:54:07,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 854.69952 ± 5.091
2026-01-23 04:54:07,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [850.93207, 858.5045, 848.1424, 860.7555, 857.7903, 862.0838, 851.9935, 854.4569, 845.9496, 856.3862]
2026-01-23 04:54:07,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:54:07,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (854.70) for latency DatasetOffice
2026-01-23 04:54:07,142 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 34 minutes, 42 seconds)
2026-01-23 04:58:22,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:58:37,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 867.00800 ± 4.600
2026-01-23 04:58:37,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [867.8564, 868.5555, 865.45514, 871.40784, 866.4199, 865.1522, 877.05676, 859.77985, 861.5615, 866.83545]
2026-01-23 04:58:37,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:58:37,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (867.01) for latency DatasetOffice
2026-01-23 04:58:37,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 26 minutes, 22 seconds)
2026-01-23 05:02:52,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:03:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 874.65198 ± 7.512
2026-01-23 05:03:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [876.7432, 868.3274, 881.52954, 876.39276, 872.4255, 875.37665, 878.0268, 875.9393, 856.3443, 885.41486]
2026-01-23 05:03:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:03:08,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (874.65) for latency DatasetOffice
2026-01-23 05:03:08,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 21 minutes, 51 seconds)
2026-01-23 05:07:23,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:07:39,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 863.54431 ± 26.635
2026-01-23 05:07:39,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [877.17224, 867.9423, 873.2308, 876.9907, 871.02313, 875.0864, 867.2103, 873.968, 868.51337, 784.30615]
2026-01-23 05:07:39,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:07:39,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 17 minutes, 41 seconds)
2026-01-23 05:11:55,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:12:10,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 808.59454 ± 33.053
2026-01-23 05:12:10,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [842.3678, 829.9628, 819.84534, 760.193, 738.1067, 838.74744, 839.04443, 801.3283, 807.9335, 808.416]
2026-01-23 05:12:10,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:12:10,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 12 minutes, 46 seconds)
2026-01-23 05:16:25,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:16:41,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 834.60419 ± 42.717
2026-01-23 05:16:41,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [812.85077, 830.85394, 843.80634, 848.2645, 856.2358, 876.38385, 717.8481, 862.3217, 833.32367, 864.15344]
2026-01-23 05:16:41,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:16:41,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 8 minutes, 19 seconds)
2026-01-23 05:20:31,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:20:46,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 876.92157 ± 21.662
2026-01-23 05:20:46,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [891.5541, 885.4357, 895.0159, 881.151, 894.2306, 836.5305, 885.6769, 887.05334, 879.72, 832.8478]
2026-01-23 05:20:46,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:20:46,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (876.92) for latency DatasetOffice
2026-01-23 05:20:46,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 59 minutes, 11 seconds)
2026-01-23 05:25:01,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:25:17,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 882.20264 ± 7.287
2026-01-23 05:25:17,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [880.55414, 870.3931, 884.4065, 891.15875, 887.6104, 869.77026, 883.3212, 892.21674, 884.1943, 878.4013]
2026-01-23 05:25:17,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:25:17,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (882.20) for latency DatasetOffice
2026-01-23 05:25:17,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 54 minutes, 46 seconds)
2026-01-23 05:29:32,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:29:48,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 898.37354 ± 8.029
2026-01-23 05:29:48,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [906.6555, 907.07715, 886.5229, 893.86224, 902.7823, 906.2522, 904.1322, 900.56287, 889.0492, 886.83905]
2026-01-23 05:29:48,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:29:48,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (898.37) for latency DatasetOffice
2026-01-23 05:29:48,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 50 minutes, 19 seconds)
2026-01-23 05:34:03,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:34:19,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 910.69727 ± 8.008
2026-01-23 05:34:19,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [893.1059, 911.7599, 915.3852, 909.6376, 906.5694, 918.38434, 902.2314, 922.0986, 911.74927, 916.0522]
2026-01-23 05:34:19,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:34:19,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (910.70) for latency DatasetOffice
2026-01-23 05:34:19,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 45 minutes, 49 seconds)
2026-01-23 05:38:33,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:38:49,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 895.80432 ± 9.640
2026-01-23 05:38:49,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [895.0452, 903.55914, 880.6282, 892.3786, 907.9759, 896.4065, 888.5287, 907.1363, 881.29535, 905.0891]
2026-01-23 05:38:49,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:38:49,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 41 minutes, 19 seconds)
2026-01-23 05:43:04,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:43:20,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 172.65817 ± 898.390
2026-01-23 05:43:20,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-778.46515, 924.39154, 862.4395, -741.5571, 932.53284, 901.40717, -954.3528, -1199.2261, 908.3797, 871.0321]
2026-01-23 05:43:20,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:43:20,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 41 minutes, 3 seconds)
2026-01-23 05:47:35,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:47:48,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 12.31565 ± 308.216
2026-01-23 05:47:48,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [-241.41106, 17.535032, -130.09848, 30.153046, 8.076924, 877.3019, -8.078417, -252.7743, 24.405157, -201.95322]
2026-01-23 05:47:48,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 33.0, 1000.0, 1000.0, 1000.0, 41.0, 1000.0]
2026-01-23 05:47:48,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 36 minutes, 4 seconds)
2026-01-23 05:52:03,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:52:18,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 899.91516 ± 6.460
2026-01-23 05:52:18,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [904.7414, 895.4887, 898.765, 891.24646, 896.0098, 909.71826, 897.2321, 912.3591, 898.38196, 895.2084]
2026-01-23 05:52:18,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:52:18,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 31 minutes, 34 seconds)
2026-01-23 05:56:33,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:56:49,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 913.08563 ± 14.952
2026-01-23 05:56:49,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [879.59436, 927.8826, 928.1695, 925.0217, 901.1887, 903.65106, 915.7377, 926.2187, 919.15424, 904.23865]
2026-01-23 05:56:49,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:56:49,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (913.09) for latency DatasetOffice
2026-01-23 05:56:49,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 27 minutes, 4 seconds)
2026-01-23 06:01:04,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:01:20,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 919.44989 ± 4.487
2026-01-23 06:01:20,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [925.49365, 917.1265, 913.6065, 913.7937, 921.1559, 920.0781, 917.0811, 916.4856, 921.8808, 927.7971]
2026-01-23 06:01:20,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:01:20,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (919.45) for latency DatasetOffice
2026-01-23 06:01:20,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 22 minutes, 35 seconds)
2026-01-23 06:05:35,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:05:50,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 913.54114 ± 13.573
2026-01-23 06:05:50,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [911.0538, 928.43524, 908.61316, 900.60895, 893.9196, 895.4879, 911.1984, 928.0329, 931.89276, 926.1687]
2026-01-23 06:05:50,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:05:50,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 18 minutes, 5 seconds)
2026-01-23 06:10:05,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:10:21,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 910.91241 ± 10.252
2026-01-23 06:10:21,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [922.07623, 907.10284, 891.5949, 925.61066, 911.64856, 915.9957, 903.69586, 897.9392, 918.75543, 914.704]
2026-01-23 06:10:21,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:10:21,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 14 minutes, 1 second)
2026-01-23 06:14:36,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:14:52,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 931.71594 ± 6.540
2026-01-23 06:14:52,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [940.7279, 933.5195, 932.33417, 936.39856, 926.8088, 927.92535, 929.7966, 936.8261, 916.50244, 936.3189]
2026-01-23 06:14:52,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:14:52,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (931.72) for latency DatasetOffice
2026-01-23 06:14:52,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 9 minutes, 29 seconds)
2026-01-23 06:19:07,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:19:23,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 943.06348 ± 6.291
2026-01-23 06:19:23,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [937.9098, 942.4278, 944.22626, 942.0364, 934.28394, 951.26086, 936.62286, 954.5975, 938.79016, 948.47925]
2026-01-23 06:19:23,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:19:23,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (943.06) for latency DatasetOffice
2026-01-23 06:19:23,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 4 minutes, 59 seconds)
2026-01-23 06:23:39,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:23:54,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 925.82825 ± 10.600
2026-01-23 06:23:54,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [932.37366, 920.5207, 940.7359, 937.9262, 907.45667, 915.9464, 930.4677, 929.8732, 912.1826, 930.799]
2026-01-23 06:23:54,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:23:54,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 3 hours, 37 seconds)
2026-01-23 06:28:10,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:28:25,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 936.47345 ± 2.813
2026-01-23 06:28:25,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [932.33246, 934.4911, 937.34985, 937.13544, 932.9176, 934.80396, 935.80804, 938.51373, 939.9035, 941.4781]
2026-01-23 06:28:25,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:28:25,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 56 minutes, 8 seconds)
2026-01-23 06:32:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:32:57,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 941.43243 ± 10.346
2026-01-23 06:32:57,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [944.4986, 950.69745, 920.32983, 944.379, 944.574, 957.1415, 948.13354, 937.76416, 927.193, 939.61365]
2026-01-23 06:32:57,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:32:57,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 51 minutes, 45 seconds)
2026-01-23 06:37:12,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:37:28,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 930.11865 ± 10.950
2026-01-23 06:37:28,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [941.46515, 942.1129, 930.27905, 929.5381, 932.09827, 938.26605, 905.62787, 916.0461, 928.26074, 937.4924]
2026-01-23 06:37:28,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:37:28,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 47 minutes, 17 seconds)
2026-01-23 06:41:43,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:41:59,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 946.25574 ± 10.019
2026-01-23 06:41:59,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [941.1036, 946.71027, 954.2929, 958.4201, 946.6963, 921.2248, 944.327, 947.44916, 944.67474, 957.65875]
2026-01-23 06:41:59,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:41:59,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (946.26) for latency DatasetOffice
2026-01-23 06:41:59,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 42 minutes, 44 seconds)
2026-01-23 06:46:14,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:46:29,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 945.14551 ± 8.577
2026-01-23 06:46:29,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [957.0502, 952.0459, 939.46814, 933.43677, 947.5986, 958.9688, 944.53534, 944.99, 932.4324, 940.92834]
2026-01-23 06:46:29,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:46:29,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 38 minutes, 5 seconds)
2026-01-23 06:50:45,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:51:00,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 949.32263 ± 10.770
2026-01-23 06:51:00,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [962.9839, 955.80634, 949.91016, 928.64233, 954.9908, 939.0008, 965.3313, 943.7289, 940.93066, 951.9014]
2026-01-23 06:51:00,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:51:00,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (949.32) for latency DatasetOffice
2026-01-23 06:51:00,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 33 minutes, 34 seconds)
2026-01-23 06:55:16,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:55:31,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 954.86426 ± 9.642
2026-01-23 06:55:31,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [958.3311, 960.43335, 945.79724, 971.1517, 957.50574, 941.08167, 938.40283, 962.947, 957.07227, 955.92]
2026-01-23 06:55:31,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:55:31,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (954.86) for latency DatasetOffice
2026-01-23 06:55:31,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 28 minutes, 56 seconds)
2026-01-23 06:59:46,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:00:02,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 947.80792 ± 6.672
2026-01-23 07:00:02,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [947.2433, 943.9377, 942.5413, 958.8044, 938.5011, 952.4895, 946.54517, 939.10126, 952.61694, 956.29803]
2026-01-23 07:00:02,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:00:02,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 24 minutes, 22 seconds)
2026-01-23 07:04:17,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:04:32,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 950.96277 ± 7.141
2026-01-23 07:04:32,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [954.82983, 962.62714, 952.128, 947.89014, 952.2313, 941.83575, 935.80023, 955.52875, 952.7513, 954.0057]
2026-01-23 07:04:32,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:04:32,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 19 minutes, 51 seconds)
2026-01-23 07:08:47,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:09:02,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 938.63947 ± 8.916
2026-01-23 07:09:02,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [948.62134, 942.4179, 930.97614, 938.02515, 952.5999, 946.5399, 930.0561, 921.6356, 937.3715, 938.15106]
2026-01-23 07:09:02,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:09:02,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 15 minutes, 17 seconds)
2026-01-23 07:13:17,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:13:33,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 945.74304 ± 7.996
2026-01-23 07:13:33,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [942.65027, 942.99084, 954.81775, 943.6728, 953.8385, 955.42505, 931.44366, 952.14404, 934.31134, 946.13696]
2026-01-23 07:13:33,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:13:33,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 10 minutes, 45 seconds)
2026-01-23 07:17:47,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:18:03,522 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 944.97888 ± 6.951
2026-01-23 07:18:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [937.45087, 942.98975, 951.16705, 956.55414, 932.93256, 950.01434, 942.3171, 947.49603, 938.6978, 950.1692]
2026-01-23 07:18:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:18:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 6 minutes, 10 seconds)
2026-01-23 07:22:18,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:22:33,889 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 970.20398 ± 8.994
2026-01-23 07:22:33,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [969.2459, 961.78723, 969.0305, 965.2239, 963.75323, 954.06586, 977.91144, 985.5414, 977.1723, 978.30896]
2026-01-23 07:22:33,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:22:33,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (970.20) for latency DatasetOffice
2026-01-23 07:22:33,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 1 minute, 39 seconds)
2026-01-23 07:26:48,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:27:04,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 957.33508 ± 8.589
2026-01-23 07:27:04,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [952.0992, 962.07184, 943.69916, 957.59955, 968.55475, 952.743, 968.0954, 944.13696, 965.18604, 959.1651]
2026-01-23 07:27:04,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:27:04,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 57 minutes, 6 seconds)
2026-01-23 07:31:18,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:31:34,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 962.04120 ± 8.206
2026-01-23 07:31:34,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [972.85596, 955.1984, 958.9017, 955.24744, 964.8475, 966.14294, 964.41534, 973.8485, 963.682, 945.2721]
2026-01-23 07:31:34,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:31:34,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 52 minutes, 37 seconds)
2026-01-23 07:35:49,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:36:04,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 956.51746 ± 17.699
2026-01-23 07:36:04,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [944.3636, 955.8629, 962.282, 977.0155, 965.99036, 972.1218, 964.88727, 911.3558, 962.22876, 949.06726]
2026-01-23 07:36:04,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:36:04,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 48 minutes, 6 seconds)
2026-01-23 07:40:19,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:40:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1007.78613 ± 18.851
2026-01-23 07:40:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1024.1366, 996.39465, 995.4656, 990.8926, 1025.6284, 997.0609, 988.2694, 1005.9568, 1002.93274, 1051.1241]
2026-01-23 07:40:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:40:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1007.79) for latency DatasetOffice
2026-01-23 07:40:34,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 43 minutes, 36 seconds)
2026-01-23 07:44:49,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:45:05,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 964.51062 ± 24.423
2026-01-23 07:45:05,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1028.6904, 960.323, 932.9293, 961.97, 953.19385, 958.60754, 977.4339, 969.30536, 959.1882, 943.4649]
2026-01-23 07:45:05,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:45:05,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 39 minutes, 5 seconds)
2026-01-23 07:49:19,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:49:35,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1045.06299 ± 47.489
2026-01-23 07:49:35,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [963.8136, 1055.4265, 1049.1042, 1028.5426, 1041.2064, 1017.313, 1070.0212, 1103.0419, 1132.7434, 989.4166]
2026-01-23 07:49:35,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:49:35,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1045.06) for latency DatasetOffice
2026-01-23 07:49:35,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 34 minutes, 34 seconds)
2026-01-23 07:53:49,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:54:05,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 938.46307 ± 23.839
2026-01-23 07:54:05,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [916.3146, 936.2563, 999.18646, 942.77454, 957.54626, 941.42944, 915.4812, 919.2292, 928.0919, 928.32135]
2026-01-23 07:54:05,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:54:05,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 30 minutes, 2 seconds)
2026-01-23 07:58:19,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:58:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1101.17932 ± 62.605
2026-01-23 07:58:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1132.0884, 1133.4137, 1170.1072, 1127.2488, 1036.9232, 1056.7511, 1055.518, 1119.501, 982.88226, 1197.3607]
2026-01-23 07:58:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:58:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1101.18) for latency DatasetOffice
2026-01-23 07:58:35,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 25 minutes, 31 seconds)
2026-01-23 08:02:49,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:03:04,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1101.92859 ± 64.601
2026-01-23 08:03:04,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1095.512, 1093.204, 1059.3444, 1172.0009, 1076.5958, 1221.6346, 1115.7074, 1138.8629, 968.5557, 1077.8695]
2026-01-23 08:03:04,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:03:04,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1101.93) for latency DatasetOffice
2026-01-23 08:03:04,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 20 minutes, 59 seconds)
2026-01-23 08:07:19,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:07:34,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1142.25366 ± 32.557
2026-01-23 08:07:34,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1154.7814, 1125.4922, 1078.2987, 1147.8457, 1168.3596, 1134.3337, 1102.1382, 1169.1974, 1196.569, 1145.5204]
2026-01-23 08:07:34,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:07:34,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1142.25) for latency DatasetOffice
2026-01-23 08:07:34,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 16 minutes, 28 seconds)
2026-01-23 08:11:49,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:12:04,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1144.23022 ± 77.086
2026-01-23 08:12:04,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1263.5121, 1051.5411, 1260.0569, 1185.3691, 1060.2408, 1144.5012, 1205.0597, 1100.475, 1054.217, 1117.3306]
2026-01-23 08:12:04,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:12:04,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1144.23) for latency DatasetOffice
2026-01-23 08:12:04,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 11 minutes, 59 seconds)
2026-01-23 08:16:19,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:16:34,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1278.24048 ± 45.696
2026-01-23 08:16:34,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1284.8271, 1163.182, 1300.1096, 1298.6475, 1262.365, 1268.5841, 1328.0198, 1281.4042, 1337.125, 1258.141]
2026-01-23 08:16:34,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:16:34,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1278.24) for latency DatasetOffice
2026-01-23 08:16:34,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 7 minutes, 29 seconds)
2026-01-23 08:20:50,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:21:05,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1292.23071 ± 17.857
2026-01-23 08:21:05,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1297.8634, 1271.1575, 1307.0077, 1278.5634, 1297.52, 1265.7878, 1274.9376, 1317.2589, 1294.4812, 1317.7292]
2026-01-23 08:21:05,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:21:05,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1292.23) for latency DatasetOffice
2026-01-23 08:21:05,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 3 minutes, 1 second)
2026-01-23 08:25:20,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:25:36,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1402.17603 ± 52.699
2026-01-23 08:25:36,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1398.1123, 1422.3602, 1418.0582, 1441.3715, 1425.5283, 1406.6573, 1433.3774, 1388.6484, 1251.5548, 1436.0924]
2026-01-23 08:25:36,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:25:36,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1402.18) for latency DatasetOffice
2026-01-23 08:25:36,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 58 minutes, 33 seconds)
2026-01-23 08:30:27,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:30:43,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1482.77124 ± 12.896
2026-01-23 08:30:43,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1475.681, 1479.7588, 1489.6207, 1460.0709, 1471.4539, 1473.956, 1496.615, 1490.6196, 1506.9058, 1483.0315]
2026-01-23 08:30:43,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:30:43,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1482.77) for latency DatasetOffice
2026-01-23 08:30:43,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 55 minutes, 31 seconds)
2026-01-23 08:34:58,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:35:13,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1439.16565 ± 13.362
2026-01-23 08:35:13,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1439.5065, 1437.9758, 1428.827, 1441.1586, 1445.4939, 1407.6293, 1439.1975, 1444.2587, 1463.3936, 1444.2161]
2026-01-23 08:35:13,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:35:13,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 50 minutes, 55 seconds)
2026-01-23 08:39:28,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:39:44,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1475.58215 ± 46.867
2026-01-23 08:39:44,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1423.8514, 1364.1271, 1492.1947, 1500.527, 1539.3079, 1477.745, 1478.2496, 1478.2095, 1483.2445, 1518.3656]
2026-01-23 08:39:44,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:39:44,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 46 minutes, 18 seconds)
2026-01-23 08:43:59,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:44:15,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1416.21619 ± 26.916
2026-01-23 08:44:15,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1368.7535, 1389.7255, 1450.8228, 1416.157, 1380.8442, 1436.2448, 1415.1835, 1434.2787, 1420.9065, 1449.2466]
2026-01-23 08:44:15,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:44:15,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 41 minutes, 41 seconds)
2026-01-23 08:48:30,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:48:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1546.51440 ± 71.867
2026-01-23 08:48:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1565.0446, 1578.9459, 1584.9382, 1546.61, 1548.8378, 1567.8035, 1584.5692, 1573.1111, 1334.4224, 1580.8611]
2026-01-23 08:48:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:48:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1546.51) for latency DatasetOffice
2026-01-23 08:48:45,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 37 minutes, 2 seconds)
2026-01-23 08:53:00,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:53:16,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1585.61646 ± 24.433
2026-01-23 08:53:16,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1598.427, 1614.1558, 1560.4277, 1558.0448, 1570.2594, 1543.3907, 1602.4845, 1607.3552, 1587.4131, 1614.2052]
2026-01-23 08:53:16,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:53:16,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1585.62) for latency DatasetOffice
2026-01-23 08:53:16,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 31 minutes, 34 seconds)
2026-01-23 08:57:31,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:57:46,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1591.57642 ± 19.416
2026-01-23 08:57:46,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1593.3999, 1565.0953, 1583.3762, 1621.633, 1570.8644, 1608.5652, 1584.054, 1568.5759, 1602.8549, 1617.3456]
2026-01-23 08:57:46,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:57:46,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1591.58) for latency DatasetOffice
2026-01-23 08:57:46,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 27 minutes, 3 seconds)
2026-01-23 09:02:02,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:02:17,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1578.96948 ± 56.467
2026-01-23 09:02:17,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1545.2411, 1425.1002, 1621.9513, 1637.0872, 1596.7804, 1595.626, 1582.0034, 1587.4069, 1586.0415, 1612.4574]
2026-01-23 09:02:17,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:02:17,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 22 minutes, 33 seconds)
2026-01-23 09:06:33,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:06:48,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1604.90515 ± 16.417
2026-01-23 09:06:48,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1609.7789, 1575.8894, 1587.7103, 1605.4216, 1616.5956, 1608.2285, 1587.1095, 1612.3192, 1636.4058, 1609.5929]
2026-01-23 09:06:48,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:06:48,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1604.91) for latency DatasetOffice
2026-01-23 09:06:48,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 18 minutes, 2 seconds)
2026-01-23 09:11:04,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:11:20,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1632.63916 ± 90.628
2026-01-23 09:11:20,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1605.2932, 1708.4459, 1679.5107, 1660.1049, 1694.9272, 1644.9354, 1653.782, 1656.1628, 1373.1154, 1650.1139]
2026-01-23 09:11:20,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:11:20,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1632.64) for latency DatasetOffice
2026-01-23 09:11:20,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 32 seconds)
2026-01-23 09:15:35,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:15:51,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1598.24927 ± 20.937
2026-01-23 09:15:51,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1582.6608, 1604.533, 1632.008, 1583.851, 1636.6631, 1590.5334, 1608.9954, 1592.9229, 1573.0531, 1577.2714]
2026-01-23 09:15:51,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:15:51,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 9 minutes, 2 seconds)
2026-01-23 09:20:06,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:20:22,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1648.08032 ± 25.211
2026-01-23 09:20:22,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1676.5238, 1620.9138, 1654.6412, 1663.9034, 1681.4185, 1644.1472, 1615.845, 1603.3888, 1665.8102, 1654.2106]
2026-01-23 09:20:22,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:20:22,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1274 [INFO]: New best (1648.08) for latency DatasetOffice
2026-01-23 09:20:22,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 31 seconds)
2026-01-23 09:24:37,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:24:53,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1269 [DEBUG]: Total Reward: 1623.62256 ± 57.186
2026-01-23 09:24:53,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1270 [DEBUG]: All rewards: [1573.849, 1479.3435, 1633.9448, 1638.3201, 1617.0061, 1661.2126, 1618.8557, 1693.1669, 1666.761, 1653.7667]
2026-01-23 09:24:53,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:24:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-ant):1299 [DEBUG]: Training session finished
