2026-01-23 01:59:34,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mda-mem1  
2026-01-23 01:59:34,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mda-mem1  
2026-01-23 01:59:34,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x15400d8574d0>}
2026-01-23 01:59:34,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-23 01:59:34,599 baseline-bpql-mda-noisy-walker2d:91 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-23 01:59:34,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-23 01:59:34,615 baseline-bpql-mda-noisy-walker2d:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:59:34,616 baseline-bpql-mda-noisy-walker2d:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:59:34,621 baseline-bpql-mda-noisy-walker2d:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2026-01-23 01:59:35,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-23 01:59:35,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-23 02:03:02,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:04,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 261.47079 ± 86.515
2026-01-23 02:03:04,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [78.4949, 108.201256, 335.5176, 315.7869, 276.54086, 316.0837, 300.2519, 317.45108, 301.25504, 265.12463]
2026-01-23 02:03:04,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [76.0, 104.0, 216.0, 199.0, 167.0, 197.0, 182.0, 199.0, 183.0, 160.0]
2026-01-23 02:03:04,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (261.47) for latency DatasetOffice
2026-01-23 02:03:04,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 45 minutes, 23 seconds)
2026-01-23 02:06:45,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:46,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 20.65584 ± 29.904
2026-01-23 02:06:46,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [14.9464245, 16.302223, 16.167969, -26.937511, 11.255961, 16.13438, 17.91963, 101.06955, 16.545757, 23.154034]
2026-01-23 02:06:46,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [64.0, 68.0, 71.0, 180.0, 87.0, 66.0, 63.0, 102.0, 67.0, 91.0]
2026-01-23 02:06:46,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 52 minutes, 11 seconds)
2026-01-23 02:10:29,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:34,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 317.73425 ± 333.404
2026-01-23 02:10:34,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [243.60503, 121.44613, 118.930504, 45.700615, 1017.8168, 16.977377, 835.4427, 481.66492, 250.01564, 45.74292]
2026-01-23 02:10:34,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [412.0, 91.0, 213.0, 126.0, 1000.0, 119.0, 932.0, 472.0, 158.0, 123.0]
2026-01-23 02:10:34,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (317.73) for latency DatasetOffice
2026-01-23 02:10:34,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 55 minutes, 8 seconds)
2026-01-23 02:14:18,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:20,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 75.37630 ± 78.877
2026-01-23 02:14:20,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [110.49085, 73.05761, 19.774353, -7.429212, 11.168818, 209.26047, 32.765358, 12.243335, 65.14865, 227.28276]
2026-01-23 02:14:20,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [197.0, 68.0, 27.0, 20.0, 22.0, 170.0, 64.0, 25.0, 175.0, 402.0]
2026-01-23 02:14:20,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 54 minutes, 1 second)
2026-01-23 02:18:06,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:15,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 637.67950 ± 398.204
2026-01-23 02:18:15,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [112.4279, 410.84378, 1025.1158, 1021.693, 360.05722, 1018.37256, 386.15793, 12.60945, 1000.7959, 1028.721]
2026-01-23 02:18:15,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [78.0, 213.0, 1000.0, 1000.0, 202.0, 1000.0, 165.0, 26.0, 1000.0, 1000.0]
2026-01-23 02:18:15,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (637.68) for latency DatasetOffice
2026-01-23 02:18:15,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 54 minutes, 33 seconds)
2026-01-23 02:21:58,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 281.01355 ± 107.263
2026-01-23 02:22:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [211.31006, 300.38495, 514.07355, 347.5885, 283.7389, 233.71751, 347.83795, 278.4569, 210.44392, 82.58331]
2026-01-23 02:22:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [106.0, 408.0, 394.0, 217.0, 158.0, 124.0, 177.0, 135.0, 102.0, 103.0]
2026-01-23 02:22:01,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 56 minutes, 6 seconds)
2026-01-23 02:25:41,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:52,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 705.37793 ± 425.982
2026-01-23 02:25:52,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [989.34344, 985.7705, 93.471664, 977.4882, 964.7631, 985.09314, 1018.20044, 966.1473, 62.918232, 10.583294]
2026-01-23 02:25:52,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 217.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 99.0, 19.0]
2026-01-23 02:25:52,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (705.38) for latency DatasetOffice
2026-01-23 02:25:52,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 55 minutes, 10 seconds)
2026-01-23 02:29:50,460 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:52,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 130.74590 ± 158.562
2026-01-23 02:29:52,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [186.13383, 59.278915, -62.63888, 26.2287, 267.35977, 46.806858, 174.88196, 54.162045, 520.7851, 34.46065]
2026-01-23 02:29:52,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [118.0, 145.0, 168.0, 102.0, 205.0, 60.0, 93.0, 66.0, 228.0, 166.0]
2026-01-23 02:29:52,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 55 minutes, 3 seconds)
2026-01-23 02:33:21,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:25,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 429.69110 ± 223.674
2026-01-23 02:33:25,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [447.88278, 178.98734, 443.90506, 602.5805, 464.7527, 969.7043, 237.0912, 478.9962, 239.1828, 233.82832]
2026-01-23 02:33:25,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [206.0, 316.0, 200.0, 353.0, 233.0, 1000.0, 107.0, 213.0, 134.0, 118.0]
2026-01-23 02:33:25,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 47 minutes, 10 seconds)
2026-01-23 02:37:06,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:10,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 416.67352 ± 169.984
2026-01-23 02:37:10,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [434.723, 570.74603, 688.78394, 466.53455, 187.36731, 532.7088, 185.62233, 170.50308, 422.00528, 507.74094]
2026-01-23 02:37:10,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [182.0, 247.0, 672.0, 255.0, 133.0, 219.0, 129.0, 114.0, 247.0, 201.0]
2026-01-23 02:37:10,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 40 minutes, 32 seconds)
2026-01-23 02:40:52,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:56,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 648.02307 ± 228.889
2026-01-23 02:40:56,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [388.0505, 660.61395, 588.3775, 508.3996, 385.44772, 1222.7186, 663.3812, 822.51483, 649.28815, 591.4385]
2026-01-23 02:40:56,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [215.0, 278.0, 301.0, 238.0, 213.0, 668.0, 305.0, 413.0, 374.0, 311.0]
2026-01-23 02:40:56,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 36 minutes, 49 seconds)
2026-01-23 02:44:38,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:40,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 300.66605 ± 215.122
2026-01-23 02:44:40,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [312.67014, 733.79736, 320.53488, 320.92722, 282.2243, 541.1167, 65.224304, 383.42142, 22.40797, 24.335867]
2026-01-23 02:44:40,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [193.0, 333.0, 184.0, 171.0, 163.0, 241.0, 109.0, 199.0, 44.0, 35.0]
2026-01-23 02:44:40,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 30 minutes, 58 seconds)
2026-01-23 02:48:19,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:22,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 566.90729 ± 103.103
2026-01-23 02:48:22,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [450.09656, 526.8737, 497.5873, 519.55646, 668.94965, 599.54095, 606.31036, 506.434, 481.7344, 811.9897]
2026-01-23 02:48:22,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [200.0, 223.0, 218.0, 231.0, 271.0, 262.0, 275.0, 236.0, 204.0, 332.0]
2026-01-23 02:48:22,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 21 minutes, 56 seconds)
2026-01-23 02:52:03,811 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:06,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 529.44305 ± 54.182
2026-01-23 02:52:06,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [436.83365, 599.2474, 450.8165, 550.1744, 568.61597, 576.3886, 465.03583, 547.8971, 560.82513, 538.5964]
2026-01-23 02:52:06,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [161.0, 238.0, 174.0, 220.0, 229.0, 225.0, 170.0, 218.0, 220.0, 195.0]
2026-01-23 02:52:06,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 21 minutes, 28 seconds)
2026-01-23 02:55:49,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:51,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 470.12573 ± 188.644
2026-01-23 02:55:51,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [504.26382, 496.2508, -12.952816, 582.7298, 475.72723, 715.6516, 450.77008, 322.98026, 573.17584, 592.6606]
2026-01-23 02:55:51,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [180.0, 203.0, 15.0, 202.0, 200.0, 261.0, 185.0, 158.0, 223.0, 304.0]
2026-01-23 02:55:51,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 17 minutes, 44 seconds)
2026-01-23 02:59:25,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:26,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 188.29581 ± 231.015
2026-01-23 02:59:26,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [223.59105, 21.494009, 89.50101, 12.415334, 30.905626, 27.07511, 10.573077, 606.5919, 641.7824, 219.02872]
2026-01-23 02:59:26,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [115.0, 28.0, 115.0, 20.0, 44.0, 45.0, 22.0, 227.0, 258.0, 122.0]
2026-01-23 02:59:27,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 10 minutes, 53 seconds)
2026-01-23 03:03:05,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:03:08,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 454.68091 ± 180.684
2026-01-23 03:03:08,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [529.69775, 472.86093, 470.41208, 493.19193, 468.74332, 756.7821, 515.08514, 199.12596, 73.14473, 567.7647]
2026-01-23 03:03:08,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [202.0, 174.0, 173.0, 187.0, 181.0, 268.0, 200.0, 111.0, 61.0, 211.0]
2026-01-23 03:03:08,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 6 minutes, 21 seconds)
2026-01-23 03:06:51,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:06:54,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 536.99896 ± 158.275
2026-01-23 03:06:54,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [496.04327, 887.41876, 488.17947, 529.63885, 220.2062, 510.88422, 507.13925, 478.5299, 591.2098, 660.7397]
2026-01-23 03:06:54,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [197.0, 576.0, 206.0, 185.0, 272.0, 197.0, 201.0, 185.0, 236.0, 239.0]
2026-01-23 03:06:54,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 3 minutes, 56 seconds)
2026-01-23 03:10:34,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:10:38,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 717.06299 ± 169.232
2026-01-23 03:10:38,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [703.96625, 966.5849, 653.6632, 413.44443, 894.41315, 534.4368, 897.54315, 841.69507, 648.65985, 616.22314]
2026-01-23 03:10:38,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [363.0, 345.0, 290.0, 174.0, 363.0, 204.0, 418.0, 473.0, 274.0, 242.0]
2026-01-23 03:10:38,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (717.06) for latency DatasetOffice
2026-01-23 03:10:38,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 18 seconds)
2026-01-23 03:14:18,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:21,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 496.15903 ± 298.875
2026-01-23 03:14:21,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [233.7672, 337.1998, 648.39056, 327.94452, 215.54887, 835.84656, 282.12906, 1026.7706, 861.8716, 192.12164]
2026-01-23 03:14:21,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [125.0, 172.0, 241.0, 142.0, 124.0, 400.0, 139.0, 383.0, 283.0, 120.0]
2026-01-23 03:14:21,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 55 minutes, 50 seconds)
2026-01-23 03:17:58,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:02,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 716.42542 ± 439.945
2026-01-23 03:18:02,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [996.7268, 422.0021, 782.4673, 941.43365, 1626.8572, 676.6407, 896.8829, 686.68805, 1.8292767, 132.72601]
2026-01-23 03:18:02,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [359.0, 211.0, 370.0, 333.0, 660.0, 318.0, 410.0, 299.0, 11.0, 84.0]
2026-01-23 03:18:02,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 53 minutes, 44 seconds)
2026-01-23 03:21:46,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:21:49,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 597.54089 ± 278.942
2026-01-23 03:21:49,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [5.6021166, 975.54553, 984.3598, 586.2829, 498.72903, 538.42694, 871.66943, 582.1735, 543.6737, 388.94644]
2026-01-23 03:21:49,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [18.0, 386.0, 339.0, 194.0, 188.0, 199.0, 307.0, 204.0, 193.0, 148.0]
2026-01-23 03:21:49,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 51 minutes, 27 seconds)
2026-01-23 03:25:29,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:25:33,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 781.10852 ± 442.777
2026-01-23 03:25:33,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1036.5281, 555.61725, 604.5041, 636.67065, 286.57388, 863.3719, 543.52057, 767.0207, 541.58496, 1975.6932]
2026-01-23 03:25:33,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [392.0, 202.0, 217.0, 248.0, 136.0, 311.0, 190.0, 249.0, 203.0, 700.0]
2026-01-23 03:25:33,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (781.11) for latency DatasetOffice
2026-01-23 03:25:33,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 47 minutes, 8 seconds)
2026-01-23 03:29:12,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:29:15,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 419.75262 ± 331.576
2026-01-23 03:29:15,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [622.38525, 894.78467, 678.12256, 679.0905, 786.0586, 349.8085, 28.005259, 17.863306, 92.79569, 48.611977]
2026-01-23 03:29:15,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [242.0, 338.0, 272.0, 263.0, 304.0, 149.0, 38.0, 31.0, 140.0, 69.0]
2026-01-23 03:29:15,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 42 minutes, 53 seconds)
2026-01-23 03:32:51,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:32:56,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1031.14709 ± 462.196
2026-01-23 03:32:56,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [853.2879, 768.6978, 710.65796, 638.67535, 1420.548, 1315.3258, 786.70105, 1008.85333, 2181.462, 627.2612]
2026-01-23 03:32:56,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [296.0, 292.0, 278.0, 246.0, 532.0, 487.0, 269.0, 362.0, 844.0, 246.0]
2026-01-23 03:32:56,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (1031.15) for latency DatasetOffice
2026-01-23 03:32:56,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 38 minutes, 54 seconds)
2026-01-23 03:36:42,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:36:48,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1167.02258 ± 515.108
2026-01-23 03:36:48,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1693.8427, 483.65973, 936.64905, 1795.6306, 1137.6042, 483.3376, 1099.272, 2115.9568, 1002.7653, 921.5068]
2026-01-23 03:36:48,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [584.0, 186.0, 336.0, 638.0, 402.0, 200.0, 405.0, 758.0, 392.0, 338.0]
2026-01-23 03:36:48,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (1167.02) for latency DatasetOffice
2026-01-23 03:36:48,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 37 minutes, 42 seconds)
2026-01-23 03:40:24,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:40:27,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 683.31506 ± 76.204
2026-01-23 03:40:27,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [734.9439, 722.94794, 737.0257, 686.69305, 702.9206, 471.92056, 719.8291, 732.822, 686.2681, 637.77966]
2026-01-23 03:40:27,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [240.0, 237.0, 251.0, 225.0, 236.0, 150.0, 264.0, 258.0, 227.0, 198.0]
2026-01-23 03:40:27,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 32 minutes, 5 seconds)
2026-01-23 03:44:06,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:44:10,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 982.91748 ± 309.263
2026-01-23 03:44:10,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1381.7437, 819.50555, 1234.7722, 1384.2338, 973.204, 791.0307, 747.2395, 1324.2167, 694.1482, 479.08096]
2026-01-23 03:44:10,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [385.0, 277.0, 381.0, 400.0, 304.0, 295.0, 235.0, 422.0, 219.0, 206.0]
2026-01-23 03:44:10,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 28 minutes, 13 seconds)
2026-01-23 03:47:47,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:47:53,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1420.51831 ± 464.747
2026-01-23 03:47:53,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1423.7545, 1044.8754, 2444.8467, 1753.5353, 1036.1902, 1081.4705, 1025.6681, 1297.6254, 1105.8085, 1991.4086]
2026-01-23 03:47:53,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [412.0, 304.0, 635.0, 476.0, 295.0, 308.0, 288.0, 353.0, 300.0, 518.0]
2026-01-23 03:47:53,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (1420.52) for latency DatasetOffice
2026-01-23 03:47:53,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 24 minutes, 30 seconds)
2026-01-23 03:51:34,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:51:39,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1387.62537 ± 317.102
2026-01-23 03:51:39,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2128.6873, 1084.8485, 1638.6875, 1154.4076, 1251.3667, 1174.2152, 1042.944, 1443.63, 1329.375, 1628.0918]
2026-01-23 03:51:39,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [604.0, 308.0, 423.0, 330.0, 349.0, 320.0, 302.0, 371.0, 372.0, 456.0]
2026-01-23 03:51:39,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 21 minutes, 56 seconds)
2026-01-23 03:55:26,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:55:31,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1314.51831 ± 402.857
2026-01-23 03:55:31,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1190.5242, 1541.9404, 1171.4225, 2365.2756, 1229.0328, 1548.9875, 1021.5035, 880.3928, 1119.9016, 1076.2023]
2026-01-23 03:55:31,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [363.0, 434.0, 361.0, 637.0, 378.0, 455.0, 367.0, 291.0, 348.0, 359.0]
2026-01-23 03:55:31,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 18 minutes, 26 seconds)
2026-01-23 03:59:05,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:59:10,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1346.93896 ± 518.494
2026-01-23 03:59:10,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1343.4681, 675.85754, 1251.3442, 1628.1638, 1688.7958, 169.47575, 1320.4637, 1944.8636, 1772.0029, 1674.9545]
2026-01-23 03:59:10,807 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [394.0, 225.0, 363.0, 467.0, 481.0, 171.0, 378.0, 557.0, 517.0, 471.0]
2026-01-23 03:59:10,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 14 minutes, 40 seconds)
2026-01-23 04:02:53,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:03:03,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2725.46143 ± 829.228
2026-01-23 04:03:03,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3837.2078, 1974.276, 3278.883, 2671.3213, 3796.1272, 1661.1316, 1944.4358, 1979.2483, 2313.6099, 3798.373]
2026-01-23 04:03:03,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 569.0, 844.0, 740.0, 1000.0, 471.0, 543.0, 563.0, 608.0, 1000.0]
2026-01-23 04:03:03,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2725.46) for latency DatasetOffice
2026-01-23 04:03:03,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 13 minutes, 5 seconds)
2026-01-23 04:06:40,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:06:47,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1664.37671 ± 1082.382
2026-01-23 04:06:47,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [693.8261, 2694.6436, 1393.5125, 1880.4524, 1871.2726, 2361.398, -0.34324425, 100.691574, 2063.6265, 3584.6873]
2026-01-23 04:06:47,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [231.0, 770.0, 377.0, 496.0, 479.0, 643.0, 16.0, 126.0, 627.0, 1000.0]
2026-01-23 04:06:47,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 9 minutes, 30 seconds)
2026-01-23 04:10:41,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:10:48,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1670.27441 ± 1403.591
2026-01-23 04:10:48,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2663.8967, 1809.9391, 151.68425, 27.77126, 1115.894, 589.05444, 54.536915, 3847.372, 3382.3625, 3060.2327]
2026-01-23 04:10:48,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [696.0, 503.0, 94.0, 42.0, 347.0, 230.0, 67.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:10:48,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 8 minutes, 53 seconds)
2026-01-23 04:14:24,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:14:33,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2397.54932 ± 1128.235
2026-01-23 04:14:33,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1795.5726, 1483.5978, 1302.4883, 1055.8817, 3716.694, 3845.2878, 1103.7542, 3730.5886, 2428.611, 3513.0151]
2026-01-23 04:14:33,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [507.0, 408.0, 347.0, 298.0, 1000.0, 1000.0, 328.0, 1000.0, 665.0, 1000.0]
2026-01-23 04:14:33,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 3 minutes, 30 seconds)
2026-01-23 04:18:20,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:18:33,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3254.05322 ± 1067.747
2026-01-23 04:18:33,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3885.6638, 3727.3984, 3761.4219, 3769.387, 3823.9258, 2071.4224, 3729.7598, 3803.1575, 3529.934, 438.46045]
2026-01-23 04:18:33,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 592.0, 1000.0, 1000.0, 1000.0, 156.0]
2026-01-23 04:18:33,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3254.05) for latency DatasetOffice
2026-01-23 04:18:33,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 4 minutes, 5 seconds)
2026-01-23 04:22:10,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:22:22,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3308.39722 ± 1044.802
2026-01-23 04:22:22,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3720.341, 3682.8787, 3725.004, 3830.1194, 3781.8489, 3419.956, 3486.3835, 3685.117, 195.73991, 3556.583]
2026-01-23 04:22:22,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 126.0, 1000.0]
2026-01-23 04:22:22,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3308.40) for latency DatasetOffice
2026-01-23 04:22:22,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 59 minutes, 30 seconds)
2026-01-23 04:25:44,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:25:57,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3802.46045 ± 260.829
2026-01-23 04:25:57,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3854.9976, 3654.6062, 3923.0674, 3965.9072, 3940.0042, 4046.922, 3087.7769, 3883.463, 3924.5972, 3743.267]
2026-01-23 04:25:57,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 829.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:25:57,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3802.46) for latency DatasetOffice
2026-01-23 04:25:57,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 53 minutes, 57 seconds)
2026-01-23 04:29:33,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:29:42,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2261.91357 ± 1652.748
2026-01-23 04:29:42,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3909.1511, 3875.8103, 108.3758, 3840.936, 1180.6976, 3748.8706, 181.65039, 3771.5803, 1923.0796, 78.98212]
2026-01-23 04:29:42,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 102.0, 1000.0, 384.0, 1000.0, 109.0, 1000.0, 579.0, 81.0]
2026-01-23 04:29:42,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 46 minutes, 48 seconds)
2026-01-23 04:33:31,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:33:44,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3522.35938 ± 967.672
2026-01-23 04:33:44,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [781.8535, 3653.709, 3930.5005, 2937.3267, 4109.7095, 4055.1128, 3886.5645, 3953.8599, 3981.2083, 3933.747]
2026-01-23 04:33:44,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [273.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:33:44,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 46 minutes, 21 seconds)
2026-01-23 04:37:21,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:37:34,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3743.91870 ± 613.351
2026-01-23 04:37:34,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4127.266, 3891.1545, 3966.4355, 3957.3699, 4157.8057, 3985.5034, 3835.762, 1958.9127, 3968.4368, 3590.5398]
2026-01-23 04:37:34,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 530.0, 1000.0, 894.0]
2026-01-23 04:37:34,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 40 minutes, 41 seconds)
2026-01-23 04:41:11,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:41:17,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1524.96265 ± 1777.612
2026-01-23 04:41:17,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3519.4219, 3763.9902, 3791.5195, 3719.8184, 132.09549, 4.082759, 21.647806, 14.223637, 262.64438, 20.181807]
2026-01-23 04:41:17,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 87.0, 15.0, 39.0, 26.0, 155.0, 31.0]
2026-01-23 04:41:17,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 35 minutes, 35 seconds)
2026-01-23 04:44:49,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:45:03,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3943.47974 ± 124.073
2026-01-23 04:45:03,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4049.1577, 3979.193, 3947.9558, 3990.8303, 3767.9585, 3651.9272, 4039.5984, 3994.7751, 3964.4983, 4048.9016]
2026-01-23 04:45:03,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 939.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:45:03,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3943.48) for latency DatasetOffice
2026-01-23 04:45:03,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 33 minutes, 52 seconds)
2026-01-23 04:48:41,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:48:53,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3492.31177 ± 1113.338
2026-01-23 04:48:53,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4022.8596, 3975.0613, 3945.0024, 384.32568, 3932.7476, 3952.9243, 3992.378, 3961.7705, 4124.1626, 2631.884]
2026-01-23 04:48:53,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 147.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 690.0]
2026-01-23 04:48:53,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 31 minutes, 5 seconds)
2026-01-23 04:52:19,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:52:23,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 519.93774 ± 1055.779
2026-01-23 04:52:23,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1.00167, 49.451645, 45.10439, 26.214922, 37.7361, 3486.4268, 5.358315, 1280.2217, 52.074184, 215.78809]
2026-01-23 04:52:23,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [185.0, 69.0, 82.0, 224.0, 84.0, 1000.0, 270.0, 551.0, 129.0, 250.0]
2026-01-23 04:52:23,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 21 minutes, 26 seconds)
2026-01-23 04:56:09,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:56:22,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3861.29541 ± 267.684
2026-01-23 04:56:22,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3971.0276, 3072.7317, 3922.3538, 3981.157, 3913.2856, 3997.5576, 3930.0728, 3995.439, 4000.943, 3828.3867]
2026-01-23 04:56:22,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 784.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:56:22,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 19 minutes, 18 seconds)
2026-01-23 05:00:07,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:00:17,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2917.93408 ± 1273.678
2026-01-23 05:00:17,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3660.8782, 935.35223, 3892.3071, 4071.9873, 3874.472, 4002.026, 2406.8528, 1277.3123, 3963.166, 1094.9874]
2026-01-23 05:00:17,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 289.0, 1000.0, 1000.0, 1000.0, 1000.0, 608.0, 354.0, 1000.0, 320.0]
2026-01-23 05:00:17,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 17 minutes, 37 seconds)
2026-01-23 05:03:58,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:04:03,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1337.59570 ± 1628.809
2026-01-23 05:04:03,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2343.6084, 15.356556, 31.34899, 65.321724, 7.1328163, 174.07544, 51.823463, 3899.8784, 3962.5942, 2824.8162]
2026-01-23 05:04:03,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [624.0, 28.0, 59.0, 72.0, 20.0, 102.0, 71.0, 1000.0, 1000.0, 728.0]
2026-01-23 05:04:03,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 13 minutes, 46 seconds)
2026-01-23 05:07:41,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:07:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3967.74072 ± 120.557
2026-01-23 05:07:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3976.9255, 3906.1975, 3857.2231, 4114.537, 3889.706, 3986.8804, 3719.5945, 4113.1714, 4019.6016, 4093.5723]
2026-01-23 05:07:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 959.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:07:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3967.74) for latency DatasetOffice
2026-01-23 05:07:55,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 10 minutes, 21 seconds)
2026-01-23 05:11:32,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:11:45,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3802.69775 ± 341.771
2026-01-23 05:11:45,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3758.4736, 4045.516, 3698.9326, 4087.3823, 4034.6228, 2907.116, 3565.342, 4067.7727, 3928.923, 3932.893]
2026-01-23 05:11:45,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 726.0, 912.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:11:45,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 9 minutes, 51 seconds)
2026-01-23 05:15:01,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:15:05,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1027.39771 ± 1171.854
2026-01-23 05:15:05,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1459.9626, 1302.9534, 2384.8333, 977.871, 12.938042, 373.82343, 24.984627, 17.611322, 21.195312, 3697.8037]
2026-01-23 05:15:05,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [423.0, 373.0, 620.0, 316.0, 22.0, 194.0, 47.0, 29.0, 29.0, 1000.0]
2026-01-23 05:15:05,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 59 minutes, 34 seconds)
2026-01-23 05:18:48,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:18:59,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3219.98682 ± 1193.321
2026-01-23 05:18:59,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [34.66658, 3869.7285, 2090.9663, 3873.174, 3819.245, 3882.054, 3851.5537, 3878.0405, 3793.0073, 3107.4316]
2026-01-23 05:18:59,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [35.0, 1000.0, 569.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 794.0]
2026-01-23 05:18:59,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 55 minutes, 47 seconds)
2026-01-23 05:22:43,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:22:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1658.06763 ± 1579.008
2026-01-23 05:22:50,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [703.22284, 1234.5851, 3937.1938, 626.072, 57.2452, 3872.2717, 4067.3464, 207.56165, 1702.4634, 172.71411]
2026-01-23 05:22:50,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [301.0, 397.0, 1000.0, 214.0, 65.0, 1000.0, 1000.0, 147.0, 506.0, 135.0]
2026-01-23 05:22:50,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 52 minutes, 48 seconds)
2026-01-23 05:26:18,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:26:27,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2565.23462 ± 1652.677
2026-01-23 05:26:27,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3491.0317, 2273.0474, 317.52005, 79.00864, 61.238842, 3913.2476, 3572.398, 3874.5957, 4047.1501, 4023.108]
2026-01-23 05:26:27,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 655.0, 150.0, 61.0, 126.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:26:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 46 minutes, 50 seconds)
2026-01-23 05:29:57,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:30:11,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4014.62622 ± 87.263
2026-01-23 05:30:11,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4009.7456, 4061.6235, 4124.2646, 4072.6055, 4042.3528, 4124.875, 3945.522, 3855.38, 4019.5713, 3890.3247]
2026-01-23 05:30:11,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:30:11,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (4014.63) for latency DatasetOffice
2026-01-23 05:30:11,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 42 minutes, 7 seconds)
2026-01-23 05:33:50,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:34:01,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3348.55396 ± 1088.553
2026-01-23 05:34:01,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3745.0283, 3929.0388, 3861.9282, 2075.6604, 3661.6304, 3942.3335, 3887.796, 3897.927, 3971.239, 512.9588]
2026-01-23 05:34:01,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 558.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 183.0]
2026-01-23 05:34:01,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 42 minutes, 56 seconds)
2026-01-23 05:37:31,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:37:41,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2838.10645 ± 1822.258
2026-01-23 05:37:41,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [34.74001, 49.06979, 3993.1638, 82.46052, 4087.582, 4073.7922, 4105.817, 3994.029, 3999.8003, 3960.6091]
2026-01-23 05:37:41,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [44.0, 56.0, 1000.0, 86.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:37:41,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 37 minutes, 1 second)
2026-01-23 05:41:16,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:41:27,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3262.36426 ± 1123.982
2026-01-23 05:41:27,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3938.7332, 1077.9988, 3955.1438, 1378.2461, 4000.3257, 2371.4326, 3980.0127, 4009.0732, 3941.321, 3971.357]
2026-01-23 05:41:27,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 325.0, 1000.0, 397.0, 1000.0, 648.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:41:27,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 32 minutes, 41 seconds)
2026-01-23 05:45:03,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:45:16,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3830.84448 ± 397.197
2026-01-23 05:45:16,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3816.8313, 3993.6445, 3998.125, 3945.9346, 2667.1213, 4081.49, 3815.893, 4066.8557, 3998.006, 3924.5425]
2026-01-23 05:45:16,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 690.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:45:16,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 30 minutes, 31 seconds)
2026-01-23 05:48:53,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:49:07,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3995.76440 ± 48.518
2026-01-23 05:49:07,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4000.9827, 3983.1108, 4049.072, 4021.4158, 3871.077, 4028.0234, 4041.1677, 3970.2227, 3979.689, 4012.8816]
2026-01-23 05:49:07,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:49:07,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 27 minutes, 45 seconds)
2026-01-23 05:52:57,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:53:10,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3595.63550 ± 731.134
2026-01-23 05:53:10,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4021.4502, 3967.768, 4019.734, 1785.6643, 4011.7908, 4022.0354, 4037.9185, 3355.862, 4021.7856, 2712.3472]
2026-01-23 05:53:10,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 506.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 701.0]
2026-01-23 05:53:10,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 25 minutes, 27 seconds)
2026-01-23 05:56:48,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:56:54,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1845.96021 ± 1581.161
2026-01-23 05:56:54,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4059.392, 3375.491, 4050.8625, 2796.2795, 27.219769, 2272.9875, 40.429943, 1205.6871, 592.5282, 38.724663]
2026-01-23 05:56:54,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 844.0, 1000.0, 718.0, 38.0, 590.0, 48.0, 356.0, 220.0, 60.0]
2026-01-23 05:56:54,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 22 minutes, 14 seconds)
2026-01-23 06:00:22,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:00:36,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4035.35425 ± 65.203
2026-01-23 06:00:36,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4112.433, 4058.5393, 3985.8364, 4068.3604, 3982.5046, 4098.82, 4065.1816, 4013.5798, 3886.512, 4081.7742]
2026-01-23 06:00:36,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:00:36,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (4035.35) for latency DatasetOffice
2026-01-23 06:00:36,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 17 minutes, 52 seconds)
2026-01-23 06:04:04,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:04:14,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2711.60645 ± 1283.513
2026-01-23 06:04:14,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3934.537, 4054.7263, 895.3114, 886.4842, 1502.0205, 3150.8093, 4056.349, 3236.7983, 1515.438, 3883.5906]
2026-01-23 06:04:14,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 286.0, 284.0, 415.0, 846.0, 1000.0, 843.0, 433.0, 1000.0]
2026-01-23 06:04:14,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 12 minutes, 43 seconds)
2026-01-23 06:07:45,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:07:49,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1120.80237 ± 1632.064
2026-01-23 06:07:49,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [21.655588, 58.184776, 23.669947, 20.276022, 17.15091, 225.64061, 221.62611, 4049.337, 4030.3542, 2540.1294]
2026-01-23 06:07:49,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [28.0, 69.0, 31.0, 26.0, 27.0, 117.0, 154.0, 1000.0, 1000.0, 643.0]
2026-01-23 06:07:49,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 7 minutes, 9 seconds)
2026-01-23 06:11:24,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:11:38,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4001.34644 ± 26.637
2026-01-23 06:11:38,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3933.1433, 3994.1223, 4002.7893, 4042.0369, 4009.5845, 4012.241, 4004.8606, 4009.9146, 4017.541, 3987.2305]
2026-01-23 06:11:38,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:11:38,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 1 minute, 51 seconds)
2026-01-23 06:15:16,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:15:29,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3695.60229 ± 330.617
2026-01-23 06:15:29,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3828.8916, 3784.393, 3789.405, 3823.1216, 3801.9045, 2711.0784, 3892.8796, 3722.8386, 3802.1213, 3799.392]
2026-01-23 06:15:29,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 735.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:15:29,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 58 minutes, 53 seconds)
2026-01-23 06:19:12,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:19:18,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1781.66833 ± 1847.391
2026-01-23 06:19:18,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1163.8525, 9.739367, 556.95, 28.512741, 17.547915, 18.237064, 3953.0713, 3937.3796, 4155.6035, 3975.7886]
2026-01-23 06:19:18,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [340.0, 18.0, 207.0, 51.0, 27.0, 32.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:19:18,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 55 minutes, 58 seconds)
2026-01-23 06:23:05,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:23:16,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3331.28174 ± 1525.329
2026-01-23 06:23:16,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4096.72, 4120.5054, 4079.1611, 4078.651, 4121.673, 587.0924, -0.8057039, 3976.636, 4128.273, 4124.9097]
2026-01-23 06:23:16,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 215.0, 8.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:23:16,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 54 minutes, 14 seconds)
2026-01-23 06:26:34,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:26:46,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3447.58350 ± 1296.426
2026-01-23 06:26:46,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [859.25165, 4080.2417, 4202.893, 3963.5022, 855.11505, 4129.755, 4082.6558, 4078.8928, 4126.427, 4097.1016]
2026-01-23 06:26:46,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [266.0, 1000.0, 1000.0, 1000.0, 270.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:26:46,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 49 minutes, 49 seconds)
2026-01-23 06:30:24,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:30:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3836.55542 ± 487.144
2026-01-23 06:30:38,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3913.817, 3974.2734, 2379.0598, 3977.46, 4025.7473, 4003.6982, 4042.6033, 3991.7898, 4028.288, 4028.8198]
2026-01-23 06:30:38,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 625.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:30:38,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 46 minutes, 24 seconds)
2026-01-23 06:34:14,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:34:27,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4016.50073 ± 42.562
2026-01-23 06:34:27,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4049.017, 3989.9666, 3994.5688, 4098.339, 4017.3767, 4014.7004, 4025.4492, 3922.705, 4033.3694, 4019.5166]
2026-01-23 06:34:27,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:34:27,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 42 minutes, 26 seconds)
2026-01-23 06:38:03,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:38:12,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2616.96191 ± 1679.613
2026-01-23 06:38:12,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3803.5964, 4001.8572, 2738.025, 308.5092, 33.58454, 45.268276, 4105.2256, 3915.3726, 4047.5444, 3170.6355]
2026-01-23 06:38:12,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 691.0, 176.0, 46.0, 53.0, 1000.0, 1000.0, 1000.0, 838.0]
2026-01-23 06:38:12,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 38 minutes, 16 seconds)
2026-01-23 06:41:49,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:42:03,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4135.58887 ± 50.974
2026-01-23 06:42:03,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4167.0024, 4061.6614, 4167.784, 4032.0635, 4148.8325, 4114.1577, 4148.283, 4132.629, 4173.4634, 4210.009]
2026-01-23 06:42:03,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:42:03,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (4135.59) for latency DatasetOffice
2026-01-23 06:42:03,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 33 minutes, 54 seconds)
2026-01-23 06:45:52,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:46:06,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3991.83545 ± 215.757
2026-01-23 06:46:06,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4028.1409, 4104.5957, 4042.106, 4116.5034, 4101.901, 4009.598, 4016.1294, 3354.7024, 4099.5977, 4045.0818]
2026-01-23 06:46:06,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 849.0, 1000.0, 1000.0]
2026-01-23 06:46:06,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 32 minutes, 48 seconds)
2026-01-23 06:49:44,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:49:57,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4052.50830 ± 63.248
2026-01-23 06:49:57,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4115.815, 4093.7568, 4077.4028, 3957.5999, 4081.5251, 3931.8718, 4003.7017, 4063.4597, 4066.3398, 4133.611]
2026-01-23 06:49:57,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:49:57,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 28 minutes, 54 seconds)
2026-01-23 06:53:25,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:53:39,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3488.07373 ± 1016.603
2026-01-23 06:53:39,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4011.4377, 4031.079, 3943.91, 4040.2314, 962.6857, 2074.2307, 4002.8347, 3960.8687, 3884.9573, 3968.5046]
2026-01-23 06:53:39,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 582.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:53:39,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 24 minutes, 27 seconds)
2026-01-23 06:57:20,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:57:28,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2389.31396 ± 1675.059
2026-01-23 06:57:28,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4119.261, 4215.3623, 4201.7275, 618.7223, 1199.158, 2682.6091, 4248.15, 2377.2883, 37.20291, 193.66107]
2026-01-23 06:57:28,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 206.0, 388.0, 659.0, 1000.0, 597.0, 44.0, 113.0]
2026-01-23 06:57:28,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 20 minutes, 54 seconds)
2026-01-23 07:01:05,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:01:18,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3847.26514 ± 578.840
2026-01-23 07:01:18,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4082.3948, 3964.438, 4034.4133, 4117.4277, 2140.4106, 4022.5457, 3758.736, 4096.786, 4095.9795, 4159.518]
2026-01-23 07:01:18,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 640.0, 1000.0, 913.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:01:18,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 16 minutes, 59 seconds)
2026-01-23 07:04:36,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:04:50,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4045.27490 ± 94.880
2026-01-23 07:04:50,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4091.7031, 4099.6367, 4059.2542, 3962.5088, 3787.247, 4087.8318, 4101.3203, 4072.3308, 4078.6223, 4112.2964]
2026-01-23 07:04:50,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:04:50,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 11 minutes, 11 seconds)
2026-01-23 07:08:43,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:08:49,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1530.88232 ± 1849.687
2026-01-23 07:08:49,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4098.5127, 4078.399, 4096.1924, 2719.5637, 12.559862, 2.88338, 127.51638, -0.11756811, 9.192883, 164.11967]
2026-01-23 07:08:49,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 676.0, 24.0, 14.0, 90.0, 13.0, 18.0, 123.0]
2026-01-23 07:08:49,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 7 minutes, 53 seconds)
2026-01-23 07:12:25,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:12:39,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4094.34326 ± 72.416
2026-01-23 07:12:39,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4100.794, 4183.3057, 4169.5273, 4089.4634, 4074.4917, 3928.933, 4132.9814, 4017.8206, 4155.125, 4090.9917]
2026-01-23 07:12:39,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:12:39,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 4 minutes, 35 seconds)
2026-01-23 07:16:20,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:16:32,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3616.24731 ± 1278.045
2026-01-23 07:16:32,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4207.4663, 4193.166, 4197.5195, -0.26458603, 4180.044, 4124.6743, 4153.819, 4187.4727, 4164.147, 2754.4263]
2026-01-23 07:16:32,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 14.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 669.0]
2026-01-23 07:16:32,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 59 seconds)
2026-01-23 07:20:15,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:20:26,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3204.47266 ± 1348.770
2026-01-23 07:20:26,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4182.1914, 4184.787, 4175.4746, 2396.8147, 2329.179, 2169.249, 50.122, 4225.5303, 4138.242, 4193.1357]
2026-01-23 07:20:26,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 621.0, 603.0, 646.0, 68.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:20:26,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 57 minutes, 23 seconds)
2026-01-23 07:23:55,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:24:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3366.59058 ± 1541.847
2026-01-23 07:24:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4050.899, 4086.266, 4170.4214, 4191.16, 4142.904, 4256.5474, 4191.6763, 573.61743, 19.293552, 3983.123]
2026-01-23 07:24:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 237.0, 30.0, 1000.0]
2026-01-23 07:24:06,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 53 minutes, 58 seconds)
2026-01-23 07:27:57,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:28:10,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3783.75317 ± 758.217
2026-01-23 07:28:10,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3489.9258, 4051.4749, 4165.6826, 4084.273, 1578.775, 4029.8972, 4121.0464, 4176.011, 4055.13, 4085.3154]
2026-01-23 07:28:10,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [848.0, 1000.0, 1000.0, 1000.0, 410.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:28:10,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 50 minutes, 18 seconds)
2026-01-23 07:31:55,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:32:07,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3591.35083 ± 1188.697
2026-01-23 07:32:07,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3990.9932, 3912.8828, 4007.1401, 3962.6626, 3996.2776, 4001.0312, 4056.8171, 3955.7817, 26.886757, 4003.0366]
2026-01-23 07:32:07,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 33.0, 1000.0]
2026-01-23 07:32:07,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 46 minutes, 44 seconds)
2026-01-23 07:35:39,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:35:53,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4147.40234 ± 27.817
2026-01-23 07:35:53,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4186.68, 4159.1006, 4146.8525, 4127.624, 4203.205, 4119.844, 4145.645, 4124.05, 4112.1475, 4148.874]
2026-01-23 07:35:53,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:35:53,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (4147.40) for latency DatasetOffice
2026-01-23 07:35:53,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 42 minutes, 35 seconds)
2026-01-23 07:39:29,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:39:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2366.26050 ± 1819.987
2026-01-23 07:39:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4132.287, 4109.1772, 891.3327, 4003.513, 1987.9844, -7.0855308, 270.17508, -6.6850405, 4065.0251, 4216.88]
2026-01-23 07:39:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 273.0, 1000.0, 527.0, 38.0, 134.0, 9.0, 1000.0, 1000.0]
2026-01-23 07:39:37,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 38 minutes, 23 seconds)
2026-01-23 07:43:19,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:43:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3850.14502 ± 497.681
2026-01-23 07:43:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4032.3354, 4061.8435, 4035.9822, 3342.7512, 2513.765, 4111.4277, 4072.7861, 4113.311, 4097.572, 4119.677]
2026-01-23 07:43:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 858.0, 664.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:43:33,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 34 minutes, 59 seconds)
2026-01-23 07:47:18,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:47:31,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3942.05908 ± 785.034
2026-01-23 07:47:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4165.5146, 4177.694, 4208.43, 4248.985, 1589.7473, 4143.8477, 4236.336, 4159.339, 4257.6953, 4232.9995]
2026-01-23 07:47:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 428.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:47:31,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 30 minutes, 58 seconds)
2026-01-23 07:51:02,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:51:06,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1072.55103 ± 1662.728
2026-01-23 07:51:06,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4195.1104, 4180.352, 2008.7874, 27.11189, 23.572222, 4.6451364, 35.503597, 14.0237055, 189.09357, 47.310005]
2026-01-23 07:51:06,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 523.0, 38.0, 38.0, 15.0, 43.0, 24.0, 97.0, 66.0]
2026-01-23 07:51:06,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 26 minutes, 33 seconds)
2026-01-23 07:55:01,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:55:15,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4075.12183 ± 68.949
2026-01-23 07:55:15,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4054.087, 3941.4204, 3985.1833, 4095.73, 4106.9224, 4162.6616, 4037.208, 4128.9463, 4162.8584, 4076.2043]
2026-01-23 07:55:15,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:55:15,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 23 minutes, 13 seconds)
2026-01-23 07:58:56,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:59:10,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4139.60742 ± 205.946
2026-01-23 07:59:10,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4268.856, 4220.9673, 4004.2202, 4262.467, 4278.182, 4244.443, 4231.5044, 3567.215, 4172.462, 4145.759]
2026-01-23 07:59:10,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 848.0, 1000.0, 1000.0]
2026-01-23 07:59:10,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 19 minutes, 32 seconds)
2026-01-23 08:02:36,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:02:48,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3326.55615 ± 939.906
2026-01-23 08:02:48,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2473.6326, 4179.6475, 2741.2632, 2716.6692, 1280.7905, 3431.7197, 4146.4634, 4169.279, 4041.0212, 4085.0742]
2026-01-23 08:02:48,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [633.0, 1000.0, 692.0, 686.0, 372.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:02:48,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 24 seconds)
2026-01-23 08:06:48,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:07:01,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3723.28662 ± 951.028
2026-01-23 08:07:01,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4060.792, 4026.3784, 3602.6047, 4077.6992, 908.2229, 4010.355, 4066.714, 4190.8003, 4144.483, 4144.8174]
2026-01-23 08:07:01,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 888.0, 1000.0, 292.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:07:01,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 41 seconds)
2026-01-23 08:10:30,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:10:42,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3386.85303 ± 1471.842
2026-01-23 08:10:42,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4088.43, 568.0608, 4156.2407, 4144.433, 4111.9478, 323.11035, 4090.4211, 4097.0225, 4147.656, 4141.2085]
2026-01-23 08:10:42,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 229.0, 1000.0, 1000.0, 1000.0, 159.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:10:42,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 50 seconds)
2026-01-23 08:14:22,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:14:36,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4237.35938 ± 35.888
2026-01-23 08:14:36,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4286.732, 4205.3726, 4205.018, 4255.65, 4253.147, 4208.8403, 4204.9976, 4269.202, 4192.9956, 4291.6387]
2026-01-23 08:14:36,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:14:36,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (4237.36) for latency DatasetOffice
2026-01-23 08:14:36,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 52 seconds)
2026-01-23 08:18:17,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:18:31,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4013.92432 ± 323.096
2026-01-23 08:18:31,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4124.856, 4073.8696, 4181.289, 4154.5234, 4132.8823, 4009.3005, 3055.5562, 4101.6147, 4179.127, 4126.2246]
2026-01-23 08:18:31,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 755.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:18:31,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1299 [DEBUG]: Training session finished
