2026-01-23 01:53:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mda-highdim-mem1  
2026-01-23 01:53:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mda-highdim-mem1  
2026-01-23 01:53:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x15400d3be710>}
2026-01-23 01:53:54,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-23 01:53:55,092 baseline-bpql-mda-noisy-humanoid:91 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-23 01:53:55,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-23 01:53:55,110 baseline-bpql-mda-noisy-humanoid:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-23 01:53:55,110 baseline-bpql-mda-noisy-humanoid:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:53:55,118 baseline-bpql-mda-noisy-humanoid:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(17, 512, batch_first=True)
)
2026-01-23 01:53:56,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-23 01:53:56,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-23 01:59:41,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:42,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 78.13754 ± 3.433
2026-01-23 01:59:42,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [75.90763, 75.11471, 76.33517, 82.799446, 71.16835, 79.42418, 83.15415, 79.46542, 79.23704, 78.769264]
2026-01-23 01:59:42,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [16.0, 16.0, 16.0, 18.0, 15.0, 17.0, 18.0, 17.0, 17.0, 17.0]
2026-01-23 01:59:42,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (78.14) for latency DatasetOffice
2026-01-23 01:59:42,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 9 hours, 29 minutes, 40 seconds)
2026-01-23 02:05:44,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:45,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 332.81674 ± 30.091
2026-01-23 02:05:45,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [305.54352, 344.33636, 384.95862, 344.26346, 324.66754, 316.30502, 285.4599, 356.05756, 366.49744, 300.0782]
2026-01-23 02:05:45,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [57.0, 65.0, 72.0, 64.0, 60.0, 58.0, 54.0, 66.0, 68.0, 57.0]
2026-01-23 02:05:45,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (332.82) for latency DatasetOffice
2026-01-23 02:05:45,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 9 hours, 38 minutes, 49 seconds)
2026-01-23 02:11:44,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:45,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 443.10834 ± 133.631
2026-01-23 02:11:45,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [474.77872, 535.2617, 336.08218, 336.83264, 616.2821, 725.48944, 360.5789, 318.33716, 392.13553, 335.30542]
2026-01-23 02:11:45,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [108.0, 103.0, 73.0, 77.0, 122.0, 144.0, 69.0, 70.0, 88.0, 69.0]
2026-01-23 02:11:45,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (443.11) for latency DatasetOffice
2026-01-23 02:11:45,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 9 hours, 36 minutes, 9 seconds)
2026-01-23 02:17:44,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:46,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 407.97864 ± 87.018
2026-01-23 02:17:46,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [362.441, 404.58908, 425.70975, 332.38974, 487.77325, 457.6853, 435.9045, 320.42963, 268.8807, 583.9834]
2026-01-23 02:17:46,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [71.0, 77.0, 86.0, 71.0, 100.0, 95.0, 81.0, 70.0, 58.0, 110.0]
2026-01-23 02:17:46,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 9 hours, 31 minutes, 45 seconds)
2026-01-23 02:23:43,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:44,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 291.02310 ± 37.674
2026-01-23 02:23:44,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [289.011, 286.3906, 298.30927, 391.00217, 259.7253, 264.98828, 316.82047, 256.95258, 270.15936, 276.8717]
2026-01-23 02:23:44,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [54.0, 53.0, 55.0, 72.0, 49.0, 50.0, 57.0, 49.0, 51.0, 52.0]
2026-01-23 02:23:44,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 9 hours, 26 minutes, 11 seconds)
2026-01-23 02:29:43,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:44,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 482.93573 ± 186.225
2026-01-23 02:29:44,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [417.4326, 393.04492, 794.1012, 395.1134, 437.59717, 301.19717, 437.2336, 400.39703, 357.72183, 895.51855]
2026-01-23 02:29:44,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [89.0, 81.0, 166.0, 76.0, 84.0, 56.0, 94.0, 83.0, 78.0, 176.0]
2026-01-23 02:29:44,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (482.94) for latency DatasetOffice
2026-01-23 02:29:44,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 9 hours, 24 minutes, 51 seconds)
2026-01-23 02:35:46,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:47,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 415.48340 ± 109.024
2026-01-23 02:35:47,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [437.51758, 196.27667, 528.76227, 351.91705, 465.43112, 475.42462, 330.89392, 334.9118, 435.70633, 597.9927]
2026-01-23 02:35:47,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [86.0, 39.0, 101.0, 64.0, 93.0, 92.0, 64.0, 72.0, 87.0, 117.0]
2026-01-23 02:35:47,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 9 hours, 18 minutes, 37 seconds)
2026-01-23 02:41:47,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:49,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 475.20532 ± 120.102
2026-01-23 02:41:49,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [661.20337, 478.84534, 502.70416, 454.8901, 683.4861, 355.91193, 384.08395, 269.99008, 466.63187, 494.30594]
2026-01-23 02:41:49,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [126.0, 94.0, 108.0, 95.0, 127.0, 79.0, 83.0, 56.0, 100.0, 106.0]
2026-01-23 02:41:49,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 9 hours, 13 minutes, 1 second)
2026-01-23 02:47:48,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:50,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 535.77930 ± 75.153
2026-01-23 02:47:50,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [508.23056, 501.90646, 489.31833, 613.0778, 694.5574, 519.95557, 433.3778, 468.5097, 610.93677, 517.92303]
2026-01-23 02:47:50,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [93.0, 95.0, 94.0, 114.0, 148.0, 96.0, 91.0, 91.0, 114.0, 96.0]
2026-01-23 02:47:50,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (535.78) for latency DatasetOffice
2026-01-23 02:47:50,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 9 hours, 7 minutes, 13 seconds)
2026-01-23 02:53:46,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:53:48,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 454.65479 ± 66.218
2026-01-23 02:53:48,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [434.39944, 464.1794, 593.1492, 434.41364, 325.29462, 497.44937, 500.3318, 456.47476, 400.9582, 439.89752]
2026-01-23 02:53:48,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [82.0, 87.0, 119.0, 80.0, 64.0, 95.0, 94.0, 85.0, 75.0, 82.0]
2026-01-23 02:53:48,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 9 hours, 1 minute, 2 seconds)
2026-01-23 02:59:48,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:49,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 483.20074 ± 103.190
2026-01-23 02:59:49,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [535.2804, 431.38782, 505.72635, 411.24356, 264.53305, 540.09155, 566.8576, 397.2278, 649.9093, 529.7501]
2026-01-23 02:59:49,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [102.0, 80.0, 95.0, 79.0, 50.0, 99.0, 103.0, 73.0, 137.0, 97.0]
2026-01-23 02:59:49,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 8 hours, 55 minutes, 28 seconds)
2026-01-23 03:05:45,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:46,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 472.28922 ± 120.864
2026-01-23 03:05:46,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [372.4059, 386.57025, 679.2809, 496.9119, 487.9021, 415.98154, 428.75964, 336.37323, 711.6055, 407.10086]
2026-01-23 03:05:46,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [68.0, 71.0, 134.0, 100.0, 94.0, 81.0, 78.0, 64.0, 132.0, 75.0]
2026-01-23 03:05:46,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 8 hours, 47 minutes, 42 seconds)
2026-01-23 03:11:44,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:45,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 463.17349 ± 74.080
2026-01-23 03:11:45,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [514.0298, 378.35342, 659.2488, 458.36624, 466.73618, 446.14377, 430.76724, 445.39346, 399.38992, 433.30615]
2026-01-23 03:11:45,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [95.0, 70.0, 123.0, 84.0, 88.0, 88.0, 80.0, 83.0, 76.0, 81.0]
2026-01-23 03:11:45,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 8 hours, 40 minutes, 59 seconds)
2026-01-23 03:17:46,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:17:47,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 484.68427 ± 159.562
2026-01-23 03:17:47,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [511.27258, 347.72287, 436.29767, 285.33884, 878.13983, 568.6143, 569.1108, 337.9064, 434.8702, 477.56955]
2026-01-23 03:17:47,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [95.0, 64.0, 81.0, 54.0, 167.0, 108.0, 111.0, 63.0, 83.0, 99.0]
2026-01-23 03:17:47,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 8 hours, 35 minutes, 20 seconds)
2026-01-23 03:23:44,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:45,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 468.21606 ± 72.051
2026-01-23 03:23:45,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [630.84393, 391.03726, 489.30545, 420.82825, 389.2728, 525.5337, 403.75186, 446.8575, 464.76855, 519.96124]
2026-01-23 03:23:45,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [118.0, 75.0, 88.0, 82.0, 76.0, 96.0, 76.0, 88.0, 84.0, 93.0]
2026-01-23 03:23:45,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 8 hours, 29 minutes, 19 seconds)
2026-01-23 03:29:43,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:29:44,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 563.94519 ± 192.660
2026-01-23 03:29:44,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1060.5159, 519.10077, 449.98047, 507.55188, 521.7902, 559.4736, 403.0052, 456.33603, 763.28894, 398.40875]
2026-01-23 03:29:44,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [195.0, 102.0, 84.0, 95.0, 98.0, 113.0, 76.0, 88.0, 145.0, 75.0]
2026-01-23 03:29:44,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (563.95) for latency DatasetOffice
2026-01-23 03:29:44,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 8 hours, 22 minutes, 35 seconds)
2026-01-23 03:35:43,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:35:45,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 517.39978 ± 78.107
2026-01-23 03:35:45,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [501.83224, 615.5775, 546.047, 635.2783, 502.8776, 528.6908, 405.13666, 497.41785, 373.92023, 567.22]
2026-01-23 03:35:45,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [94.0, 121.0, 99.0, 121.0, 92.0, 101.0, 88.0, 94.0, 68.0, 107.0]
2026-01-23 03:35:45,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 8 hours, 17 minutes, 36 seconds)
2026-01-23 03:41:39,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:41:42,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 657.41956 ± 143.105
2026-01-23 03:41:42,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [597.0265, 490.63852, 625.8533, 917.74426, 757.6081, 773.8787, 726.5372, 422.69916, 531.84283, 730.367]
2026-01-23 03:41:42,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [128.0, 95.0, 117.0, 178.0, 153.0, 148.0, 139.0, 79.0, 99.0, 148.0]
2026-01-23 03:41:42,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (657.42) for latency DatasetOffice
2026-01-23 03:41:42,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 8 hours, 11 minutes, 1 second)
2026-01-23 03:47:41,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:47:42,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 570.59113 ± 88.877
2026-01-23 03:47:42,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [477.23114, 625.9536, 614.977, 451.27576, 579.3002, 572.6422, 707.58124, 695.1616, 446.5265, 535.2616]
2026-01-23 03:47:42,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [103.0, 120.0, 121.0, 87.0, 111.0, 105.0, 127.0, 125.0, 79.0, 105.0]
2026-01-23 03:47:42,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 8 hours, 4 minutes, 37 seconds)
2026-01-23 03:53:39,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:53:41,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 639.20795 ± 122.508
2026-01-23 03:53:41,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [903.891, 541.721, 603.94073, 512.80585, 555.5524, 670.5673, 821.09076, 537.2898, 648.09344, 597.12683]
2026-01-23 03:53:41,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [175.0, 98.0, 114.0, 92.0, 105.0, 127.0, 152.0, 102.0, 118.0, 109.0]
2026-01-23 03:53:41,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 7 hours, 58 minutes, 46 seconds)
2026-01-23 03:59:37,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:59:39,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 511.12115 ± 203.615
2026-01-23 03:59:39,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [526.25934, 733.4956, 386.59006, 573.28265, 453.214, 421.2173, 77.2231, 445.88712, 623.96954, 870.0728]
2026-01-23 03:59:39,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [109.0, 145.0, 77.0, 110.0, 100.0, 78.0, 16.0, 99.0, 115.0, 157.0]
2026-01-23 03:59:39,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 7 hours, 52 minutes, 33 seconds)
2026-01-23 04:05:35,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:05:37,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 655.38391 ± 145.561
2026-01-23 04:05:37,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [521.76074, 976.1277, 728.0687, 383.74878, 703.07715, 723.6836, 591.5647, 626.8475, 639.91437, 659.0455]
2026-01-23 04:05:37,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [94.0, 191.0, 143.0, 70.0, 131.0, 142.0, 110.0, 121.0, 132.0, 127.0]
2026-01-23 04:05:37,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 7 hours, 46 minutes, 5 seconds)
2026-01-23 04:11:35,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:11:37,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 646.50403 ± 187.538
2026-01-23 04:11:37,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [598.616, 663.1997, 953.0636, 845.0264, 460.66565, 507.138, 547.7773, 558.87177, 933.3409, 397.34152]
2026-01-23 04:11:37,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [110.0, 123.0, 180.0, 158.0, 87.0, 101.0, 100.0, 112.0, 172.0, 74.0]
2026-01-23 04:11:37,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 7 hours, 40 minutes, 48 seconds)
2026-01-23 04:17:40,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:17:42,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 738.09833 ± 154.380
2026-01-23 04:17:42,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [745.40063, 730.84344, 606.0936, 641.65515, 652.15894, 681.2204, 661.901, 676.57056, 1168.3193, 816.8205]
2026-01-23 04:17:42,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [142.0, 139.0, 111.0, 121.0, 124.0, 131.0, 122.0, 131.0, 242.0, 157.0]
2026-01-23 04:17:42,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (738.10) for latency DatasetOffice
2026-01-23 04:17:42,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 7 hours, 35 minutes, 55 seconds)
2026-01-23 04:23:37,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:23:40,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 792.29889 ± 220.010
2026-01-23 04:23:40,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [898.2903, 668.24426, 615.0912, 670.0335, 1109.8475, 1209.5829, 634.31757, 675.83575, 929.8974, 511.8482]
2026-01-23 04:23:40,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [171.0, 125.0, 117.0, 125.0, 215.0, 228.0, 122.0, 128.0, 185.0, 96.0]
2026-01-23 04:23:40,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (792.30) for latency DatasetOffice
2026-01-23 04:23:40,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 7 hours, 29 minutes, 44 seconds)
2026-01-23 04:29:39,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:29:42,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 897.86798 ± 345.819
2026-01-23 04:29:42,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [869.94006, 720.146, 838.1536, 651.70825, 1750.4232, 655.07294, 1341.3489, 829.0583, 674.65186, 648.17676]
2026-01-23 04:29:42,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [165.0, 138.0, 166.0, 125.0, 333.0, 124.0, 255.0, 168.0, 124.0, 122.0]
2026-01-23 04:29:42,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (897.87) for latency DatasetOffice
2026-01-23 04:29:42,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 7 hours, 24 minutes, 44 seconds)
2026-01-23 04:35:40,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:35:43,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 914.55029 ± 194.812
2026-01-23 04:35:43,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1214.5718, 1186.7075, 635.29926, 636.8457, 773.9503, 1058.2943, 1016.1313, 908.7026, 896.50055, 818.49915]
2026-01-23 04:35:43,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [236.0, 222.0, 118.0, 137.0, 149.0, 206.0, 196.0, 171.0, 167.0, 154.0]
2026-01-23 04:35:43,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (914.55) for latency DatasetOffice
2026-01-23 04:35:43,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 7 hours, 19 minutes, 24 seconds)
2026-01-23 04:41:40,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:41:42,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 739.71960 ± 168.029
2026-01-23 04:41:42,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [636.68, 1190.5072, 634.5068, 609.4729, 781.8281, 743.165, 741.655, 600.5523, 827.16034, 631.66907]
2026-01-23 04:41:42,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [118.0, 226.0, 115.0, 114.0, 143.0, 138.0, 135.0, 112.0, 151.0, 113.0]
2026-01-23 04:41:42,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 7 hours, 13 minutes, 10 seconds)
2026-01-23 04:47:39,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:47:42,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 861.74591 ± 116.488
2026-01-23 04:47:42,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [963.9036, 902.4997, 844.8338, 570.27563, 809.61255, 919.9531, 823.56396, 1036.4802, 872.7627, 873.5734]
2026-01-23 04:47:42,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [181.0, 171.0, 158.0, 103.0, 161.0, 183.0, 154.0, 204.0, 168.0, 173.0]
2026-01-23 04:47:42,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 7 hours, 6 minutes, 2 seconds)
2026-01-23 04:53:38,044 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:53:41,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1092.03760 ± 407.471
2026-01-23 04:53:41,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1141.3008, 1548.5765, 679.018, 1037.0881, 2024.3099, 930.3589, 1162.4711, 699.2324, 1076.1853, 621.83496]
2026-01-23 04:53:41,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [217.0, 286.0, 126.0, 193.0, 380.0, 186.0, 237.0, 129.0, 200.0, 112.0]
2026-01-23 04:53:41,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1092.04) for latency DatasetOffice
2026-01-23 04:53:41,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 7 hours, 23 seconds)
2026-01-23 04:59:41,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:59:44,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 889.92517 ± 396.375
2026-01-23 04:59:44,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1993.0802, 539.59326, 529.08136, 917.5618, 970.42, 808.7049, 920.50415, 780.7792, 628.5511, 810.976]
2026-01-23 04:59:44,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [379.0, 104.0, 113.0, 170.0, 198.0, 164.0, 187.0, 148.0, 128.0, 160.0]
2026-01-23 04:59:44,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 6 hours, 54 minutes, 28 seconds)
2026-01-23 05:05:38,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:05:42,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1135.49561 ± 301.289
2026-01-23 05:05:42,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1451.0887, 1143.5823, 1322.1771, 1656.4652, 701.12915, 1122.7783, 1349.0148, 696.14307, 1027.4928, 885.0843]
2026-01-23 05:05:42,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [277.0, 213.0, 246.0, 310.0, 135.0, 210.0, 252.0, 147.0, 191.0, 162.0]
2026-01-23 05:05:42,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1135.50) for latency DatasetOffice
2026-01-23 05:05:42,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 6 hours, 47 minutes, 41 seconds)
2026-01-23 05:11:41,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:11:45,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1079.63330 ± 409.058
2026-01-23 05:11:45,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1231.4312, 734.1058, 1027.6414, 1576.5756, 1392.0002, 1797.2976, 802.7761, 703.19446, 1121.3694, 409.9418]
2026-01-23 05:11:45,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [227.0, 134.0, 188.0, 296.0, 262.0, 338.0, 149.0, 127.0, 207.0, 74.0]
2026-01-23 05:11:45,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 6 hours, 42 minutes, 33 seconds)
2026-01-23 05:17:32,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:17:36,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1257.23169 ± 602.635
2026-01-23 05:17:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [772.07495, 1531.5472, 2757.3853, 1633.7012, 1403.9878, 1071.6044, 1220.3729, 786.6241, 805.8328, 589.1857]
2026-01-23 05:17:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [147.0, 284.0, 515.0, 291.0, 261.0, 199.0, 224.0, 145.0, 147.0, 125.0]
2026-01-23 05:17:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1257.23) for latency DatasetOffice
2026-01-23 05:17:36,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 6 hours, 34 minutes, 40 seconds)
2026-01-23 05:23:29,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:23:34,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1349.57312 ± 423.889
2026-01-23 05:23:34,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1903.6686, 1171.7314, 1010.4607, 1223.529, 1991.0988, 1071.6338, 810.51776, 1201.7662, 1086.0331, 2025.292]
2026-01-23 05:23:34,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [364.0, 243.0, 177.0, 221.0, 392.0, 219.0, 167.0, 234.0, 209.0, 384.0]
2026-01-23 05:23:34,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1349.57) for latency DatasetOffice
2026-01-23 05:23:34,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 6 hours, 28 minutes, 24 seconds)
2026-01-23 05:29:26,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:29:30,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1269.81287 ± 302.401
2026-01-23 05:29:30,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [789.6092, 1393.3743, 752.8763, 1294.8073, 1534.9692, 1643.3501, 1544.5319, 993.99133, 1467.7981, 1282.8209]
2026-01-23 05:29:30,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [145.0, 255.0, 140.0, 245.0, 276.0, 303.0, 291.0, 188.0, 276.0, 237.0]
2026-01-23 05:29:30,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 6 hours, 21 minutes, 8 seconds)
2026-01-23 05:35:21,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:35:25,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1450.23206 ± 752.074
2026-01-23 05:35:25,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [758.58966, 2124.5002, 914.8358, 2689.7402, 2684.7217, 1386.2561, 573.0549, 1047.0339, 1531.6981, 791.8892]
2026-01-23 05:35:25,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [142.0, 406.0, 166.0, 507.0, 524.0, 258.0, 124.0, 207.0, 302.0, 151.0]
2026-01-23 05:35:25,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1450.23) for latency DatasetOffice
2026-01-23 05:35:25,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 6 hours, 14 minutes, 33 seconds)
2026-01-23 05:41:17,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:41:22,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1474.22778 ± 702.976
2026-01-23 05:41:22,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [769.5645, 2180.0107, 1090.3125, 2887.4976, 1472.4933, 2101.014, 725.91077, 689.8569, 1749.5001, 1076.118]
2026-01-23 05:41:22,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [166.0, 426.0, 222.0, 557.0, 298.0, 425.0, 151.0, 149.0, 346.0, 200.0]
2026-01-23 05:41:22,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1474.23) for latency DatasetOffice
2026-01-23 05:41:22,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 6 hours, 7 minutes, 25 seconds)
2026-01-23 05:47:22,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:47:27,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1220.82642 ± 472.555
2026-01-23 05:47:27,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1554.3406, 691.9454, 782.3063, 947.1881, 697.3694, 1976.6982, 1057.6045, 1822.7854, 1734.4077, 943.61835]
2026-01-23 05:47:27,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [301.0, 137.0, 154.0, 182.0, 130.0, 372.0, 208.0, 362.0, 346.0, 197.0]
2026-01-23 05:47:27,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 6 hours, 4 minutes, 8 seconds)
2026-01-23 05:53:15,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:53:24,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2294.28052 ± 1062.125
2026-01-23 05:53:24,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1147.3074, 707.52814, 4118.41, 2490.011, 1696.5869, 1100.0494, 3282.4353, 3390.3525, 2631.3462, 2378.7783]
2026-01-23 05:53:24,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [216.0, 131.0, 812.0, 474.0, 330.0, 204.0, 632.0, 685.0, 487.0, 453.0]
2026-01-23 05:53:24,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (2294.28) for latency DatasetOffice
2026-01-23 05:53:24,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 5 hours, 57 minutes, 55 seconds)
2026-01-23 05:59:16,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:59:19,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1063.91089 ± 368.137
2026-01-23 05:59:19,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [831.33746, 1299.4857, 1511.2395, 1736.8014, 624.24097, 1131.951, 1152.8856, 1036.9642, 823.2432, 490.95953]
2026-01-23 05:59:19,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [151.0, 245.0, 284.0, 323.0, 113.0, 207.0, 224.0, 192.0, 154.0, 92.0]
2026-01-23 05:59:19,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 5 hours, 51 minutes, 46 seconds)
2026-01-23 06:05:14,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:05:18,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1325.88879 ± 590.261
2026-01-23 06:05:18,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [601.8842, 1266.9479, 1188.0685, 739.5993, 2719.5266, 1661.8887, 859.37506, 970.857, 1713.4453, 1537.2958]
2026-01-23 06:05:18,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [126.0, 234.0, 237.0, 159.0, 530.0, 316.0, 169.0, 182.0, 329.0, 291.0]
2026-01-23 06:05:18,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 5 hours, 46 minutes, 35 seconds)
2026-01-23 06:11:20,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:11:28,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2342.46924 ± 1124.039
2026-01-23 06:11:28,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2242.7903, 2217.7502, 1359.7825, 1767.5779, 5226.4165, 2003.7455, 2030.8032, 3508.0505, 1291.2275, 1776.5482]
2026-01-23 06:11:28,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [423.0, 412.0, 256.0, 350.0, 1000.0, 376.0, 394.0, 680.0, 256.0, 322.0]
2026-01-23 06:11:28,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (2342.47) for latency DatasetOffice
2026-01-23 06:11:28,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 5 hours, 43 minutes, 4 seconds)
2026-01-23 06:17:05,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:17:13,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2200.10986 ± 1093.403
2026-01-23 06:17:13,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1667.926, 2227.1094, 507.92963, 868.14557, 1706.2379, 3536.3389, 3204.2708, 3952.1252, 1485.7896, 2845.2256]
2026-01-23 06:17:13,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [340.0, 448.0, 105.0, 179.0, 329.0, 680.0, 635.0, 773.0, 297.0, 573.0]
2026-01-23 06:17:13,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 5 hours, 33 minutes, 24 seconds)
2026-01-23 06:23:21,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:23:28,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2045.06580 ± 1336.613
2026-01-23 06:23:28,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3205.573, 667.7015, 710.8404, 2263.0862, 782.1116, 2718.5664, 713.88336, 4477.2495, 3622.0312, 1289.6147]
2026-01-23 06:23:28,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [606.0, 146.0, 139.0, 415.0, 169.0, 538.0, 145.0, 883.0, 703.0, 243.0]
2026-01-23 06:23:28,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 5 hours, 30 minutes, 44 seconds)
2026-01-23 06:29:15,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:29:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 903.00360 ± 1036.793
2026-01-23 06:29:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2066.3582, 3353.1047, 1633.9883, 527.7134, 180.0141, 336.5661, 474.48444, 157.58864, 124.67202, 175.54619]
2026-01-23 06:29:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [380.0, 643.0, 308.0, 101.0, 35.0, 62.0, 85.0, 33.0, 24.0, 34.0]
2026-01-23 06:29:18,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 5 hours, 23 minutes, 46 seconds)
2026-01-23 06:35:09,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:35:20,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3092.54102 ± 1243.631
2026-01-23 06:35:20,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2581.3186, 3173.4321, 1459.2448, 2243.8035, 2369.145, 5359.3145, 1895.8485, 3152.7693, 3458.033, 5232.502]
2026-01-23 06:35:20,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [473.0, 592.0, 272.0, 428.0, 441.0, 1000.0, 357.0, 588.0, 667.0, 977.0]
2026-01-23 06:35:20,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (3092.54) for latency DatasetOffice
2026-01-23 06:35:20,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 5 hours, 18 minutes, 19 seconds)
2026-01-23 06:41:11,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:41:19,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2335.77441 ± 1203.494
2026-01-23 06:41:19,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1479.9454, 1695.3273, 637.2376, 2596.7817, 2287.309, 5333.101, 1688.4323, 2478.1294, 3229.8032, 1931.6776]
2026-01-23 06:41:19,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [282.0, 327.0, 123.0, 481.0, 445.0, 1000.0, 324.0, 479.0, 611.0, 367.0]
2026-01-23 06:41:19,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 5 hours, 10 minutes, 25 seconds)
2026-01-23 06:47:33,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:47:42,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2490.41089 ± 1175.575
2026-01-23 06:47:42,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2821.6382, 3259.502, 3861.9597, 1070.4896, 2032.1166, 1578.4686, 1785.8599, 667.5436, 4307.7783, 3518.7544]
2026-01-23 06:47:42,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [557.0, 647.0, 761.0, 229.0, 394.0, 309.0, 352.0, 134.0, 855.0, 697.0]
2026-01-23 06:47:43,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 5 hours, 10 minutes, 59 seconds)
2026-01-23 06:53:40,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:53:53,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3911.37744 ± 1165.672
2026-01-23 06:53:53,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5227.4014, 3812.9316, 5263.7363, 2200.4873, 3648.3716, 5221.675, 1622.033, 3942.2236, 4078.0361, 4096.878]
2026-01-23 06:53:53,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 725.0, 1000.0, 417.0, 690.0, 1000.0, 311.0, 776.0, 780.0, 779.0]
2026-01-23 06:53:53,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (3911.38) for latency DatasetOffice
2026-01-23 06:53:53,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 5 hours, 4 minutes, 11 seconds)
2026-01-23 06:59:33,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:59:46,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3786.26562 ± 1523.083
2026-01-23 06:59:46,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3809.0505, 4796.7925, 2841.6633, 5208.818, 5155.6416, 5138.982, 5111.3555, 582.8831, 1990.468, 3227.0022]
2026-01-23 06:59:46,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [755.0, 936.0, 561.0, 1000.0, 1000.0, 1000.0, 1000.0, 112.0, 397.0, 623.0]
2026-01-23 06:59:46,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 4 hours, 58 minutes, 37 seconds)
2026-01-23 07:05:42,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:05:58,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4462.29443 ± 1470.643
2026-01-23 07:05:58,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5060.3384, 5309.5513, 1133.6046, 5193.026, 5217.7407, 5152.309, 5172.251, 5257.594, 5166.1353, 1960.3958]
2026-01-23 07:05:58,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [976.0, 1000.0, 216.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 364.0]
2026-01-23 07:05:58,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (4462.29) for latency DatasetOffice
2026-01-23 07:05:58,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 4 hours, 54 minutes)
2026-01-23 07:11:52,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:12:05,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3235.97314 ± 1653.950
2026-01-23 07:12:05,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2962.063, 5064.9995, 4511.0547, 1572.1554, 5132.5464, 427.55713, 2415.5671, 4254.375, 4776.6553, 1242.7574]
2026-01-23 07:12:05,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [599.0, 1000.0, 904.0, 314.0, 1000.0, 89.0, 476.0, 841.0, 961.0, 242.0]
2026-01-23 07:12:05,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 4 hours, 49 minutes, 7 seconds)
2026-01-23 07:18:09,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:18:23,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3591.37964 ± 1623.608
2026-01-23 07:18:23,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5051.8667, 3564.3389, 4085.43, 1046.9475, 478.27078, 3614.7722, 5177.4897, 5099.375, 5089.218, 2706.0889]
2026-01-23 07:18:23,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 717.0, 802.0, 222.0, 100.0, 697.0, 1000.0, 1000.0, 1000.0, 534.0]
2026-01-23 07:18:23,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 4 hours, 42 minutes, 11 seconds)
2026-01-23 07:24:40,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:24:49,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2180.15405 ± 2061.190
2026-01-23 07:24:49,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5066.55, 4252.816, 4091.1829, 873.4154, 1643.7765, 5086.3916, 72.89287, 180.39613, 316.46637, 217.6535]
2026-01-23 07:24:49,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 841.0, 801.0, 180.0, 334.0, 1000.0, 15.0, 35.0, 62.0, 44.0]
2026-01-23 07:24:49,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 4 hours, 38 minutes, 27 seconds)
2026-01-23 07:30:56,129 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:31:14,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4915.45557 ± 663.754
2026-01-23 07:31:14,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [4831.847, 5166.644, 5197.464, 5168.313, 5113.4243, 5176.6143, 2948.6304, 5174.744, 5194.8003, 5182.071]
2026-01-23 07:31:14,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [939.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 572.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:31:14,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (4915.46) for latency DatasetOffice
2026-01-23 07:31:14,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 4 hours, 36 minutes, 49 seconds)
2026-01-23 07:37:29,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:37:44,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4355.58105 ± 1556.215
2026-01-23 07:37:44,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5201.2344, 5304.555, 2111.8694, 5146.2837, 5197.909, 5182.7725, 4496.3403, 584.7821, 5164.738, 5165.327]
2026-01-23 07:37:44,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 402.0, 1000.0, 1000.0, 1000.0, 863.0, 108.0, 1000.0, 1000.0]
2026-01-23 07:37:44,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 4 hours, 33 minutes, 18 seconds)
2026-01-23 07:43:23,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:43:34,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2957.86035 ± 2231.692
2026-01-23 07:43:34,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1340.8152, 5140.3164, 5131.4707, 5112.6562, 5126.636, 5113.784, 2011.5812, 186.46922, 179.74458, 235.13107]
2026-01-23 07:43:34,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [263.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 387.0, 36.0, 39.0, 48.0]
2026-01-23 07:43:34,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 4 hours, 24 minutes, 32 seconds)
2026-01-23 07:50:04,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:50:18,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3841.42432 ± 1800.493
2026-01-23 07:50:18,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5269.5596, 5254.5166, 1598.6968, 3021.62, 5223.989, 1488.0969, 5240.7705, 5250.31, 5267.25, 799.43396]
2026-01-23 07:50:18,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 302.0, 577.0, 1000.0, 281.0, 1000.0, 1000.0, 1000.0, 150.0]
2026-01-23 07:50:18,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 4 hours, 21 minutes, 41 seconds)
2026-01-23 07:55:59,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:56:13,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3859.38940 ± 1769.386
2026-01-23 07:56:13,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5033.357, 5002.155, 5036.1523, 1799.9897, 5103.51, 1021.9387, 4802.4443, 734.1903, 5053.0044, 5007.1514]
2026-01-23 07:56:13,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 361.0, 1000.0, 202.0, 952.0, 152.0, 1000.0, 1000.0]
2026-01-23 07:56:13,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 4 hours, 11 minutes, 11 seconds)
2026-01-23 08:02:08,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:02:16,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2263.15747 ± 2066.733
2026-01-23 08:02:16,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5261.7646, 2877.7473, 239.15308, 382.43576, 650.93384, 540.71356, 1211.8478, 5209.2, 1023.98065, 5233.798]
2026-01-23 08:02:16,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 549.0, 44.0, 71.0, 141.0, 108.0, 228.0, 1000.0, 186.0, 1000.0]
2026-01-23 08:02:16,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 4 hours, 2 minutes, 5 seconds)
2026-01-23 08:08:26,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:08:42,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4419.84277 ± 1307.433
2026-01-23 08:08:42,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5152.1523, 4383.3145, 5170.688, 742.25757, 3886.5925, 5150.639, 5156.7446, 4248.7153, 5159.7383, 5147.5913]
2026-01-23 08:08:42,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 865.0, 1000.0, 140.0, 761.0, 1000.0, 1000.0, 831.0, 1000.0, 1000.0]
2026-01-23 08:08:42,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 3 hours, 55 minutes, 20 seconds)
2026-01-23 08:15:02,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:15:17,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4370.85742 ± 1218.305
2026-01-23 08:15:17,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5241.125, 5161.2163, 5188.0825, 5131.7036, 2134.2197, 5134.4277, 5083.371, 2458.5093, 3014.767, 5161.1514]
2026-01-23 08:15:17,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 406.0, 1000.0, 1000.0, 492.0, 586.0, 1000.0]
2026-01-23 08:15:17,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 3 hours, 54 minutes, 43 seconds)
2026-01-23 08:21:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:21:07,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1942.02600 ± 2176.069
2026-01-23 08:21:07,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5098.453, 5072.062, 5121.3955, 2572.4644, 237.8988, 253.75159, 175.70328, 535.76636, 176.33339, 176.43259]
2026-01-23 08:21:07,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 503.0, 48.0, 54.0, 35.0, 97.0, 34.0, 34.0]
2026-01-23 08:21:07,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 3 hours, 41 minutes, 58 seconds)
2026-01-23 08:27:11,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:27:30,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5208.71240 ± 26.584
2026-01-23 08:27:30,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5187.4004, 5189.7764, 5202.0205, 5220.763, 5154.6216, 5261.4766, 5214.6265, 5221.7876, 5217.1196, 5217.53]
2026-01-23 08:27:30,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:27:30,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (5208.71) for latency DatasetOffice
2026-01-23 08:27:30,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 3 hours, 38 minutes, 54 seconds)
2026-01-23 08:33:29,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:33:46,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4362.19189 ± 1455.576
2026-01-23 08:33:46,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5210.2695, 3348.3276, 5192.783, 678.7916, 5220.1255, 5213.997, 5236.9697, 5205.6074, 5196.8867, 3118.1597]
2026-01-23 08:33:46,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 654.0, 1000.0, 142.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 594.0]
2026-01-23 08:33:46,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 3 hours, 34 minutes, 13 seconds)
2026-01-23 08:39:51,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:40:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2444.96582 ± 1925.339
2026-01-23 08:40:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5234.772, 4425.649, 5175.5073, 1347.831, 2207.3792, 231.08746, 3792.5442, 1240.0902, 181.56839, 613.2297]
2026-01-23 08:40:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 840.0, 1000.0, 262.0, 427.0, 46.0, 744.0, 239.0, 35.0, 114.0]
2026-01-23 08:40:01,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 3 hours, 26 minutes, 37 seconds)
2026-01-23 08:45:48,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:46:04,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4673.44043 ± 1248.372
2026-01-23 08:46:04,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5224.558, 5268.956, 5166.1694, 5209.9565, 5207.6763, 5241.647, 4752.198, 4431.1826, 5222.1045, 1009.95416]
2026-01-23 08:46:04,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 915.0, 846.0, 1000.0, 195.0]
2026-01-23 08:46:04,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 3 hours, 17 minutes)
2026-01-23 08:52:13,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:52:25,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3499.50342 ± 1782.380
2026-01-23 08:52:25,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2619.4011, 5215.924, 5222.3022, 3764.4492, 467.35236, 5234.0737, 5237.785, 1108.0293, 4420.2656, 1705.4501]
2026-01-23 08:52:25,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [495.0, 1000.0, 1000.0, 712.0, 86.0, 1000.0, 1000.0, 217.0, 829.0, 322.0]
2026-01-23 08:52:25,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 3 hours, 14 minutes)
2026-01-23 08:58:39,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:58:44,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1576.53918 ± 1867.856
2026-01-23 08:58:44,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2506.9224, 382.05295, 330.8197, 276.60312, 207.82434, 211.3961, 670.6846, 5116.228, 5035.726, 1027.1334]
2026-01-23 08:58:44,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [490.0, 78.0, 66.0, 55.0, 40.0, 41.0, 144.0, 1000.0, 1000.0, 204.0]
2026-01-23 08:58:44,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 3 hours, 7 minutes, 28 seconds)
2026-01-23 09:04:37,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:04:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4212.68652 ± 1538.469
2026-01-23 09:04:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5202.532, 1916.9143, 5214.3135, 1339.5709, 5254.291, 5177.7324, 5181.614, 5219.8, 2419.5718, 5200.532]
2026-01-23 09:04:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 374.0, 1000.0, 255.0, 1000.0, 1000.0, 1000.0, 1000.0, 465.0, 1000.0]
2026-01-23 09:04:52,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 3 hours, 23 seconds)
2026-01-23 09:10:57,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:11:12,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4219.56348 ± 1632.578
2026-01-23 09:11:12,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5173.207, 1686.3481, 5166.8994, 5176.6963, 5156.4004, 5162.5117, 5193.2485, 5202.6963, 3746.9885, 530.63666]
2026-01-23 09:11:12,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 343.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 720.0, 108.0]
2026-01-23 09:11:12,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 54 minutes, 40 seconds)
2026-01-23 09:17:06,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:17:16,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2643.60547 ± 1959.175
2026-01-23 09:17:16,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5178.887, 5230.834, 749.8453, 5227.055, 3342.8638, 412.38196, 3195.7458, 1866.5314, 1064.8217, 167.08844]
2026-01-23 09:17:16,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 149.0, 1000.0, 640.0, 89.0, 621.0, 359.0, 202.0, 32.0]
2026-01-23 09:17:16,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 48 minutes, 25 seconds)
2026-01-23 09:23:02,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:23:19,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4546.95947 ± 1211.438
2026-01-23 09:23:19,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [4576.941, 5310.084, 5287.955, 5324.922, 5338.007, 5278.9663, 4266.837, 3263.1492, 1478.6376, 5344.0986]
2026-01-23 09:23:19,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [893.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 802.0, 620.0, 288.0, 1000.0]
2026-01-23 09:23:19,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 2 hours, 40 minutes, 42 seconds)
2026-01-23 09:29:22,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:29:38,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4885.76123 ± 1022.777
2026-01-23 09:29:38,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [4907.5234, 5339.622, 5363.109, 4666.295, 5295.699, 5344.2393, 5306.3394, 1893.6078, 5369.5146, 5371.6616]
2026-01-23 09:29:38,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [917.0, 1000.0, 1000.0, 890.0, 1000.0, 1000.0, 1000.0, 355.0, 1000.0, 1000.0]
2026-01-23 09:29:38,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 2 hours, 34 minutes, 30 seconds)
2026-01-23 09:35:45,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:35:51,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1541.53186 ± 971.768
2026-01-23 09:35:51,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1083.9463, 982.2169, 1754.192, 4330.7817, 1085.3932, 1758.089, 1277.6727, 1028.2152, 917.5122, 1197.2983]
2026-01-23 09:35:51,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [217.0, 201.0, 368.0, 889.0, 225.0, 374.0, 263.0, 209.0, 193.0, 250.0]
2026-01-23 09:35:51,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 2 hours, 28 minutes, 41 seconds)
2026-01-23 09:41:42,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:42:02,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4915.40527 ± 576.924
2026-01-23 09:42:02,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3185.2998, 5101.1626, 5104.8286, 5109.673, 5131.55, 5067.6235, 5121.444, 5107.2354, 5119.75, 5105.4844]
2026-01-23 09:42:02,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [617.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:42:02,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 2 hours, 21 minutes, 46 seconds)
2026-01-23 09:48:11,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:48:28,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4869.72559 ± 743.847
2026-01-23 09:48:28,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5182.6025, 5242.77, 5258.644, 5242.8887, 3177.558, 3613.5413, 5240.3213, 5269.4688, 5250.4653, 5218.993]
2026-01-23 09:48:28,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 602.0, 687.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:48:28,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 2 hours, 17 minutes, 17 seconds)
2026-01-23 09:54:16,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:54:23,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1935.56738 ± 2225.538
2026-01-23 09:54:23,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [415.987, 1436.8208, 646.33704, 443.59085, 214.00008, 155.20018, 164.98917, 5307.8325, 5290.4443, 5280.4717]
2026-01-23 09:54:23,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [80.0, 276.0, 123.0, 89.0, 42.0, 30.0, 32.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:54:23,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 2 hours, 10 minutes, 27 seconds)
2026-01-23 10:00:36,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:00:51,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4458.07666 ± 1632.028
2026-01-23 10:00:51,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1541.5756, 5280.912, 5253.687, 5256.291, 5259.2734, 5249.772, 5287.673, 874.17126, 5308.4688, 5268.9424]
2026-01-23 10:00:51,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [300.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 171.0, 1000.0, 1000.0]
2026-01-23 10:00:51,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 2 hours, 4 minutes, 50 seconds)
2026-01-23 10:06:57,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:07:12,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3954.37378 ± 1540.610
2026-01-23 10:07:12,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1030.0463, 5180.715, 5228.169, 4202.175, 2407.9275, 5233.2505, 4073.934, 5205.9956, 1764.8594, 5216.6675]
2026-01-23 10:07:12,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [204.0, 1000.0, 1000.0, 815.0, 461.0, 1000.0, 789.0, 1000.0, 336.0, 1000.0]
2026-01-23 10:07:12,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 59 minutes, 7 seconds)
2026-01-23 10:13:15,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:13:33,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4675.85059 ± 1344.024
2026-01-23 10:13:33,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5194.0693, 5168.0493, 5190.2114, 5142.8467, 5123.0596, 5165.958, 4812.6387, 5171.685, 5133.6177, 656.368]
2026-01-23 10:13:33,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 936.0, 1000.0, 1000.0, 123.0]
2026-01-23 10:13:33,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 53 minutes, 30 seconds)
2026-01-23 10:19:16,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:19:34,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5368.49561 ± 24.131
2026-01-23 10:19:34,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5355.293, 5379.3423, 5374.483, 5353.8145, 5380.9307, 5394.7373, 5404.1934, 5356.8857, 5371.712, 5313.568]
2026-01-23 10:19:34,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:19:34,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (5368.50) for latency DatasetOffice
2026-01-23 10:19:34,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 45 minutes, 46 seconds)
2026-01-23 10:25:53,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:26:00,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2084.75781 ± 1726.654
2026-01-23 10:26:00,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5086.297, 2336.3562, 548.14624, 114.60563, 2393.5122, 2533.6543, 665.5787, 5055.454, 1847.412, 266.56067]
2026-01-23 10:26:00,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 454.0, 117.0, 23.0, 483.0, 504.0, 140.0, 1000.0, 367.0, 55.0]
2026-01-23 10:26:00,979 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 41 minutes, 12 seconds)
2026-01-23 10:31:31,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:31:49,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5071.37842 ± 505.802
2026-01-23 10:31:49,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5319.742, 3995.4658, 5339.9673, 5305.8486, 5332.9346, 5279.8657, 5314.0776, 4129.1846, 5352.056, 5344.646]
2026-01-23 10:31:49,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 744.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 778.0, 1000.0, 1000.0]
2026-01-23 10:31:49,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 32 minutes, 53 seconds)
2026-01-23 10:37:55,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:38:12,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5074.72510 ± 842.056
2026-01-23 10:38:12,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5347.374, 5323.988, 5358.7104, 5339.693, 5358.5, 5364.2964, 5372.712, 2548.9497, 5355.163, 5377.8574]
2026-01-23 10:38:12,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 482.0, 1000.0, 1000.0]
2026-01-23 10:38:12,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 26 minutes, 48 seconds)
2026-01-23 10:44:30,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:44:40,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2471.34619 ± 2324.144
2026-01-23 10:44:40,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5118.578, 5112.2363, 5135.1064, 5139.206, 3118.3289, 215.72545, 258.55457, 297.0373, 162.90599, 155.78337]
2026-01-23 10:44:40,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 600.0, 39.0, 51.0, 57.0, 29.0, 30.0]
2026-01-23 10:44:40,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 20 minutes, 52 seconds)
2026-01-23 10:50:28,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:50:43,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4261.05322 ± 1479.108
2026-01-23 10:50:43,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5190.8354, 5154.298, 907.05347, 2570.1812, 2888.514, 5154.9966, 5191.774, 5175.8696, 5202.472, 5174.537]
2026-01-23 10:50:43,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 175.0, 505.0, 570.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:50:43,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 1 hour, 14 minutes, 45 seconds)
2026-01-23 10:56:59,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:57:15,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4483.41699 ± 1592.148
2026-01-23 10:57:15,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5244.969, 5219.3643, 5224.943, 5218.628, 2571.9458, 345.8555, 5252.933, 5241.912, 5240.955, 5272.662]
2026-01-23 10:57:15,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 495.0, 62.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:57:15,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 1 hour, 8 minutes, 43 seconds)
2026-01-23 11:02:54,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:03:04,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2897.67993 ± 2476.529
2026-01-23 11:03:04,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [192.23547, 617.97797, 265.86346, 397.71686, 648.92303, 5365.44, 5370.7305, 5360.77, 5383.3237, 5373.819]
2026-01-23 11:03:04,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [37.0, 133.0, 52.0, 76.0, 115.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:03:04,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 2 minutes, 31 seconds)
2026-01-23 11:09:13,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:09:30,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4827.59229 ± 1106.341
2026-01-23 11:09:30,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1834.6516, 3736.227, 5318.913, 5352.099, 5370.5464, 5351.427, 5350.1743, 5355.415, 5328.469, 5277.999]
2026-01-23 11:09:30,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [357.0, 694.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:09:30,260 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 56 minutes, 19 seconds)
2026-01-23 11:15:40,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:15:58,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4551.82666 ± 1423.513
2026-01-23 11:15:58,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5299.445, 1993.0789, 5313.2324, 5278.9297, 5047.286, 5281.878, 5273.0312, 5299.7373, 1445.1222, 5286.5293]
2026-01-23 11:15:58,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 377.0, 1000.0, 1000.0, 965.0, 1000.0, 1000.0, 1000.0, 278.0, 1000.0]
2026-01-23 11:15:58,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 50 minutes, 5 seconds)
2026-01-23 11:21:36,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:21:41,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1507.51709 ± 2017.371
2026-01-23 11:21:41,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2574.1235, 160.95662, 235.73639, 386.99863, 280.3396, 155.76715, 140.44664, 543.8797, 5318.7373, 5278.1855]
2026-01-23 11:21:41,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [482.0, 31.0, 49.0, 74.0, 56.0, 30.0, 27.0, 101.0, 1000.0, 1000.0]
2026-01-23 11:21:41,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 43 minutes, 20 seconds)
2026-01-23 11:27:50,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:28:08,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5299.44629 ± 67.762
2026-01-23 11:28:08,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5207.23, 5244.5015, 5276.735, 5335.9775, 5275.035, 5370.206, 5313.367, 5451.548, 5244.296, 5275.565]
2026-01-23 11:28:08,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:28:08,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 37 minutes, 3 seconds)
2026-01-23 11:34:29,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:34:44,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3932.55981 ± 1805.370
2026-01-23 11:34:44,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [130.13228, 5206.733, 5211.5825, 5205.4824, 5170.265, 5198.2734, 3716.1345, 3181.8538, 1116.1183, 5189.023]
2026-01-23 11:34:44,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [27.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 731.0, 612.0, 220.0, 1000.0]
2026-01-23 11:34:44,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 31 minutes, 39 seconds)
2026-01-23 11:40:18,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:40:25,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2010.93823 ± 1932.034
2026-01-23 11:40:25,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [4376.949, 5241.301, 3917.2473, 3196.2583, 180.00586, 146.06001, 2295.975, 451.80484, 173.79509, 129.98558]
2026-01-23 11:40:25,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [826.0, 1000.0, 754.0, 600.0, 35.0, 28.0, 455.0, 85.0, 34.0, 25.0]
2026-01-23 11:40:25,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 24 minutes, 44 seconds)
2026-01-23 11:46:57,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:47:14,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4675.44043 ± 1285.835
2026-01-23 11:47:14,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5301.5054, 5313.142, 5279.027, 5226.6636, 5276.989, 5302.5723, 5270.562, 1343.7643, 3114.907, 5325.2695]
2026-01-23 11:47:14,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 261.0, 587.0, 1000.0]
2026-01-23 11:47:14,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 18 minutes, 45 seconds)
2026-01-23 11:52:58,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:53:09,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2965.30151 ± 1926.525
2026-01-23 11:53:09,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2164.482, 1573.3743, 5222.934, 5291.299, 1301.2529, 1610.7656, 411.38983, 1505.4568, 5290.7944, 5281.268]
2026-01-23 11:53:09,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [406.0, 327.0, 1000.0, 1000.0, 271.0, 311.0, 75.0, 293.0, 1000.0, 1000.0]
2026-01-23 11:53:09,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 12 minutes, 35 seconds)
2026-01-23 11:59:19,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:59:24,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1571.34717 ± 2033.589
2026-01-23 11:59:24,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5250.566, 3167.573, 192.41248, 181.58359, 146.07812, 377.41077, 390.19977, 196.94182, 545.55994, 5265.1465]
2026-01-23 11:59:24,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 598.0, 41.0, 35.0, 28.0, 72.0, 86.0, 38.0, 106.0, 1000.0]
2026-01-23 11:59:24,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 6 minutes, 15 seconds)
2026-01-23 12:05:43,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:05:58,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4257.92480 ± 1662.405
2026-01-23 12:05:58,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2882.8394, 5290.432, 2229.8337, 5273.292, 481.71637, 5244.0474, 5231.827, 5315.803, 5288.338, 5341.1157]
2026-01-23 12:05:58,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [537.0, 1000.0, 441.0, 1000.0, 98.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 12:05:58,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1299 [DEBUG]: Training session finished
