2026-01-23 01:54:25,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mda-highdim-mem5 
2026-01-23 01:54:25,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mda-highdim-mem5 
2026-01-23 01:54:25,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x154c2d1a9690>}
2026-01-23 01:54:25,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-23 01:54:25,186 baseline-bpql-mda-noisy-humanoid:91 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-23 01:54:25,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-23 01:54:25,204 baseline-bpql-mda-noisy-humanoid:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-23 01:54:25,204 baseline-bpql-mda-noisy-humanoid:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:54:25,213 baseline-bpql-mda-noisy-humanoid:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(17, 512, batch_first=True)
)
2026-01-23 01:54:26,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-23 01:54:28,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-23 02:00:12,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:13,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 258.54642 ± 11.506
2026-01-23 02:00:13,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [254.5499, 251.25842, 260.4431, 249.07799, 247.89981, 273.84308, 245.51009, 280.02734, 270.54272, 252.31163]
2026-01-23 02:00:13,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [48.0, 47.0, 49.0, 47.0, 47.0, 51.0, 46.0, 53.0, 51.0, 47.0]
2026-01-23 02:00:13,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (258.55) for latency DatasetOffice
2026-01-23 02:00:13,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 9 hours, 28 minutes, 45 seconds)
2026-01-23 02:06:14,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:15,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 330.69922 ± 85.778
2026-01-23 02:06:15,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [290.3701, 398.32098, 278.83508, 322.07278, 332.27588, 278.19077, 286.70526, 563.7065, 286.2501, 270.265]
2026-01-23 02:06:15,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [59.0, 75.0, 58.0, 66.0, 68.0, 56.0, 59.0, 111.0, 57.0, 55.0]
2026-01-23 02:06:15,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (330.70) for latency DatasetOffice
2026-01-23 02:06:15,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 9 hours, 37 minutes, 27 seconds)
2026-01-23 02:12:16,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 378.27167 ± 30.085
2026-01-23 02:12:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [407.2246, 382.50085, 331.7606, 426.07343, 377.61786, 347.47247, 340.62967, 405.29556, 399.7032, 364.4382]
2026-01-23 02:12:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [76.0, 71.0, 62.0, 80.0, 69.0, 64.0, 63.0, 75.0, 73.0, 68.0]
2026-01-23 02:12:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (378.27) for latency DatasetOffice
2026-01-23 02:12:17,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 9 hours, 36 minutes, 2 seconds)
2026-01-23 02:18:15,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:16,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 380.40729 ± 97.520
2026-01-23 02:18:16,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [313.18457, 451.4672, 307.75867, 277.79224, 255.87932, 531.21216, 367.26205, 532.7683, 319.8437, 446.90472]
2026-01-23 02:18:16,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [59.0, 84.0, 57.0, 53.0, 50.0, 100.0, 68.0, 100.0, 62.0, 82.0]
2026-01-23 02:18:16,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (380.41) for latency DatasetOffice
2026-01-23 02:18:16,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 9 hours, 31 minutes, 27 seconds)
2026-01-23 02:24:18,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:20,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 533.72516 ± 128.349
2026-01-23 02:24:20,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [317.17, 664.1899, 508.06918, 702.6282, 579.57587, 468.06213, 396.30386, 692.0148, 603.5701, 405.66745]
2026-01-23 02:24:20,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [68.0, 127.0, 96.0, 137.0, 109.0, 94.0, 84.0, 132.0, 113.0, 86.0]
2026-01-23 02:24:20,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (533.73) for latency DatasetOffice
2026-01-23 02:24:20,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 9 hours, 27 minutes, 33 seconds)
2026-01-23 02:30:18,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:20,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 552.57166 ± 143.870
2026-01-23 02:30:20,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [334.2956, 778.6216, 395.36914, 540.75665, 760.317, 497.01892, 694.8287, 490.14734, 444.7025, 589.6588]
2026-01-23 02:30:20,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [73.0, 152.0, 76.0, 102.0, 163.0, 94.0, 148.0, 93.0, 97.0, 126.0]
2026-01-23 02:30:20,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (552.57) for latency DatasetOffice
2026-01-23 02:30:20,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 9 hours, 26 minutes, 13 seconds)
2026-01-23 02:36:19,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:21,142 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 480.22940 ± 69.775
2026-01-23 02:36:21,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [396.98825, 408.4527, 502.22382, 510.45926, 485.60297, 449.71237, 422.64545, 597.23236, 600.7828, 428.19403]
2026-01-23 02:36:21,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [74.0, 75.0, 95.0, 97.0, 104.0, 84.0, 78.0, 115.0, 114.0, 80.0]
2026-01-23 02:36:21,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 9 hours, 19 minutes, 46 seconds)
2026-01-23 02:42:18,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:19,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 465.96918 ± 55.939
2026-01-23 02:42:19,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [435.79733, 511.51047, 492.4175, 489.9083, 368.06845, 471.2918, 561.0878, 482.7729, 376.13913, 470.69833]
2026-01-23 02:42:19,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [81.0, 97.0, 94.0, 107.0, 69.0, 90.0, 109.0, 94.0, 70.0, 89.0]
2026-01-23 02:42:19,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 9 hours, 12 minutes, 45 seconds)
2026-01-23 02:48:15,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:17,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 442.13867 ± 44.271
2026-01-23 02:48:17,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [385.8841, 480.65164, 470.59103, 397.38385, 420.04276, 391.84436, 413.5469, 525.4337, 471.72385, 464.28442]
2026-01-23 02:48:17,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [72.0, 90.0, 89.0, 74.0, 79.0, 73.0, 77.0, 100.0, 90.0, 87.0]
2026-01-23 02:48:17,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 9 hours, 6 minutes, 5 seconds)
2026-01-23 02:54:15,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:16,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 537.62585 ± 84.805
2026-01-23 02:54:16,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [648.5907, 499.82724, 520.0816, 462.89972, 657.88885, 460.6842, 470.50732, 418.31326, 617.25354, 620.21216]
2026-01-23 02:54:16,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [125.0, 97.0, 104.0, 88.0, 127.0, 87.0, 88.0, 92.0, 118.0, 126.0]
2026-01-23 02:54:16,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 8 hours, 58 minutes, 52 seconds)
2026-01-23 03:00:13,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:14,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 421.95264 ± 33.835
2026-01-23 03:00:14,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [437.8334, 433.6001, 479.56357, 462.63882, 421.48688, 402.8282, 360.7789, 423.542, 419.68585, 377.5685]
2026-01-23 03:00:14,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [81.0, 81.0, 89.0, 86.0, 77.0, 74.0, 69.0, 78.0, 77.0, 69.0]
2026-01-23 03:00:14,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 8 hours, 52 minutes, 16 seconds)
2026-01-23 03:06:18,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:06:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 537.09857 ± 72.700
2026-01-23 03:06:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [667.0231, 497.02524, 627.56476, 527.8837, 519.71375, 621.44745, 524.7556, 458.34256, 491.23917, 435.99088]
2026-01-23 03:06:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [129.0, 96.0, 120.0, 102.0, 99.0, 120.0, 115.0, 86.0, 92.0, 80.0]
2026-01-23 03:06:20,260 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 8 hours, 47 minutes, 44 seconds)
2026-01-23 03:12:15,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:18,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 609.22131 ± 174.078
2026-01-23 03:12:18,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [791.01935, 506.10645, 1010.12213, 585.9682, 487.3643, 392.0154, 549.3598, 562.1897, 726.93243, 481.13528]
2026-01-23 03:12:18,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [153.0, 96.0, 201.0, 127.0, 107.0, 87.0, 103.0, 108.0, 149.0, 105.0]
2026-01-23 03:12:18,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (609.22) for latency DatasetOffice
2026-01-23 03:12:18,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 8 hours, 41 minutes, 32 seconds)
2026-01-23 03:18:16,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:18,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 514.01453 ± 102.771
2026-01-23 03:18:18,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [682.0763, 563.5284, 397.15134, 522.52765, 624.07294, 432.9292, 637.78894, 439.85144, 465.10452, 375.11417]
2026-01-23 03:18:18,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [148.0, 123.0, 89.0, 99.0, 135.0, 95.0, 122.0, 99.0, 87.0, 83.0]
2026-01-23 03:18:18,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 8 hours, 36 minutes, 19 seconds)
2026-01-23 03:24:16,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:17,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 430.80786 ± 164.385
2026-01-23 03:24:17,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [325.91022, 581.5974, 188.76587, 618.64996, 602.806, 176.70181, 281.27567, 565.88007, 517.56464, 448.92712]
2026-01-23 03:24:17,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [73.0, 112.0, 38.0, 117.0, 123.0, 34.0, 54.0, 123.0, 100.0, 85.0]
2026-01-23 03:24:17,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 8 hours, 30 minutes, 11 seconds)
2026-01-23 03:30:16,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:17,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 525.03210 ± 87.905
2026-01-23 03:30:17,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [614.03357, 619.99316, 480.68954, 383.91956, 506.40475, 564.3883, 542.7504, 446.67154, 424.80734, 666.663]
2026-01-23 03:30:17,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [119.0, 118.0, 90.0, 83.0, 94.0, 107.0, 101.0, 83.0, 79.0, 125.0]
2026-01-23 03:30:17,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 8 hours, 25 minutes)
2026-01-23 03:36:17,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:36:19,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 571.85852 ± 95.436
2026-01-23 03:36:19,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [556.21014, 420.35916, 520.5839, 576.94525, 794.04095, 666.7677, 565.81506, 588.5008, 515.93146, 513.42993]
2026-01-23 03:36:19,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 90.0, 96.0, 111.0, 163.0, 143.0, 119.0, 113.0, 97.0, 99.0]
2026-01-23 03:36:19,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 8 hours, 17 minutes, 46 seconds)
2026-01-23 03:42:13,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:15,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 590.14606 ± 186.646
2026-01-23 03:42:15,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [434.90826, 613.1591, 1005.3025, 864.58356, 375.14304, 577.3838, 543.6607, 485.19244, 518.8399, 483.28726]
2026-01-23 03:42:15,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [97.0, 118.0, 199.0, 177.0, 69.0, 109.0, 118.0, 91.0, 114.0, 89.0]
2026-01-23 03:42:15,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 8 hours, 11 minutes, 17 seconds)
2026-01-23 03:48:14,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:48:16,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 428.98798 ± 45.626
2026-01-23 03:48:16,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [371.3769, 462.39633, 367.64905, 394.99515, 506.96146, 482.03378, 398.50073, 432.58234, 409.21933, 464.1648]
2026-01-23 03:48:16,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [69.0, 86.0, 69.0, 75.0, 96.0, 91.0, 76.0, 81.0, 77.0, 88.0]
2026-01-23 03:48:16,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 8 hours, 5 minutes, 25 seconds)
2026-01-23 03:54:13,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:54:15,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 617.28479 ± 89.748
2026-01-23 03:54:15,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [471.41125, 604.3593, 559.71014, 515.0208, 690.2897, 721.7573, 759.6015, 692.8497, 588.318, 569.5304]
2026-01-23 03:54:15,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [102.0, 116.0, 106.0, 99.0, 133.0, 140.0, 145.0, 133.0, 112.0, 109.0]
2026-01-23 03:54:15,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (617.28) for latency DatasetOffice
2026-01-23 03:54:15,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 7 hours, 59 minutes, 23 seconds)
2026-01-23 04:00:13,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:00:15,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 541.91046 ± 107.697
2026-01-23 04:00:15,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [601.47626, 321.61484, 598.3005, 424.59344, 604.89374, 673.19574, 568.88947, 425.06805, 648.0163, 553.05646]
2026-01-23 04:00:15,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [133.0, 73.0, 114.0, 96.0, 126.0, 146.0, 108.0, 95.0, 139.0, 122.0]
2026-01-23 04:00:15,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 7 hours, 53 minutes, 26 seconds)
2026-01-23 04:06:18,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:06:20,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 607.33240 ± 110.331
2026-01-23 04:06:20,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [737.30743, 486.5453, 680.3209, 471.84488, 502.85013, 542.9822, 581.0588, 765.33826, 756.8493, 548.227]
2026-01-23 04:06:20,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [153.0, 92.0, 125.0, 89.0, 93.0, 100.0, 109.0, 141.0, 141.0, 101.0]
2026-01-23 04:06:20,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 7 hours, 48 minutes, 9 seconds)
2026-01-23 04:12:16,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:12:18,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 525.79291 ± 88.144
2026-01-23 04:12:18,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [650.6167, 428.8743, 457.05057, 680.77594, 625.5909, 481.978, 423.31708, 505.78238, 496.5892, 507.35385]
2026-01-23 04:12:18,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [127.0, 81.0, 84.0, 131.0, 129.0, 91.0, 82.0, 94.0, 95.0, 94.0]
2026-01-23 04:12:18,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 7 hours, 42 minutes, 37 seconds)
2026-01-23 04:18:16,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:18:18,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 601.90558 ± 110.526
2026-01-23 04:18:18,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [560.4104, 767.3452, 540.15247, 816.2206, 504.13208, 708.12756, 504.10373, 567.989, 521.82214, 528.7519]
2026-01-23 04:18:18,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 143.0, 100.0, 153.0, 91.0, 136.0, 92.0, 106.0, 99.0, 99.0]
2026-01-23 04:18:18,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 7 hours, 36 minutes, 26 seconds)
2026-01-23 04:24:16,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:24:17,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 536.17987 ± 88.528
2026-01-23 04:24:17,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [726.3187, 522.72687, 489.45175, 387.26114, 471.6171, 520.8942, 549.4751, 512.81024, 533.0623, 648.1808]
2026-01-23 04:24:17,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [135.0, 96.0, 93.0, 85.0, 88.0, 100.0, 105.0, 96.0, 100.0, 142.0]
2026-01-23 04:24:17,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 7 hours, 30 minutes, 38 seconds)
2026-01-23 04:30:15,626 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:30:17,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 559.53558 ± 129.357
2026-01-23 04:30:17,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [690.2294, 490.0347, 477.9694, 443.57953, 480.53906, 455.88354, 576.45166, 805.2075, 435.89572, 739.56573]
2026-01-23 04:30:17,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [128.0, 92.0, 90.0, 83.0, 90.0, 85.0, 110.0, 162.0, 82.0, 141.0]
2026-01-23 04:30:17,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 7 hours, 24 minutes, 22 seconds)
2026-01-23 04:36:12,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:36:14,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 636.28259 ± 197.695
2026-01-23 04:36:14,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [572.951, 1205.0399, 505.75128, 533.019, 701.22284, 647.65967, 523.5746, 548.4072, 564.2883, 560.91187]
2026-01-23 04:36:14,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [110.0, 229.0, 94.0, 100.0, 132.0, 124.0, 95.0, 103.0, 106.0, 103.0]
2026-01-23 04:36:14,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (636.28) for latency DatasetOffice
2026-01-23 04:36:14,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 7 hours, 16 minutes, 38 seconds)
2026-01-23 04:42:10,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:42:12,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 658.83618 ± 135.715
2026-01-23 04:42:12,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [646.89496, 667.03796, 567.622, 575.3617, 722.4685, 473.09695, 751.72546, 497.61337, 960.50366, 726.0377]
2026-01-23 04:42:12,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [120.0, 127.0, 105.0, 107.0, 134.0, 87.0, 148.0, 92.0, 177.0, 139.0]
2026-01-23 04:42:12,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (658.84) for latency DatasetOffice
2026-01-23 04:42:12,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 7 hours, 10 minutes, 47 seconds)
2026-01-23 04:48:13,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:48:15,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 619.07922 ± 39.898
2026-01-23 04:48:15,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [625.79706, 718.8259, 592.1172, 626.0234, 631.18115, 633.6977, 614.60266, 599.79614, 560.2347, 588.51544]
2026-01-23 04:48:15,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [119.0, 136.0, 127.0, 120.0, 120.0, 121.0, 118.0, 116.0, 105.0, 113.0]
2026-01-23 04:48:15,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 7 hours, 5 minutes, 23 seconds)
2026-01-23 04:54:13,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:54:15,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 599.72815 ± 83.359
2026-01-23 04:54:15,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [717.70886, 799.1422, 584.6896, 579.47473, 584.0316, 532.8944, 550.72925, 559.08704, 558.1277, 531.3955]
2026-01-23 04:54:15,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [152.0, 148.0, 114.0, 110.0, 124.0, 107.0, 102.0, 106.0, 108.0, 100.0]
2026-01-23 04:54:15,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 6 hours, 59 minutes, 27 seconds)
2026-01-23 05:00:12,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:00:14,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 609.71814 ± 96.734
2026-01-23 05:00:14,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [544.15686, 563.8826, 846.8357, 661.56555, 575.5287, 622.15717, 620.0614, 453.85587, 639.8275, 569.3095]
2026-01-23 05:00:14,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [103.0, 106.0, 161.0, 129.0, 109.0, 116.0, 116.0, 85.0, 120.0, 107.0]
2026-01-23 05:00:14,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 6 hours, 53 minutes, 14 seconds)
2026-01-23 05:06:15,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:06:18,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 765.61078 ± 101.150
2026-01-23 05:06:18,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [907.2548, 616.9207, 710.459, 725.4804, 747.4492, 626.53314, 904.44324, 736.06537, 793.9671, 887.53467]
2026-01-23 05:06:18,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [174.0, 117.0, 134.0, 137.0, 142.0, 117.0, 176.0, 139.0, 154.0, 168.0]
2026-01-23 05:06:18,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (765.61) for latency DatasetOffice
2026-01-23 05:06:18,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 6 hours, 48 minutes, 51 seconds)
2026-01-23 05:12:11,830 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:12:14,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 699.45557 ± 141.711
2026-01-23 05:12:14,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [748.0174, 839.1885, 673.81726, 724.09, 737.15173, 484.63574, 985.79266, 511.3202, 703.7918, 586.75]
2026-01-23 05:12:14,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [156.0, 181.0, 140.0, 133.0, 156.0, 105.0, 188.0, 112.0, 142.0, 127.0]
2026-01-23 05:12:14,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 6 hours, 42 minutes, 16 seconds)
2026-01-23 05:18:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:18:13,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 885.12286 ± 209.697
2026-01-23 05:18:13,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [794.40967, 683.57446, 748.19366, 1181.9694, 753.46124, 974.99915, 1007.01904, 623.75995, 1293.1765, 790.66504]
2026-01-23 05:18:13,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [149.0, 132.0, 143.0, 218.0, 156.0, 194.0, 195.0, 116.0, 270.0, 152.0]
2026-01-23 05:18:13,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (885.12) for latency DatasetOffice
2026-01-23 05:18:13,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 6 hours, 35 minutes, 39 seconds)
2026-01-23 05:24:14,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:24:17,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 936.10559 ± 216.908
2026-01-23 05:24:17,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [795.8147, 637.8334, 985.7534, 880.72736, 1308.8295, 985.46826, 820.0262, 1031.31, 1267.0681, 648.2244]
2026-01-23 05:24:17,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [144.0, 116.0, 200.0, 178.0, 252.0, 185.0, 155.0, 194.0, 237.0, 139.0]
2026-01-23 05:24:17,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (936.11) for latency DatasetOffice
2026-01-23 05:24:17,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 6 hours, 30 minutes, 25 seconds)
2026-01-23 05:30:07,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:30:09,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 565.55548 ± 113.636
2026-01-23 05:30:09,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [425.24448, 858.11633, 476.2093, 487.2486, 597.113, 561.8507, 587.9292, 510.20587, 526.05206, 625.5849]
2026-01-23 05:30:09,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [96.0, 180.0, 101.0, 106.0, 128.0, 119.0, 127.0, 112.0, 115.0, 139.0]
2026-01-23 05:30:09,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 6 hours, 23 minutes, 1 second)
2026-01-23 05:36:03,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:36:06,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 778.93060 ± 201.057
2026-01-23 05:36:06,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [905.577, 669.0863, 498.5718, 1001.79016, 477.20938, 723.4245, 1170.3585, 768.21344, 799.16534, 775.9093]
2026-01-23 05:36:06,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [168.0, 145.0, 109.0, 195.0, 106.0, 132.0, 247.0, 146.0, 149.0, 148.0]
2026-01-23 05:36:06,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 6 hours, 15 minutes, 26 seconds)
2026-01-23 05:41:55,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:41:57,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 800.48468 ± 182.517
2026-01-23 05:41:57,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [652.72534, 614.1387, 820.4158, 964.6984, 675.04114, 1177.9044, 855.0521, 657.1124, 977.4582, 610.3002]
2026-01-23 05:41:57,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [123.0, 116.0, 153.0, 188.0, 126.0, 234.0, 167.0, 122.0, 195.0, 117.0]
2026-01-23 05:41:57,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 6 hours, 8 minutes, 34 seconds)
2026-01-23 05:47:49,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:47:51,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 695.67126 ± 201.297
2026-01-23 05:47:51,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [981.68976, 826.1374, 1000.4646, 602.61743, 491.2546, 487.06723, 371.78305, 759.2364, 649.1111, 787.3513]
2026-01-23 05:47:51,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [211.0, 157.0, 204.0, 122.0, 110.0, 108.0, 83.0, 160.0, 137.0, 166.0]
2026-01-23 05:47:51,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 6 hours, 1 minute, 29 seconds)
2026-01-23 05:53:44,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:53:47,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1008.42792 ± 170.354
2026-01-23 05:53:47,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [977.1678, 693.7358, 1244.119, 1048.2052, 1080.9874, 818.4076, 978.952, 919.1442, 1295.6034, 1027.9567]
2026-01-23 05:53:47,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [193.0, 147.0, 247.0, 200.0, 221.0, 176.0, 203.0, 191.0, 268.0, 218.0]
2026-01-23 05:53:47,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1008.43) for latency DatasetOffice
2026-01-23 05:53:47,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 5 hours, 53 minutes, 59 seconds)
2026-01-23 05:59:34,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:59:37,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 740.73206 ± 219.344
2026-01-23 05:59:37,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [792.36707, 709.99335, 580.9184, 1218.7638, 781.23706, 1050.9785, 634.3377, 527.7746, 566.58545, 544.3651]
2026-01-23 05:59:37,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [148.0, 131.0, 126.0, 252.0, 147.0, 207.0, 137.0, 116.0, 124.0, 117.0]
2026-01-23 05:59:37,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 5 hours, 47 minutes, 40 seconds)
2026-01-23 06:05:29,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:05:32,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 821.88409 ± 147.294
2026-01-23 06:05:32,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [892.2047, 664.7102, 866.7354, 787.87213, 983.7232, 706.5134, 1110.1167, 638.2005, 896.19946, 672.56525]
2026-01-23 06:05:32,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [181.0, 147.0, 161.0, 154.0, 189.0, 145.0, 224.0, 125.0, 168.0, 144.0]
2026-01-23 06:05:32,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 5 hours, 41 minutes, 30 seconds)
2026-01-23 06:11:27,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:11:31,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1152.90845 ± 521.643
2026-01-23 06:11:31,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1390.237, 654.21497, 1574.3722, 466.01923, 1318.9614, 1094.7916, 2247.669, 1423.3657, 769.8496, 589.6034]
2026-01-23 06:11:31,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [264.0, 137.0, 321.0, 104.0, 270.0, 203.0, 454.0, 288.0, 165.0, 129.0]
2026-01-23 06:11:31,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1152.91) for latency DatasetOffice
2026-01-23 06:11:31,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 5 hours, 37 minutes, 5 seconds)
2026-01-23 06:17:19,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:17:22,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 933.71777 ± 231.132
2026-01-23 06:17:22,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1106.2444, 1182.3579, 754.0008, 729.1265, 951.8158, 650.7478, 636.6335, 1343.6477, 879.1725, 1103.4312]
2026-01-23 06:17:22,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [230.0, 246.0, 141.0, 154.0, 182.0, 120.0, 117.0, 269.0, 160.0, 202.0]
2026-01-23 06:17:22,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 5 hours, 30 minutes, 28 seconds)
2026-01-23 06:23:11,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:23:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 687.40186 ± 124.747
2026-01-23 06:23:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [704.21564, 813.20325, 833.181, 509.09656, 715.0584, 589.1502, 768.9363, 583.34015, 508.00128, 849.8352]
2026-01-23 06:23:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [153.0, 155.0, 155.0, 111.0, 133.0, 128.0, 143.0, 117.0, 111.0, 161.0]
2026-01-23 06:23:13,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 5 hours, 23 minutes, 51 seconds)
2026-01-23 06:29:12,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:29:15,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 998.10437 ± 350.372
2026-01-23 06:29:15,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1140.5469, 1106.3835, 1272.7609, 1312.9913, 969.4613, 1585.9814, 488.5652, 671.3608, 443.01263, 989.97925]
2026-01-23 06:29:15,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [221.0, 225.0, 244.0, 270.0, 194.0, 300.0, 108.0, 145.0, 98.0, 208.0]
2026-01-23 06:29:15,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 5 hours, 20 minutes, 2 seconds)
2026-01-23 06:35:01,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:35:04,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 976.89484 ± 341.750
2026-01-23 06:35:04,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1453.8483, 1379.6312, 926.8474, 1104.8077, 504.66153, 705.8855, 691.21826, 1475.411, 619.1413, 907.49634]
2026-01-23 06:35:04,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [296.0, 280.0, 197.0, 232.0, 110.0, 150.0, 148.0, 304.0, 132.0, 191.0]
2026-01-23 06:35:04,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 5 hours, 13 minutes, 3 seconds)
2026-01-23 06:40:58,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:41:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 813.97571 ± 216.055
2026-01-23 06:41:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [702.9166, 1350.8276, 694.4995, 548.66583, 816.0392, 636.90283, 886.29376, 1005.1339, 742.98175, 755.49585]
2026-01-23 06:41:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [130.0, 254.0, 134.0, 103.0, 154.0, 119.0, 164.0, 190.0, 140.0, 140.0]
2026-01-23 06:41:00,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 5 hours, 6 minutes, 36 seconds)
2026-01-23 06:47:01,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:47:04,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 997.29578 ± 141.775
2026-01-23 06:47:04,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1115.3568, 1270.1295, 886.56854, 823.0769, 893.7612, 1073.3251, 976.18164, 1083.7666, 1058.1532, 792.63824]
2026-01-23 06:47:04,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [206.0, 238.0, 163.0, 151.0, 165.0, 194.0, 176.0, 198.0, 203.0, 147.0]
2026-01-23 06:47:04,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 5 hours, 2 minutes, 55 seconds)
2026-01-23 06:52:48,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:52:51,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1054.24194 ± 457.513
2026-01-23 06:52:51,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1039.5387, 675.69214, 668.7021, 1162.5901, 1262.4487, 951.19135, 634.5444, 737.34644, 2258.0393, 1152.3271]
2026-01-23 06:52:51,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [200.0, 127.0, 124.0, 229.0, 254.0, 181.0, 117.0, 133.0, 436.0, 239.0]
2026-01-23 06:52:51,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 4 hours, 56 minutes, 15 seconds)
2026-01-23 06:58:42,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:58:46,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1222.27478 ± 512.788
2026-01-23 06:58:46,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [815.6915, 968.48425, 1134.7982, 670.6578, 1341.3306, 2436.147, 1569.8785, 560.3527, 1392.6437, 1332.7637]
2026-01-23 06:58:46,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [155.0, 191.0, 216.0, 146.0, 258.0, 481.0, 321.0, 120.0, 270.0, 260.0]
2026-01-23 06:58:46,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1222.27) for latency DatasetOffice
2026-01-23 06:58:46,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 4 hours, 49 minutes, 14 seconds)
2026-01-23 07:04:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:04:47,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 953.17322 ± 323.006
2026-01-23 07:04:47,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1015.0122, 733.01184, 771.2671, 839.69855, 551.5313, 1430.8475, 640.8547, 1261.6113, 1524.8105, 763.0872]
2026-01-23 07:04:47,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [205.0, 155.0, 160.0, 159.0, 121.0, 272.0, 139.0, 265.0, 295.0, 155.0]
2026-01-23 07:04:47,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 4 hours, 45 minutes, 14 seconds)
2026-01-23 07:10:41,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:10:45,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1059.23132 ± 535.432
2026-01-23 07:10:45,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2394.3987, 1045.671, 782.01074, 1254.6721, 571.7374, 900.66815, 582.73596, 506.96686, 1077.3221, 1476.1295]
2026-01-23 07:10:45,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [449.0, 185.0, 144.0, 233.0, 127.0, 190.0, 124.0, 113.0, 198.0, 273.0]
2026-01-23 07:10:45,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 4 hours, 39 minutes, 36 seconds)
2026-01-23 07:16:44,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:16:47,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1124.92810 ± 419.539
2026-01-23 07:16:47,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1093.69, 1454.6044, 1135.0743, 1345.7013, 1953.3031, 1051.9901, 1347.117, 884.30115, 469.40445, 514.0959]
2026-01-23 07:16:47,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [208.0, 284.0, 217.0, 266.0, 374.0, 196.0, 251.0, 178.0, 90.0, 93.0]
2026-01-23 07:16:47,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 4 hours, 33 minutes, 31 seconds)
2026-01-23 07:22:46,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:22:49,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1037.28455 ± 273.146
2026-01-23 07:22:49,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1049.0897, 1366.4423, 1502.2338, 615.65814, 783.2913, 865.5658, 987.8997, 874.6634, 953.45306, 1374.5487]
2026-01-23 07:22:49,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [213.0, 262.0, 281.0, 134.0, 166.0, 184.0, 183.0, 169.0, 179.0, 257.0]
2026-01-23 07:22:49,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 4 hours, 29 minutes, 43 seconds)
2026-01-23 07:28:54,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:28:59,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1489.34802 ± 625.453
2026-01-23 07:28:59,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1072.6874, 1566.6636, 3078.269, 1120.3265, 1244.1814, 1292.6991, 824.58105, 2135.4006, 1167.0826, 1391.5885]
2026-01-23 07:28:59,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [200.0, 323.0, 571.0, 216.0, 240.0, 235.0, 151.0, 404.0, 217.0, 260.0]
2026-01-23 07:28:59,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1489.35) for latency DatasetOffice
2026-01-23 07:28:59,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 4 hours, 25 minutes, 53 seconds)
2026-01-23 07:34:57,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:35:00,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 914.45538 ± 240.302
2026-01-23 07:35:00,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1097.0199, 913.40094, 1130.1686, 871.1603, 573.6446, 910.6736, 881.00476, 1415.8684, 594.8613, 756.751]
2026-01-23 07:35:00,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [235.0, 190.0, 219.0, 168.0, 126.0, 190.0, 175.0, 279.0, 126.0, 166.0]
2026-01-23 07:35:00,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 4 hours, 19 minutes, 53 seconds)
2026-01-23 07:41:05,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:41:09,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 912.79883 ± 270.948
2026-01-23 07:41:09,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [547.08417, 526.9454, 924.43097, 755.90784, 1038.4249, 1339.1482, 730.25885, 1090.2621, 860.6594, 1314.8666]
2026-01-23 07:41:09,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [120.0, 112.0, 202.0, 163.0, 213.0, 278.0, 151.0, 221.0, 185.0, 273.0]
2026-01-23 07:41:09,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 4 hours, 15 minutes, 21 seconds)
2026-01-23 07:47:04,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:47:09,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1427.11096 ± 632.776
2026-01-23 07:47:09,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [858.5336, 1353.4963, 617.2658, 1009.5692, 1768.5605, 1207.1056, 1200.7455, 1651.3064, 1578.7708, 3025.7563]
2026-01-23 07:47:09,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [166.0, 259.0, 114.0, 194.0, 336.0, 227.0, 232.0, 307.0, 300.0, 598.0]
2026-01-23 07:47:09,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 4 hours, 8 minutes, 53 seconds)
2026-01-23 07:53:13,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:53:18,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1610.72229 ± 664.819
2026-01-23 07:53:18,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1279.2015, 1332.6648, 990.5579, 1681.4553, 1882.0284, 2007.1367, 1628.8944, 1380.3335, 675.7369, 3249.2134]
2026-01-23 07:53:18,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [234.0, 275.0, 187.0, 329.0, 357.0, 380.0, 307.0, 255.0, 123.0, 600.0]
2026-01-23 07:53:18,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1610.72) for latency DatasetOffice
2026-01-23 07:53:18,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 4 hours, 3 minutes, 53 seconds)
2026-01-23 07:59:16,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:59:23,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2021.04712 ± 995.585
2026-01-23 07:59:23,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2696.0464, 2308.9592, 718.33484, 1075.1833, 1910.2933, 2563.9944, 690.3439, 3087.812, 1366.9972, 3792.506]
2026-01-23 07:59:23,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [543.0, 436.0, 153.0, 218.0, 373.0, 489.0, 132.0, 622.0, 278.0, 718.0]
2026-01-23 07:59:23,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (2021.05) for latency DatasetOffice
2026-01-23 07:59:23,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 3 hours, 57 minutes, 7 seconds)
2026-01-23 08:05:30,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:05:36,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1958.06189 ± 1155.779
2026-01-23 08:05:36,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1321.836, 2275.448, 1094.4802, 3962.544, 778.15485, 759.9594, 486.87772, 3094.1182, 2807.9504, 2999.2495]
2026-01-23 08:05:36,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [249.0, 444.0, 225.0, 756.0, 155.0, 140.0, 89.0, 587.0, 534.0, 562.0]
2026-01-23 08:05:36,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 3 hours, 52 minutes, 36 seconds)
2026-01-23 08:11:44,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:11:53,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2341.24243 ± 1358.253
2026-01-23 08:11:53,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2667.7231, 690.3179, 1215.576, 3215.5244, 3310.2385, 1375.2809, 2821.7485, 2455.62, 5174.416, 485.9773]
2026-01-23 08:11:53,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [528.0, 146.0, 255.0, 642.0, 660.0, 294.0, 552.0, 484.0, 1000.0, 102.0]
2026-01-23 08:11:53,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (2341.24) for latency DatasetOffice
2026-01-23 08:11:53,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 3 hours, 47 minutes, 24 seconds)
2026-01-23 08:17:48,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:17:53,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1672.04260 ± 664.806
2026-01-23 08:17:53,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1941.4004, 609.2456, 1671.3362, 1129.9357, 1960.9048, 2304.733, 492.88443, 2508.4504, 1873.1417, 2228.3938]
2026-01-23 08:17:53,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [363.0, 119.0, 310.0, 213.0, 364.0, 421.0, 110.0, 471.0, 345.0, 412.0]
2026-01-23 08:17:53,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 3 hours, 41 minutes, 22 seconds)
2026-01-23 08:23:54,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:24:03,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2641.61987 ± 1574.077
2026-01-23 08:24:03,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5042.3706, 2168.0503, 544.03485, 853.3535, 2414.9167, 4528.1694, 643.69745, 2749.8855, 4504.934, 2966.787]
2026-01-23 08:24:03,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [977.0, 398.0, 119.0, 179.0, 446.0, 842.0, 123.0, 509.0, 821.0, 550.0]
2026-01-23 08:24:03,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (2641.62) for latency DatasetOffice
2026-01-23 08:24:03,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 3 hours, 35 minutes, 13 seconds)
2026-01-23 08:30:09,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:30:14,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1416.12769 ± 397.024
2026-01-23 08:30:14,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1661.2106, 1856.7659, 2216.0576, 1106.3788, 952.9796, 1423.87, 963.7386, 1288.5516, 1611.2358, 1080.4888]
2026-01-23 08:30:14,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [307.0, 336.0, 410.0, 199.0, 175.0, 264.0, 180.0, 234.0, 297.0, 195.0]
2026-01-23 08:30:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 3 hours, 29 minutes, 45 seconds)
2026-01-23 08:36:45,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:36:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3794.87695 ± 1532.766
2026-01-23 08:36:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2341.4163, 4654.7476, 5207.925, 5181.324, 970.58325, 5190.479, 4462.031, 2110.6426, 2594.2053, 5235.414]
2026-01-23 08:36:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [468.0, 885.0, 1000.0, 1000.0, 190.0, 1000.0, 882.0, 400.0, 508.0, 1000.0]
2026-01-23 08:36:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (3794.88) for latency DatasetOffice
2026-01-23 08:36:59,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 3 hours, 27 minutes, 3 seconds)
2026-01-23 08:42:40,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:42:50,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2874.94263 ± 1646.990
2026-01-23 08:42:50,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2751.1755, 5282.5083, 1808.0181, 2421.8477, 940.63416, 2085.915, 888.1599, 2114.8474, 5283.018, 5173.303]
2026-01-23 08:42:50,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [505.0, 1000.0, 341.0, 454.0, 198.0, 381.0, 187.0, 412.0, 1000.0, 1000.0]
2026-01-23 08:42:50,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 3 hours, 18 minutes, 9 seconds)
2026-01-23 08:48:47,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:48:58,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3031.61133 ± 2136.152
2026-01-23 08:48:58,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [785.8028, 728.9968, 338.17642, 370.14545, 5036.2173, 4980.2544, 5090.5425, 5136.7476, 5121.78, 2727.451]
2026-01-23 08:48:58,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [154.0, 153.0, 64.0, 71.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 552.0]
2026-01-23 08:48:58,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 3 hours, 12 minutes, 42 seconds)
2026-01-23 08:55:16,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:55:23,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2048.85913 ± 697.184
2026-01-23 08:55:23,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2311.8333, 1953.4304, 780.1466, 3678.6633, 1666.5707, 1851.9944, 2398.9062, 2041.5099, 1614.3911, 2191.1462]
2026-01-23 08:55:23,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [432.0, 352.0, 146.0, 674.0, 298.0, 334.0, 425.0, 376.0, 298.0, 399.0]
2026-01-23 08:55:23,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 3 hours, 7 minutes, 57 seconds)
2026-01-23 09:01:34,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:01:47,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3355.64966 ± 1576.609
2026-01-23 09:01:47,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5054.8853, 1580.9421, 2741.9778, 5027.0845, 4958.887, 4981.673, 4329.762, 1909.8712, 1036.9174, 1934.4963]
2026-01-23 09:01:47,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 333.0, 567.0, 1000.0, 1000.0, 1000.0, 879.0, 412.0, 210.0, 385.0]
2026-01-23 09:01:47,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 3 hours, 3 minutes)
2026-01-23 09:07:32,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:07:45,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3325.74463 ± 1704.080
2026-01-23 09:07:45,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1582.8044, 1460.7936, 4995.518, 5030.154, 5039.2466, 1828.2659, 4974.199, 1227.998, 2071.8943, 5046.5693]
2026-01-23 09:07:45,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [326.0, 322.0, 1000.0, 1000.0, 1000.0, 387.0, 1000.0, 256.0, 437.0, 1000.0]
2026-01-23 09:07:45,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 52 minutes, 18 seconds)
2026-01-23 09:14:05,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:14:17,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3038.83960 ± 1500.009
2026-01-23 09:14:17,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1563.5957, 4725.0073, 1104.892, 2974.123, 1281.6711, 4934.3364, 5275.982, 2659.8086, 3905.47, 1963.5082]
2026-01-23 09:14:17,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [309.0, 932.0, 227.0, 616.0, 270.0, 1000.0, 1000.0, 516.0, 789.0, 405.0]
2026-01-23 09:14:17,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 49 minutes, 45 seconds)
2026-01-23 09:20:13,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:20:30,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4746.91357 ± 970.884
2026-01-23 09:20:30,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5234.1133, 2721.4768, 5224.598, 5258.778, 5222.7285, 2892.7183, 5230.416, 5187.812, 5269.8813, 5226.6133]
2026-01-23 09:20:30,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 527.0, 1000.0, 1000.0, 1000.0, 573.0, 1000.0, 993.0, 1000.0, 1000.0]
2026-01-23 09:20:30,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (4746.91) for latency DatasetOffice
2026-01-23 09:20:30,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 2 hours, 43 minutes, 56 seconds)
2026-01-23 09:26:32,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:26:43,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2886.10303 ± 1966.619
2026-01-23 09:26:43,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5020.8765, 3018.551, 269.671, 698.87695, 409.73724, 5101.6743, 2685.6482, 5076.4487, 5059.882, 1519.6619]
2026-01-23 09:26:43,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 586.0, 51.0, 146.0, 79.0, 1000.0, 572.0, 1000.0, 1000.0, 305.0]
2026-01-23 09:26:43,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 2 hours, 36 minutes, 41 seconds)
2026-01-23 09:32:52,996 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:33:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4896.41895 ± 1037.909
2026-01-23 09:33:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5265.101, 5174.2017, 1785.7134, 5295.9277, 5289.0913, 5179.319, 5190.47, 5291.945, 5273.8433, 5218.582]
2026-01-23 09:33:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 346.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:33:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (4896.42) for latency DatasetOffice
2026-01-23 09:33:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 2 hours, 30 minutes, 40 seconds)
2026-01-23 09:38:59,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:39:10,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3229.94580 ± 1608.789
2026-01-23 09:39:10,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3007.9956, 1327.8291, 5469.9517, 3436.4597, 2839.713, 907.0986, 3060.098, 5086.609, 5567.1514, 1596.5515]
2026-01-23 09:39:10,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [539.0, 285.0, 1000.0, 633.0, 517.0, 169.0, 545.0, 928.0, 1000.0, 294.0]
2026-01-23 09:39:10,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 2 hours, 24 minutes, 32 seconds)
2026-01-23 09:45:47,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:45:57,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2424.49976 ± 1980.391
2026-01-23 09:45:57,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [4991.005, 4946.096, 4971.467, 2339.5889, 75.406204, 511.79727, 502.1638, 150.48784, 3692.2651, 2064.7212]
2026-01-23 09:45:57,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 463.0, 16.0, 95.0, 103.0, 29.0, 758.0, 439.0]
2026-01-23 09:45:57,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 2 hours, 19 minutes, 20 seconds)
2026-01-23 09:51:56,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:52:11,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4730.77344 ± 1585.473
2026-01-23 09:52:11,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5560.452, 5542.748, 2688.3618, 5531.5063, 707.9189, 5088.2305, 5540.3213, 5551.918, 5549.8403, 5546.4365]
2026-01-23 09:52:11,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 486.0, 1000.0, 139.0, 912.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:52:11,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 2 hours, 13 minutes, 6 seconds)
2026-01-23 09:58:07,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:58:21,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4079.88745 ± 1616.511
2026-01-23 09:58:21,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5333.552, 5323.7305, 5391.4985, 2488.1797, 1520.9691, 5332.654, 3285.2314, 5322.2236, 5342.4126, 1458.4215]
2026-01-23 09:58:21,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 463.0, 298.0, 1000.0, 603.0, 1000.0, 1000.0, 278.0]
2026-01-23 09:58:21,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 2 hours, 6 minutes, 30 seconds)
2026-01-23 10:04:10,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:04:20,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3215.68481 ± 1843.640
2026-01-23 10:04:20,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [72.07427, 447.23737, 5442.4443, 4789.643, 3437.1213, 3696.1453, 3555.5923, 5015.025, 4448.9624, 1252.6023]
2026-01-23 10:04:20,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [15.0, 81.0, 1000.0, 870.0, 625.0, 667.0, 657.0, 906.0, 813.0, 225.0]
2026-01-23 10:04:20,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 58 minutes, 27 seconds)
2026-01-23 10:10:25,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:10:43,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4765.36133 ± 1071.328
2026-01-23 10:10:43,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [4998.004, 5129.926, 5097.902, 5113.3115, 5174.8384, 5181.894, 1555.6475, 5077.336, 5199.4243, 5125.3286]
2026-01-23 10:10:43,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 334.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:10:43,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 53 minutes, 35 seconds)
2026-01-23 10:16:54,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:17:11,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4953.40234 ± 947.892
2026-01-23 10:17:11,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [4072.5405, 5514.641, 5496.858, 4195.964, 5516.446, 5532.248, 5499.721, 2632.464, 5475.081, 5598.058]
2026-01-23 10:17:11,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [748.0, 1000.0, 1000.0, 770.0, 1000.0, 1000.0, 1000.0, 481.0, 1000.0, 1000.0]
2026-01-23 10:17:11,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (4953.40) for latency DatasetOffice
2026-01-23 10:17:11,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 46 minutes, 12 seconds)
2026-01-23 10:24:43,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:25:01,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5236.21582 ± 349.276
2026-01-23 10:25:01,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5300.0215, 4192.077, 5370.606, 5365.865, 5320.9106, 5367.3296, 5364.403, 5409.991, 5341.3145, 5329.646]
2026-01-23 10:25:01,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 803.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:25:01,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (5236.22) for latency DatasetOffice
2026-01-23 10:25:01,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 45 minutes, 3 seconds)
2026-01-23 10:31:00,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:31:18,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5024.98047 ± 568.089
2026-01-23 10:31:18,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5089.964, 5251.6514, 5217.7383, 5249.3667, 5221.329, 5203.2993, 3325.7837, 5232.7876, 5220.458, 5237.4272]
2026-01-23 10:31:18,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 635.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:31:18,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 38 minutes, 53 seconds)
2026-01-23 10:37:20,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:37:32,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3441.95435 ± 2066.909
2026-01-23 10:37:32,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5335.305, 5311.424, 5309.6074, 2814.2998, 5282.204, 5311.708, 3357.5513, 411.60562, 549.39276, 736.4397]
2026-01-23 10:37:32,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 534.0, 1000.0, 1000.0, 626.0, 76.0, 105.0, 138.0]
2026-01-23 10:37:32,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 32 minutes, 56 seconds)
2026-01-23 10:44:03,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:44:19,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4750.33154 ± 1498.663
2026-01-23 10:44:19,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5470.17, 5478.832, 5449.543, 674.4202, 5429.75, 5453.9346, 3322.7646, 5302.645, 5474.062, 5447.1963]
2026-01-23 10:44:19,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 130.0, 1000.0, 1000.0, 610.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:44:19,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 27 minutes, 21 seconds)
2026-01-23 10:49:57,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:50:15,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4950.56836 ± 924.052
2026-01-23 10:50:15,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5217.6436, 5292.1763, 5228.451, 5223.236, 5291.2183, 5275.702, 5309.7017, 2179.9954, 5257.701, 5229.855]
2026-01-23 10:50:15,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 409.0, 1000.0, 1000.0]
2026-01-23 10:50:15,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 1 hour, 19 minutes, 21 seconds)
2026-01-23 10:56:25,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:56:35,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2563.40503 ± 1608.381
2026-01-23 10:56:35,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3525.3057, 714.57684, 1572.11, 1049.338, 5313.6494, 3859.04, 2563.0908, 4508.0347, 2226.6782, 302.22842]
2026-01-23 10:56:35,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [693.0, 148.0, 336.0, 216.0, 1000.0, 729.0, 482.0, 881.0, 453.0, 61.0]
2026-01-23 10:56:35,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 1 hour, 9 minutes, 25 seconds)
2026-01-23 11:03:07,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:03:24,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4859.49463 ± 1286.711
2026-01-23 11:03:24,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5299.01, 5274.1016, 5359.9233, 5383.6377, 1052.5756, 5466.4106, 5341.037, 4664.719, 5370.16, 5383.371]
2026-01-23 11:03:24,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 216.0, 1000.0, 1000.0, 867.0, 1000.0, 1000.0]
2026-01-23 11:03:24,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 4 minutes, 11 seconds)
2026-01-23 11:09:03,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:09:18,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4122.70605 ± 1876.180
2026-01-23 11:09:18,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5389.048, 5391.77, 1088.7781, 5381.6196, 5139.398, 5328.076, 1954.1581, 5359.2266, 819.7174, 5375.265]
2026-01-23 11:09:18,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 195.0, 1000.0, 950.0, 1000.0, 406.0, 1000.0, 171.0, 1000.0]
2026-01-23 11:09:18,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 57 minutes, 10 seconds)
2026-01-23 11:15:26,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:15:37,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2774.54443 ± 2025.490
2026-01-23 11:15:37,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [372.87857, 474.38263, 305.62653, 4920.1885, 838.17944, 3410.826, 5054.914, 4938.591, 5030.57, 2399.29]
2026-01-23 11:15:37,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [80.0, 99.0, 58.0, 1000.0, 185.0, 666.0, 1000.0, 976.0, 1000.0, 480.0]
2026-01-23 11:15:37,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 50 minutes, 4 seconds)
2026-01-23 11:21:59,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:22:16,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4661.54785 ± 1309.484
2026-01-23 11:22:16,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3743.903, 5172.7188, 5231.7705, 968.7518, 5265.7524, 5297.694, 5188.537, 5255.344, 5236.7695, 5254.236]
2026-01-23 11:22:16,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [720.0, 1000.0, 1000.0, 195.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:22:16,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 44 minutes, 50 seconds)
2026-01-23 11:28:31,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:28:45,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3990.19214 ± 1652.770
2026-01-23 11:28:45,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5318.511, 1546.0106, 5320.738, 5266.5776, 4187.868, 1564.9647, 4619.727, 5296.2344, 5318.9507, 1462.3392]
2026-01-23 11:28:45,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 297.0, 1000.0, 1000.0, 780.0, 304.0, 893.0, 1000.0, 1000.0, 289.0]
2026-01-23 11:28:45,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 38 minutes, 36 seconds)
2026-01-23 11:35:06,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:35:18,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3560.28003 ± 2144.150
2026-01-23 11:35:18,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2711.3313, 393.20663, 487.4446, 690.17566, 5379.7773, 5402.1753, 4302.1304, 5400.8916, 5401.28, 5434.3896]
2026-01-23 11:35:18,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [506.0, 75.0, 92.0, 140.0, 1000.0, 1000.0, 788.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:35:18,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 31 minutes, 54 seconds)
2026-01-23 11:41:23,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:41:42,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5327.11426 ± 22.266
2026-01-23 11:41:42,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5345.287, 5335.537, 5321.75, 5374.085, 5307.61, 5328.0757, 5324.019, 5310.834, 5287.407, 5336.538]
2026-01-23 11:41:42,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:41:42,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (5327.11) for latency DatasetOffice
2026-01-23 11:41:42,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 25 minutes, 55 seconds)
2026-01-23 11:47:57,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:48:13,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4481.74365 ± 1308.876
2026-01-23 11:48:13,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5230.6533, 5209.8125, 3673.6562, 3676.784, 5210.3037, 1000.2963, 5174.66, 5225.94, 5191.951, 5223.384]
2026-01-23 11:48:13,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 691.0, 703.0, 1000.0, 185.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:48:13,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 19 minutes, 33 seconds)
2026-01-23 11:54:21,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:54:38,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4330.92529 ± 1720.186
2026-01-23 11:54:38,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1306.0686, 5215.0713, 5186.453, 5187.5137, 5157.978, 5167.5205, 5163.6978, 5211.8003, 5200.9824, 512.1674]
2026-01-23 11:54:38,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [281.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 93.0]
2026-01-23 11:54:38,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 12 minutes, 56 seconds)
2026-01-23 12:00:36,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:00:56,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5071.61230 ± 28.619
2026-01-23 12:00:56,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5027.1094, 5050.1675, 5074.583, 5083.274, 5100.943, 5126.4, 5062.3086, 5032.4756, 5080.6333, 5078.225]
2026-01-23 12:00:56,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 12:00:56,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 6 minutes, 26 seconds)
2026-01-23 12:06:43,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:06:59,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3978.43091 ± 1715.102
2026-01-23 12:06:59,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1465.3135, 2162.832, 5095.155, 5148.519, 5108.252, 5113.7285, 5028.7793, 4962.1646, 5086.228, 613.3412]
2026-01-23 12:06:59,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [293.0, 438.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 113.0]
2026-01-23 12:06:59,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1299 [DEBUG]: Training session finished
