2026-01-23 01:54:25,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mda-highdim-mem2
2026-01-23 01:54:25,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mda-highdim-mem2
2026-01-23 01:54:25,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x1516c4d964d0>}
2026-01-23 01:54:25,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-23 01:54:25,956 baseline-bpql-mda-noisy-humanoid:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-23 01:54:25,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-23 01:54:25,974 baseline-bpql-mda-noisy-humanoid:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-23 01:54:25,974 baseline-bpql-mda-noisy-humanoid:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:54:25,982 baseline-bpql-mda-noisy-humanoid:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(17, 512, batch_first=True)
)
2026-01-23 01:54:27,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-23 01:54:28,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-23 02:00:10,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:10,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 264.89359 ± 24.749
2026-01-23 02:00:10,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [258.5807, 306.79953, 228.95818, 270.97873, 303.9318, 246.95288, 238.9192, 280.81488, 263.05615, 249.94392]
2026-01-23 02:00:10,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [49.0, 57.0, 43.0, 50.0, 56.0, 47.0, 45.0, 52.0, 49.0, 47.0]
2026-01-23 02:00:10,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (264.89) for latency DatasetOffice
2026-01-23 02:00:10,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 9 hours, 24 minutes, 46 seconds)
2026-01-23 02:06:07,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:09,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 378.41489 ± 37.400
2026-01-23 02:06:09,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [361.29135, 348.60046, 363.88205, 380.35294, 446.65988, 356.3325, 420.43793, 388.67194, 408.03693, 309.88293]
2026-01-23 02:06:09,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [64.0, 61.0, 65.0, 69.0, 82.0, 63.0, 76.0, 69.0, 74.0, 57.0]
2026-01-23 02:06:09,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (378.41) for latency DatasetOffice
2026-01-23 02:06:09,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 9 hours, 32 minutes)
2026-01-23 02:12:10,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:11,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 361.78339 ± 40.414
2026-01-23 02:12:11,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [453.1395, 374.88522, 343.49524, 341.5373, 355.47702, 372.2768, 348.9728, 327.83795, 401.74057, 298.47134]
2026-01-23 02:12:11,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [83.0, 67.0, 63.0, 63.0, 65.0, 68.0, 65.0, 61.0, 74.0, 55.0]
2026-01-23 02:12:11,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 9 hours, 32 minutes, 35 seconds)
2026-01-23 02:18:08,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:09,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 296.19592 ± 123.156
2026-01-23 02:18:09,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [229.90036, 398.72885, 80.27895, 284.64633, 258.2379, 96.67026, 393.0965, 448.80298, 389.3671, 382.23007]
2026-01-23 02:18:09,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [45.0, 75.0, 18.0, 56.0, 50.0, 21.0, 73.0, 94.0, 74.0, 77.0]
2026-01-23 02:18:09,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 9 hours, 28 minutes, 8 seconds)
2026-01-23 02:24:07,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:08,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 421.66425 ± 70.638
2026-01-23 02:24:08,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [585.5265, 466.66098, 456.75055, 310.3434, 447.90298, 396.8807, 420.57755, 367.87555, 376.377, 387.74716]
2026-01-23 02:24:08,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [112.0, 102.0, 87.0, 66.0, 95.0, 81.0, 82.0, 79.0, 76.0, 75.0]
2026-01-23 02:24:08,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (421.66) for latency DatasetOffice
2026-01-23 02:24:08,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 9 hours, 23 minutes, 41 seconds)
2026-01-23 02:30:08,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:09,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 385.81439 ± 60.199
2026-01-23 02:30:09,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [373.30295, 328.55267, 409.4157, 392.74457, 435.5326, 303.0473, 471.1494, 361.23218, 477.9634, 305.20303]
2026-01-23 02:30:09,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [72.0, 65.0, 75.0, 72.0, 82.0, 61.0, 87.0, 70.0, 89.0, 58.0]
2026-01-23 02:30:09,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 9 hours, 23 minutes, 37 seconds)
2026-01-23 02:36:11,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:13,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 364.72052 ± 86.829
2026-01-23 02:36:13,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [332.14478, 459.89612, 265.45862, 387.35333, 368.566, 398.40866, 449.251, 181.73323, 474.02573, 330.36792]
2026-01-23 02:36:13,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [65.0, 92.0, 54.0, 79.0, 75.0, 75.0, 85.0, 37.0, 90.0, 65.0]
2026-01-23 02:36:13,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 9 hours, 19 minutes, 13 seconds)
2026-01-23 02:42:10,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:11,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 416.40015 ± 99.932
2026-01-23 02:42:11,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [402.3556, 385.53934, 351.68808, 380.902, 380.23267, 391.95306, 390.5508, 400.43045, 713.08606, 367.26355]
2026-01-23 02:42:11,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [75.0, 72.0, 65.0, 70.0, 85.0, 71.0, 72.0, 73.0, 133.0, 67.0]
2026-01-23 02:42:11,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 9 hours, 12 minutes, 8 seconds)
2026-01-23 02:48:12,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:14,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 517.43909 ± 134.787
2026-01-23 02:48:14,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [473.30606, 465.83057, 387.9312, 657.2098, 451.53546, 473.95746, 417.23392, 448.9729, 863.08673, 535.32733]
2026-01-23 02:48:14,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [90.0, 87.0, 75.0, 125.0, 97.0, 90.0, 79.0, 84.0, 168.0, 100.0]
2026-01-23 02:48:14,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (517.44) for latency DatasetOffice
2026-01-23 02:48:14,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 9 hours, 7 minutes, 31 seconds)
2026-01-23 02:54:14,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:16,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 397.70428 ± 85.841
2026-01-23 02:54:16,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [484.68915, 333.1969, 308.8914, 377.78186, 378.64786, 417.4857, 340.31723, 610.5181, 393.5368, 331.97787]
2026-01-23 02:54:16,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [104.0, 73.0, 65.0, 78.0, 83.0, 90.0, 70.0, 116.0, 85.0, 70.0]
2026-01-23 02:54:16,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 9 hours, 2 minutes, 17 seconds)
2026-01-23 03:00:18,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:19,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 411.36050 ± 62.759
2026-01-23 03:00:19,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [424.33252, 359.53906, 315.17795, 375.63693, 326.07947, 456.88455, 451.75, 522.99854, 461.28918, 419.91696]
2026-01-23 03:00:19,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [89.0, 79.0, 69.0, 77.0, 74.0, 86.0, 101.0, 116.0, 98.0, 92.0]
2026-01-23 03:00:19,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 8 hours, 56 minutes, 58 seconds)
2026-01-23 03:06:18,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:06:20,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 458.75674 ± 99.673
2026-01-23 03:06:20,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [387.59302, 604.05023, 500.36908, 453.6112, 418.45444, 441.34116, 581.2411, 356.24332, 278.59125, 566.0723]
2026-01-23 03:06:20,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [73.0, 115.0, 112.0, 97.0, 88.0, 84.0, 125.0, 70.0, 62.0, 105.0]
2026-01-23 03:06:20,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 8 hours, 50 minutes, 11 seconds)
2026-01-23 03:12:21,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:23,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 502.40591 ± 87.310
2026-01-23 03:12:23,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [372.04034, 456.57007, 595.59125, 619.7783, 619.6004, 463.9074, 436.12042, 583.61804, 426.58188, 450.25055]
2026-01-23 03:12:23,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [70.0, 90.0, 110.0, 118.0, 132.0, 87.0, 90.0, 119.0, 100.0, 90.0]
2026-01-23 03:12:23,129 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 8 hours, 45 minutes, 19 seconds)
2026-01-23 03:18:23,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:24,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 464.35248 ± 48.143
2026-01-23 03:18:24,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [509.84354, 426.80054, 514.57245, 543.9988, 395.1438, 490.795, 477.2912, 407.26562, 456.5367, 421.27737]
2026-01-23 03:18:24,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [101.0, 79.0, 104.0, 102.0, 75.0, 92.0, 89.0, 76.0, 83.0, 78.0]
2026-01-23 03:18:24,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 8 hours, 38 minutes, 58 seconds)
2026-01-23 03:24:24,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:26,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 523.20038 ± 102.456
2026-01-23 03:24:26,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [549.8679, 562.11755, 558.5894, 493.63486, 386.42505, 671.3656, 407.71494, 488.34576, 410.12726, 703.8156]
2026-01-23 03:24:26,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [102.0, 107.0, 104.0, 92.0, 71.0, 131.0, 76.0, 104.0, 75.0, 139.0]
2026-01-23 03:24:26,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (523.20) for latency DatasetOffice
2026-01-23 03:24:26,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 8 hours, 32 minutes, 51 seconds)
2026-01-23 03:30:26,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:28,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 446.20282 ± 95.220
2026-01-23 03:30:28,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [315.86844, 555.1741, 478.68646, 420.13882, 497.34027, 326.40646, 415.00735, 326.41705, 575.0937, 551.8956]
2026-01-23 03:30:28,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [68.0, 113.0, 95.0, 86.0, 103.0, 67.0, 91.0, 70.0, 121.0, 111.0]
2026-01-23 03:30:28,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 8 hours, 26 minutes, 24 seconds)
2026-01-23 03:36:30,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:36:32,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 501.38989 ± 138.106
2026-01-23 03:36:32,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [548.8575, 436.48395, 786.56165, 482.82477, 574.28424, 217.59302, 414.44357, 463.2774, 589.32385, 500.24884]
2026-01-23 03:36:32,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [119.0, 83.0, 159.0, 102.0, 117.0, 44.0, 77.0, 86.0, 125.0, 96.0]
2026-01-23 03:36:32,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 8 hours, 21 minutes, 19 seconds)
2026-01-23 03:42:31,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:33,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 516.62451 ± 74.627
2026-01-23 03:42:33,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [502.31607, 622.7995, 435.81738, 526.6818, 438.5177, 644.3436, 554.47394, 415.91766, 470.46707, 554.9102]
2026-01-23 03:42:33,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [93.0, 134.0, 83.0, 100.0, 91.0, 129.0, 114.0, 79.0, 89.0, 109.0]
2026-01-23 03:42:33,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 8 hours, 14 minutes, 47 seconds)
2026-01-23 03:48:37,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:48:38,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 521.66321 ± 66.294
2026-01-23 03:48:38,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [453.6038, 446.7507, 567.2381, 678.6526, 516.1558, 549.0311, 489.18503, 548.0079, 512.44446, 455.56226]
2026-01-23 03:48:38,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [84.0, 91.0, 120.0, 133.0, 93.0, 117.0, 103.0, 101.0, 96.0, 85.0]
2026-01-23 03:48:38,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 8 hours, 9 minutes, 52 seconds)
2026-01-23 03:54:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:54:38,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 601.61993 ± 76.038
2026-01-23 03:54:38,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [737.8323, 694.7927, 604.96924, 651.8998, 473.6533, 556.04443, 530.2639, 557.3327, 569.21344, 640.1972]
2026-01-23 03:54:38,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [142.0, 146.0, 113.0, 128.0, 88.0, 106.0, 101.0, 105.0, 107.0, 121.0]
2026-01-23 03:54:38,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (601.62) for latency DatasetOffice
2026-01-23 03:54:38,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 8 hours, 3 minutes, 14 seconds)
2026-01-23 04:00:40,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:00:42,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 621.91223 ± 113.939
2026-01-23 04:00:42,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [603.53357, 647.5502, 537.5783, 565.24615, 571.62854, 575.0132, 909.3491, 513.4315, 552.9304, 742.862]
2026-01-23 04:00:42,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [133.0, 141.0, 116.0, 122.0, 106.0, 107.0, 186.0, 95.0, 104.0, 142.0]
2026-01-23 04:00:42,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (621.91) for latency DatasetOffice
2026-01-23 04:00:42,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 7 hours, 57 minutes, 38 seconds)
2026-01-23 04:06:43,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:06:45,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 540.99957 ± 47.705
2026-01-23 04:06:45,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [492.3882, 480.33844, 583.37024, 618.93317, 580.1785, 496.7441, 487.3274, 528.65826, 587.0873, 554.9699]
2026-01-23 04:06:45,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [95.0, 93.0, 116.0, 116.0, 123.0, 94.0, 94.0, 112.0, 113.0, 107.0]
2026-01-23 04:06:45,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 7 hours, 51 minutes, 18 seconds)
2026-01-23 04:12:46,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:12:48,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 578.62329 ± 136.254
2026-01-23 04:12:48,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [568.3088, 522.8, 334.9041, 888.335, 530.9374, 481.56598, 546.86847, 585.857, 664.9011, 661.75525]
2026-01-23 04:12:48,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [106.0, 101.0, 74.0, 176.0, 100.0, 93.0, 102.0, 110.0, 128.0, 128.0]
2026-01-23 04:12:48,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 7 hours, 45 minutes, 50 seconds)
2026-01-23 04:18:46,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:18:48,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 559.34973 ± 207.883
2026-01-23 04:18:48,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [668.3772, 843.8806, 527.69476, 529.755, 858.5547, 87.14231, 549.6653, 432.9164, 482.2672, 613.2438]
2026-01-23 04:18:48,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [127.0, 164.0, 98.0, 98.0, 184.0, 18.0, 102.0, 78.0, 89.0, 114.0]
2026-01-23 04:18:48,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 7 hours, 38 minutes, 29 seconds)
2026-01-23 04:24:51,460 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:24:53,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 528.79407 ± 92.932
2026-01-23 04:24:53,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [742.6673, 564.13525, 499.9164, 519.20276, 478.8684, 482.16376, 627.0444, 505.8158, 380.90527, 487.22098]
2026-01-23 04:24:53,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [140.0, 105.0, 96.0, 100.0, 88.0, 90.0, 122.0, 93.0, 76.0, 93.0]
2026-01-23 04:24:53,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 7 hours, 33 minutes, 36 seconds)
2026-01-23 04:30:54,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:30:56,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 502.51010 ± 102.671
2026-01-23 04:30:56,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [499.8886, 346.29697, 434.33472, 695.4086, 615.1449, 389.04565, 510.46124, 591.08075, 515.19885, 428.2409]
2026-01-23 04:30:56,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 66.0, 80.0, 134.0, 116.0, 84.0, 101.0, 127.0, 110.0, 93.0]
2026-01-23 04:30:56,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 7 hours, 27 minutes, 29 seconds)
2026-01-23 04:36:55,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:36:57,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 673.63037 ± 140.129
2026-01-23 04:36:57,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [791.222, 522.59985, 500.88162, 634.831, 658.3381, 942.11127, 753.6808, 813.72284, 514.9816, 603.93475]
2026-01-23 04:36:57,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [168.0, 108.0, 97.0, 124.0, 120.0, 186.0, 142.0, 153.0, 96.0, 111.0]
2026-01-23 04:36:57,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (673.63) for latency DatasetOffice
2026-01-23 04:36:57,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 7 hours, 21 minutes)
2026-01-23 04:42:53,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:42:55,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 585.28662 ± 116.702
2026-01-23 04:42:55,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [434.8944, 818.2136, 695.2728, 449.68164, 690.7576, 506.04135, 645.41284, 520.7456, 562.51984, 529.3263]
2026-01-23 04:42:55,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [93.0, 153.0, 132.0, 100.0, 129.0, 99.0, 127.0, 97.0, 109.0, 117.0]
2026-01-23 04:42:55,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 7 hours, 13 minutes, 42 seconds)
2026-01-23 04:48:55,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:48:57,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 726.64618 ± 120.305
2026-01-23 04:48:57,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [740.6329, 573.793, 761.90735, 583.5662, 890.98676, 776.66364, 632.7648, 576.4592, 878.6434, 851.04425]
2026-01-23 04:48:57,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [138.0, 112.0, 144.0, 113.0, 174.0, 152.0, 135.0, 108.0, 174.0, 163.0]
2026-01-23 04:48:57,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (726.65) for latency DatasetOffice
2026-01-23 04:48:57,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 7 hours, 8 minutes, 9 seconds)
2026-01-23 04:54:55,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:54:57,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 790.61102 ± 165.384
2026-01-23 04:54:57,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [702.7475, 590.60474, 938.2149, 721.9864, 521.0359, 633.2008, 956.91003, 929.13556, 945.88873, 966.386]
2026-01-23 04:54:57,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [152.0, 109.0, 179.0, 136.0, 95.0, 117.0, 191.0, 170.0, 175.0, 179.0]
2026-01-23 04:54:57,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (790.61) for latency DatasetOffice
2026-01-23 04:54:57,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 7 hours, 1 minute, 7 seconds)
2026-01-23 05:00:55,987 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:00:57,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 546.91254 ± 126.927
2026-01-23 05:00:57,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [365.6417, 816.58527, 470.97458, 464.64008, 643.83484, 435.23773, 466.48526, 619.7743, 643.4704, 542.4813]
2026-01-23 05:00:57,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [72.0, 150.0, 89.0, 89.0, 121.0, 82.0, 87.0, 114.0, 116.0, 101.0]
2026-01-23 05:00:57,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 6 hours, 54 minutes, 18 seconds)
2026-01-23 05:07:00,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:07:03,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 884.49298 ± 313.193
2026-01-23 05:07:03,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1102.3394, 531.1183, 1127.8291, 502.71698, 772.72015, 1325.0662, 677.95703, 643.4233, 749.89905, 1411.8599]
2026-01-23 05:07:03,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [210.0, 99.0, 222.0, 112.0, 155.0, 250.0, 131.0, 117.0, 160.0, 268.0]
2026-01-23 05:07:03,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (884.49) for latency DatasetOffice
2026-01-23 05:07:03,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 6 hours, 49 minutes, 19 seconds)
2026-01-23 05:12:55,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:12:57,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 733.62653 ± 201.835
2026-01-23 05:12:57,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [509.10965, 868.24915, 778.5952, 437.188, 1160.3788, 817.8498, 670.90607, 515.26624, 758.4832, 820.2389]
2026-01-23 05:12:57,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [95.0, 165.0, 148.0, 86.0, 220.0, 152.0, 143.0, 99.0, 160.0, 150.0]
2026-01-23 05:12:57,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 6 hours, 42 minutes, 33 seconds)
2026-01-23 05:18:58,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:19:01,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 921.89142 ± 286.815
2026-01-23 05:19:01,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1054.9862, 898.71686, 802.86346, 714.5785, 817.95325, 590.13837, 889.12585, 1690.1385, 1005.9299, 754.4837]
2026-01-23 05:19:01,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [196.0, 169.0, 150.0, 147.0, 156.0, 116.0, 167.0, 323.0, 185.0, 142.0]
2026-01-23 05:19:01,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (921.89) for latency DatasetOffice
2026-01-23 05:19:01,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 6 hours, 36 minutes, 45 seconds)
2026-01-23 05:24:58,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:25:01,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 741.46368 ± 232.937
2026-01-23 05:25:01,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1043.0951, 808.37775, 881.2492, 429.47998, 520.4864, 515.0775, 503.80045, 880.8187, 700.1801, 1132.071]
2026-01-23 05:25:01,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [221.0, 174.0, 181.0, 96.0, 114.0, 114.0, 111.0, 166.0, 148.0, 215.0]
2026-01-23 05:25:01,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 6 hours, 30 minutes, 46 seconds)
2026-01-23 05:31:02,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:31:04,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 692.30920 ± 117.070
2026-01-23 05:31:04,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [548.1334, 738.90216, 667.32996, 713.0331, 524.2036, 721.694, 960.9553, 659.81714, 767.6027, 621.4208]
2026-01-23 05:31:04,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [103.0, 135.0, 122.0, 135.0, 118.0, 137.0, 182.0, 123.0, 144.0, 111.0]
2026-01-23 05:31:04,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 6 hours, 25 minutes, 30 seconds)
2026-01-23 05:37:06,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:37:09,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 916.90118 ± 339.883
2026-01-23 05:37:09,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1021.1345, 1603.4083, 1482.3595, 557.73914, 827.6517, 694.1726, 571.67096, 814.7354, 864.667, 731.47284]
2026-01-23 05:37:09,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [209.0, 322.0, 275.0, 105.0, 172.0, 153.0, 111.0, 166.0, 187.0, 162.0]
2026-01-23 05:37:09,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 6 hours, 19 minutes, 14 seconds)
2026-01-23 05:43:14,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:43:16,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 716.58783 ± 347.254
2026-01-23 05:43:16,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [437.91895, 598.7914, 532.12854, 769.61255, 582.96704, 687.6468, 548.49786, 415.18378, 931.2253, 1661.9064]
2026-01-23 05:43:16,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [83.0, 109.0, 106.0, 148.0, 113.0, 131.0, 101.0, 77.0, 177.0, 299.0]
2026-01-23 05:43:16,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 6 hours, 15 minutes, 51 seconds)
2026-01-23 05:49:09,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:49:11,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 608.70691 ± 63.494
2026-01-23 05:49:11,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [770.1892, 632.5932, 565.5626, 598.65173, 593.866, 610.2414, 518.73175, 563.99414, 641.7422, 591.4973]
2026-01-23 05:49:11,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [142.0, 116.0, 106.0, 128.0, 110.0, 121.0, 96.0, 108.0, 119.0, 109.0]
2026-01-23 05:49:11,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 6 hours, 8 minutes, 5 seconds)
2026-01-23 05:55:09,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:55:11,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 607.67908 ± 150.512
2026-01-23 05:55:11,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [531.5091, 658.749, 531.03076, 669.5822, 549.05963, 430.84628, 651.04517, 481.92194, 571.99, 1001.0568]
2026-01-23 05:55:11,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [96.0, 121.0, 100.0, 124.0, 107.0, 87.0, 122.0, 94.0, 110.0, 179.0]
2026-01-23 05:55:11,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 6 hours, 1 minute, 57 seconds)
2026-01-23 06:01:13,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:01:17,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1112.64099 ± 440.243
2026-01-23 06:01:17,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2277.124, 889.70154, 704.61316, 1385.9097, 1144.6677, 1085.8086, 912.32495, 1025.3367, 622.11615, 1078.8076]
2026-01-23 06:01:17,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [421.0, 179.0, 152.0, 273.0, 234.0, 196.0, 189.0, 186.0, 117.0, 221.0]
2026-01-23 06:01:17,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1112.64) for latency DatasetOffice
2026-01-23 06:01:17,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 5 hours, 56 minutes, 25 seconds)
2026-01-23 06:07:13,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:07:15,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 791.83527 ± 156.653
2026-01-23 06:07:15,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1082.2802, 684.1391, 787.2274, 1082.1648, 624.68243, 749.01227, 666.39764, 716.38934, 841.36664, 684.6926]
2026-01-23 06:07:15,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [194.0, 125.0, 146.0, 199.0, 116.0, 138.0, 123.0, 135.0, 149.0, 131.0]
2026-01-23 06:07:15,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 5 hours, 49 minutes, 10 seconds)
2026-01-23 06:13:17,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:13:21,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1125.90405 ± 425.786
2026-01-23 06:13:21,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [553.0797, 778.6447, 960.20105, 713.15533, 1194.8954, 788.63495, 1784.814, 1582.0646, 1140.9232, 1762.6293]
2026-01-23 06:13:21,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [127.0, 159.0, 207.0, 132.0, 214.0, 159.0, 323.0, 296.0, 206.0, 322.0]
2026-01-23 06:13:21,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1125.90) for latency DatasetOffice
2026-01-23 06:13:21,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 5 hours, 42 minutes, 50 seconds)
2026-01-23 06:19:24,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:19:27,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 862.81805 ± 203.153
2026-01-23 06:19:27,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [894.96704, 826.689, 556.82117, 481.80814, 980.70294, 1062.835, 1196.3309, 935.36847, 864.0631, 828.59534]
2026-01-23 06:19:27,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [161.0, 151.0, 125.0, 110.0, 177.0, 199.0, 211.0, 167.0, 162.0, 147.0]
2026-01-23 06:19:27,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 5 hours, 38 minutes, 59 seconds)
2026-01-23 06:25:18,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:25:21,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1033.44263 ± 414.795
2026-01-23 06:25:21,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [838.86615, 1021.2151, 918.69794, 521.089, 1406.8438, 567.2441, 1919.0364, 1457.0127, 760.76715, 923.65405]
2026-01-23 06:25:21,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [158.0, 190.0, 170.0, 93.0, 252.0, 130.0, 365.0, 274.0, 139.0, 170.0]
2026-01-23 06:25:21,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 5 hours, 31 minutes, 49 seconds)
2026-01-23 06:31:21,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:31:25,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1169.73022 ± 323.800
2026-01-23 06:31:25,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1224.8099, 1481.7601, 898.31824, 1612.146, 621.1582, 903.84644, 1640.625, 1038.8698, 956.015, 1319.7551]
2026-01-23 06:31:25,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [226.0, 268.0, 174.0, 285.0, 115.0, 172.0, 305.0, 186.0, 177.0, 239.0]
2026-01-23 06:31:25,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1169.73) for latency DatasetOffice
2026-01-23 06:31:25,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 5 hours, 25 minutes, 30 seconds)
2026-01-23 06:37:28,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:37:32,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1274.53259 ± 635.002
2026-01-23 06:37:32,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [407.008, 1228.4536, 981.50916, 642.41187, 1052.4237, 2574.3337, 869.14233, 1858.163, 1130.9149, 2000.9662]
2026-01-23 06:37:32,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [82.0, 229.0, 200.0, 119.0, 203.0, 478.0, 180.0, 366.0, 218.0, 371.0]
2026-01-23 06:37:32,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1274.53) for latency DatasetOffice
2026-01-23 06:37:32,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 5 hours, 21 minutes, 2 seconds)
2026-01-23 06:43:30,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:43:33,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 818.50061 ± 437.275
2026-01-23 06:43:33,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1217.4427, 1472.8219, 830.439, 1574.0278, 817.06104, 433.72342, 365.7059, 352.61752, 401.41666, 719.7506]
2026-01-23 06:43:33,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [268.0, 292.0, 156.0, 286.0, 155.0, 80.0, 71.0, 70.0, 83.0, 165.0]
2026-01-23 06:43:33,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 5 hours, 14 minutes, 10 seconds)
2026-01-23 06:49:29,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:49:34,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1564.31396 ± 999.416
2026-01-23 06:49:34,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1203.0986, 833.34796, 835.19794, 1927.2129, 1945.3313, 4233.1455, 1620.4211, 772.6883, 1581.4684, 691.2277]
2026-01-23 06:49:34,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [253.0, 185.0, 161.0, 350.0, 372.0, 875.0, 295.0, 171.0, 345.0, 145.0]
2026-01-23 06:49:34,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1564.31) for latency DatasetOffice
2026-01-23 06:49:34,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 5 hours, 7 minutes, 16 seconds)
2026-01-23 06:55:35,715 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:55:41,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1829.91467 ± 809.606
2026-01-23 06:55:41,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1963.7236, 2751.6206, 1888.6312, 2486.3608, 1039.4668, 823.3485, 1434.3915, 3472.9124, 1137.6365, 1301.0547]
2026-01-23 06:55:41,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [371.0, 523.0, 351.0, 458.0, 193.0, 155.0, 280.0, 662.0, 222.0, 247.0]
2026-01-23 06:55:41,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (1829.91) for latency DatasetOffice
2026-01-23 06:55:41,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 5 hours, 3 minutes, 23 seconds)
2026-01-23 07:01:54,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:01:57,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1342.05725 ± 742.800
2026-01-23 07:01:57,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3166.14, 1352.4352, 1892.802, 902.46625, 1516.157, 1477.2803, 793.14465, 470.2113, 1268.2124, 581.7227]
2026-01-23 07:01:57,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [580.0, 245.0, 335.0, 163.0, 271.0, 287.0, 148.0, 89.0, 235.0, 124.0]
2026-01-23 07:01:58,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 4 hours, 59 minutes, 19 seconds)
2026-01-23 07:07:52,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:07:59,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2004.79224 ± 1808.526
2026-01-23 07:07:59,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3644.604, 2369.83, 5032.0005, 1069.312, 4966.2866, 1460.6499, 658.1398, 72.55549, 389.60056, 384.9433]
2026-01-23 07:07:59,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [763.0, 445.0, 1000.0, 239.0, 1000.0, 317.0, 134.0, 15.0, 76.0, 74.0]
2026-01-23 07:07:59,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (2004.79) for latency DatasetOffice
2026-01-23 07:07:59,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 4 hours, 52 minutes, 18 seconds)
2026-01-23 07:14:17,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:14:26,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2789.14600 ± 1737.323
2026-01-23 07:14:26,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1381.6649, 1253.9562, 2356.357, 3342.042, 1677.1268, 5598.2236, 5525.856, 2654.2205, 98.17324, 4003.8416]
2026-01-23 07:14:26,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [247.0, 230.0, 423.0, 608.0, 316.0, 1000.0, 1000.0, 477.0, 21.0, 713.0]
2026-01-23 07:14:26,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (2789.15) for latency DatasetOffice
2026-01-23 07:14:26,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 4 hours, 50 minutes, 17 seconds)
2026-01-23 07:20:25,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:20:40,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4381.46875 ± 1576.420
2026-01-23 07:20:40,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5199.036, 5277.679, 5267.386, 769.7256, 5317.361, 4922.8525, 5204.2803, 1768.9851, 5072.8057, 5014.58]
2026-01-23 07:20:40,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 162.0, 1000.0, 1000.0, 1000.0, 378.0, 973.0, 1000.0]
2026-01-23 07:20:40,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (4381.47) for latency DatasetOffice
2026-01-23 07:20:40,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 4 hours, 46 minutes, 6 seconds)
2026-01-23 07:26:20,545 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:26:26,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 1738.79199 ± 1642.466
2026-01-23 07:26:26,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2331.5515, 670.98645, 3990.9397, 1327.6398, 5214.4277, 2427.8037, 303.9134, 276.066, 533.196, 311.3971]
2026-01-23 07:26:26,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [449.0, 123.0, 743.0, 257.0, 1000.0, 447.0, 62.0, 54.0, 101.0, 63.0]
2026-01-23 07:26:26,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 4 hours, 36 minutes, 42 seconds)
2026-01-23 07:32:36,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:32:51,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4238.97803 ± 1530.220
2026-01-23 07:32:51,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5186.455, 5117.459, 5098.105, 5154.03, 1355.6548, 5143.802, 1178.0105, 3911.1006, 5116.332, 5128.8335]
2026-01-23 07:32:51,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 265.0, 1000.0, 214.0, 757.0, 1000.0, 1000.0]
2026-01-23 07:32:51,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 4 hours, 31 minutes, 50 seconds)
2026-01-23 07:38:45,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:39:01,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4724.59082 ± 913.742
2026-01-23 07:39:01,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5102.659, 4358.4277, 5159.115, 5417.42, 5266.6206, 3647.9133, 5280.5083, 5138.7744, 2482.9414, 5391.526]
2026-01-23 07:39:01,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [942.0, 796.0, 1000.0, 1000.0, 1000.0, 679.0, 1000.0, 1000.0, 471.0, 1000.0]
2026-01-23 07:39:01,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (4724.59) for latency DatasetOffice
2026-01-23 07:39:01,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 4 hours, 26 minutes, 49 seconds)
2026-01-23 07:45:06,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:45:17,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3164.87402 ± 2044.009
2026-01-23 07:45:17,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3752.832, 5280.1626, 1812.9493, 5305.842, 1027.3633, 5182.2217, 5308.9775, 3510.9966, 166.95265, 300.44244]
2026-01-23 07:45:17,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [733.0, 1000.0, 357.0, 1000.0, 186.0, 1000.0, 1000.0, 673.0, 33.0, 56.0]
2026-01-23 07:45:17,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 4 hours, 19 minutes, 8 seconds)
2026-01-23 07:51:12,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:51:19,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2237.64575 ± 1842.270
2026-01-23 07:51:19,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3907.1316, 988.3012, 772.06177, 1383.9124, 5399.2534, 1424.7997, 5524.341, 1551.2513, 454.004, 971.4003]
2026-01-23 07:51:19,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [717.0, 177.0, 145.0, 246.0, 1000.0, 259.0, 1000.0, 275.0, 103.0, 172.0]
2026-01-23 07:51:19,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 4 hours, 11 minutes, 16 seconds)
2026-01-23 07:57:42,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:57:54,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3525.01367 ± 1617.740
2026-01-23 07:57:54,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [979.11053, 5227.4487, 4067.773, 2638.1165, 2539.806, 5203.2603, 1143.1273, 5211.9575, 2958.7437, 5280.7925]
2026-01-23 07:57:54,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [176.0, 1000.0, 779.0, 482.0, 487.0, 1000.0, 209.0, 1000.0, 569.0, 1000.0]
2026-01-23 07:57:54,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 4 hours, 11 minutes, 47 seconds)
2026-01-23 08:03:59,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:04:08,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2861.67090 ± 2134.396
2026-01-23 08:04:08,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5337.787, 5355.0645, 2699.4233, 860.7254, 4878.853, 5410.495, 2946.0564, 165.29773, 530.0939, 432.9139]
2026-01-23 08:04:08,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 499.0, 168.0, 916.0, 1000.0, 548.0, 34.0, 100.0, 76.0]
2026-01-23 08:04:08,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 4 hours, 4 minutes, 4 seconds)
2026-01-23 08:10:02,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:10:12,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3003.17236 ± 1301.424
2026-01-23 08:10:12,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2928.8403, 1721.3113, 1721.351, 4917.1343, 3800.637, 1694.1608, 1485.9923, 3278.0417, 3196.861, 5287.393]
2026-01-23 08:10:12,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [551.0, 331.0, 315.0, 895.0, 729.0, 313.0, 270.0, 603.0, 614.0, 1000.0]
2026-01-23 08:10:12,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 3 hours, 57 minutes, 1 second)
2026-01-23 08:16:06,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:16:20,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4228.31787 ± 1762.938
2026-01-23 08:16:20,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5388.1035, 1720.5315, 5366.022, 600.7071, 2540.721, 5231.746, 5457.3276, 5249.797, 5400.675, 5327.5503]
2026-01-23 08:16:20,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 322.0, 1000.0, 132.0, 474.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:16:20,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 3 hours, 49 minutes, 47 seconds)
2026-01-23 08:22:18,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:22:35,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4917.07910 ± 1350.062
2026-01-23 08:22:35,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5360.349, 5466.4033, 869.40204, 5265.534, 5387.1245, 5366.3203, 5351.8477, 5383.1997, 5334.1587, 5386.453]
2026-01-23 08:22:35,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 170.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:22:35,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (4917.08) for latency DatasetOffice
2026-01-23 08:22:35,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 3 hours, 45 minutes, 4 seconds)
2026-01-23 08:28:34,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:28:50,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5081.38672 ± 1113.677
2026-01-23 08:28:50,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5468.1167, 5470.958, 5450.7734, 5424.9507, 1741.2926, 5399.2773, 5451.4863, 5487.74, 5486.425, 5432.8457]
2026-01-23 08:28:50,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 995.0, 1000.0, 323.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:28:50,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (5081.39) for latency DatasetOffice
2026-01-23 08:28:50,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 3 hours, 36 minutes, 32 seconds)
2026-01-23 08:34:37,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:34:53,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4596.37354 ± 1230.336
2026-01-23 08:34:53,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5140.2773, 5216.4727, 5168.7485, 5174.1465, 2963.7068, 1488.7281, 5182.4546, 5223.4233, 5204.748, 5201.03]
2026-01-23 08:34:53,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 570.0, 320.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:34:53,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 3 hours, 29 minutes)
2026-01-23 08:40:27,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:40:37,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3039.95923 ± 2314.512
2026-01-23 08:40:37,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [119.85616, 88.49521, 892.4485, 5357.3325, 5353.5186, 5435.6987, 3696.557, 5389.1074, 3941.2957, 125.28286]
2026-01-23 08:40:37,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [23.0, 19.0, 190.0, 1000.0, 1000.0, 1000.0, 684.0, 1000.0, 729.0, 24.0]
2026-01-23 08:40:37,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 3 hours, 20 minutes, 43 seconds)
2026-01-23 08:46:34,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:46:48,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4614.30029 ± 913.448
2026-01-23 08:46:48,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [4323.0474, 5574.925, 4027.1826, 5625.3516, 5571.997, 3870.5615, 5605.8267, 4000.4124, 4699.092, 2844.6057]
2026-01-23 08:46:48,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [790.0, 1000.0, 726.0, 1000.0, 1000.0, 692.0, 1000.0, 736.0, 848.0, 516.0]
2026-01-23 08:46:48,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 3 hours, 15 minutes)
2026-01-23 08:53:12,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:53:26,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4008.02490 ± 1729.231
2026-01-23 08:53:26,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5286.017, 1693.7184, 5308.444, 5315.3706, 3930.7405, 1479.1544, 5293.649, 5261.28, 5348.423, 1163.4525]
2026-01-23 08:53:26,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 316.0, 1000.0, 1000.0, 746.0, 299.0, 1000.0, 1000.0, 1000.0, 222.0]
2026-01-23 08:53:26,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 3 hours, 11 minutes, 16 seconds)
2026-01-23 08:59:00,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:59:03,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 858.62714 ± 1457.516
2026-01-23 08:59:03,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5134.03, 983.0706, 963.5509, 160.29253, 316.27518, 314.178, 252.4842, 135.85031, 125.379875, 201.1601]
2026-01-23 08:59:03,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 203.0, 189.0, 31.0, 59.0, 61.0, 53.0, 26.0, 29.0, 39.0]
2026-01-23 08:59:03,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 3 hours, 1 minute, 14 seconds)
2026-01-23 09:04:51,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:05:07,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4603.68945 ± 1156.561
2026-01-23 09:05:07,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5204.228, 5168.9395, 2270.1025, 5195.503, 5197.6553, 2311.5574, 5170.681, 5151.2627, 5186.188, 5180.7783]
2026-01-23 09:05:07,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 439.0, 1000.0, 1000.0, 447.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:05:07,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 55 minutes, 24 seconds)
2026-01-23 09:11:24,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:11:40,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4695.21045 ± 915.282
2026-01-23 09:11:40,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2691.17, 5350.4077, 5271.1, 5236.0684, 5286.428, 3507.6226, 5133.181, 5306.972, 5276.428, 3892.7258]
2026-01-23 09:11:40,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [501.0, 1000.0, 1000.0, 1000.0, 1000.0, 667.0, 1000.0, 1000.0, 1000.0, 718.0]
2026-01-23 09:11:40,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 53 minutes, 52 seconds)
2026-01-23 09:17:16,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:17:26,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2979.53320 ± 1979.131
2026-01-23 09:17:26,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5152.466, 5140.738, 4528.076, 5158.462, 3601.2542, 3367.1265, 165.04791, 1759.9982, 481.95047, 440.2127]
2026-01-23 09:17:26,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 870.0, 1000.0, 692.0, 644.0, 32.0, 341.0, 89.0, 87.0]
2026-01-23 09:17:26,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 45 minutes, 26 seconds)
2026-01-23 09:23:43,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:24:00,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4878.89404 ± 771.680
2026-01-23 09:24:00,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5267.986, 5247.75, 5260.0923, 5279.366, 5229.988, 5300.614, 5255.311, 3629.1348, 5236.7505, 3081.9504]
2026-01-23 09:24:00,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 674.0, 1000.0, 586.0]
2026-01-23 09:24:00,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 2 hours, 38 minutes, 56 seconds)
2026-01-23 09:29:34,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:29:51,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4861.93408 ± 829.401
2026-01-23 09:29:51,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5177.4224, 5250.8755, 5206.291, 5271.2163, 5254.2, 4226.1704, 2543.0476, 5244.3584, 5217.8667, 5227.892]
2026-01-23 09:29:51,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 804.0, 486.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:29:51,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 2 hours, 34 minutes)
2026-01-23 09:36:02,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:36:20,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5237.89990 ± 16.745
2026-01-23 09:36:20,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5253.2354, 5245.6494, 5236.0063, 5221.0747, 5277.9287, 5222.374, 5226.5645, 5229.3623, 5225.135, 5241.671]
2026-01-23 09:36:20,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:36:20,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1274 [INFO]: New best (5237.90) for latency DatasetOffice
2026-01-23 09:36:20,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 2 hours, 29 minutes, 48 seconds)
2026-01-23 09:42:26,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:42:44,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4933.95752 ± 713.062
2026-01-23 09:42:44,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5143.444, 5117.094, 5204.4814, 5223.6997, 2797.6401, 5153.811, 5173.7124, 5138.0474, 5151.1465, 5236.4956]
2026-01-23 09:42:44,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 536.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:42:44,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 2 hours, 22 minutes, 55 seconds)
2026-01-23 09:48:16,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:48:25,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2606.82031 ± 2212.452
2026-01-23 09:48:25,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5120.382, 5124.3228, 684.87775, 5157.093, 5168.3853, 3132.0146, 223.00954, 720.5393, 379.7858, 357.79306]
2026-01-23 09:48:25,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 150.0, 1000.0, 1000.0, 608.0, 44.0, 133.0, 71.0, 67.0]
2026-01-23 09:48:25,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 2 hours, 16 minutes, 18 seconds)
2026-01-23 09:54:18,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:54:33,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4426.10254 ± 1581.524
2026-01-23 09:54:33,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5212.684, 5253.953, 5233.812, 1234.7319, 5191.381, 5197.4727, 5190.134, 1292.0546, 5221.5386, 5233.2627]
2026-01-23 09:54:33,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 245.0, 1000.0, 1000.0, 1000.0, 242.0, 1000.0, 1000.0]
2026-01-23 09:54:33,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 2 hours, 8 minutes, 19 seconds)
2026-01-23 10:00:35,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:00:51,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4994.73682 ± 906.142
2026-01-23 10:00:51,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5334.636, 5301.8755, 5321.5664, 5297.5933, 5273.573, 5289.643, 5267.4263, 2277.1062, 5316.822, 5267.1235]
2026-01-23 10:00:51,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 438.0, 1000.0, 1000.0]
2026-01-23 10:00:51,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 2 hours, 4 minutes)
2026-01-23 10:06:54,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:07:04,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3172.23291 ± 2298.865
2026-01-23 10:07:04,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5302.9243, 5318.5933, 5304.141, 5391.105, 3297.842, 289.96844, 525.1488, 483.09662, 503.93558, 5305.5728]
2026-01-23 10:07:04,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 625.0, 54.0, 97.0, 92.0, 100.0, 1000.0]
2026-01-23 10:07:04,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 56 minutes, 49 seconds)
2026-01-23 10:12:41,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:12:59,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5164.38477 ± 17.003
2026-01-23 10:12:59,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5143.8022, 5163.529, 5153.601, 5180.284, 5163.798, 5154.395, 5185.705, 5135.7, 5173.3584, 5189.6743]
2026-01-23 10:12:59,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:12:59,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 48 minutes, 54 seconds)
2026-01-23 10:18:33,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:18:47,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4052.67578 ± 1686.905
2026-01-23 10:18:47,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5146.376, 5102.5938, 1495.2185, 897.94025, 5166.918, 5159.2666, 2141.1106, 5094.918, 5139.884, 5182.531]
2026-01-23 10:18:47,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 314.0, 180.0, 1000.0, 1000.0, 447.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:18:47,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 43 minutes, 13 seconds)
2026-01-23 10:24:50,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:24:58,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2480.09204 ± 1906.372
2026-01-23 10:24:58,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5180.9146, 4550.299, 5142.6924, 3558.4634, 679.70496, 2555.1062, 179.56691, 1901.5021, 499.18222, 553.48895]
2026-01-23 10:24:58,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 884.0, 1000.0, 687.0, 136.0, 492.0, 35.0, 377.0, 106.0, 110.0]
2026-01-23 10:24:58,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 37 minutes, 20 seconds)
2026-01-23 10:30:35,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:30:51,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4973.86426 ± 605.846
2026-01-23 10:30:51,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5183.049, 5204.1943, 5201.485, 5215.372, 5154.2607, 5200.3467, 5160.971, 3158.7063, 5153.2554, 5107.004]
2026-01-23 10:30:51,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 615.0, 1000.0, 1000.0]
2026-01-23 10:30:51,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 30 minutes, 1 second)
2026-01-23 10:36:40,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:36:52,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3693.44141 ± 1837.547
2026-01-23 10:36:52,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5079.748, 1801.1083, 5115.307, 86.435875, 1624.5336, 5123.211, 5145.306, 5091.127, 2765.6575, 5101.9795]
2026-01-23 10:36:52,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 369.0, 1000.0, 19.0, 327.0, 1000.0, 1000.0, 1000.0, 540.0, 1000.0]
2026-01-23 10:36:52,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 23 minutes, 26 seconds)
2026-01-23 10:42:35,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:42:46,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3190.13525 ± 2329.074
2026-01-23 10:42:46,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [95.46746, 314.92825, 710.733, 286.21997, 5183.2314, 5194.8516, 4550.653, 5182.5864, 5196.6743, 5186.008]
2026-01-23 10:42:46,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [19.0, 58.0, 139.0, 57.0, 1000.0, 1000.0, 852.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:42:46,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 17 minutes, 25 seconds)
2026-01-23 10:49:03,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:49:19,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4785.98828 ± 1315.011
2026-01-23 10:49:19,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5240.23, 5234.1016, 5235.3647, 5215.82, 5225.9937, 5218.352, 5248.034, 5222.4106, 5178.248, 841.3271]
2026-01-23 10:49:19,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 150.0]
2026-01-23 10:49:19,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 1 hour, 13 minutes, 16 seconds)
2026-01-23 10:55:06,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:55:20,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4284.02979 ± 1353.952
2026-01-23 10:55:20,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5272.6763, 5290.4053, 3738.301, 3565.592, 5252.4395, 5285.2593, 5302.248, 5221.921, 2599.5525, 1311.9039]
2026-01-23 10:55:20,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 695.0, 680.0, 1000.0, 1000.0, 1000.0, 1000.0, 490.0, 248.0]
2026-01-23 10:55:20,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 1 hour, 6 minutes, 48 seconds)
2026-01-23 11:00:45,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:00:56,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 3803.26636 ± 2207.999
2026-01-23 11:00:56,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [978.462, 273.45596, 262.9689, 5388.797, 5460.8657, 5428.4224, 3978.2942, 5428.6167, 5430.3115, 5402.47]
2026-01-23 11:00:56,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [184.0, 61.0, 53.0, 1000.0, 1000.0, 1000.0, 737.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:00:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 9 seconds)
2026-01-23 11:06:45,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:06:59,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4275.38867 ± 1622.388
2026-01-23 11:06:59,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5265.775, 3516.075, 5259.722, 1998.8999, 534.7343, 5244.9663, 5313.705, 5212.4985, 5196.8315, 5210.6807]
2026-01-23 11:06:59,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 681.0, 1000.0, 375.0, 111.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:06:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 54 minutes, 12 seconds)
2026-01-23 11:12:48,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:13:02,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4500.62256 ± 1419.110
2026-01-23 11:13:02,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [2643.5405, 5218.506, 5199.245, 5201.6694, 902.7675, 5247.222, 5079.4844, 5208.163, 5210.7085, 5094.9233]
2026-01-23 11:13:02,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [507.0, 1000.0, 1000.0, 1000.0, 175.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:13:02,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 48 minutes, 26 seconds)
2026-01-23 11:19:13,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:19:27,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4118.30273 ± 1647.012
2026-01-23 11:19:27,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5193.792, 5178.239, 5155.986, 5202.481, 2427.207, 5174.623, 1474.4481, 5199.083, 1046.55, 5130.617]
2026-01-23 11:19:27,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 466.0, 1000.0, 294.0, 1000.0, 204.0, 1000.0]
2026-01-23 11:19:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 42 minutes, 10 seconds)
2026-01-23 11:25:14,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:25:28,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4355.87402 ± 1845.884
2026-01-23 11:25:28,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5289.845, 5206.194, 5285.535, 5297.7427, 5299.5894, 5229.971, 5303.8623, 644.6364, 684.8207, 5316.5435]
2026-01-23 11:25:28,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 116.0, 144.0, 1000.0]
2026-01-23 11:25:28,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 36 minutes, 10 seconds)
2026-01-23 11:30:49,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:30:58,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2866.76147 ± 2146.203
2026-01-23 11:30:58,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [3144.678, 5188.9805, 5179.6006, 3005.0303, 427.5045, 338.5671, 503.02533, 405.80182, 5247.0244, 5227.405]
2026-01-23 11:30:58,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [591.0, 1000.0, 1000.0, 566.0, 79.0, 73.0, 97.0, 76.0, 1000.0, 1000.0]
2026-01-23 11:30:58,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 30 minutes, 1 second)
2026-01-23 11:37:07,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:37:24,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 5224.40039 ± 325.589
2026-01-23 11:37:24,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5344.229, 5349.322, 5327.5425, 4249.1206, 5361.239, 5307.8076, 5338.5674, 5347.191, 5305.2734, 5313.712]
2026-01-23 11:37:24,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 796.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:37:24,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 24 minutes, 20 seconds)
2026-01-23 11:43:20,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:43:34,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4295.64111 ± 1338.540
2026-01-23 11:43:34,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5167.5415, 5158.891, 5139.6294, 2171.6375, 5183.577, 5102.466, 3773.4954, 1368.3041, 4769.397, 5121.4717]
2026-01-23 11:43:34,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 418.0, 1000.0, 1000.0, 736.0, 263.0, 1000.0, 1000.0]
2026-01-23 11:43:34,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 18 minutes, 18 seconds)
2026-01-23 11:49:03,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:49:09,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 2100.37256 ± 2293.027
2026-01-23 11:49:09,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5324.71, 5308.376, 3249.3572, 405.87418, 211.15463, 278.84177, 101.57999, 217.17464, 522.6069, 5384.0493]
2026-01-23 11:49:09,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 609.0, 76.0, 41.0, 53.0, 22.0, 44.0, 99.0, 1000.0]
2026-01-23 11:49:09,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 11 minutes, 53 seconds)
2026-01-23 11:54:59,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:55:14,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4426.91309 ± 1460.964
2026-01-23 11:55:14,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [1409.5326, 5168.197, 5179.5117, 5137.467, 1603.3534, 5161.2856, 5168.486, 5133.5884, 5133.9277, 5173.777]
2026-01-23 11:55:14,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [305.0, 1000.0, 1000.0, 1000.0, 313.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:55:14,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 5 minutes, 57 seconds)
2026-01-23 12:01:03,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:01:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1269 [DEBUG]: Total Reward: 4306.05811 ± 1859.585
2026-01-23 12:01:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1270 [DEBUG]: All rewards: [5274.2334, 5264.782, 5228.5396, 745.1996, 434.3597, 5216.5215, 5226.185, 5237.5645, 5186.048, 5247.1484]
2026-01-23 12:01:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 143.0, 84.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 12:01:17,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1299 [DEBUG]: Training session finished
