2026-01-22 23:01:37,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-mbpac-highdim-memdelay
2026-01-22 23:01:37,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-mbpac-highdim-memdelay
2026-01-22 23:01:37,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x1475019f8590>}
2026-01-22 23:01:37,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-22 23:01:37,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-22 23:01:37,388 baseline-mbpac-noisy-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-22 23:01:37,388 baseline-mbpac-noisy-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:01:37,399 baseline-mbpac-noisy-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2026-01-22 23:01:39,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-22 23:01:39,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-22 23:16:35,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:16:35,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:16:41,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 84.48461 ± 7.871
2026-01-22 23:16:41,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [78.10777, 96.58915, 101.73629, 78.233955, 84.439445, 83.606026, 82.610374, 77.655556, 84.0558, 77.81177]
2026-01-22 23:16:41,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [16.0, 19.0, 20.0, 16.0, 17.0, 17.0, 17.0, 16.0, 17.0, 16.0]
2026-01-22 23:16:41,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (84.48) for latency DatasetOffice
2026-01-22 23:16:41,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 24 hours, 49 minutes, 9 seconds)
2026-01-22 23:32:16,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:32:16,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:48,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 435.93848 ± 73.129
2026-01-22 23:32:48,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [493.37082, 595.89435, 352.96368, 436.1718, 479.86346, 320.921, 431.14365, 413.142, 391.87253, 444.04163]
2026-01-22 23:32:48,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [93.0, 114.0, 67.0, 82.0, 95.0, 68.0, 85.0, 83.0, 73.0, 83.0]
2026-01-22 23:32:48,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (435.94) for latency DatasetOffice
2026-01-22 23:32:48,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 25 hours, 26 minutes, 18 seconds)
2026-01-22 23:48:21,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:48:21,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:55,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 455.73233 ± 109.359
2026-01-22 23:48:55,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [617.4723, 423.49384, 644.2069, 519.6059, 363.43137, 352.69803, 367.84387, 405.47037, 536.99634, 326.1047]
2026-01-22 23:48:55,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [117.0, 83.0, 122.0, 115.0, 69.0, 72.0, 67.0, 74.0, 101.0, 60.0]
2026-01-22 23:48:55,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (455.73) for latency DatasetOffice
2026-01-22 23:48:55,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 28 minutes, 15 seconds)
2026-01-23 00:04:21,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:04:21,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:48,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 362.86053 ± 31.255
2026-01-23 00:04:48,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [323.68384, 384.71652, 369.52737, 371.59818, 317.8098, 362.92795, 367.4852, 402.98907, 319.3685, 408.49902]
2026-01-23 00:04:48,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [66.0, 72.0, 68.0, 69.0, 72.0, 66.0, 67.0, 74.0, 65.0, 75.0]
2026-01-23 00:04:48,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 15 minutes, 36 seconds)
2026-01-23 00:20:01,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:20:01,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:34,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 455.80679 ± 94.209
2026-01-23 00:20:34,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [462.83698, 448.7888, 497.87308, 367.50677, 408.53647, 380.78326, 397.2472, 350.94064, 650.9827, 592.572]
2026-01-23 00:20:34,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [102.0, 85.0, 93.0, 67.0, 77.0, 72.0, 72.0, 67.0, 122.0, 116.0]
2026-01-23 00:20:34,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (455.81) for latency DatasetOffice
2026-01-23 00:20:34,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 24 hours, 59 minutes, 27 seconds)
2026-01-23 00:35:56,761 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:35:56,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:28,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 441.94189 ± 70.862
2026-01-23 00:36:28,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [339.47244, 386.03265, 440.243, 547.2421, 577.21466, 438.3579, 389.7401, 436.48618, 386.2412, 478.38885]
2026-01-23 00:36:28,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [63.0, 70.0, 81.0, 101.0, 107.0, 81.0, 87.0, 81.0, 72.0, 97.0]
2026-01-23 00:36:28,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 59 minutes, 54 seconds)
2026-01-23 00:51:55,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:51:55,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:33,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 529.79468 ± 97.917
2026-01-23 00:52:33,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [585.5437, 563.6238, 563.2614, 379.33368, 489.36746, 488.1628, 405.43222, 498.8182, 581.5465, 742.857]
2026-01-23 00:52:33,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [108.0, 104.0, 112.0, 71.0, 91.0, 93.0, 75.0, 97.0, 108.0, 158.0]
2026-01-23 00:52:33,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (529.79) for latency DatasetOffice
2026-01-23 00:52:33,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 43 minutes, 30 seconds)
2026-01-23 01:07:58,687 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:07:58,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:36,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 527.00391 ± 101.627
2026-01-23 01:08:36,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [381.64893, 524.43896, 508.9102, 520.27057, 486.53873, 481.49722, 457.38428, 479.4365, 702.6644, 727.2489]
2026-01-23 01:08:36,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [83.0, 97.0, 95.0, 96.0, 93.0, 93.0, 86.0, 87.0, 139.0, 153.0]
2026-01-23 01:08:36,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 26 minutes, 26 seconds)
2026-01-23 01:23:51,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:23:51,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:32,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 561.04810 ± 109.966
2026-01-23 01:24:32,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [540.9385, 454.796, 586.8612, 415.9269, 499.9638, 732.6465, 576.8699, 439.33884, 753.0339, 610.10535]
2026-01-23 01:24:32,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [102.0, 93.0, 110.0, 75.0, 94.0, 144.0, 108.0, 78.0, 161.0, 117.0]
2026-01-23 01:24:32,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (561.05) for latency DatasetOffice
2026-01-23 01:24:32,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 11 minutes, 14 seconds)
2026-01-23 01:39:50,501 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:39:50,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:26,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 507.11639 ± 71.219
2026-01-23 01:40:26,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [473.82806, 422.6521, 590.9082, 497.16605, 646.9404, 471.1769, 481.41766, 555.37286, 529.5761, 402.12582]
2026-01-23 01:40:26,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [87.0, 82.0, 109.0, 94.0, 125.0, 86.0, 92.0, 105.0, 97.0, 74.0]
2026-01-23 01:40:26,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 57 minutes, 40 seconds)
2026-01-23 01:55:50,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:55:50,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:31,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 552.43976 ± 171.684
2026-01-23 01:56:31,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [436.58374, 371.1077, 445.94894, 380.18784, 788.23846, 688.68066, 793.96124, 353.37878, 738.71765, 527.59247]
2026-01-23 01:56:31,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [81.0, 80.0, 84.0, 84.0, 155.0, 133.0, 150.0, 75.0, 152.0, 113.0]
2026-01-23 01:56:31,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 44 minutes, 56 seconds)
2026-01-23 02:12:06,050 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:12:06,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:49,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 616.32483 ± 169.028
2026-01-23 02:12:49,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [448.96133, 576.52295, 634.4788, 933.5538, 382.03412, 801.9991, 736.93616, 491.35706, 706.73267, 450.6725]
2026-01-23 02:12:49,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [94.0, 109.0, 118.0, 175.0, 72.0, 147.0, 135.0, 92.0, 134.0, 83.0]
2026-01-23 02:12:49,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (616.32) for latency DatasetOffice
2026-01-23 02:12:49,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 32 minutes, 47 seconds)
2026-01-23 02:28:26,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:28:26,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:13,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 634.62146 ± 122.066
2026-01-23 02:29:13,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [813.3538, 667.8654, 506.963, 772.6113, 475.68384, 672.9692, 794.4438, 496.60153, 601.6681, 544.05505]
2026-01-23 02:29:13,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [151.0, 136.0, 106.0, 147.0, 106.0, 124.0, 148.0, 101.0, 113.0, 120.0]
2026-01-23 02:29:13,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (634.62) for latency DatasetOffice
2026-01-23 02:29:13,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 22 minutes, 32 seconds)
2026-01-23 02:44:31,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:44:31,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:45:11,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 573.75165 ± 117.262
2026-01-23 02:45:11,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [372.09146, 684.48425, 692.2416, 527.1667, 532.1183, 564.02014, 565.804, 789.5309, 577.3724, 432.68723]
2026-01-23 02:45:11,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [76.0, 128.0, 130.0, 97.0, 102.0, 102.0, 102.0, 150.0, 105.0, 78.0]
2026-01-23 02:45:11,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 23 hours, 7 minutes, 18 seconds)
2026-01-23 03:00:43,177 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:00:43,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:01:28,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 608.64563 ± 195.416
2026-01-23 03:01:28,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1034.7559, 815.7285, 339.55115, 703.6346, 483.98254, 420.07114, 535.2244, 625.5364, 476.75284, 651.2188]
2026-01-23 03:01:28,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [203.0, 173.0, 63.0, 132.0, 90.0, 77.0, 113.0, 133.0, 89.0, 128.0]
2026-01-23 03:01:28,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 57 minutes, 32 seconds)
2026-01-23 03:16:53,878 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:16:53,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:17:41,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 656.13983 ± 184.313
2026-01-23 03:17:41,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [503.6904, 586.6823, 436.2098, 716.26953, 603.2836, 469.97098, 809.18677, 974.578, 938.42505, 523.1018]
2026-01-23 03:17:41,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [97.0, 127.0, 83.0, 134.0, 116.0, 87.0, 171.0, 179.0, 170.0, 97.0]
2026-01-23 03:17:41,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (656.14) for latency DatasetOffice
2026-01-23 03:17:41,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 43 minutes, 23 seconds)
2026-01-23 03:33:03,023 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:33:03,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:05,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 853.68445 ± 345.158
2026-01-23 03:34:05,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1417.1909, 801.0113, 511.13416, 885.14355, 699.5024, 447.3517, 1567.0652, 799.0756, 766.01495, 643.3545]
2026-01-23 03:34:05,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [292.0, 154.0, 101.0, 187.0, 130.0, 97.0, 294.0, 148.0, 140.0, 122.0]
2026-01-23 03:34:05,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (853.68) for latency DatasetOffice
2026-01-23 03:34:05,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 29 minutes, 1 second)
2026-01-23 03:49:27,916 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:49:27,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:50:08,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 548.34290 ± 95.985
2026-01-23 03:50:08,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [541.4098, 624.90265, 595.7351, 633.60077, 428.05978, 474.52502, 702.4375, 580.66504, 370.2183, 531.8748]
2026-01-23 03:50:08,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [104.0, 128.0, 112.0, 120.0, 80.0, 94.0, 146.0, 123.0, 70.0, 103.0]
2026-01-23 03:50:08,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 22 hours, 6 minutes, 58 seconds)
2026-01-23 04:05:31,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:05:31,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:06:25,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 768.08966 ± 145.655
2026-01-23 04:06:25,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [639.5137, 839.2191, 548.226, 874.248, 933.2068, 761.24634, 660.392, 1043.1725, 737.9325, 643.73914]
2026-01-23 04:06:25,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [120.0, 179.0, 105.0, 161.0, 173.0, 142.0, 123.0, 192.0, 155.0, 120.0]
2026-01-23 04:06:25,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 55 minutes, 57 seconds)
2026-01-23 04:21:46,079 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:21:46,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:22:34,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 691.03967 ± 132.174
2026-01-23 04:22:34,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [648.4912, 574.686, 438.0487, 778.0862, 890.88617, 544.35834, 708.3342, 721.5153, 806.1853, 799.80475]
2026-01-23 04:22:34,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [120.0, 105.0, 86.0, 146.0, 163.0, 101.0, 132.0, 136.0, 147.0, 147.0]
2026-01-23 04:22:34,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 37 minutes, 33 seconds)
2026-01-23 04:37:47,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:37:47,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:38:49,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 898.36877 ± 286.179
2026-01-23 04:38:49,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [927.4724, 584.7866, 1157.7869, 691.49677, 973.81647, 1329.8916, 1173.3745, 314.26035, 929.7945, 901.0076]
2026-01-23 04:38:49,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [169.0, 110.0, 214.0, 128.0, 179.0, 247.0, 217.0, 60.0, 171.0, 171.0]
2026-01-23 04:38:49,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (898.37) for latency DatasetOffice
2026-01-23 04:38:49,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 22 minutes, 1 second)
2026-01-23 04:54:08,865 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:54:08,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:55:34,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1192.31238 ± 596.940
2026-01-23 04:55:34,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [955.0666, 878.43427, 2896.2415, 925.82495, 1043.2167, 1367.2288, 1275.2759, 990.00726, 697.72833, 894.09924]
2026-01-23 04:55:34,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [185.0, 167.0, 568.0, 176.0, 213.0, 256.0, 239.0, 192.0, 145.0, 168.0]
2026-01-23 04:55:34,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (1192.31) for latency DatasetOffice
2026-01-23 04:55:34,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 21 hours, 11 minutes, 4 seconds)
2026-01-23 05:10:44,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:10:44,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:12:01,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1093.89722 ± 273.085
2026-01-23 05:12:01,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1250.7156, 1540.3583, 788.1039, 703.64325, 1351.0902, 836.7756, 1020.0084, 1000.21844, 1442.7334, 1005.32495]
2026-01-23 05:12:01,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [231.0, 289.0, 153.0, 135.0, 253.0, 160.0, 193.0, 192.0, 268.0, 190.0]
2026-01-23 05:12:01,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 21 hours, 1 minute, 8 seconds)
2026-01-23 05:27:28,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:27:28,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:29:01,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1301.91968 ± 309.620
2026-01-23 05:29:01,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1282.8899, 1747.2753, 1352.579, 907.2536, 744.3214, 1427.5187, 1611.9535, 1013.8442, 1610.8953, 1320.6666]
2026-01-23 05:29:01,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [276.0, 321.0, 250.0, 167.0, 136.0, 274.0, 308.0, 197.0, 301.0, 264.0]
2026-01-23 05:29:01,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (1301.92) for latency DatasetOffice
2026-01-23 05:29:01,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 55 minutes, 22 seconds)
2026-01-23 05:44:20,804 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:44:20,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:45:44,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1188.21155 ± 305.034
2026-01-23 05:45:44,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1308.2723, 1048.9594, 1299.0819, 680.55725, 1192.5925, 1761.0605, 834.43604, 937.7489, 1488.6561, 1330.7505]
2026-01-23 05:45:44,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [249.0, 198.0, 244.0, 124.0, 229.0, 352.0, 159.0, 176.0, 288.0, 249.0]
2026-01-23 05:45:44,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 47 minutes, 35 seconds)
2026-01-23 06:01:01,430 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:01:01,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:02:26,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1173.55432 ± 224.593
2026-01-23 06:02:26,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1412.3617, 1498.9733, 1344.4307, 1013.05914, 1226.3779, 914.8049, 944.5824, 1393.8389, 1153.3085, 833.80457]
2026-01-23 06:02:26,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [271.0, 300.0, 267.0, 196.0, 230.0, 171.0, 182.0, 273.0, 228.0, 153.0]
2026-01-23 06:02:26,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 20 hours, 37 minutes, 37 seconds)
2026-01-23 06:17:39,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:17:39,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:18:46,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 987.07727 ± 290.394
2026-01-23 06:18:46,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [790.95166, 804.7079, 805.0697, 1398.7628, 711.1597, 1607.9972, 914.0429, 1072.001, 1074.2045, 691.8751]
2026-01-23 06:18:46,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [148.0, 153.0, 152.0, 255.0, 126.0, 304.0, 161.0, 197.0, 205.0, 124.0]
2026-01-23 06:18:46,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 20 hours, 14 minutes, 42 seconds)
2026-01-23 06:33:53,780 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:33:53,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:34:57,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 907.07458 ± 184.862
2026-01-23 06:34:57,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [996.3926, 1057.6752, 643.15875, 1024.0983, 877.5847, 735.1321, 868.4992, 1124.9941, 601.59827, 1141.6129]
2026-01-23 06:34:57,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [202.0, 194.0, 120.0, 188.0, 182.0, 135.0, 164.0, 210.0, 120.0, 213.0]
2026-01-23 06:34:57,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 54 minutes, 16 seconds)
2026-01-23 06:50:20,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:50:20,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:51:31,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 995.43390 ± 325.894
2026-01-23 06:51:31,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1350.1914, 890.64624, 708.62463, 924.3755, 1731.8206, 829.7552, 850.88403, 989.7563, 524.1594, 1154.1257]
2026-01-23 06:51:31,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [254.0, 176.0, 134.0, 169.0, 320.0, 161.0, 162.0, 189.0, 96.0, 250.0]
2026-01-23 06:51:31,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 31 minutes, 39 seconds)
2026-01-23 07:06:43,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:06:43,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:08:05,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1176.07898 ± 465.273
2026-01-23 07:08:05,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [993.862, 1005.0185, 1885.0911, 1009.4586, 875.09357, 950.8247, 2219.2322, 758.2079, 1283.224, 780.7771]
2026-01-23 07:08:05,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [177.0, 193.0, 370.0, 186.0, 167.0, 170.0, 401.0, 140.0, 243.0, 139.0]
2026-01-23 07:08:05,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 19 hours, 12 minutes, 59 seconds)
2026-01-23 07:23:36,569 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:23:36,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:25:19,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1390.93628 ± 348.110
2026-01-23 07:25:19,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1271.0865, 1197.6056, 952.29126, 2273.2986, 1239.2607, 1428.0155, 1113.2067, 1678.8949, 1331.392, 1424.3118]
2026-01-23 07:25:19,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [236.0, 236.0, 186.0, 436.0, 224.0, 288.0, 215.0, 348.0, 253.0, 303.0]
2026-01-23 07:25:19,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (1390.94) for latency DatasetOffice
2026-01-23 07:25:19,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 19 hours, 3 minutes, 47 seconds)
2026-01-23 07:40:45,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:40:45,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:42:24,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1396.21082 ± 800.369
2026-01-23 07:42:24,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [550.77545, 980.03546, 635.2201, 1596.9958, 2729.4321, 863.1091, 2949.7092, 1570.2465, 1311.7145, 774.8702]
2026-01-23 07:42:24,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [100.0, 199.0, 121.0, 283.0, 511.0, 161.0, 549.0, 298.0, 233.0, 143.0]
2026-01-23 07:42:24,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (1396.21) for latency DatasetOffice
2026-01-23 07:42:24,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 57 minutes, 21 seconds)
2026-01-23 07:58:04,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:58:04,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:59:26,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1126.57324 ± 408.562
2026-01-23 07:59:26,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1553.3135, 971.638, 1473.4867, 1100.1742, 474.69122, 814.6707, 1049.4292, 1022.4691, 1973.1498, 832.71027]
2026-01-23 07:59:26,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [319.0, 172.0, 263.0, 221.0, 96.0, 156.0, 197.0, 204.0, 355.0, 170.0]
2026-01-23 07:59:26,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 52 minutes, 1 second)
2026-01-23 08:14:59,153 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:14:59,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:17:08,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1785.95239 ± 817.638
2026-01-23 08:17:08,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1145.2037, 2075.6501, 1334.9844, 1371.4652, 2607.637, 3829.66, 1472.7026, 1765.6736, 1119.9287, 1136.6174]
2026-01-23 08:17:08,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [208.0, 386.0, 242.0, 256.0, 517.0, 733.0, 284.0, 320.0, 214.0, 217.0]
2026-01-23 08:17:08,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (1785.95) for latency DatasetOffice
2026-01-23 08:17:08,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 50 minutes, 6 seconds)
2026-01-23 08:32:22,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:32:22,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:34:04,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1473.56226 ± 373.616
2026-01-23 08:34:04,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1734.9174, 981.55634, 1193.3074, 872.4247, 2015.7401, 1647.854, 1623.1614, 1922.59, 1567.801, 1176.271]
2026-01-23 08:34:04,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [313.0, 179.0, 237.0, 155.0, 380.0, 301.0, 301.0, 335.0, 282.0, 214.0]
2026-01-23 08:34:04,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 18 hours, 37 minutes, 46 seconds)
2026-01-23 08:49:39,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:49:39,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:51:31,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1506.53430 ± 865.236
2026-01-23 08:51:31,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [692.0471, 528.30676, 3310.4302, 2536.6458, 1873.0902, 1008.99146, 1960.4573, 578.7383, 1229.7814, 1346.8542]
2026-01-23 08:51:31,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [140.0, 114.0, 662.0, 490.0, 355.0, 223.0, 390.0, 108.0, 244.0, 253.0]
2026-01-23 08:51:31,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 18 hours, 23 minutes, 23 seconds)
2026-01-23 09:07:05,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:07:05,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:09:45,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2265.71997 ± 1369.722
2026-01-23 09:09:45,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [941.8071, 1344.4927, 5469.814, 3883.4363, 756.8134, 2654.9844, 1955.6671, 1842.6847, 2298.2708, 1509.2288]
2026-01-23 09:09:45,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [175.0, 251.0, 1000.0, 726.0, 154.0, 476.0, 371.0, 339.0, 403.0, 272.0]
2026-01-23 09:09:45,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (2265.72) for latency DatasetOffice
2026-01-23 09:09:45,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 18 hours, 20 minutes, 38 seconds)
2026-01-23 09:25:15,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:25:15,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:29:22,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3355.75317 ± 1377.378
2026-01-23 09:29:22,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1976.2076, 2468.7517, 5308.0903, 5253.1543, 5263.5054, 2610.4963, 3243.8752, 1433.8326, 2401.7222, 3597.896]
2026-01-23 09:29:22,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [384.0, 501.0, 1000.0, 1000.0, 1000.0, 526.0, 610.0, 283.0, 449.0, 671.0]
2026-01-23 09:29:22,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (3355.75) for latency DatasetOffice
2026-01-23 09:29:22,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 18 hours, 35 minutes, 11 seconds)
2026-01-23 09:45:24,462 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:45:24,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:48:46,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2819.55640 ± 1460.870
2026-01-23 09:48:46,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [2580.3862, 2201.6816, 5462.3267, 5361.136, 3025.649, 2094.9182, 1601.272, 2954.9688, 2371.2732, 541.9504]
2026-01-23 09:48:46,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [494.0, 401.0, 1000.0, 1000.0, 554.0, 383.0, 295.0, 549.0, 458.0, 104.0]
2026-01-23 09:48:46,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 18 hours, 37 minutes, 53 seconds)
2026-01-23 10:03:46,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:03:46,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:07:59,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3430.38037 ± 1509.625
2026-01-23 10:07:59,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1795.1307, 1057.6373, 2237.6711, 3887.091, 2466.226, 4391.7383, 2613.2705, 5241.486, 5381.8735, 5231.6797]
2026-01-23 10:07:59,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [353.0, 217.0, 412.0, 717.0, 466.0, 838.0, 518.0, 1000.0, 1000.0, 1000.0]
2026-01-23 10:07:59,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (3430.38) for latency DatasetOffice
2026-01-23 10:07:59,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 18 hours, 46 minutes, 56 seconds)
2026-01-23 10:24:15,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:24:15,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:29:00,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3953.87256 ± 1509.917
2026-01-23 10:29:00,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [2685.8726, 5432.493, 3051.9546, 1973.9502, 5469.9316, 5463.723, 5318.7544, 2566.9846, 5497.9434, 2077.1177]
2026-01-23 10:29:00,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [497.0, 1000.0, 588.0, 374.0, 1000.0, 1000.0, 1000.0, 481.0, 1000.0, 401.0]
2026-01-23 10:29:00,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (3953.87) for latency DatasetOffice
2026-01-23 10:29:00,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 19 hours, 10 minutes, 10 seconds)
2026-01-23 10:45:02,198 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:45:02,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:47:11,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1785.72327 ± 1413.509
2026-01-23 10:47:11,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [675.5476, 2055.366, 661.8341, 1784.3558, 1959.4304, 1039.0757, 5372.2485, 2984.1594, 693.4784, 631.73584]
2026-01-23 10:47:11,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [127.0, 375.0, 138.0, 328.0, 369.0, 193.0, 1000.0, 559.0, 120.0, 124.0]
2026-01-23 10:47:11,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 18 hours, 50 minutes, 12 seconds)
2026-01-23 11:03:42,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:03:42,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:06:14,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2194.23218 ± 813.204
2026-01-23 11:06:14,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [2012.9393, 3514.6348, 2811.1418, 914.0924, 3278.1628, 1340.6995, 1391.89, 1915.851, 2629.0107, 2133.9]
2026-01-23 11:06:14,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [370.0, 618.0, 503.0, 164.0, 571.0, 273.0, 256.0, 338.0, 475.0, 377.0]
2026-01-23 11:06:14,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 18 hours, 24 minutes, 12 seconds)
2026-01-23 11:21:35,771 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:21:35,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:25:58,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3730.27539 ± 1734.769
2026-01-23 11:25:58,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5036.0474, 2311.3982, 1473.2817, 5369.407, 548.42413, 4905.803, 5527.441, 5544.998, 2989.933, 3596.02]
2026-01-23 11:25:58,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [911.0, 424.0, 270.0, 1000.0, 100.0, 897.0, 1000.0, 1000.0, 557.0, 648.0]
2026-01-23 11:25:58,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 18 hours, 8 minutes, 36 seconds)
2026-01-23 11:42:35,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:42:35,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:47:11,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3870.82739 ± 1575.383
2026-01-23 11:47:11,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1722.5938, 5498.971, 5550.4736, 4229.798, 3611.8806, 3029.5374, 5507.2437, 5562.8984, 1201.3414, 2793.5347]
2026-01-23 11:47:11,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [318.0, 1000.0, 1000.0, 778.0, 653.0, 544.0, 1000.0, 1000.0, 229.0, 491.0]
2026-01-23 11:47:11,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 18 hours, 11 minutes, 6 seconds)
2026-01-23 12:02:01,397 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:02:01,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:07:54,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4923.80469 ± 1062.057
2026-01-23 12:07:54,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5537.947, 2198.5479, 5447.64, 5380.173, 5485.412, 5364.734, 5493.568, 5391.0605, 3595.6433, 5343.3184]
2026-01-23 12:07:54,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 424.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 655.0, 1000.0]
2026-01-23 12:07:54,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (4923.80) for latency DatasetOffice
2026-01-23 12:07:54,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 17 hours, 48 minutes, 10 seconds)
2026-01-23 12:25:01,879 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:25:01,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:31:00,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4760.25830 ± 895.251
2026-01-23 12:31:00,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5248.6978, 5168.8584, 2748.431, 5159.9937, 5284.5527, 5227.7295, 5225.09, 5196.557, 3220.4192, 5122.254]
2026-01-23 12:31:00,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 531.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 622.0, 1000.0]
2026-01-23 12:31:00,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 18 hours, 20 minutes, 31 seconds)
2026-01-23 12:45:36,751 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:45:36,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:49:07,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2932.09229 ± 2012.468
2026-01-23 12:49:07,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [2099.0867, 5608.389, 5525.8926, 1037.8586, 2938.3972, 5589.0537, 586.36566, 428.4661, 1443.3297, 4064.0847]
2026-01-23 12:49:07,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [388.0, 1000.0, 1000.0, 203.0, 545.0, 1000.0, 105.0, 95.0, 274.0, 718.0]
2026-01-23 12:49:07,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 17 hours, 49 minutes, 58 seconds)
2026-01-23 13:06:22,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:06:22,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:12:38,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5301.25195 ± 618.621
2026-01-23 13:12:38,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5469.4897, 5603.4136, 5358.237, 5539.503, 3454.7898, 5535.2827, 5510.4917, 5494.72, 5563.845, 5482.746]
2026-01-23 13:12:38,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 634.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 13:12:38,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (5301.25) for latency DatasetOffice
2026-01-23 13:12:38,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 18 hours, 7 minutes, 56 seconds)
2026-01-23 13:28:20,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:28:20,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:33:50,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4767.15381 ± 1356.297
2026-01-23 13:33:50,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5644.1123, 5607.6924, 1465.876, 5582.231, 3837.171, 5316.7236, 5487.478, 5656.986, 3359.085, 5714.181]
2026-01-23 13:33:50,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 259.0, 1000.0, 696.0, 952.0, 1000.0, 1000.0, 602.0, 1000.0]
2026-01-23 13:33:50,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 17 hours, 46 minutes, 32 seconds)
2026-01-23 13:49:22,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:49:22,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:53:10,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3334.09253 ± 2193.156
2026-01-23 13:53:10,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [3151.881, 5732.198, 3977.7393, 5655.5786, 1990.0985, 615.91187, 603.8454, 312.47485, 5648.967, 5652.23]
2026-01-23 13:53:10,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [564.0, 1000.0, 698.0, 1000.0, 359.0, 120.0, 112.0, 58.0, 1000.0, 1000.0]
2026-01-23 13:53:10,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 17 hours, 11 minutes, 37 seconds)
2026-01-23 14:09:05,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:09:05,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:14:32,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4433.10840 ± 1801.726
2026-01-23 14:14:32,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [760.6461, 5360.1177, 5302.758, 904.59454, 5264.836, 5267.507, 5226.954, 5393.7886, 5442.187, 5407.697]
2026-01-23 14:14:32,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [147.0, 1000.0, 1000.0, 182.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 14:14:32,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 16 hours, 33 minutes, 56 seconds)
2026-01-23 14:30:25,190 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:30:25,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:37:02,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5438.69434 ± 96.830
2026-01-23 14:37:02,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5492.61, 5390.528, 5388.663, 5561.357, 5511.3247, 5489.7275, 5473.4556, 5461.025, 5191.1816, 5427.078]
2026-01-23 14:37:02,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 976.0, 1000.0]
2026-01-23 14:37:02,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (5438.69) for latency DatasetOffice
2026-01-23 14:37:02,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 16 hours, 54 minutes, 23 seconds)
2026-01-23 14:53:07,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:53:07,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:57:16,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3446.48560 ± 1945.808
2026-01-23 14:57:16,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1523.3499, 5463.7856, 3941.0054, 3373.1453, 155.6997, 2025.9574, 1422.1245, 5540.6787, 5524.1445, 5494.9653]
2026-01-23 14:57:16,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [296.0, 1000.0, 714.0, 616.0, 30.0, 368.0, 248.0, 1000.0, 1000.0, 1000.0]
2026-01-23 14:57:16,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 16 hours, 2 minutes, 46 seconds)
2026-01-23 15:13:44,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:13:44,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:20:00,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5140.70215 ± 725.672
2026-01-23 15:20:00,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5359.8887, 5380.8774, 5383.574, 5342.081, 5378.956, 5379.551, 2967.2866, 5449.191, 5456.2446, 5309.367]
2026-01-23 15:20:00,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 541.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:20:00,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 15 hours, 55 minutes, 29 seconds)
2026-01-23 15:34:49,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:34:49,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:40:49,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5126.80811 ± 1031.273
2026-01-23 15:40:49,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5431.7866, 5546.252, 5419.6904, 2035.8, 5505.276, 5488.6094, 5509.1787, 5496.406, 5427.124, 5407.9575]
2026-01-23 15:40:49,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 368.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 15:40:49,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 15 hours, 47 minutes, 19 seconds)
2026-01-23 15:57:07,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:57:07,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:03:29,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5525.09375 ± 192.578
2026-01-23 16:03:29,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5558.106, 5610.402, 5657.9985, 4967.169, 5579.0747, 5685.286, 5532.2656, 5572.6787, 5575.1743, 5512.783]
2026-01-23 16:03:29,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 899.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:03:29,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (5525.09) for latency DatasetOffice
2026-01-23 16:03:29,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 15 hours, 36 minutes, 53 seconds)
2026-01-23 16:20:02,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:20:02,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:26:01,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5120.83301 ± 822.813
2026-01-23 16:26:01,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5519.0234, 3952.2705, 5486.414, 5512.603, 5494.849, 5515.908, 5555.8633, 3091.622, 5589.1504, 5490.6235]
2026-01-23 16:26:01,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 725.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 582.0, 1000.0, 1000.0]
2026-01-23 16:26:01,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 15 hours, 15 minutes, 30 seconds)
2026-01-23 16:41:18,358 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:41:18,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:45:13,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3379.24414 ± 2243.014
2026-01-23 16:45:13,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [2256.9165, 5555.9297, 2142.0283, 482.5142, 731.35516, 443.82727, 5519.019, 5551.4087, 5527.417, 5582.022]
2026-01-23 16:45:13,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [412.0, 1000.0, 388.0, 97.0, 132.0, 82.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 16:45:13,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 14 hours, 45 minutes, 10 seconds)
2026-01-23 17:01:58,429 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:01:58,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:07:43,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4914.37109 ± 1369.903
2026-01-23 17:07:43,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5540.0903, 1094.4528, 3833.6511, 5495.0947, 5559.237, 5522.7812, 5485.495, 5524.4414, 5546.22, 5542.25]
2026-01-23 17:07:43,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 218.0, 713.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:07:43,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 14 hours, 21 minutes, 48 seconds)
2026-01-23 17:22:31,084 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:22:31,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:28:05,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5017.40430 ± 1233.599
2026-01-23 17:28:05,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5736.349, 5726.2607, 5672.519, 1907.7601, 5699.3228, 3984.2878, 5856.3564, 4112.6147, 5772.324, 5706.2485]
2026-01-23 17:28:05,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 343.0, 1000.0, 688.0, 1000.0, 683.0, 1000.0, 1000.0]
2026-01-23 17:28:05,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 13 hours, 56 minutes, 38 seconds)
2026-01-23 17:44:55,929 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:44:55,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:49:07,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3482.27979 ± 2394.568
2026-01-23 17:49:07,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5352.1914, 963.7888, 974.21655, 157.06548, 170.8899, 5414.008, 5404.4453, 5443.9375, 5503.049, 5439.2056]
2026-01-23 17:49:07,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 174.0, 193.0, 30.0, 33.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 17:49:07,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 13 hours, 22 minutes, 48 seconds)
2026-01-23 18:05:17,390 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:05:17,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:11:45,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5581.81445 ± 18.050
2026-01-23 18:11:45,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5594.0073, 5583.49, 5581.053, 5585.8604, 5610.5996, 5603.3174, 5574.07, 5556.3516, 5548.82, 5580.5737]
2026-01-23 18:11:45,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 18:11:45,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (5581.81) for latency DatasetOffice
2026-01-23 18:11:45,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 13 hours, 2 minutes, 27 seconds)
2026-01-23 18:27:10,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:27:10,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:32:55,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4912.32129 ± 1365.597
2026-01-23 18:32:55,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5552.0425, 5476.551, 5487.6465, 3527.1443, 5596.623, 5526.163, 5548.4375, 5620.822, 5551.2397, 1236.5447]
2026-01-23 18:32:55,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 647.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 216.0]
2026-01-23 18:32:55,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 12 hours, 55 minutes, 23 seconds)
2026-01-23 18:47:57,701 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:47:57,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:53:55,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5120.02832 ± 1260.102
2026-01-23 18:53:55,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1341.6757, 5506.5005, 5519.093, 5578.626, 5499.0117, 5514.5625, 5604.741, 5568.911, 5479.1377, 5588.0234]
2026-01-23 18:53:55,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [256.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 18:53:55,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 12 hours, 23 minutes, 22 seconds)
2026-01-23 19:09:41,612 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:09:41,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:16:01,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5380.29395 ± 282.730
2026-01-23 19:16:01,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [4538.957, 5463.9053, 5564.9287, 5474.902, 5427.6655, 5439.7744, 5497.583, 5451.3633, 5464.8086, 5479.0493]
2026-01-23 19:16:01,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [838.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 19:16:01,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 12 hours, 13 minutes, 58 seconds)
2026-01-23 19:31:51,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:31:51,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:36:48,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4202.18262 ± 1984.157
2026-01-23 19:36:48,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [1194.07, 5503.2554, 5455.2188, 5483.347, 5478.683, 5426.7803, 5514.443, 5527.341, 1931.3602, 507.32703]
2026-01-23 19:36:48,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [222.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 360.0, 97.0]
2026-01-23 19:36:48,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 11 hours, 50 minutes, 46 seconds)
2026-01-23 19:52:53,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:52:53,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:59:10,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5328.39160 ± 365.719
2026-01-23 19:59:10,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5425.7627, 5469.7812, 5386.7827, 5412.025, 4235.5615, 5466.601, 5491.8545, 5460.211, 5441.6313, 5493.7075]
2026-01-23 19:59:10,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 774.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 19:59:10,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 11 hours, 27 minutes, 25 seconds)
2026-01-23 20:15:06,070 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:15:06,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:21:13,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5252.92432 ± 542.374
2026-01-23 20:21:13,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5470.068, 5515.257, 5504.3105, 5601.819, 5438.3525, 5515.805, 5519.519, 5553.1553, 4506.9927, 3903.9653]
2026-01-23 20:21:13,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 832.0, 698.0]
2026-01-23 20:21:13,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 11 hours, 11 minutes, 27 seconds)
2026-01-23 20:37:40,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:37:40,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:40:47,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2751.09473 ± 2145.321
2026-01-23 20:40:47,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [4777.688, 5734.8154, 3934.1086, 3477.6973, 140.39912, 2049.1997, 485.84158, 361.0172, 786.5684, 5763.613]
2026-01-23 20:40:47,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [856.0, 1000.0, 708.0, 618.0, 27.0, 371.0, 105.0, 70.0, 145.0, 1000.0]
2026-01-23 20:40:47,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 10 hours, 41 minutes, 13 seconds)
2026-01-23 20:56:42,427 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:56:42,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:03:01,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5522.16113 ± 478.615
2026-01-23 21:03:01,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5769.99, 5702.39, 5648.6934, 5665.918, 5714.2944, 5695.7856, 5567.262, 5653.4585, 4094.323, 5709.4937]
2026-01-23 21:03:01,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 726.0, 1000.0]
2026-01-23 21:03:01,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 10 hours, 20 minutes, 36 seconds)
2026-01-23 21:19:36,103 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:19:36,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:25:34,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5086.23975 ± 1210.475
2026-01-23 21:25:34,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5561.0693, 5525.0806, 5546.4907, 5404.849, 1457.2723, 5497.3745, 5466.7466, 5444.455, 5490.0586, 5469.0044]
2026-01-23 21:25:34,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 267.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 21:25:34,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 10 hours, 9 minutes, 2 seconds)
2026-01-23 21:40:01,887 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:40:01,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:46:31,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5705.16113 ± 23.091
2026-01-23 21:46:31,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5671.789, 5722.376, 5687.818, 5735.7617, 5667.109, 5730.6133, 5712.674, 5706.4175, 5692.583, 5724.47]
2026-01-23 21:46:31,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 21:46:31,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (5705.16) for latency DatasetOffice
2026-01-23 21:46:31,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 9 hours, 39 minutes, 41 seconds)
2026-01-23 22:02:37,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:02:37,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:08:58,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5525.67285 ± 353.893
2026-01-23 22:08:58,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5594.046, 5514.295, 5705.95, 5713.474, 5708.5654, 5582.0996, 5675.8604, 4479.8325, 5648.907, 5633.6978]
2026-01-23 22:08:58,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 801.0, 1000.0, 1000.0]
2026-01-23 22:08:58,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 9 hours, 20 minutes, 18 seconds)
2026-01-23 22:25:16,263 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:25:16,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:30:35,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4695.21094 ± 1900.298
2026-01-23 22:30:35,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5769.026, 5720.224, 5615.139, 962.9252, 847.9777, 5695.1143, 5802.2153, 5592.9507, 5263.2603, 5683.2764]
2026-01-23 22:30:35,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 174.0, 157.0, 1000.0, 1000.0, 1000.0, 931.0, 1000.0]
2026-01-23 22:30:35,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 9 hours, 8 minutes, 57 seconds)
2026-01-23 22:46:31,097 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:46:31,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:52:58,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5428.69580 ± 44.013
2026-01-23 22:52:58,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5452.2075, 5477.7344, 5393.006, 5471.871, 5348.495, 5388.0483, 5474.8115, 5456.4966, 5441.671, 5382.614]
2026-01-23 22:52:58,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:52:58,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 8 hours, 47 minutes, 42 seconds)
2026-01-23 23:08:17,490 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:08:17,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:14:22,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5487.39307 ± 576.151
2026-01-23 23:14:22,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5624.877, 5605.779, 5691.1577, 5622.1265, 5620.1694, 5707.667, 5764.5493, 3766.7402, 5746.7764, 5724.084]
2026-01-23 23:14:22,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 667.0, 1000.0, 1000.0]
2026-01-23 23:14:22,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 8 hours, 20 minutes, 29 seconds)
2026-01-23 23:29:58,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:29:58,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:36:16,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5482.80566 ± 56.201
2026-01-23 23:36:16,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5486.3506, 5544.2583, 5481.1587, 5427.6284, 5519.896, 5409.664, 5573.772, 5458.534, 5398.565, 5528.2256]
2026-01-23 23:36:16,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 23:36:17,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 8 hours, 2 minutes, 56 seconds)
2026-01-23 23:51:43,824 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:51:43,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:58:06,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5681.89551 ± 66.652
2026-01-23 23:58:06,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5710.4277, 5653.417, 5620.9033, 5682.476, 5821.254, 5702.6, 5677.234, 5551.1016, 5672.738, 5726.8037]
2026-01-23 23:58:06,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 23:58:06,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 7 hours, 38 minutes, 20 seconds)
2026-01-24 00:14:21,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:14:21,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:17:34,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2846.01123 ± 2278.987
2026-01-24 00:17:34,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5485.658, 5536.106, 934.5941, 5555.0215, 525.7467, 5498.6206, 2928.223, 495.8034, 695.8168, 804.5216]
2026-01-24 00:17:34,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 172.0, 1000.0, 101.0, 1000.0, 527.0, 84.0, 125.0, 156.0]
2026-01-24 00:17:34,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 7 hours, 7 minutes, 57 seconds)
2026-01-24 00:32:42,605 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:32:42,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:38:25,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4953.56738 ± 1460.680
2026-01-24 00:38:25,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [4886.6724, 5477.4775, 5515.2705, 5465.792, 5561.1875, 5555.4683, 5460.0967, 5386.7075, 610.337, 5616.6665]
2026-01-24 00:38:25,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [915.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 126.0, 1000.0]
2026-01-24 00:38:25,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 6 hours, 40 minutes, 44 seconds)
2026-01-24 00:53:31,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:53:31,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:59:31,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5408.31641 ± 793.320
2026-01-24 00:59:31,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5615.949, 5706.184, 5768.072, 5699.684, 5652.475, 5692.458, 5617.4624, 3032.6523, 5691.3975, 5606.8286]
2026-01-24 00:59:31,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 555.0, 1000.0, 1000.0]
2026-01-24 00:59:31,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 6 hours, 18 minutes, 34 seconds)
2026-01-24 01:15:05,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:15:05,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:20:11,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4450.97119 ± 1962.782
2026-01-24 01:20:11,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5478.794, 5508.2124, 5506.076, 5571.185, 4758.2314, 771.5728, 341.16583, 5525.4404, 5540.0605, 5508.9683]
2026-01-24 01:20:11,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 877.0, 148.0, 64.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:20:11,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 53 minutes, 18 seconds)
2026-01-24 01:35:36,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:35:36,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:41:46,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5460.74951 ± 453.453
2026-01-24 01:41:46,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5598.879, 4107.1855, 5633.343, 5538.272, 5638.139, 5688.2314, 5563.76, 5672.609, 5570.9126, 5596.1597]
2026-01-24 01:41:46,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 728.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 01:41:46,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 31 minutes, 44 seconds)
2026-01-24 01:57:12,482 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 01:57:12,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:02:41,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4783.88184 ± 1527.683
2026-01-24 02:02:41,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5454.4863, 5447.7744, 650.419, 5457.4707, 5495.2935, 5432.6855, 3279.2852, 5530.5464, 5535.983, 5554.875]
2026-01-24 02:02:41,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 129.0, 1000.0, 1000.0, 1000.0, 607.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:02:41,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 5 hours, 15 minutes, 21 seconds)
2026-01-24 02:18:23,563 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:18:23,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:22:35,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3844.76880 ± 1641.330
2026-01-24 02:22:35,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [3263.8115, 3784.1433, 2481.8079, 1383.2108, 5796.2524, 2723.6763, 5632.33, 5671.518, 5737.4595, 1973.4778]
2026-01-24 02:22:35,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [571.0, 665.0, 428.0, 236.0, 1000.0, 492.0, 1000.0, 1000.0, 1000.0, 343.0]
2026-01-24 02:22:35,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 51 minutes, 39 seconds)
2026-01-24 02:38:03,395 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:38:03,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 02:44:17,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5471.48584 ± 250.496
2026-01-24 02:44:17,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5532.629, 5629.7153, 5533.412, 5649.8877, 4737.313, 5580.6484, 5517.0923, 5531.468, 5453.084, 5549.6104]
2026-01-24 02:44:17,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 864.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 02:44:17,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 4 hours, 32 minutes, 22 seconds)
2026-01-24 02:58:52,671 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 02:58:52,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 03:04:45,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5244.95605 ± 1198.293
2026-01-24 03:04:45,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5691.4644, 1652.421, 5669.3633, 5665.4043, 5631.498, 5655.7383, 5548.3184, 5668.2397, 5583.0815, 5684.037]
2026-01-24 03:04:45,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 293.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 03:04:45,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 4 hours, 10 minutes, 56 seconds)
2026-01-24 03:20:22,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 03:20:22,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 03:26:42,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5618.27637 ± 78.534
2026-01-24 03:26:42,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5655.3584, 5655.3687, 5699.6553, 5605.056, 5432.853, 5640.236, 5682.243, 5654.8374, 5512.6885, 5644.4707]
2026-01-24 03:26:42,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 03:26:42,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 50 minutes, 51 seconds)
2026-01-24 03:42:13,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 03:42:13,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 03:48:36,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5684.22119 ± 33.151
2026-01-24 03:48:36,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5693.197, 5672.8594, 5674.923, 5729.25, 5734.3955, 5679.8887, 5608.7383, 5684.1455, 5667.8145, 5696.999]
2026-01-24 03:48:36,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 03:48:36,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 31 minutes, 49 seconds)
2026-01-24 04:04:58,295 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 04:04:58,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 04:08:52,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3294.19604 ± 2086.815
2026-01-24 04:08:52,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5393.4893, 5349.747, 5561.339, 5399.3584, 3477.2615, 3267.2405, 171.62378, 302.29547, 3295.9824, 723.6234]
2026-01-24 04:08:52,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 643.0, 616.0, 33.0, 57.0, 589.0, 159.0]
2026-01-24 04:08:52,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 3 hours, 11 minutes, 17 seconds)
2026-01-24 04:23:28,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 04:23:28,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 04:29:52,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5839.00537 ± 64.120
2026-01-24 04:29:52,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5807.2505, 5873.696, 5878.419, 5727.789, 5766.4424, 5918.771, 5819.459, 5861.418, 5941.6826, 5795.13]
2026-01-24 04:29:52,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 04:29:52,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1274 [INFO]: New best (5839.01) for latency DatasetOffice
2026-01-24 04:29:52,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 48 minutes, 56 seconds)
2026-01-24 04:45:02,934 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 04:45:02,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 04:50:53,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5092.17871 ± 1388.194
2026-01-24 04:50:53,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5504.065, 5529.525, 5617.0156, 5576.1587, 5481.1797, 5559.1855, 929.83026, 5519.571, 5571.9326, 5633.319]
2026-01-24 04:50:53,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 170.0, 1000.0, 1000.0, 1000.0]
2026-01-24 04:50:53,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 28 minutes, 34 seconds)
2026-01-24 05:06:34,032 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:06:34,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 05:12:51,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5717.75684 ± 40.988
2026-01-24 05:12:51,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5763.5054, 5754.806, 5738.325, 5646.023, 5777.07, 5688.0107, 5671.3057, 5685.738, 5722.4526, 5730.3315]
2026-01-24 05:12:51,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 05:12:51,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 2 hours, 7 minutes, 23 seconds)
2026-01-24 05:28:22,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:28:22,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 05:34:43,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5628.13281 ± 46.663
2026-01-24 05:34:43,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5643.9346, 5636.601, 5621.1235, 5668.6235, 5597.0527, 5639.582, 5503.7905, 5669.9805, 5665.65, 5634.9937]
2026-01-24 05:34:43,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 05:34:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 46 minutes, 6 seconds)
2026-01-24 05:51:00,445 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 05:51:00,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 05:55:28,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4024.09888 ± 2255.385
2026-01-24 05:55:28,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5819.896, 1017.84186, 5661.8413, 5616.499, 3623.3232, 504.72873, 632.3096, 5799.421, 5772.9385, 5792.1914]
2026-01-24 05:55:28,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 184.0, 1000.0, 1000.0, 630.0, 93.0, 132.0, 1000.0, 1000.0, 1000.0]
2026-01-24 05:55:28,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 25 minutes, 17 seconds)
2026-01-24 06:10:55,204 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 06:10:55,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 06:17:17,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5463.57910 ± 65.212
2026-01-24 06:17:17,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5363.6196, 5497.6167, 5551.447, 5488.9067, 5485.0186, 5420.6997, 5352.1685, 5445.0903, 5551.2915, 5479.9326]
2026-01-24 06:17:17,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 06:17:17,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 1 hour, 4 minutes, 26 seconds)
2026-01-24 06:32:29,267 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 06:32:29,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 06:38:44,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5692.62451 ± 24.858
2026-01-24 06:38:44,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5673.578, 5728.647, 5720.609, 5687.093, 5642.501, 5692.3286, 5695.1475, 5719.185, 5696.17, 5670.987]
2026-01-24 06:38:44,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 06:38:44,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 43 minutes, 8 seconds)
2026-01-24 06:54:02,390 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 06:54:02,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 06:58:25,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3950.62939 ± 2326.099
2026-01-24 06:58:25,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [3568.1611, 355.30722, 313.137, 950.4666, 5703.728, 5686.6514, 5833.4536, 5757.7896, 5658.8335, 5678.7646]
2026-01-24 06:58:25,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [622.0, 64.0, 57.0, 183.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 06:58:25,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 21 minutes, 6 seconds)
2026-01-24 07:15:00,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 07:15:00,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 07:21:18,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5709.40967 ± 33.397
2026-01-24 07:21:18,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1270 [DEBUG]: All rewards: [5719.3247, 5723.631, 5723.7793, 5752.7466, 5704.5464, 5662.5425, 5752.6475, 5712.9336, 5700.2295, 5641.7183]
2026-01-24 07:21:18,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-24 07:21:18,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1299 [DEBUG]: Training session finished
