2025-09-11 19:24:43,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc5-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-09-11 19:24:43,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc5-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-09-11 19:24:43,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x154ec7894550>}
2025-09-11 19:24:43,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1111 [DEBUG]: using device: cuda
2025-09-11 19:24:43,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1133 [INFO]: Creating new trainer
2025-09-11 19:24:43,975 baseline-mbpac-noiseperc5-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-11 19:24:43,975 baseline-mbpac-noiseperc5-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:24:43,986 baseline-mbpac-noiseperc5-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 19:24:45,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1194 [DEBUG]: Starting training session...
2025-09-11 19:24:45,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 1/100
2025-09-11 19:37:42,952 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:37:42,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:38:01,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 325.81339 ± 76.906
2025-09-11 19:38:01,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [308.45, 271.06058, 472.58224, 229.48895, 362.73578, 267.92322, 254.62753, 365.4316, 289.0214, 436.81256]
2025-09-11 19:38:01,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 58.0, 88.0, 53.0, 66.0, 49.0, 46.0, 67.0, 53.0, 86.0]
2025-09-11 19:38:01,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (325.81) for latency ExtremeClogL1U23
2025-09-11 19:38:01,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 53 minutes, 13 seconds)
2025-09-11 19:52:28,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:52:28,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:52:47,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 344.83527 ± 99.252
2025-09-11 19:52:47,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [280.41028, 147.31319, 402.44553, 492.64362, 358.94073, 504.42023, 309.5521, 343.62622, 306.7843, 302.21634]
2025-09-11 19:52:47,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 28.0, 76.0, 106.0, 67.0, 96.0, 58.0, 64.0, 58.0, 57.0]
2025-09-11 19:52:47,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (344.84) for latency ExtremeClogL1U23
2025-09-11 19:52:47,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 22 hours, 54 minutes, 13 seconds)
2025-09-11 20:07:23,685 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:07:23,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:07:52,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 493.91406 ± 115.980
2025-09-11 20:07:52,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [714.13477, 373.80792, 555.11774, 585.2493, 542.8331, 518.1329, 437.2402, 542.6673, 320.5957, 349.36182]
2025-09-11 20:07:52,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 80.0, 114.0, 112.0, 102.0, 112.0, 82.0, 104.0, 60.0, 65.0]
2025-09-11 20:07:52,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (493.91) for latency ExtremeClogL1U23
2025-09-11 20:07:52,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 23 hours, 14 minutes, 3 seconds)
2025-09-11 20:22:38,314 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:22:38,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:22:59,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 377.15704 ± 43.001
2025-09-11 20:22:59,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [391.40735, 375.86374, 366.15558, 399.32263, 340.76392, 298.59674, 353.00534, 451.60895, 437.60742, 357.2389]
2025-09-11 20:22:59,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 70.0, 68.0, 73.0, 62.0, 55.0, 65.0, 85.0, 85.0, 78.0]
2025-09-11 20:22:59,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 23 hours, 17 minutes, 31 seconds)
2025-09-11 20:37:47,314 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:37:47,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:38:12,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 449.70709 ± 86.136
2025-09-11 20:38:12,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [313.63147, 532.3021, 589.0857, 350.67825, 518.0177, 484.3028, 486.82822, 412.4064, 345.8511, 463.9669]
2025-09-11 20:38:12,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 101.0, 120.0, 78.0, 112.0, 92.0, 95.0, 87.0, 68.0, 84.0]
2025-09-11 20:38:12,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 23 hours, 15 minutes, 45 seconds)
2025-09-11 20:52:53,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:52:53,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:53:19,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 451.88916 ± 47.751
2025-09-11 20:53:19,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [484.19885, 446.35806, 444.20468, 431.3163, 401.25656, 557.6242, 444.48962, 381.94305, 430.041, 497.4591]
2025-09-11 20:53:19,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 89.0, 82.0, 80.0, 75.0, 103.0, 86.0, 70.0, 80.0, 101.0]
2025-09-11 20:53:19,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 23 hours, 35 minutes, 38 seconds)
2025-09-11 21:08:12,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:08:12,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:08:38,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 459.96704 ± 105.807
2025-09-11 21:08:38,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [551.69727, 418.5701, 319.84573, 408.459, 365.9667, 634.8323, 451.79855, 425.6725, 641.0028, 381.82562]
2025-09-11 21:08:38,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 79.0, 66.0, 90.0, 70.0, 133.0, 83.0, 80.0, 129.0, 84.0]
2025-09-11 21:08:38,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 23 hours, 30 minutes, 46 seconds)
2025-09-11 21:23:22,267 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:23:22,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:23:51,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 526.24402 ± 127.531
2025-09-11 21:23:51,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [520.1904, 657.5359, 528.63556, 458.4099, 825.4737, 539.30444, 436.70224, 421.81644, 527.36536, 347.00632]
2025-09-11 21:23:51,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 128.0, 102.0, 93.0, 164.0, 102.0, 81.0, 79.0, 99.0, 64.0]
2025-09-11 21:23:51,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (526.24) for latency ExtremeClogL1U23
2025-09-11 21:23:51,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 23 hours, 18 minutes, 15 seconds)
2025-09-11 21:38:39,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:38:39,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:39:05,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 448.57559 ± 83.768
2025-09-11 21:39:05,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [463.8229, 415.11505, 499.56287, 337.88068, 429.86713, 433.74286, 495.52704, 360.20276, 399.1304, 650.9038]
2025-09-11 21:39:05,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 79.0, 93.0, 63.0, 80.0, 93.0, 95.0, 67.0, 89.0, 126.0]
2025-09-11 21:39:05,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 23 hours, 5 minutes, 5 seconds)
2025-09-11 21:53:49,526 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:53:49,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:54:14,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 461.48480 ± 76.381
2025-09-11 21:54:14,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [348.97568, 413.39822, 404.7922, 586.3342, 463.57767, 464.77368, 446.21176, 611.3849, 456.6507, 418.74915]
2025-09-11 21:54:14,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 76.0, 75.0, 112.0, 88.0, 89.0, 82.0, 115.0, 83.0, 83.0]
2025-09-11 21:54:14,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 22 hours, 48 minutes, 29 seconds)
2025-09-11 22:08:55,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:08:55,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:09:22,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 476.88763 ± 58.067
2025-09-11 22:09:22,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [474.84332, 541.5321, 486.83484, 610.5374, 493.5035, 433.34915, 448.79202, 401.55325, 444.46756, 433.4628]
2025-09-11 22:09:22,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 105.0, 94.0, 115.0, 93.0, 84.0, 86.0, 77.0, 83.0, 80.0]
2025-09-11 22:09:22,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 22 hours, 33 minutes, 44 seconds)
2025-09-11 22:24:18,801 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:24:18,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:24:43,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 432.41144 ± 158.997
2025-09-11 22:24:43,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [483.91687, 358.08066, 162.2484, 500.766, 826.77466, 393.63983, 388.80493, 454.0987, 406.3632, 349.421]
2025-09-11 22:24:43,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 69.0, 31.0, 109.0, 171.0, 76.0, 82.0, 85.0, 76.0, 68.0]
2025-09-11 22:24:43,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 22 hours, 18 minutes, 57 seconds)
2025-09-11 22:39:22,990 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:39:22,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:39:47,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 465.70850 ± 62.681
2025-09-11 22:39:47,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [546.1711, 429.87265, 464.71545, 564.94403, 461.51334, 500.27997, 407.1207, 411.04407, 513.96906, 357.4544]
2025-09-11 22:39:47,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 79.0, 95.0, 107.0, 86.0, 92.0, 74.0, 75.0, 96.0, 66.0]
2025-09-11 22:39:47,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 1 minute, 18 seconds)
2025-09-11 22:54:20,206 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:54:20,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:54:46,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 473.91943 ± 95.201
2025-09-11 22:54:46,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [373.47922, 592.3349, 501.1387, 447.89578, 458.99554, 652.8306, 530.81915, 310.62238, 438.8394, 432.23883]
2025-09-11 22:54:46,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 115.0, 108.0, 84.0, 89.0, 128.0, 101.0, 59.0, 91.0, 82.0]
2025-09-11 22:54:46,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 21 hours, 41 minutes, 52 seconds)
2025-09-11 23:09:00,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:09:00,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:09:24,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 468.97406 ± 88.977
2025-09-11 23:09:24,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [427.53055, 402.11472, 490.38907, 397.006, 617.6135, 409.19806, 635.4071, 492.70544, 353.24695, 464.52936]
2025-09-11 23:09:24,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 74.0, 91.0, 72.0, 117.0, 74.0, 120.0, 93.0, 65.0, 85.0]
2025-09-11 23:09:24,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 21 hours, 17 minutes, 54 seconds)
2025-09-11 23:23:48,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:23:48,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:24:18,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 509.22241 ± 131.358
2025-09-11 23:24:18,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [725.6202, 399.6428, 710.62213, 566.29913, 468.8042, 396.438, 372.92957, 400.60944, 414.65674, 636.60187]
2025-09-11 23:24:18,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 89.0, 138.0, 104.0, 102.0, 88.0, 75.0, 82.0, 77.0, 128.0]
2025-09-11 23:24:18,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 20 hours, 58 minutes, 58 seconds)
2025-09-11 23:39:00,128 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:39:00,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:39:30,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 536.59717 ± 84.504
2025-09-11 23:39:30,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [562.79224, 515.2506, 548.9067, 656.89874, 493.89886, 356.60046, 514.13586, 662.533, 576.9269, 478.02808]
2025-09-11 23:39:30,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 95.0, 103.0, 143.0, 91.0, 66.0, 100.0, 144.0, 124.0, 90.0]
2025-09-11 23:39:30,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (536.60) for latency ExtremeClogL1U23
2025-09-11 23:39:30,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 41 minutes, 23 seconds)
2025-09-11 23:54:07,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:54:07,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:54:36,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 528.79797 ± 104.319
2025-09-11 23:54:36,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [557.6358, 362.5945, 513.71606, 433.75366, 733.326, 612.22437, 459.37015, 446.43073, 631.9798, 536.9486]
2025-09-11 23:54:36,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 78.0, 95.0, 81.0, 142.0, 128.0, 93.0, 94.0, 121.0, 97.0]
2025-09-11 23:54:36,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 20 hours, 26 minutes, 59 seconds)
2025-09-12 00:09:15,033 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:09:15,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:09:45,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 550.96619 ± 68.562
2025-09-12 00:09:45,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [565.02966, 511.75192, 616.9129, 573.2516, 615.5284, 387.0198, 586.61206, 621.71136, 497.92096, 533.9236]
2025-09-12 00:09:45,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 91.0, 117.0, 106.0, 114.0, 71.0, 110.0, 119.0, 92.0, 113.0]
2025-09-12 00:09:45,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (550.97) for latency ExtremeClogL1U23
2025-09-12 00:09:45,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 20 hours, 14 minutes, 36 seconds)
2025-09-12 00:24:26,606 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:24:26,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:24:55,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 530.38245 ± 70.895
2025-09-12 00:24:55,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [531.0076, 475.9674, 439.3955, 564.0347, 515.66846, 557.7361, 511.02628, 427.2765, 668.32544, 613.3869]
2025-09-12 00:24:55,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 87.0, 82.0, 109.0, 107.0, 101.0, 110.0, 78.0, 129.0, 114.0]
2025-09-12 00:24:55,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 20 hours, 8 minutes, 10 seconds)
2025-09-12 00:39:32,395 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:39:32,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:40:02,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 554.26312 ± 106.703
2025-09-12 00:40:02,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [407.22485, 637.6433, 571.4963, 688.4536, 742.68933, 565.1611, 395.33722, 496.0122, 507.88663, 530.72705]
2025-09-12 00:40:02,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 120.0, 105.0, 126.0, 143.0, 118.0, 73.0, 92.0, 109.0, 99.0]
2025-09-12 00:40:02,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (554.26) for latency ExtremeClogL1U23
2025-09-12 00:40:02,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 19 hours, 56 minutes, 39 seconds)
2025-09-12 00:54:48,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:54:48,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:55:21,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 584.76288 ± 202.906
2025-09-12 00:55:21,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [140.37462, 709.8928, 570.98364, 721.158, 444.94574, 878.54364, 822.2864, 576.3961, 496.2867, 486.76144]
2025-09-12 00:55:21,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 135.0, 116.0, 135.0, 97.0, 170.0, 162.0, 111.0, 92.0, 100.0]
2025-09-12 00:55:21,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (584.76) for latency ExtremeClogL1U23
2025-09-12 00:55:21,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 43 minutes, 11 seconds)
2025-09-12 01:10:01,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:10:01,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:10:31,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 553.14001 ± 146.691
2025-09-12 01:10:31,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [512.4875, 561.8537, 665.4431, 466.53073, 587.316, 646.4114, 465.49857, 883.92456, 371.64737, 370.28702]
2025-09-12 01:10:31,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 102.0, 131.0, 86.0, 114.0, 120.0, 85.0, 172.0, 68.0, 70.0]
2025-09-12 01:10:31,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 28 minutes, 58 seconds)
2025-09-12 01:25:04,843 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:25:04,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:25:37,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 535.05634 ± 132.820
2025-09-12 01:25:37,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [470.97723, 617.6699, 524.7342, 449.76166, 407.0658, 501.1668, 698.02423, 816.5086, 349.60193, 515.05334]
2025-09-12 01:25:37,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 131.0, 112.0, 90.0, 87.0, 93.0, 141.0, 172.0, 74.0, 106.0]
2025-09-12 01:25:37,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 13 minutes, 7 seconds)
2025-09-12 01:40:24,945 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:40:24,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:40:55,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 532.51501 ± 139.019
2025-09-12 01:40:55,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [575.046, 369.56393, 554.8779, 496.37347, 409.07956, 477.98035, 465.11774, 889.2482, 463.86325, 623.9994]
2025-09-12 01:40:55,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 83.0, 107.0, 92.0, 91.0, 91.0, 104.0, 172.0, 85.0, 119.0]
2025-09-12 01:40:55,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 18 hours, 59 minutes, 55 seconds)
2025-09-12 01:55:31,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:55:31,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:55:58,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 493.73102 ± 83.252
2025-09-12 01:55:58,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [447.56924, 509.7982, 425.5692, 710.4896, 521.99695, 402.9437, 501.24164, 529.34534, 441.6464, 446.71017]
2025-09-12 01:55:58,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 94.0, 78.0, 134.0, 95.0, 75.0, 91.0, 99.0, 80.0, 82.0]
2025-09-12 01:55:58,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 18 hours, 43 minutes, 38 seconds)
2025-09-12 02:10:44,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:10:44,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:11:21,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 667.21936 ± 120.394
2025-09-12 02:11:21,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [847.78406, 553.67725, 765.24677, 630.1376, 638.9419, 742.21295, 724.98444, 523.48987, 787.3463, 458.3724]
2025-09-12 02:11:21,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 101.0, 162.0, 135.0, 116.0, 146.0, 137.0, 102.0, 145.0, 93.0]
2025-09-12 02:11:21,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (667.22) for latency ExtremeClogL1U23
2025-09-12 02:11:21,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 29 minutes, 34 seconds)
2025-09-12 02:25:48,884 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:25:48,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:26:30,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 741.04559 ± 118.701
2025-09-12 02:26:30,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [603.11255, 891.94824, 884.6631, 625.2898, 621.6523, 757.7114, 919.47266, 799.36865, 650.537, 656.7001]
2025-09-12 02:26:30,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 186.0, 173.0, 130.0, 116.0, 142.0, 179.0, 150.0, 139.0, 121.0]
2025-09-12 02:26:30,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (741.05) for latency ExtremeClogL1U23
2025-09-12 02:26:30,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 14 minutes, 14 seconds)
2025-09-12 02:41:15,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:41:15,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:41:44,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 517.30469 ± 79.766
2025-09-12 02:41:44,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [450.29962, 413.68518, 629.7472, 581.70526, 532.05945, 422.38043, 498.11194, 607.0658, 602.5101, 435.48172]
2025-09-12 02:41:44,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 75.0, 135.0, 112.0, 100.0, 91.0, 93.0, 115.0, 128.0, 81.0]
2025-09-12 02:41:44,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 1 minute)
2025-09-12 02:56:14,321 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:56:14,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:56:47,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 605.52063 ± 106.190
2025-09-12 02:56:47,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [730.7015, 675.0862, 611.444, 636.5166, 429.3678, 688.86414, 624.14136, 380.71225, 631.1736, 647.1999]
2025-09-12 02:56:47,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 127.0, 128.0, 119.0, 90.0, 129.0, 114.0, 76.0, 116.0, 125.0]
2025-09-12 02:56:47,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 42 minutes, 16 seconds)
2025-09-12 03:11:40,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:11:40,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:12:21,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 723.65356 ± 168.806
2025-09-12 03:12:21,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [552.6154, 952.7744, 646.7379, 604.4807, 821.57697, 997.4432, 624.9434, 903.6362, 507.88174, 624.4458]
2025-09-12 03:12:21,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 200.0, 131.0, 110.0, 156.0, 182.0, 121.0, 171.0, 94.0, 119.0]
2025-09-12 03:12:21,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 34 minutes, 15 seconds)
2025-09-12 03:27:30,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:27:30,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:28:08,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 646.82794 ± 180.818
2025-09-12 03:28:08,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [384.3707, 520.5559, 702.3853, 706.5337, 824.6256, 495.01584, 897.66974, 832.15515, 371.17767, 733.7896]
2025-09-12 03:28:08,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 107.0, 133.0, 132.0, 174.0, 90.0, 170.0, 163.0, 76.0, 138.0]
2025-09-12 03:28:08,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 24 minutes, 18 seconds)
2025-09-12 03:43:09,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:43:09,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:43:45,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 641.06750 ± 149.845
2025-09-12 03:43:45,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [663.6691, 819.04254, 902.53705, 570.6845, 629.97766, 517.68964, 646.2299, 396.87823, 483.88724, 780.0793]
2025-09-12 03:43:45,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 151.0, 172.0, 121.0, 115.0, 112.0, 120.0, 73.0, 92.0, 140.0]
2025-09-12 03:43:45,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 15 minutes, 10 seconds)
2025-09-12 03:58:41,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:58:41,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:59:11,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 558.95227 ± 85.760
2025-09-12 03:59:11,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [578.3551, 739.8405, 474.18137, 467.30798, 610.9908, 622.95844, 598.3357, 520.46765, 539.10895, 437.97647]
2025-09-12 03:59:11,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 136.0, 86.0, 84.0, 113.0, 130.0, 109.0, 95.0, 98.0, 79.0]
2025-09-12 03:59:11,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 2 minutes, 24 seconds)
2025-09-12 04:14:08,471 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:14:08,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:14:49,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 726.21179 ± 146.609
2025-09-12 04:14:49,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [754.04175, 754.2896, 828.81946, 747.8115, 546.56134, 606.9027, 1075.7634, 739.9484, 646.5507, 561.4283]
2025-09-12 04:14:49,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 140.0, 158.0, 139.0, 100.0, 113.0, 216.0, 137.0, 119.0, 102.0]
2025-09-12 04:14:49,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 54 minutes, 19 seconds)
2025-09-12 04:29:45,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:29:45,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:30:26,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 727.44043 ± 228.189
2025-09-12 04:30:26,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [607.47174, 410.3055, 852.64557, 686.83014, 971.37616, 716.36346, 535.7509, 932.47253, 431.06024, 1130.1283]
2025-09-12 04:30:26,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 74.0, 179.0, 137.0, 194.0, 134.0, 99.0, 167.0, 80.0, 219.0]
2025-09-12 04:30:26,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 39 minutes, 22 seconds)
2025-09-12 04:45:22,555 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:45:22,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:46:00,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 696.10693 ± 189.952
2025-09-12 04:46:00,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [652.00165, 829.2506, 553.15265, 1067.3683, 610.4836, 428.98227, 583.10046, 555.65344, 718.12555, 962.95044]
2025-09-12 04:46:00,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 150.0, 100.0, 217.0, 113.0, 80.0, 106.0, 101.0, 124.0, 178.0]
2025-09-12 04:46:00,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 21 minutes, 12 seconds)
2025-09-12 05:00:54,180 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:00:54,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:01:30,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 672.53424 ± 235.160
2025-09-12 05:01:30,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [595.9183, 350.86926, 856.39685, 504.07242, 617.36676, 1216.256, 577.53485, 856.22943, 678.1702, 472.52792]
2025-09-12 05:01:30,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 65.0, 154.0, 93.0, 111.0, 230.0, 108.0, 153.0, 127.0, 85.0]
2025-09-12 05:01:30,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 4 minutes, 5 seconds)
2025-09-12 05:16:38,468 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:16:38,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:17:17,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 660.84485 ± 143.473
2025-09-12 05:17:17,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [531.80676, 742.8372, 715.46765, 495.87805, 926.384, 445.80945, 829.7827, 606.2903, 612.50366, 701.68866]
2025-09-12 05:17:17,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 157.0, 131.0, 104.0, 173.0, 92.0, 157.0, 121.0, 126.0, 147.0]
2025-09-12 05:17:17,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 52 minutes, 43 seconds)
2025-09-12 05:32:09,608 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:32:09,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:32:49,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 701.25293 ± 195.109
2025-09-12 05:32:49,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [997.2464, 888.9117, 536.913, 336.5182, 644.6438, 683.1696, 779.86584, 510.56927, 928.6556, 706.0357]
2025-09-12 05:32:49,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [189.0, 162.0, 106.0, 63.0, 122.0, 130.0, 146.0, 92.0, 193.0, 128.0]
2025-09-12 05:32:49,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 35 minutes, 57 seconds)
2025-09-12 05:47:46,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:47:46,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:48:41,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 928.48254 ± 229.121
2025-09-12 05:48:41,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [674.37714, 1175.4825, 1168.4744, 907.9053, 1108.7169, 1025.9279, 846.78503, 1163.2589, 482.8024, 731.0945]
2025-09-12 05:48:41,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 229.0, 220.0, 173.0, 208.0, 200.0, 159.0, 237.0, 103.0, 154.0]
2025-09-12 05:48:41,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (928.48) for latency ExtremeClogL1U23
2025-09-12 05:48:41,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 23 minutes, 16 seconds)
2025-09-12 06:03:44,424 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:03:44,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:04:33,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 853.73633 ± 316.056
2025-09-12 06:04:33,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [697.7965, 1095.8074, 566.50336, 1602.7302, 918.2564, 564.66644, 962.6835, 530.50684, 622.44934, 975.9633]
2025-09-12 06:04:33,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 221.0, 117.0, 307.0, 171.0, 105.0, 176.0, 111.0, 118.0, 204.0]
2025-09-12 06:04:33,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 11 minutes, 5 seconds)
2025-09-12 06:19:37,047 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:19:37,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:20:42,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1166.40503 ± 472.789
2025-09-12 06:20:42,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1052.8978, 752.0653, 1315.982, 1441.3724, 962.7706, 1002.42474, 786.09186, 967.916, 933.1733, 2449.357]
2025-09-12 06:20:42,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 142.0, 261.0, 275.0, 183.0, 188.0, 148.0, 183.0, 173.0, 475.0]
2025-09-12 06:20:42,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1166.41) for latency ExtremeClogL1U23
2025-09-12 06:20:42,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 2 minutes, 54 seconds)
2025-09-12 06:35:43,936 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:35:43,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:36:32,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 835.52167 ± 294.355
2025-09-12 06:36:32,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [854.41626, 471.06546, 1302.0088, 1257.8085, 944.669, 839.40656, 549.52936, 1073.7332, 560.85315, 501.72668]
2025-09-12 06:36:32,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 87.0, 251.0, 253.0, 174.0, 171.0, 102.0, 217.0, 103.0, 107.0]
2025-09-12 06:36:32,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 47 minutes, 35 seconds)
2025-09-12 06:51:33,085 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:51:33,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:52:21,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 832.60095 ± 337.804
2025-09-12 06:52:21,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1108.7839, 146.16206, 1369.8822, 532.44446, 1064.4269, 535.2157, 753.0052, 958.7175, 802.2228, 1055.1495]
2025-09-12 06:52:21,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 28.0, 257.0, 112.0, 197.0, 114.0, 156.0, 181.0, 149.0, 195.0]
2025-09-12 06:52:21,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 34 minutes, 52 seconds)
2025-09-12 07:07:34,865 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:07:34,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:08:12,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 692.85956 ± 307.569
2025-09-12 07:08:12,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [721.9903, 477.52014, 144.62317, 464.02023, 1087.8402, 1022.1605, 1012.658, 1003.183, 441.05228, 553.548]
2025-09-12 07:08:12,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 88.0, 28.0, 85.0, 198.0, 187.0, 185.0, 182.0, 82.0, 100.0]
2025-09-12 07:08:12,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 18 minutes, 53 seconds)
2025-09-12 07:22:58,517 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:22:58,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:23:39,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 747.84381 ± 209.519
2025-09-12 07:23:39,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1054.2137, 1010.3596, 902.7986, 585.6665, 715.3902, 583.05786, 983.012, 474.14548, 500.14407, 669.6506]
2025-09-12 07:23:39,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 185.0, 168.0, 108.0, 129.0, 110.0, 174.0, 86.0, 92.0, 127.0]
2025-09-12 07:23:39,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 58 minutes, 28 seconds)
2025-09-12 07:38:47,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:38:47,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:39:47,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1072.08679 ± 438.739
2025-09-12 07:39:47,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1161.1742, 1440.7139, 1071.5289, 632.3956, 532.4854, 459.67996, 1253.1261, 1437.0619, 1905.9761, 826.72516]
2025-09-12 07:39:47,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 258.0, 204.0, 120.0, 100.0, 87.0, 232.0, 275.0, 365.0, 176.0]
2025-09-12 07:39:47,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 42 minutes, 20 seconds)
2025-09-12 07:54:45,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:54:45,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:55:35,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 899.57800 ± 409.585
2025-09-12 07:55:35,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1758.1891, 786.7038, 777.63104, 593.91705, 1122.0778, 1226.669, 384.12732, 1265.0859, 595.2975, 486.0825]
2025-09-12 07:55:35,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [326.0, 140.0, 143.0, 119.0, 208.0, 227.0, 69.0, 236.0, 116.0, 91.0]
2025-09-12 07:55:35,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 26 minutes, 21 seconds)
2025-09-12 08:10:32,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:10:32,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:11:19,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 789.96112 ± 266.562
2025-09-12 08:11:19,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [650.5495, 969.22565, 572.9967, 549.68304, 1254.3843, 426.5298, 541.2026, 1117.2616, 976.3833, 841.39526]
2025-09-12 08:11:19,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 186.0, 123.0, 119.0, 256.0, 91.0, 114.0, 216.0, 176.0, 174.0]
2025-09-12 08:11:19,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 9 minutes, 44 seconds)
2025-09-12 08:26:29,532 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:26:29,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:27:31,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1055.03247 ± 557.258
2025-09-12 08:27:31,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1125.8856, 160.66533, 2041.2657, 526.37006, 1758.8423, 740.77893, 1035.342, 1538.3403, 586.0981, 1036.7365]
2025-09-12 08:27:31,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 31.0, 380.0, 95.0, 332.0, 138.0, 208.0, 325.0, 126.0, 214.0]
2025-09-12 08:27:31,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 57 minutes, 16 seconds)
2025-09-12 08:42:31,044 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:42:31,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:43:30,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1088.90759 ± 369.581
2025-09-12 08:43:30,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [379.93146, 1815.6133, 984.0079, 1217.2732, 829.4158, 1077.8835, 1310.0945, 1033.7157, 815.41785, 1425.7231]
2025-09-12 08:43:30,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 334.0, 185.0, 212.0, 151.0, 194.0, 238.0, 196.0, 148.0, 266.0]
2025-09-12 08:43:30,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 46 minutes, 36 seconds)
2025-09-12 08:58:39,366 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:58:39,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:59:52,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1279.18091 ± 528.933
2025-09-12 08:59:52,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [863.8032, 2267.2969, 589.8157, 1132.4689, 1341.4338, 2200.6028, 1052.721, 786.9021, 1225.8317, 1330.9316]
2025-09-12 08:59:52,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 431.0, 112.0, 216.0, 245.0, 414.0, 197.0, 150.0, 225.0, 270.0]
2025-09-12 08:59:52,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1279.18) for latency ExtremeClogL1U23
2025-09-12 08:59:52,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 32 minutes, 51 seconds)
2025-09-12 09:15:04,508 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:15:04,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:16:13,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1204.06934 ± 567.030
2025-09-12 09:16:13,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1479.7473, 1304.6514, 984.3854, 485.95038, 2000.4318, 766.4453, 785.6496, 1830.1543, 1967.2695, 436.00906]
2025-09-12 09:16:13,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [286.0, 244.0, 203.0, 104.0, 376.0, 151.0, 146.0, 338.0, 388.0, 79.0]
2025-09-12 09:16:14,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 21 minutes, 51 seconds)
2025-09-12 09:31:16,711 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:31:16,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:32:23,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1169.26404 ± 588.458
2025-09-12 09:32:23,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [692.2067, 795.587, 1570.077, 560.9597, 557.36273, 1355.7622, 1627.1831, 1199.69, 815.4941, 2518.3188]
2025-09-12 09:32:23,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 142.0, 295.0, 117.0, 108.0, 251.0, 316.0, 235.0, 169.0, 461.0]
2025-09-12 09:32:23,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 9 minutes, 35 seconds)
2025-09-12 09:47:05,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:47:05,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:48:50,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1913.51782 ± 915.502
2025-09-12 09:48:50,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1368.795, 1616.0178, 2168.532, 840.7901, 1286.9508, 3581.982, 3559.8303, 1392.955, 1162.7445, 2156.5796]
2025-09-12 09:48:50,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [248.0, 296.0, 411.0, 154.0, 238.0, 663.0, 687.0, 292.0, 222.0, 428.0]
2025-09-12 09:48:50,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1913.52) for latency ExtremeClogL1U23
2025-09-12 09:48:50,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 55 minutes, 33 seconds)
2025-09-12 10:04:03,239 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:04:03,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:05:40,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1685.46643 ± 662.566
2025-09-12 10:05:40,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1161.1042, 2402.3398, 912.41187, 3150.9976, 1279.7732, 1833.2614, 988.7523, 1967.1184, 1781.9713, 1376.934]
2025-09-12 10:05:40,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [247.0, 447.0, 190.0, 602.0, 235.0, 384.0, 200.0, 392.0, 326.0, 268.0]
2025-09-12 10:05:40,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 46 minutes, 31 seconds)
2025-09-12 10:21:18,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:21:18,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:23:02,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1896.16479 ± 1054.734
2025-09-12 10:23:02,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2977.7122, 3142.4875, 1052.581, 1395.9838, 420.25708, 3301.3652, 1156.1814, 494.5534, 2611.6108, 2408.9158]
2025-09-12 10:23:02,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [586.0, 592.0, 190.0, 258.0, 79.0, 602.0, 216.0, 101.0, 480.0, 431.0]
2025-09-12 10:23:02,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 38 minutes, 35 seconds)
2025-09-12 10:36:59,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:36:59,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:38:19,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1461.01685 ± 626.888
2025-09-12 10:38:19,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1279.329, 1691.5358, 2773.0356, 661.82117, 1515.2935, 1559.2395, 411.71988, 1203.2906, 2010.2789, 1504.6238]
2025-09-12 10:38:19,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [245.0, 318.0, 501.0, 119.0, 289.0, 313.0, 77.0, 229.0, 384.0, 274.0]
2025-09-12 10:38:19,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 13 minutes, 10 seconds)
2025-09-12 10:53:11,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:53:11,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:55:33,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2549.82788 ± 1593.743
2025-09-12 10:55:33,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1858.4841, 155.87296, 2651.4932, 5241.6, 3553.2722, 1728.9261, 2009.1571, 996.31775, 2051.4211, 5251.735]
2025-09-12 10:55:33,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [360.0, 30.0, 500.0, 1000.0, 687.0, 342.0, 369.0, 190.0, 426.0, 1000.0]
2025-09-12 10:55:33,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2549.83) for latency ExtremeClogL1U23
2025-09-12 10:55:33,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 5 minutes, 22 seconds)
2025-09-12 11:10:27,096 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:10:27,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:11:54,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1498.16309 ± 861.834
2025-09-12 11:11:54,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1404.5271, 609.59686, 421.59564, 2415.666, 1066.8163, 3553.9553, 1370.2178, 1056.4563, 1520.1664, 1562.6338]
2025-09-12 11:11:54,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [270.0, 110.0, 93.0, 456.0, 221.0, 710.0, 262.0, 208.0, 329.0, 322.0]
2025-09-12 11:11:54,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 47 minutes, 52 seconds)
2025-09-12 11:26:44,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:26:44,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:28:44,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2134.88037 ± 1433.596
2025-09-12 11:28:44,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [270.81757, 4888.0957, 1345.3969, 3708.9097, 946.39813, 789.61115, 1290.7872, 1933.9957, 2536.7031, 3638.089]
2025-09-12 11:28:44,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 960.0, 252.0, 705.0, 178.0, 158.0, 242.0, 372.0, 495.0, 678.0]
2025-09-12 11:28:44,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 31 minutes, 19 seconds)
2025-09-12 11:43:53,129 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:43:53,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:46:50,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3181.12378 ± 1698.840
2025-09-12 11:46:50,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2327.025, 1379.2028, 5131.295, 676.31085, 5200.331, 5344.4355, 4418.631, 2404.7917, 3666.4617, 1262.7542]
2025-09-12 11:46:50,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [472.0, 267.0, 1000.0, 124.0, 1000.0, 1000.0, 835.0, 472.0, 691.0, 248.0]
2025-09-12 11:46:50,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (3181.12) for latency ExtremeClogL1U23
2025-09-12 11:46:50,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 20 minutes, 9 seconds)
2025-09-12 12:02:35,946 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:02:35,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:04:38,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2071.02930 ± 1456.956
2025-09-12 12:04:38,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1604.4835, 1257.892, 1998.0033, 3230.0266, 2585.1143, 493.5349, 5069.946, 561.68646, 381.85614, 3527.7502]
2025-09-12 12:04:38,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [321.0, 244.0, 388.0, 644.0, 494.0, 100.0, 962.0, 100.0, 75.0, 701.0]
2025-09-12 12:04:38,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 21 minutes, 24 seconds)
2025-09-12 12:18:28,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:18:28,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:20:40,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2477.50854 ± 1844.940
2025-09-12 12:20:40,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1276.9642, 1900.0392, 1030.7795, 773.3633, 5488.0034, 516.1268, 1760.9535, 5281.362, 4852.6787, 1894.8156]
2025-09-12 12:20:40,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 349.0, 181.0, 140.0, 1000.0, 106.0, 352.0, 1000.0, 874.0, 348.0]
2025-09-12 12:20:40,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 55 minutes, 45 seconds)
2025-09-12 12:36:04,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:36:04,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:38:41,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2849.53418 ± 1781.111
2025-09-12 12:38:41,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [684.1817, 135.0431, 3093.4211, 5297.27, 4462.2905, 691.09485, 3595.1636, 2946.7146, 5209.662, 2380.4993]
2025-09-12 12:38:41,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 26.0, 589.0, 1000.0, 879.0, 124.0, 667.0, 571.0, 1000.0, 452.0]
2025-09-12 12:38:41,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 50 minutes, 7 seconds)
2025-09-12 12:52:43,500 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:52:43,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:54:50,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2306.65991 ± 1469.616
2025-09-12 12:54:50,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1078.0632, 5140.5244, 2016.2335, 2123.2812, 1655.9122, 2975.6895, 4142.354, 3084.6064, 347.78857, 502.14597]
2025-09-12 12:54:50,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 1000.0, 365.0, 392.0, 336.0, 564.0, 767.0, 589.0, 64.0, 106.0]
2025-09-12 12:54:50,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 28 minutes, 21 seconds)
2025-09-12 13:10:47,863 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:10:47,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:12:50,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2234.86694 ± 1461.827
2025-09-12 13:12:50,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [934.1498, 1531.184, 2544.278, 4224.892, 1767.1542, 5266.83, 314.4465, 2042.9545, 936.4433, 2786.3374]
2025-09-12 13:12:50,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [189.0, 293.0, 477.0, 779.0, 353.0, 1000.0, 61.0, 377.0, 175.0, 524.0]
2025-09-12 13:12:50,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 10 minutes, 21 seconds)
2025-09-12 13:27:05,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:27:05,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:29:56,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3097.99927 ± 1622.719
2025-09-12 13:29:56,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1532.9248, 5307.317, 1945.3495, 1234.9614, 850.29974, 5214.9, 2900.318, 3601.1716, 5187.2246, 3205.5247]
2025-09-12 13:29:56,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [307.0, 1000.0, 364.0, 232.0, 173.0, 1000.0, 554.0, 691.0, 971.0, 608.0]
2025-09-12 13:29:56,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 48 minutes, 51 seconds)
2025-09-12 13:44:15,345 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:44:15,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:46:49,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2815.37109 ± 1805.415
2025-09-12 13:46:49,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [129.83646, 426.14328, 5243.3374, 5208.498, 2060.2002, 2464.972, 3116.7869, 2155.2766, 2037.6559, 5311.005]
2025-09-12 13:46:49,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 81.0, 1000.0, 1000.0, 384.0, 481.0, 576.0, 414.0, 384.0, 1000.0]
2025-09-12 13:46:49,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 36 minutes, 57 seconds)
2025-09-12 14:01:39,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:01:39,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:03:50,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2187.83447 ± 1449.934
2025-09-12 14:03:50,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1241.4357, 1637.1073, 4863.635, 1578.8853, 1195.4857, 612.2304, 2529.8267, 4796.83, 2562.457, 860.4524]
2025-09-12 14:03:50,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [250.0, 339.0, 1000.0, 310.0, 244.0, 125.0, 514.0, 1000.0, 528.0, 175.0]
2025-09-12 14:03:50,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 13 minutes, 54 seconds)
2025-09-12 14:19:12,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:19:12,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:21:23,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2256.49463 ± 1113.007
2025-09-12 14:21:23,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2155.6572, 2378.17, 1318.2336, 1870.0415, 2365.646, 2454.721, 903.7644, 992.0594, 3249.6858, 4876.9663]
2025-09-12 14:21:23,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [399.0, 464.0, 247.0, 359.0, 478.0, 509.0, 196.0, 185.0, 658.0, 992.0]
2025-09-12 14:21:23,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 4 minutes, 36 seconds)
2025-09-12 14:35:39,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:35:39,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:38:23,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2874.31567 ± 1279.671
2025-09-12 14:38:23,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1420.6963, 5242.722, 2036.1693, 3112.9998, 1849.6333, 3192.2024, 2522.535, 5084.025, 1566.2344, 2715.938]
2025-09-12 14:38:23,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [267.0, 1000.0, 389.0, 616.0, 393.0, 665.0, 520.0, 1000.0, 284.0, 518.0]
2025-09-12 14:38:23,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 42 minutes)
2025-09-12 14:53:46,307 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:53:46,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:56:31,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2858.48950 ± 1417.237
2025-09-12 14:56:31,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2982.26, 2366.4338, 3082.393, 5069.0454, 2773.9216, 3053.9717, 5020.226, 918.4899, 330.12408, 2988.028]
2025-09-12 14:56:31,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [596.0, 475.0, 644.0, 1000.0, 546.0, 629.0, 1000.0, 171.0, 61.0, 572.0]
2025-09-12 14:56:31,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 30 minutes, 18 seconds)
2025-09-12 15:11:11,427 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:11:11,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:13:34,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2647.57446 ± 1951.027
2025-09-12 15:13:34,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1183.3229, 607.4689, 4632.273, 3918.8833, 2997.4036, 1235.2943, 155.63019, 5404.0786, 964.85834, 5376.531]
2025-09-12 15:13:34,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 120.0, 852.0, 727.0, 558.0, 224.0, 30.0, 1000.0, 180.0, 1000.0]
2025-09-12 15:13:34,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 13 minutes, 43 seconds)
2025-09-12 15:28:06,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:28:06,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:30:59,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3092.79370 ± 1497.960
2025-09-12 15:30:59,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2396.4663, 1048.5889, 2028.9192, 5309.806, 2377.835, 2661.5654, 1687.4397, 5247.867, 5190.1416, 2979.3083]
2025-09-12 15:30:59,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [471.0, 203.0, 382.0, 1000.0, 448.0, 523.0, 319.0, 1000.0, 1000.0, 584.0]
2025-09-12 15:30:59,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 58 minutes, 20 seconds)
2025-09-12 15:46:31,178 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:46:31,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:47:50,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1573.67798 ± 673.109
2025-09-12 15:47:50,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1015.02344, 1296.2979, 2318.9087, 2412.8813, 1720.5056, 639.39886, 1928.4528, 1274.2144, 624.2785, 2506.8196]
2025-09-12 15:47:50,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 234.0, 423.0, 440.0, 320.0, 117.0, 343.0, 231.0, 115.0, 431.0]
2025-09-12 15:47:50,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 37 minutes, 41 seconds)
2025-09-12 16:02:19,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:02:19,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:04:43,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2575.81006 ± 2184.570
2025-09-12 16:04:43,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3462.0503, 5345.983, 166.30943, 239.45341, 5169.777, 5143.6143, 4252.96, 1393.9708, 426.96027, 157.02359]
2025-09-12 16:04:43,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [666.0, 1000.0, 32.0, 45.0, 1000.0, 1000.0, 823.0, 292.0, 78.0, 30.0]
2025-09-12 16:04:43,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 19 minutes, 51 seconds)
2025-09-12 16:18:31,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:18:31,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:21:16,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2932.50684 ± 1972.825
2025-09-12 16:21:16,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5065.3945, 2359.3037, 3444.513, 653.5278, 453.3835, 4952.7837, 5177.4033, 5190.1064, 1671.5924, 357.0633]
2025-09-12 16:21:16,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 441.0, 686.0, 115.0, 85.0, 1000.0, 1000.0, 1000.0, 297.0, 66.0]
2025-09-12 16:21:16,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 55 minutes, 54 seconds)
2025-09-12 16:37:08,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:37:08,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:39:04,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2178.36890 ± 1395.006
2025-09-12 16:39:04,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5435.7573, 1318.7213, 899.5453, 1562.5658, 1306.8038, 2437.4358, 861.4449, 1508.7273, 2559.1523, 3893.5347]
2025-09-12 16:39:04,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 238.0, 170.0, 292.0, 245.0, 462.0, 158.0, 271.0, 456.0, 731.0]
2025-09-12 16:39:04,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 42 minutes, 1 second)
2025-09-12 16:53:19,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:53:19,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:56:20,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3228.72510 ± 1208.393
2025-09-12 16:56:20,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2895.1772, 2273.3318, 3131.6619, 5119.1953, 2258.1497, 2876.4307, 5201.6655, 2947.9631, 4304.391, 1279.2864]
2025-09-12 16:56:20,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [603.0, 445.0, 614.0, 979.0, 438.0, 548.0, 1000.0, 608.0, 802.0, 268.0]
2025-09-12 16:56:20,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (3228.73) for latency ExtremeClogL1U23
2025-09-12 16:56:20,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 24 minutes, 20 seconds)
2025-09-12 17:10:51,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:10:51,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:12:47,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1971.96716 ± 1425.249
2025-09-12 17:12:47,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1744.5302, 888.7857, 4958.0566, 2272.2258, 4115.5547, 2128.311, 1370.2842, 1218.0718, 878.13293, 145.71938]
2025-09-12 17:12:47,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [365.0, 181.0, 1000.0, 476.0, 854.0, 448.0, 270.0, 249.0, 161.0, 28.0]
2025-09-12 17:12:47,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 5 minutes, 49 seconds)
2025-09-12 17:28:12,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:28:12,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:30:55,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2963.30859 ± 1723.557
2025-09-12 17:30:55,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3167.1501, 578.4086, 3089.3225, 1265.6761, 4134.678, 603.69086, 4448.1904, 5276.0073, 5257.2437, 1812.7172]
2025-09-12 17:30:55,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [588.0, 124.0, 588.0, 242.0, 847.0, 112.0, 799.0, 1000.0, 1000.0, 355.0]
2025-09-12 17:30:55,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 53 minutes, 4 seconds)
2025-09-12 17:45:14,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:45:14,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:47:43,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2758.56665 ± 1756.505
2025-09-12 17:47:43,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [438.39532, 3446.2527, 5359.819, 1640.0388, 640.66254, 4199.706, 3377.2615, 5391.9014, 1264.5713, 1827.0574]
2025-09-12 17:47:43,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 634.0, 1000.0, 305.0, 115.0, 768.0, 672.0, 1000.0, 232.0, 357.0]
2025-09-12 17:47:43,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 36 minutes, 37 seconds)
2025-09-12 18:03:16,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:03:16,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:07:20,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4395.08691 ± 1488.327
2025-09-12 18:07:20,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5167.9526, 5026.612, 2899.2395, 5161.9556, 4440.46, 444.33997, 5293.207, 5282.923, 4915.2817, 5318.9033]
2025-09-12 18:07:20,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 569.0, 1000.0, 839.0, 86.0, 1000.0, 1000.0, 984.0, 1000.0]
2025-09-12 18:07:20,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (4395.09) for latency ExtremeClogL1U23
2025-09-12 18:07:20,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 24 minutes, 48 seconds)
2025-09-12 18:21:13,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:21:13,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:22:57,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1918.16113 ± 1268.161
2025-09-12 18:22:57,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2100.7178, 2063.588, 897.5151, 3284.4812, 359.4042, 4650.6284, 1559.0498, 512.6127, 2623.5518, 1130.0631]
2025-09-12 18:22:57,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [394.0, 376.0, 180.0, 609.0, 67.0, 900.0, 284.0, 92.0, 477.0, 206.0]
2025-09-12 18:22:57,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 2 minutes, 31 seconds)
2025-09-12 18:37:51,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:37:51,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:40:20,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2736.17432 ± 1900.707
2025-09-12 18:40:20,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [4497.8467, 5245.101, 3237.982, 5435.1733, 767.02295, 1448.2379, 179.5704, 387.73593, 3907.967, 2255.108]
2025-09-12 18:40:20,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [841.0, 1000.0, 609.0, 1000.0, 159.0, 267.0, 35.0, 69.0, 737.0, 423.0]
2025-09-12 18:40:20,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 47 minutes, 38 seconds)
2025-09-12 18:56:17,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:56:17,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:59:08,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3063.02100 ± 1783.982
2025-09-12 18:59:08,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5285.747, 5215.7046, 933.8076, 4123.5063, 2008.6981, 3516.83, 5248.3857, 2065.109, 135.43114, 2096.9905]
2025-09-12 18:59:08,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 173.0, 821.0, 411.0, 708.0, 1000.0, 414.0, 26.0, 425.0]
2025-09-12 18:59:08,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 31 minutes, 42 seconds)
2025-09-12 19:13:48,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:13:48,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:15:16,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1626.61768 ± 1491.994
2025-09-12 19:15:16,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5325.1587, 673.9553, 581.31085, 771.64874, 2135.59, 2038.0044, 736.8034, 192.13379, 3004.9626, 806.6083]
2025-09-12 19:15:16,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 126.0, 110.0, 140.0, 381.0, 380.0, 139.0, 37.0, 579.0, 152.0]
2025-09-12 19:15:16,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 12 minutes, 37 seconds)
2025-09-12 19:29:42,294 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:29:42,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:32:36,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3322.47412 ± 1858.462
2025-09-12 19:32:36,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3362.6584, 3252.19, 5426.2305, 5432.7485, 1238.6173, 5503.431, 703.27893, 1532.7548, 1586.0397, 5186.794]
2025-09-12 19:32:36,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [611.0, 587.0, 1000.0, 1000.0, 225.0, 1000.0, 141.0, 295.0, 295.0, 933.0]
2025-09-12 19:32:36,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 50 minutes, 31 seconds)
2025-09-12 19:47:19,605 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:47:19,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:50:05,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3089.15625 ± 1384.946
2025-09-12 19:50:05,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1587.5967, 5246.095, 4116.5225, 1485.2109, 3161.3687, 3200.205, 3055.5527, 5394.792, 1598.3685, 2045.8491]
2025-09-12 19:50:05,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [293.0, 1000.0, 763.0, 291.0, 577.0, 594.0, 566.0, 1000.0, 298.0, 371.0]
2025-09-12 19:50:05,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 36 minutes, 49 seconds)
2025-09-12 20:04:39,512 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:04:39,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:06:50,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2297.55884 ± 1444.019
2025-09-12 20:06:50,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [736.5589, 4712.3228, 669.5489, 1510.6876, 1808.7615, 1338.6912, 5103.034, 2645.4258, 1947.7745, 2502.7805]
2025-09-12 20:06:50,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 940.0, 136.0, 296.0, 345.0, 267.0, 1000.0, 544.0, 377.0, 494.0]
2025-09-12 20:06:50,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 18 minutes, 23 seconds)
2025-09-12 20:21:22,938 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:21:22,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:24:06,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3093.05151 ± 2070.715
2025-09-12 20:24:06,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5404.542, 3210.9731, 423.8943, 5634.993, 5434.6323, 475.309, 1486.0967, 4469.756, 535.30804, 3855.009]
2025-09-12 20:24:06,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 562.0, 89.0, 1000.0, 989.0, 89.0, 276.0, 820.0, 102.0, 737.0]
2025-09-12 20:24:06,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 58 minutes, 57 seconds)
2025-09-12 20:38:49,437 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:38:49,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:41:53,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3386.40381 ± 1745.808
2025-09-12 20:41:53,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1322.2445, 4224.255, 3154.6755, 5452.93, 4402.7363, 5316.459, 389.28104, 2750.879, 5301.9595, 1548.6184]
2025-09-12 20:41:53,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [244.0, 770.0, 612.0, 1000.0, 837.0, 1000.0, 71.0, 521.0, 1000.0, 288.0]
2025-09-12 20:41:53,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 43 minutes, 56 seconds)
2025-09-12 20:57:22,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:57:22,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:00:15,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3246.13330 ± 1661.646
2025-09-12 21:00:15,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [389.46857, 5397.583, 2343.6257, 5038.874, 1215.9188, 3643.922, 3514.3438, 5435.2793, 3495.695, 1986.6223]
2025-09-12 21:00:15,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 1000.0, 442.0, 914.0, 220.0, 670.0, 645.0, 1000.0, 639.0, 378.0]
2025-09-12 21:00:15,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 27 minutes, 38 seconds)
2025-09-12 21:15:42,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:15:42,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:18:37,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3236.81104 ± 1817.698
2025-09-12 21:18:37,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [876.1098, 5399.9526, 5261.128, 5284.9175, 2177.95, 1966.7372, 2722.103, 2782.889, 589.8047, 5306.518]
2025-09-12 21:18:37,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 1000.0, 1000.0, 1000.0, 415.0, 359.0, 506.0, 504.0, 120.0, 1000.0]
2025-09-12 21:18:37,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 10 minutes, 49 seconds)
2025-09-12 21:32:14,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:32:14,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:35:18,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3371.32764 ± 1702.029
2025-09-12 21:35:18,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5385.068, 2809.7927, 5290.7295, 5254.5903, 1868.9568, 1114.3479, 1968.2107, 3214.0208, 5406.416, 1401.1459]
2025-09-12 21:35:18,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 514.0, 1000.0, 1000.0, 346.0, 210.0, 366.0, 602.0, 1000.0, 272.0]
2025-09-12 21:35:18,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 53 minutes, 4 seconds)
2025-09-12 21:50:28,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:50:28,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:53:17,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3225.55078 ± 2075.798
2025-09-12 21:53:17,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5449.098, 5393.9424, 5477.9272, 2288.912, 453.39456, 567.4064, 1970.3834, 4052.536, 1012.7771, 5589.129]
2025-09-12 21:53:17,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 986.0, 407.0, 82.0, 104.0, 343.0, 732.0, 201.0, 1000.0]
2025-09-12 21:53:17,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 35 minutes, 40 seconds)
2025-09-12 22:08:21,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:08:21,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:11:12,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3184.69287 ± 1893.286
2025-09-12 22:11:12,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5398.005, 496.19272, 1476.0309, 2812.7615, 2010.6268, 5470.942, 5397.3394, 4955.405, 3097.796, 731.83295]
2025-09-12 22:11:12,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 109.0, 267.0, 520.0, 368.0, 1000.0, 1000.0, 964.0, 574.0, 137.0]
2025-09-12 22:11:12,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 51 seconds)
2025-09-12 22:25:45,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:25:45,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:28:36,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3108.76123 ± 1908.918
2025-09-12 22:28:36,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [404.54016, 711.81323, 5247.748, 1968.1135, 3514.242, 5159.417, 5256.682, 5293.515, 1499.6765, 2031.8644]
2025-09-12 22:28:36,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 145.0, 1000.0, 387.0, 670.0, 1000.0, 1000.0, 1000.0, 297.0, 367.0]
2025-09-12 22:28:36,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1251 [DEBUG]: Training session finished
