2025-09-11 19:32:59,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc25-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-09-11 19:32:59,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc25-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-09-11 19:32:59,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1492b0805550>}
2025-09-11 19:32:59,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-11 19:32:59,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1133 [INFO]: Creating new trainer
2025-09-11 19:32:59,226 baseline-mbpac-noiseperc25-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-11 19:32:59,226 baseline-mbpac-noiseperc25-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:32:59,237 baseline-mbpac-noiseperc25-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 19:33:01,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-11 19:33:01,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-11 19:45:39,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:45:39,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:45:59,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 318.54565 ± 90.630
2025-09-11 19:45:59,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [341.31592, 461.46704, 445.72372, 309.01407, 294.6632, 170.9892, 328.3766, 215.34816, 232.9064, 385.65198]
2025-09-11 19:45:59,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 98.0, 97.0, 69.0, 63.0, 33.0, 73.0, 46.0, 53.0, 70.0]
2025-09-11 19:45:59,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (318.55) for latency ExtremeClogL1U23
2025-09-11 19:45:59,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 23 minutes, 10 seconds)
2025-09-11 20:00:20,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:00:20,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:00:39,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 354.89636 ± 113.239
2025-09-11 20:00:39,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [521.2854, 111.93161, 302.4297, 286.98892, 518.1953, 352.77213, 322.29065, 415.24985, 395.92636, 321.89374]
2025-09-11 20:00:39,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 22.0, 59.0, 54.0, 99.0, 69.0, 61.0, 86.0, 76.0, 64.0]
2025-09-11 20:00:39,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (354.90) for latency ExtremeClogL1U23
2025-09-11 20:00:39,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 22 hours, 34 minutes, 23 seconds)
2025-09-11 20:15:02,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:15:02,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:15:21,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 360.04233 ± 61.692
2025-09-11 20:15:21,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [326.06158, 409.82407, 310.40045, 312.8174, 501.33606, 423.34637, 324.37872, 312.56, 366.4765, 313.22223]
2025-09-11 20:15:21,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 75.0, 57.0, 57.0, 96.0, 81.0, 59.0, 57.0, 68.0, 57.0]
2025-09-11 20:15:21,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (360.04) for latency ExtremeClogL1U23
2025-09-11 20:15:21,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 22 hours, 48 minutes, 56 seconds)
2025-09-11 20:29:37,005 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:29:37,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:29:53,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 303.50171 ± 37.571
2025-09-11 20:29:53,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [319.61066, 275.38156, 304.16125, 269.4157, 329.67612, 263.38168, 284.17834, 398.82245, 292.81555, 297.57382]
2025-09-11 20:29:53,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 53.0, 55.0, 58.0, 60.0, 49.0, 65.0, 74.0, 54.0, 57.0]
2025-09-11 20:29:53,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 22 hours, 44 minutes, 46 seconds)
2025-09-11 20:44:02,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:44:02,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:44:22,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 380.07263 ± 85.154
2025-09-11 20:44:22,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [436.05667, 388.0887, 380.54807, 369.4642, 251.31049, 326.27826, 319.73712, 593.8963, 373.11792, 362.22852]
2025-09-11 20:44:22,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 76.0, 78.0, 68.0, 46.0, 71.0, 69.0, 114.0, 69.0, 67.0]
2025-09-11 20:44:22,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (380.07) for latency ExtremeClogL1U23
2025-09-11 20:44:22,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 22 hours, 35 minutes, 47 seconds)
2025-09-11 20:58:36,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:58:36,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:58:54,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 349.72543 ± 69.211
2025-09-11 20:58:54,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [319.28558, 294.2097, 369.1168, 265.53766, 521.472, 349.9418, 343.39264, 412.0634, 307.53122, 314.70352]
2025-09-11 20:58:54,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 54.0, 73.0, 49.0, 98.0, 65.0, 65.0, 76.0, 58.0, 58.0]
2025-09-11 20:58:54,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 22 hours, 51 minutes, 6 seconds)
2025-09-11 21:13:10,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:13:10,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:13:28,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 322.35541 ± 119.124
2025-09-11 21:13:28,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [356.425, 398.01102, 458.5127, 422.44128, 108.98698, 399.85214, 89.545456, 323.95303, 317.0053, 348.8214]
2025-09-11 21:13:28,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 75.0, 85.0, 79.0, 22.0, 83.0, 18.0, 61.0, 59.0, 74.0]
2025-09-11 21:13:28,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 22 hours, 34 minutes, 21 seconds)
2025-09-11 21:27:42,967 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:27:42,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:28:07,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 427.25928 ± 72.809
2025-09-11 21:28:07,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [292.16385, 482.54367, 467.7028, 517.7434, 474.25394, 395.84818, 333.8641, 380.60312, 514.4157, 413.4536]
2025-09-11 21:28:07,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 88.0, 94.0, 110.0, 88.0, 73.0, 74.0, 80.0, 98.0, 87.0]
2025-09-11 21:28:07,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (427.26) for latency ExtremeClogL1U23
2025-09-11 21:28:07,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 22 hours, 18 minutes, 45 seconds)
2025-09-11 21:42:29,247 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:42:29,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:42:53,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 444.05542 ± 173.339
2025-09-11 21:42:53,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [352.53848, 107.82088, 292.194, 443.8193, 638.5875, 488.67456, 327.43472, 682.12177, 449.2095, 658.1535]
2025-09-11 21:42:53,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 21.0, 54.0, 96.0, 122.0, 93.0, 60.0, 130.0, 96.0, 125.0]
2025-09-11 21:42:53,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (444.06) for latency ExtremeClogL1U23
2025-09-11 21:42:53,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 22 hours, 8 minutes, 34 seconds)
2025-09-11 21:56:54,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:56:54,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:57:14,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 372.43890 ± 161.989
2025-09-11 21:57:14,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [444.53384, 429.44864, 652.3725, 352.5548, 367.72906, 101.18054, 101.4966, 338.44724, 402.55664, 534.0693]
2025-09-11 21:57:14,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 81.0, 125.0, 66.0, 73.0, 20.0, 20.0, 69.0, 76.0, 101.0]
2025-09-11 21:57:14,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 21 hours, 51 minutes, 29 seconds)
2025-09-11 22:11:30,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:11:30,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:11:49,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 351.29645 ± 134.295
2025-09-11 22:11:49,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [381.0371, 373.20087, 412.82092, 311.62915, 429.9392, 459.36606, 534.2038, 398.28085, 105.93274, 106.55377]
2025-09-11 22:11:49,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 70.0, 85.0, 57.0, 80.0, 86.0, 113.0, 75.0, 21.0, 21.0]
2025-09-11 22:11:49,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 21 hours, 37 minutes, 42 seconds)
2025-09-11 22:25:58,097 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:25:58,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:26:19,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 401.27216 ± 125.614
2025-09-11 22:26:19,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [124.96046, 367.80695, 369.28796, 350.56857, 422.32626, 384.66107, 395.7815, 658.65564, 484.59863, 454.0745]
2025-09-11 22:26:19,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 68.0, 67.0, 71.0, 80.0, 74.0, 78.0, 137.0, 90.0, 84.0]
2025-09-11 22:26:19,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 21 hours, 22 minutes, 9 seconds)
2025-09-11 22:40:39,032 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:40:39,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:40:58,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 327.95020 ± 130.623
2025-09-11 22:40:58,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [353.0826, 613.4448, 387.18097, 382.6486, 284.9902, 353.17484, 83.79388, 187.89229, 319.74472, 313.54913]
2025-09-11 22:40:58,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 118.0, 84.0, 84.0, 63.0, 72.0, 17.0, 37.0, 71.0, 60.0]
2025-09-11 22:40:58,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 21 hours, 7 minutes, 38 seconds)
2025-09-11 22:55:08,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:55:08,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:55:30,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 408.83093 ± 53.512
2025-09-11 22:55:30,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [400.89197, 306.547, 447.3303, 392.6887, 379.76068, 497.35028, 348.47733, 451.70898, 407.767, 455.78693]
2025-09-11 22:55:30,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 56.0, 87.0, 72.0, 71.0, 95.0, 65.0, 83.0, 76.0, 86.0]
2025-09-11 22:55:30,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 20 hours, 49 minutes, 1 second)
2025-09-11 23:09:47,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:09:47,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:10:06,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 360.41156 ± 167.664
2025-09-11 23:10:06,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [417.8992, 478.11954, 452.32553, 586.37, 426.9825, 337.84842, 131.77562, 117.89839, 119.786354, 535.11]
2025-09-11 23:10:06,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 94.0, 89.0, 113.0, 79.0, 63.0, 26.0, 23.0, 23.0, 100.0]
2025-09-11 23:10:06,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 20 hours, 38 minutes, 44 seconds)
2025-09-11 23:24:14,899 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:24:14,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:24:36,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 411.43390 ± 188.213
2025-09-11 23:24:36,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [399.21188, 457.32318, 398.7909, 449.15985, 624.391, 746.3976, 453.65524, 378.30936, 89.662186, 117.437645]
2025-09-11 23:24:36,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 92.0, 75.0, 84.0, 116.0, 143.0, 99.0, 69.0, 18.0, 23.0]
2025-09-11 23:24:36,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 20 hours, 22 minutes, 57 seconds)
2025-09-11 23:38:45,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:38:45,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:39:09,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 477.25308 ± 157.636
2025-09-11 23:39:09,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [535.9641, 574.9063, 385.42422, 667.8114, 89.15291, 526.30145, 592.8639, 528.22705, 531.748, 340.1315]
2025-09-11 23:39:09,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 107.0, 73.0, 125.0, 18.0, 99.0, 114.0, 98.0, 100.0, 62.0]
2025-09-11 23:39:09,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (477.25) for latency ExtremeClogL1U23
2025-09-11 23:39:09,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 9 minutes, 4 seconds)
2025-09-11 23:53:22,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:53:22,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:53:46,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 437.91733 ± 218.057
2025-09-11 23:53:46,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [520.24585, 758.14514, 479.16144, 466.1886, 360.71677, 345.9758, 84.06575, 95.95081, 756.6738, 512.0494]
2025-09-11 23:53:46,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 146.0, 102.0, 102.0, 78.0, 76.0, 17.0, 19.0, 147.0, 100.0]
2025-09-11 23:53:46,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 19 hours, 54 minutes, 4 seconds)
2025-09-12 00:07:53,769 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:07:53,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:08:18,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 478.88495 ± 83.310
2025-09-12 00:08:18,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [399.81323, 410.90036, 529.2864, 520.8956, 506.56567, 420.56815, 686.96173, 469.7235, 432.63776, 411.49722]
2025-09-12 00:08:18,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 77.0, 110.0, 98.0, 94.0, 82.0, 135.0, 88.0, 81.0, 76.0]
2025-09-12 00:08:18,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (478.88) for latency ExtremeClogL1U23
2025-09-12 00:08:18,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 19 hours, 39 minutes, 29 seconds)
2025-09-12 00:22:29,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:22:29,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:23:01,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 588.65405 ± 189.913
2025-09-12 00:23:01,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [921.65564, 769.4074, 375.85977, 459.4231, 517.5034, 558.22375, 366.61862, 782.1373, 392.44852, 743.2625]
2025-09-12 00:23:01,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 148.0, 84.0, 85.0, 98.0, 105.0, 69.0, 167.0, 76.0, 144.0]
2025-09-12 00:23:01,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (588.65) for latency ExtremeClogL1U23
2025-09-12 00:23:01,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 19 hours, 26 minutes, 38 seconds)
2025-09-12 00:37:20,099 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:37:20,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:37:44,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 450.57822 ± 166.618
2025-09-12 00:37:44,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [117.60667, 548.97314, 499.4106, 360.1165, 399.96326, 753.38495, 410.6907, 622.9689, 309.59525, 483.07236]
2025-09-12 00:37:44,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 116.0, 102.0, 67.0, 86.0, 138.0, 83.0, 119.0, 57.0, 90.0]
2025-09-12 00:37:44,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 19 hours, 15 minutes, 25 seconds)
2025-09-12 00:51:53,013 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:51:53,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:52:15,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 410.09781 ± 176.031
2025-09-12 00:52:15,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [112.79943, 365.41605, 513.7559, 497.6768, 396.09756, 417.8249, 774.0213, 428.60297, 151.4015, 443.38147]
2025-09-12 00:52:15,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 79.0, 108.0, 94.0, 85.0, 91.0, 147.0, 81.0, 30.0, 83.0]
2025-09-12 00:52:15,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 19 seconds)
2025-09-12 01:06:45,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:06:45,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:07:09,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 419.63403 ± 185.954
2025-09-12 01:07:09,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.59179, 518.96454, 136.3034, 368.1474, 528.1199, 448.92438, 381.82745, 725.59015, 379.1978, 614.6735]
2025-09-12 01:07:09,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 112.0, 26.0, 81.0, 99.0, 97.0, 70.0, 138.0, 76.0, 114.0]
2025-09-12 01:07:09,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 18 hours, 49 minutes, 55 seconds)
2025-09-12 01:22:01,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:22:01,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:22:32,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 549.36713 ± 134.672
2025-09-12 01:22:32,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [909.1172, 501.7625, 529.0234, 623.47217, 435.3793, 503.40176, 444.14233, 478.04623, 612.3733, 456.95355]
2025-09-12 01:22:32,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [189.0, 94.0, 98.0, 117.0, 95.0, 102.0, 81.0, 90.0, 116.0, 94.0]
2025-09-12 01:22:32,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 18 hours, 48 minutes, 21 seconds)
2025-09-12 01:37:21,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:37:21,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:37:45,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 431.25537 ± 128.877
2025-09-12 01:37:45,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [505.68103, 477.72403, 412.33075, 456.33737, 84.09124, 463.55487, 409.89676, 445.22202, 618.68, 439.03592]
2025-09-12 01:37:45,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 89.0, 75.0, 85.0, 17.0, 85.0, 76.0, 82.0, 128.0, 81.0]
2025-09-12 01:37:45,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 18 hours, 41 minutes)
2025-09-12 01:52:39,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:52:39,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:53:09,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 551.96350 ± 113.001
2025-09-12 01:53:09,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [471.951, 531.55554, 579.6186, 478.78415, 743.191, 440.5561, 751.9244, 612.58344, 494.79242, 414.6786]
2025-09-12 01:53:09,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 99.0, 108.0, 88.0, 157.0, 82.0, 159.0, 115.0, 95.0, 77.0]
2025-09-12 01:53:09,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 18 hours, 36 minutes, 14 seconds)
2025-09-12 02:07:46,722 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:07:46,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:08:16,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 540.22571 ± 94.900
2025-09-12 02:08:16,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [565.1098, 403.8763, 750.86786, 502.3476, 641.63, 581.87134, 506.71747, 452.37247, 479.45245, 518.01117]
2025-09-12 02:08:16,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 87.0, 145.0, 91.0, 137.0, 108.0, 94.0, 83.0, 88.0, 97.0]
2025-09-12 02:08:16,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 29 minutes, 44 seconds)
2025-09-12 02:22:58,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:22:58,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:23:26,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 511.10822 ± 85.490
2025-09-12 02:23:26,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [557.6041, 520.2878, 509.96024, 569.8363, 698.36694, 458.6488, 461.17587, 465.56827, 516.5567, 353.07693]
2025-09-12 02:23:26,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 99.0, 96.0, 108.0, 139.0, 85.0, 85.0, 101.0, 96.0, 66.0]
2025-09-12 02:23:26,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 18 minutes, 37 seconds)
2025-09-12 02:38:12,747 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:38:12,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:38:37,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 440.01709 ± 163.024
2025-09-12 02:38:37,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [682.0984, 584.03406, 573.6838, 479.95187, 480.97665, 105.719215, 236.17418, 437.0582, 338.83466, 481.64005]
2025-09-12 02:38:37,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 109.0, 106.0, 89.0, 104.0, 21.0, 46.0, 89.0, 65.0, 89.0]
2025-09-12 02:38:37,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 16 seconds)
2025-09-12 02:53:18,938 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:53:18,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:53:49,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 534.36902 ± 86.001
2025-09-12 02:53:49,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [482.77267, 526.3907, 512.3888, 440.0627, 398.6179, 656.17816, 506.64487, 660.88885, 517.91754, 641.828]
2025-09-12 02:53:49,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 99.0, 96.0, 93.0, 85.0, 141.0, 94.0, 140.0, 98.0, 122.0]
2025-09-12 02:53:49,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 44 minutes, 52 seconds)
2025-09-12 03:08:34,694 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:08:34,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:09:08,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 588.29382 ± 177.173
2025-09-12 03:09:08,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [919.01434, 561.7519, 491.96106, 530.7663, 570.52997, 385.58182, 827.9467, 481.49994, 358.17062, 755.716]
2025-09-12 03:09:08,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 120.0, 91.0, 112.0, 106.0, 71.0, 158.0, 104.0, 65.0, 162.0]
2025-09-12 03:09:08,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 28 minutes, 29 seconds)
2025-09-12 03:23:47,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:23:47,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:24:13,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 486.58276 ± 96.608
2025-09-12 03:24:13,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [368.4161, 530.0788, 568.75354, 322.05478, 657.8532, 446.66974, 567.4464, 524.4478, 454.6243, 425.4829]
2025-09-12 03:24:13,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 103.0, 106.0, 60.0, 123.0, 81.0, 105.0, 98.0, 85.0, 80.0]
2025-09-12 03:24:13,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 13 minutes, 1 second)
2025-09-12 03:38:56,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:38:56,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:39:26,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 551.17297 ± 175.160
2025-09-12 03:39:26,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [479.91876, 454.50995, 631.58276, 566.975, 95.34878, 656.069, 598.8045, 649.556, 598.13257, 780.83246]
2025-09-12 03:39:26,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 84.0, 121.0, 105.0, 19.0, 123.0, 123.0, 120.0, 111.0, 147.0]
2025-09-12 03:39:26,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 16 hours, 58 minutes, 19 seconds)
2025-09-12 03:54:08,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:54:08,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:54:36,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 480.75723 ± 162.454
2025-09-12 03:54:36,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [455.98087, 752.7575, 613.668, 439.66342, 528.984, 486.40863, 551.8787, 299.91553, 127.333145, 550.98224]
2025-09-12 03:54:36,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 143.0, 119.0, 97.0, 97.0, 89.0, 118.0, 63.0, 25.0, 119.0]
2025-09-12 03:54:36,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 43 minutes, 1 second)
2025-09-12 04:09:22,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:09:22,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:09:51,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 518.85669 ± 191.238
2025-09-12 04:09:51,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [428.94763, 549.9399, 864.7776, 451.92627, 323.0757, 733.4879, 181.16043, 681.48425, 433.96082, 539.80597]
2025-09-12 04:09:51,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 115.0, 163.0, 84.0, 70.0, 139.0, 35.0, 131.0, 83.0, 100.0]
2025-09-12 04:09:51,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 28 minutes, 29 seconds)
2025-09-12 04:24:39,772 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:24:39,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:25:12,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 570.99957 ± 264.663
2025-09-12 04:25:12,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [799.8625, 485.8456, 408.4151, 489.2157, 563.0626, 368.5111, 1028.0139, 513.3629, 945.52216, 108.18431]
2025-09-12 04:25:12,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 106.0, 91.0, 92.0, 123.0, 71.0, 190.0, 99.0, 180.0, 21.0]
2025-09-12 04:25:12,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 13 minutes, 42 seconds)
2025-09-12 04:39:48,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:39:48,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:40:20,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 544.93469 ± 125.564
2025-09-12 04:40:20,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [400.12488, 646.50037, 640.4452, 648.3637, 493.1486, 452.2688, 575.374, 779.2963, 432.77838, 381.0471]
2025-09-12 04:40:20,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 137.0, 118.0, 121.0, 107.0, 96.0, 107.0, 147.0, 95.0, 83.0]
2025-09-12 04:40:20,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 15 hours, 58 minutes, 59 seconds)
2025-09-12 04:55:03,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:55:03,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:55:37,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 603.89417 ± 208.821
2025-09-12 04:55:37,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [594.8881, 589.74713, 546.9876, 702.5926, 925.15875, 818.96967, 89.48737, 585.3415, 529.1275, 656.64136]
2025-09-12 04:55:37,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 112.0, 102.0, 136.0, 180.0, 151.0, 18.0, 110.0, 116.0, 123.0]
2025-09-12 04:55:37,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (603.89) for latency ExtremeClogL1U23
2025-09-12 04:55:37,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 44 minutes, 35 seconds)
2025-09-12 05:10:28,239 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:10:28,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:10:59,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 561.82233 ± 108.840
2025-09-12 05:10:59,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [592.355, 527.8341, 507.86923, 540.71234, 527.5544, 804.05994, 461.6765, 522.7108, 710.8744, 422.5767]
2025-09-12 05:10:59,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 112.0, 95.0, 116.0, 98.0, 153.0, 92.0, 95.0, 133.0, 79.0]
2025-09-12 05:10:59,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 31 minutes, 54 seconds)
2025-09-12 05:25:45,055 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:25:45,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:26:16,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 550.85529 ± 158.785
2025-09-12 05:26:16,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [411.88528, 618.8217, 570.13635, 563.0472, 131.0525, 609.93854, 581.5676, 642.9465, 685.2622, 693.895]
2025-09-12 05:26:16,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 114.0, 124.0, 105.0, 25.0, 113.0, 123.0, 140.0, 126.0, 135.0]
2025-09-12 05:26:16,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 17 minutes, 2 seconds)
2025-09-12 05:40:56,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:40:56,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:41:26,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 558.35474 ± 187.311
2025-09-12 05:41:26,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [502.40375, 526.7026, 700.317, 646.7431, 522.6442, 405.03183, 139.96323, 884.0526, 587.94507, 667.7437]
2025-09-12 05:41:26,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 96.0, 131.0, 121.0, 95.0, 74.0, 27.0, 169.0, 108.0, 127.0]
2025-09-12 05:41:26,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 59 minutes, 24 seconds)
2025-09-12 05:56:17,515 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:56:17,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:56:46,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 525.47644 ± 190.225
2025-09-12 05:56:46,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [550.12823, 894.4175, 625.58105, 557.42017, 493.69406, 564.70605, 584.6973, 515.35614, 95.71204, 373.05173]
2025-09-12 05:56:46,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 171.0, 116.0, 118.0, 92.0, 107.0, 124.0, 94.0, 19.0, 68.0]
2025-09-12 05:56:46,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 46 minutes, 41 seconds)
2025-09-12 06:11:27,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:11:27,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:11:56,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 539.07727 ± 171.991
2025-09-12 06:11:56,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [774.66376, 383.26956, 242.87383, 698.728, 656.9069, 771.1864, 426.8809, 535.7403, 508.02557, 392.49817]
2025-09-12 06:11:56,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 69.0, 51.0, 131.0, 121.0, 145.0, 95.0, 98.0, 93.0, 72.0]
2025-09-12 06:11:56,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 30 minutes, 9 seconds)
2025-09-12 06:26:45,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:26:45,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:27:15,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 550.03394 ± 126.035
2025-09-12 06:27:15,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [399.79877, 573.9795, 373.9139, 557.2838, 446.03107, 656.2755, 489.0746, 825.1314, 585.51025, 593.3404]
2025-09-12 06:27:15,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 106.0, 69.0, 102.0, 84.0, 122.0, 104.0, 155.0, 106.0, 108.0]
2025-09-12 06:27:15,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 14 minutes, 14 seconds)
2025-09-12 06:42:04,684 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:42:04,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:42:40,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 630.29456 ± 175.960
2025-09-12 06:42:40,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [422.1027, 984.69507, 552.60583, 694.4265, 687.6091, 480.0613, 593.7333, 541.87024, 887.26184, 458.58005]
2025-09-12 06:42:40,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 180.0, 119.0, 141.0, 147.0, 91.0, 111.0, 99.0, 168.0, 98.0]
2025-09-12 06:42:40,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (630.29) for latency ExtremeClogL1U23
2025-09-12 06:42:40,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 19 seconds)
2025-09-12 06:57:29,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:57:29,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:58:15,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 846.34875 ± 298.242
2025-09-12 06:58:15,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [727.128, 579.2076, 585.537, 1098.2073, 609.45746, 1576.5781, 754.4455, 986.9681, 921.48627, 624.4714]
2025-09-12 06:58:15,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 107.0, 107.0, 205.0, 111.0, 304.0, 160.0, 191.0, 176.0, 116.0]
2025-09-12 06:58:15,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (846.35) for latency ExtremeClogL1U23
2025-09-12 06:58:15,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 49 minutes, 44 seconds)
2025-09-12 07:12:53,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:12:53,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:13:25,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 571.79272 ± 189.518
2025-09-12 07:13:25,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [497.88574, 775.322, 723.5454, 461.05093, 527.0847, 143.79375, 533.86017, 557.385, 632.03796, 865.9613]
2025-09-12 07:13:25,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 141.0, 157.0, 101.0, 111.0, 28.0, 101.0, 101.0, 128.0, 155.0]
2025-09-12 07:13:25,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 32 minutes, 28 seconds)
2025-09-12 07:28:14,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:28:14,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:28:47,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 606.28571 ± 148.223
2025-09-12 07:28:47,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [766.91144, 786.8209, 631.2169, 752.60675, 604.7863, 327.08688, 604.00946, 700.5528, 440.06927, 448.7957]
2025-09-12 07:28:47,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 142.0, 118.0, 138.0, 118.0, 63.0, 113.0, 128.0, 84.0, 82.0]
2025-09-12 07:28:47,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 19 minutes, 12 seconds)
2025-09-12 07:43:37,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:43:37,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:44:16,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 711.61658 ± 95.911
2025-09-12 07:44:16,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [622.2853, 719.3601, 638.39966, 837.0843, 601.3569, 778.66724, 792.7165, 696.6395, 574.52466, 855.1321]
2025-09-12 07:44:16,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 137.0, 124.0, 155.0, 112.0, 149.0, 150.0, 125.0, 111.0, 165.0]
2025-09-12 07:44:16,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 5 minutes, 24 seconds)
2025-09-12 07:59:12,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:12,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:59:39,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 478.14478 ± 247.088
2025-09-12 07:59:39,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [863.15485, 106.15316, 619.1725, 321.7163, 458.61865, 832.091, 466.1297, 458.3721, 560.7068, 95.33236]
2025-09-12 07:59:39,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 21.0, 116.0, 72.0, 84.0, 153.0, 83.0, 85.0, 105.0, 19.0]
2025-09-12 07:59:39,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 49 minutes, 49 seconds)
2025-09-12 08:14:28,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:14:28,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:15:07,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 696.46912 ± 163.072
2025-09-12 08:15:07,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [448.89548, 673.2429, 778.8223, 749.3022, 497.4945, 1037.5348, 849.32275, 678.09644, 576.16724, 675.81256]
2025-09-12 08:15:07,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 127.0, 148.0, 133.0, 110.0, 200.0, 178.0, 125.0, 107.0, 124.0]
2025-09-12 08:15:07,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 33 minutes, 11 seconds)
2025-09-12 08:29:54,308 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:29:54,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:30:28,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 607.31744 ± 257.142
2025-09-12 08:30:28,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [526.8036, 942.68463, 517.2233, 865.6306, 1016.36707, 399.1218, 157.14612, 393.83228, 564.97687, 689.3876]
2025-09-12 08:30:28,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 190.0, 106.0, 159.0, 189.0, 87.0, 31.0, 74.0, 118.0, 129.0]
2025-09-12 08:30:28,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 19 minutes, 42 seconds)
2025-09-12 08:45:15,983 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:45:15,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:45:50,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 654.10657 ± 164.723
2025-09-12 08:45:50,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [717.7441, 434.34637, 669.7026, 727.70715, 710.21985, 945.3639, 482.70688, 370.33136, 750.9666, 731.97656]
2025-09-12 08:45:50,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 93.0, 121.0, 139.0, 131.0, 176.0, 88.0, 66.0, 135.0, 133.0]
2025-09-12 08:45:50,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 4 minutes, 17 seconds)
2025-09-12 09:00:40,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:00:40,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:01:21,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 732.92999 ± 185.390
2025-09-12 09:01:21,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [862.8331, 600.3688, 559.10815, 505.90814, 1166.5989, 750.716, 833.7912, 590.81226, 663.56885, 795.5945]
2025-09-12 09:01:21,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 111.0, 100.0, 92.0, 231.0, 153.0, 166.0, 108.0, 125.0, 147.0]
2025-09-12 09:01:21,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 49 minutes, 9 seconds)
2025-09-12 09:16:17,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:16:17,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:17:01,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 795.50397 ± 334.708
2025-09-12 09:17:01,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [689.48914, 825.95764, 659.3074, 813.245, 723.2021, 274.30725, 883.0668, 839.1421, 581.69916, 1665.6232]
2025-09-12 09:17:01,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 149.0, 126.0, 169.0, 135.0, 56.0, 162.0, 156.0, 108.0, 321.0]
2025-09-12 09:17:01,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 36 minutes, 21 seconds)
2025-09-12 09:31:42,920 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:31:42,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:32:24,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 733.23865 ± 268.692
2025-09-12 09:32:24,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [899.07153, 376.9883, 986.2428, 924.9568, 418.66956, 593.2229, 366.4244, 1086.9276, 658.567, 1021.31616]
2025-09-12 09:32:24,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 82.0, 187.0, 171.0, 89.0, 108.0, 80.0, 205.0, 138.0, 209.0]
2025-09-12 09:32:24,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 20 minutes, 9 seconds)
2025-09-12 09:47:23,468 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:47:23,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:48:03,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 727.71985 ± 281.679
2025-09-12 09:48:03,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [380.67938, 680.8541, 445.1891, 1113.0332, 267.75977, 994.0735, 748.7116, 709.6501, 1115.5322, 821.71533]
2025-09-12 09:48:03,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 143.0, 81.0, 233.0, 48.0, 191.0, 140.0, 134.0, 204.0, 156.0]
2025-09-12 09:48:03,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 7 minutes, 13 seconds)
2025-09-12 10:02:46,716 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:02:46,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:03:22,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 653.02667 ± 227.830
2025-09-12 10:03:22,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [107.22125, 615.7359, 709.88434, 692.7216, 902.57184, 678.8632, 504.60812, 683.0879, 624.3974, 1011.17456]
2025-09-12 10:03:22,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 131.0, 131.0, 128.0, 167.0, 126.0, 92.0, 126.0, 116.0, 184.0]
2025-09-12 10:03:22,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 51 minutes, 11 seconds)
2025-09-12 10:18:18,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:18:18,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:18:59,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 747.30084 ± 322.871
2025-09-12 10:18:59,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [472.09763, 573.6551, 800.4445, 118.56603, 987.9374, 771.5081, 1230.2201, 456.81616, 1003.3785, 1058.3848]
2025-09-12 10:18:59,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 104.0, 145.0, 23.0, 185.0, 160.0, 226.0, 84.0, 183.0, 201.0]
2025-09-12 10:18:59,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 36 minutes, 43 seconds)
2025-09-12 10:33:57,486 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:33:57,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:34:33,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 669.62000 ± 298.205
2025-09-12 10:34:33,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [842.79376, 482.53168, 128.08206, 729.5049, 576.9639, 449.486, 506.1345, 702.396, 1110.419, 1167.8888]
2025-09-12 10:34:33,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 88.0, 25.0, 133.0, 106.0, 98.0, 91.0, 131.0, 202.0, 216.0]
2025-09-12 10:34:33,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 20 minutes, 16 seconds)
2025-09-12 10:49:18,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:49:18,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:50:07,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 857.39972 ± 263.299
2025-09-12 10:50:07,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [784.22327, 1048.2941, 640.9817, 868.47644, 612.94257, 1100.9061, 1307.9723, 657.10736, 437.98758, 1115.1058]
2025-09-12 10:50:07,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 218.0, 119.0, 178.0, 130.0, 200.0, 247.0, 135.0, 97.0, 208.0]
2025-09-12 10:50:07,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (857.40) for latency ExtremeClogL1U23
2025-09-12 10:50:07,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 6 minutes, 8 seconds)
2025-09-12 11:04:51,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:04:51,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:05:29,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 704.97754 ± 456.035
2025-09-12 11:05:29,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [111.22176, 89.587746, 422.79648, 1635.1667, 601.6401, 1258.0641, 857.7946, 508.83035, 701.19727, 863.4767]
2025-09-12 11:05:29,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 80.0, 314.0, 115.0, 234.0, 158.0, 94.0, 131.0, 158.0]
2025-09-12 11:05:29,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 48 minutes, 27 seconds)
2025-09-12 11:20:27,314 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:20:27,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:21:17,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 905.16534 ± 430.337
2025-09-12 11:21:17,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [574.3118, 1232.906, 753.60724, 89.43947, 1291.5408, 915.0357, 571.3273, 684.3817, 1468.1786, 1470.925]
2025-09-12 11:21:17,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 233.0, 145.0, 18.0, 249.0, 189.0, 107.0, 130.0, 277.0, 273.0]
2025-09-12 11:21:17,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (905.17) for latency ExtremeClogL1U23
2025-09-12 11:21:17,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 36 minutes, 33 seconds)
2025-09-12 11:36:10,637 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:36:10,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:36:53,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 772.57178 ± 289.386
2025-09-12 11:36:53,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [641.1848, 799.1017, 525.14417, 619.8645, 516.8024, 489.79086, 1423.8002, 850.23175, 1164.3358, 695.4614]
2025-09-12 11:36:53,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 149.0, 96.0, 117.0, 95.0, 89.0, 264.0, 153.0, 227.0, 146.0]
2025-09-12 11:36:53,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 20 minutes, 46 seconds)
2025-09-12 11:51:46,533 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:51:46,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:52:34,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 880.05096 ± 397.149
2025-09-12 11:52:34,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [796.6985, 1244.6932, 986.46265, 939.687, 505.23346, 106.7275, 678.0391, 728.9335, 1587.2059, 1226.8285]
2025-09-12 11:52:34,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 228.0, 189.0, 175.0, 93.0, 21.0, 135.0, 136.0, 301.0, 227.0]
2025-09-12 11:52:34,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 6 minutes, 2 seconds)
2025-09-12 12:07:15,783 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:07:15,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:07:44,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 545.59369 ± 353.648
2025-09-12 12:07:44,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [838.7521, 592.1337, 460.17587, 1115.8405, 996.50415, 539.25464, 89.93993, 101.26963, 83.94853, 638.11755]
2025-09-12 12:07:44,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 107.0, 98.0, 201.0, 186.0, 101.0, 18.0, 20.0, 17.0, 115.0]
2025-09-12 12:07:44,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 47 minutes, 51 seconds)
2025-09-12 12:22:45,059 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:22:45,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:23:34,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 919.44855 ± 364.591
2025-09-12 12:23:34,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1534.408, 952.6848, 825.45483, 1173.4083, 938.37354, 1443.3416, 639.0353, 860.86993, 475.55176, 351.35782]
2025-09-12 12:23:34,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [278.0, 170.0, 156.0, 224.0, 167.0, 262.0, 134.0, 158.0, 103.0, 62.0]
2025-09-12 12:23:34,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (919.45) for latency ExtremeClogL1U23
2025-09-12 12:23:34,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 35 minutes, 20 seconds)
2025-09-12 12:38:29,136 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:38:29,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:39:06,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 674.07092 ± 345.299
2025-09-12 12:39:06,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [792.718, 116.58185, 338.79407, 1162.5302, 924.00085, 331.02472, 380.79752, 642.55365, 1055.7606, 995.94794]
2025-09-12 12:39:06,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 23.0, 59.0, 227.0, 172.0, 62.0, 82.0, 119.0, 201.0, 192.0]
2025-09-12 12:39:06,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 18 minutes, 3 seconds)
2025-09-12 12:54:15,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:54:15,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:54:59,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 812.44128 ± 304.682
2025-09-12 12:54:59,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [765.1059, 435.07397, 1297.307, 617.34845, 1055.2968, 475.73703, 437.04764, 1133.6421, 789.5612, 1118.2928]
2025-09-12 12:54:59,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 92.0, 239.0, 131.0, 196.0, 102.0, 81.0, 211.0, 145.0, 207.0]
2025-09-12 12:55:00,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 4 minutes, 18 seconds)
2025-09-12 13:09:33,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:09:33,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:10:17,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 777.18176 ± 305.548
2025-09-12 13:10:17,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [673.91864, 1098.2534, 617.1678, 1231.0967, 358.6054, 722.08, 500.82645, 1266.0037, 474.07858, 829.78674]
2025-09-12 13:10:17,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 209.0, 112.0, 252.0, 76.0, 137.0, 94.0, 236.0, 102.0, 161.0]
2025-09-12 13:10:17,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 46 minutes, 19 seconds)
2025-09-12 13:24:58,976 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:24:58,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:25:49,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 950.66467 ± 292.079
2025-09-12 13:25:49,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1380.2836, 909.66144, 472.20407, 729.2895, 841.1975, 1022.9248, 876.43823, 867.49396, 1543.6898, 863.46436]
2025-09-12 13:25:49,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 167.0, 100.0, 136.0, 155.0, 188.0, 162.0, 161.0, 281.0, 162.0]
2025-09-12 13:25:49,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (950.66) for latency ExtremeClogL1U23
2025-09-12 13:25:49,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 32 minutes, 50 seconds)
2025-09-12 13:40:51,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:40:51,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:41:41,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 918.79218 ± 435.675
2025-09-12 13:41:41,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1176.2096, 1616.2234, 936.9926, 1004.7563, 724.967, 451.628, 978.88916, 361.0045, 1585.9662, 351.2851]
2025-09-12 13:41:41,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [218.0, 298.0, 172.0, 194.0, 135.0, 85.0, 182.0, 78.0, 292.0, 72.0]
2025-09-12 13:41:41,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 17 minutes, 27 seconds)
2025-09-12 13:56:31,844 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:56:31,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:57:18,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 882.06409 ± 380.288
2025-09-12 13:57:18,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [557.4451, 822.54, 852.0728, 886.05005, 143.55028, 1571.1218, 728.6037, 818.9411, 1046.9539, 1393.362]
2025-09-12 13:57:18,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 153.0, 159.0, 168.0, 29.0, 296.0, 134.0, 153.0, 199.0, 258.0]
2025-09-12 13:57:18,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 2 minutes, 20 seconds)
2025-09-12 14:12:07,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:12:07,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:12:55,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 902.48914 ± 336.175
2025-09-12 14:12:55,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1273.5151, 1619.7367, 806.33234, 1009.20166, 483.5137, 682.75146, 541.14386, 647.79486, 866.48846, 1094.4128]
2025-09-12 14:12:55,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 307.0, 149.0, 188.0, 102.0, 127.0, 100.0, 128.0, 159.0, 201.0]
2025-09-12 14:12:55,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 45 minutes, 15 seconds)
2025-09-12 14:27:52,217 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:27:52,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:28:48,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 992.81818 ± 428.833
2025-09-12 14:28:48,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1945.9963, 817.7892, 565.2622, 1019.54083, 525.2211, 764.3168, 768.57385, 1040.8118, 872.70886, 1607.9613]
2025-09-12 14:28:48,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [370.0, 166.0, 105.0, 213.0, 94.0, 164.0, 143.0, 196.0, 184.0, 304.0]
2025-09-12 14:28:48,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (992.82) for latency ExtremeClogL1U23
2025-09-12 14:28:48,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 32 minutes, 36 seconds)
2025-09-12 14:43:32,445 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:43:32,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:44:16,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 774.51184 ± 364.681
2025-09-12 14:44:16,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [834.14044, 743.83746, 1163.9789, 494.65164, 806.3279, 101.79479, 1451.6534, 1052.646, 589.2912, 506.79697]
2025-09-12 14:44:16,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 154.0, 210.0, 99.0, 169.0, 20.0, 284.0, 194.0, 105.0, 106.0]
2025-09-12 14:44:16,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 16 minutes, 33 seconds)
2025-09-12 14:59:24,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:59:24,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:00:27,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1189.62659 ± 543.174
2025-09-12 15:00:27,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [728.3399, 1423.6775, 1920.0303, 1935.8379, 1287.3832, 955.9947, 881.52655, 1010.5871, 1652.721, 100.16772]
2025-09-12 15:00:27,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 262.0, 353.0, 359.0, 235.0, 172.0, 162.0, 192.0, 295.0, 20.0]
2025-09-12 15:00:27,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1189.63) for latency ExtremeClogL1U23
2025-09-12 15:00:27,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 2 minutes, 19 seconds)
2025-09-12 15:15:20,609 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:15:20,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:16:06,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 811.34753 ± 340.727
2025-09-12 15:16:06,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1171.8239, 442.10986, 642.1586, 1059.803, 991.05743, 1302.261, 897.57336, 805.11884, 706.0932, 95.476]
2025-09-12 15:16:06,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 83.0, 137.0, 221.0, 183.0, 245.0, 167.0, 151.0, 143.0, 19.0]
2025-09-12 15:16:06,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 46 minutes, 41 seconds)
2025-09-12 15:30:52,298 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:30:52,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:32:07,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1381.00159 ± 689.146
2025-09-12 15:32:07,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [338.9301, 1995.5908, 1883.3735, 544.7181, 907.47064, 1322.9083, 2479.8718, 2034.1805, 1579.6895, 723.2827]
2025-09-12 15:32:07,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 374.0, 361.0, 102.0, 169.0, 251.0, 490.0, 376.0, 292.0, 156.0]
2025-09-12 15:32:07,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1381.00) for latency ExtremeClogL1U23
2025-09-12 15:32:07,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 32 minutes, 37 seconds)
2025-09-12 15:47:06,429 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:47:06,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:47:52,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 868.22150 ± 508.685
2025-09-12 15:47:52,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1133.5713, 1202.6754, 1798.6562, 1037.5063, 1117.7032, 318.9614, 911.5274, 935.41534, 102.22985, 123.9692]
2025-09-12 15:47:52,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 215.0, 328.0, 191.0, 203.0, 71.0, 164.0, 180.0, 20.0, 24.0]
2025-09-12 15:47:52,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 16 minutes, 14 seconds)
2025-09-12 16:02:31,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:02:31,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:03:41,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1263.38159 ± 627.059
2025-09-12 16:03:41,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1327.7642, 2451.838, 1643.085, 577.8295, 129.70515, 1637.659, 1200.9359, 698.4543, 1315.5754, 1650.9702]
2025-09-12 16:03:41,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 465.0, 324.0, 117.0, 26.0, 299.0, 228.0, 130.0, 261.0, 310.0]
2025-09-12 16:03:41,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 1 minute, 47 seconds)
2025-09-12 16:18:53,097 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:18:53,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:20:03,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1278.77075 ± 761.985
2025-09-12 16:20:03,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [2207.81, 1189.9856, 1445.8525, 1247.2101, 1163.1034, 587.8495, 502.86523, 1476.8304, 2841.554, 124.647644]
2025-09-12 16:20:03,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [407.0, 222.0, 276.0, 234.0, 213.0, 127.0, 106.0, 267.0, 520.0, 24.0]
2025-09-12 16:20:03,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 46 minutes, 32 seconds)
2025-09-12 16:34:45,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:34:45,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:36:01,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1380.91492 ± 574.204
2025-09-12 16:36:01,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1339.9084, 2560.681, 1590.3608, 1687.9205, 544.2658, 980.5806, 741.2718, 1039.8573, 1329.3229, 1994.9808]
2025-09-12 16:36:01,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [257.0, 487.0, 292.0, 316.0, 111.0, 179.0, 149.0, 194.0, 245.0, 373.0]
2025-09-12 16:36:01,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 31 minutes, 43 seconds)
2025-09-12 16:50:52,928 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:50:52,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:51:54,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1136.90784 ± 309.856
2025-09-12 16:51:54,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [556.017, 1376.311, 1360.9175, 1156.4427, 899.0222, 1546.7234, 712.5528, 1339.1278, 1381.7388, 1040.2249]
2025-09-12 16:51:54,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 251.0, 255.0, 216.0, 178.0, 289.0, 130.0, 246.0, 265.0, 197.0]
2025-09-12 16:51:54,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 15 minutes, 17 seconds)
2025-09-12 17:06:44,000 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:06:44,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:08:01,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1402.56226 ± 495.160
2025-09-12 17:08:01,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1672.498, 1774.864, 1357.2141, 315.22836, 1827.3115, 2090.0786, 1164.931, 1572.4766, 839.0974, 1411.9216]
2025-09-12 17:08:01,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [313.0, 331.0, 250.0, 66.0, 342.0, 388.0, 224.0, 295.0, 178.0, 272.0]
2025-09-12 17:08:01,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1402.56) for latency ExtremeClogL1U23
2025-09-12 17:08:01,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 28 seconds)
2025-09-12 17:23:07,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:23:07,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:23:54,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 858.56201 ± 751.425
2025-09-12 17:23:54,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [461.93457, 2182.0835, 1692.077, 847.8161, 116.511566, 84.01541, 132.13321, 1384.7897, 1579.2834, 104.97588]
2025-09-12 17:23:54,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 414.0, 319.0, 165.0, 23.0, 17.0, 27.0, 251.0, 297.0, 21.0]
2025-09-12 17:23:54,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 44 minutes, 36 seconds)
2025-09-12 17:38:53,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:38:53,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:39:46,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 954.41125 ± 538.987
2025-09-12 17:39:46,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [582.4864, 1081.7491, 1734.9861, 911.58563, 1221.2499, 556.91425, 1954.4358, 120.45253, 883.72565, 496.526]
2025-09-12 17:39:46,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 206.0, 330.0, 175.0, 232.0, 102.0, 373.0, 24.0, 165.0, 108.0]
2025-09-12 17:39:46,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 27 minutes, 17 seconds)
2025-09-12 17:54:33,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:54:33,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:55:18,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 813.79639 ± 823.676
2025-09-12 17:55:18,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [2331.6533, 2423.267, 78.83699, 487.77756, 626.75934, 919.11194, 95.16285, 471.67773, 613.9147, 89.80202]
2025-09-12 17:55:18,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [438.0, 463.0, 16.0, 87.0, 115.0, 168.0, 19.0, 87.0, 115.0, 18.0]
2025-09-12 17:55:18,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 10 minutes, 15 seconds)
2025-09-12 18:10:18,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:10:18,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:11:18,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1112.87964 ± 529.554
2025-09-12 18:11:18,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [104.52809, 1157.795, 1331.2094, 863.3821, 532.4695, 756.67456, 1220.7526, 1746.2091, 1494.5918, 1921.1858]
2025-09-12 18:11:18,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 216.0, 245.0, 152.0, 106.0, 142.0, 222.0, 334.0, 267.0, 356.0]
2025-09-12 18:11:18,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 54 minutes, 40 seconds)
2025-09-12 18:25:59,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:25:59,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:27:14,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1353.53564 ± 1020.705
2025-09-12 18:27:14,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1315.7397, 4187.2954, 131.41428, 1403.6921, 1363.7465, 979.4021, 1518.8431, 730.99866, 1034.0493, 870.17487]
2025-09-12 18:27:14,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [249.0, 799.0, 26.0, 261.0, 256.0, 175.0, 288.0, 132.0, 195.0, 164.0]
2025-09-12 18:27:14,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 38 minutes, 24 seconds)
2025-09-12 18:42:06,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:42:06,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:43:05,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1062.84778 ± 400.131
2025-09-12 18:43:05,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1747.3309, 662.8682, 1624.8273, 1020.2434, 656.3375, 633.48944, 1356.0084, 1279.1069, 995.35077, 652.9153]
2025-09-12 18:43:05,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [333.0, 121.0, 315.0, 194.0, 137.0, 130.0, 262.0, 243.0, 181.0, 135.0]
2025-09-12 18:43:05,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 22 minutes, 31 seconds)
2025-09-12 18:57:58,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:57:58,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:58:43,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 820.38342 ± 302.737
2025-09-12 18:58:43,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [534.3454, 1225.8052, 780.0105, 301.04404, 939.5122, 748.49255, 1016.2438, 1345.2926, 734.09875, 578.98956]
2025-09-12 18:58:43,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 228.0, 162.0, 59.0, 189.0, 148.0, 189.0, 249.0, 148.0, 113.0]
2025-09-12 18:58:44,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 6 minutes, 19 seconds)
2025-09-12 19:13:46,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:13:46,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:14:49,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1136.24194 ± 587.187
2025-09-12 19:14:49,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1554.4875, 1083.6835, 2119.063, 1338.986, 970.12646, 402.76068, 106.85218, 1881.3702, 1039.765, 865.3253]
2025-09-12 19:14:49,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [308.0, 207.0, 413.0, 254.0, 180.0, 74.0, 21.0, 351.0, 197.0, 162.0]
2025-09-12 19:14:49,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 51 minutes, 19 seconds)
2025-09-12 19:29:44,804 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:29:44,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:31:05,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1465.25781 ± 709.199
2025-09-12 19:31:05,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [620.90027, 1512.814, 507.836, 1313.978, 1516.0146, 1047.6351, 2276.5178, 2407.4573, 2595.7437, 853.68176]
2025-09-12 19:31:05,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 281.0, 100.0, 247.0, 287.0, 215.0, 431.0, 447.0, 487.0, 160.0]
2025-09-12 19:31:05,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1465.26) for latency ExtremeClogL1U23
2025-09-12 19:31:05,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 35 minutes, 44 seconds)
2025-09-12 19:46:11,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:46:11,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:47:10,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1068.14185 ± 613.190
2025-09-12 19:47:10,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.59635, 461.02484, 280.23904, 2115.8813, 903.16785, 1611.9773, 1236.2474, 1068.1587, 1287.6749, 1628.4502]
2025-09-12 19:47:10,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 81.0, 54.0, 399.0, 166.0, 317.0, 245.0, 202.0, 241.0, 317.0]
2025-09-12 19:47:10,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 19 minutes, 56 seconds)
2025-09-12 20:01:57,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:01:57,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:03:27,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1597.84851 ± 1143.371
2025-09-12 20:03:27,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [561.19183, 875.0172, 4671.828, 2119.0645, 1312.8853, 742.3251, 1643.5729, 964.17145, 2060.1162, 1028.3124]
2025-09-12 20:03:27,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 165.0, 885.0, 405.0, 271.0, 155.0, 320.0, 189.0, 405.0, 201.0]
2025-09-12 20:03:27,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1597.85) for latency ExtremeClogL1U23
2025-09-12 20:03:27,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 4 minutes, 17 seconds)
2025-09-12 20:18:12,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:18:12,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:19:09,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1042.04639 ± 653.803
2025-09-12 20:19:09,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1372.3755, 484.48254, 2307.6992, 662.6344, 1051.1501, 2153.2, 684.2521, 371.7392, 581.11725, 751.8137]
2025-09-12 20:19:09,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [258.0, 87.0, 420.0, 123.0, 207.0, 398.0, 145.0, 71.0, 108.0, 150.0]
2025-09-12 20:19:09,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 48 minutes, 15 seconds)
2025-09-12 20:33:57,321 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:33:57,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:35:04,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1214.90784 ± 774.176
2025-09-12 20:35:04,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [534.6229, 88.95938, 1903.8833, 1206.8901, 373.82657, 1394.0099, 1832.5906, 2714.1763, 622.85364, 1477.2653]
2025-09-12 20:35:04,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 18.0, 379.0, 229.0, 69.0, 285.0, 342.0, 513.0, 126.0, 275.0]
2025-09-12 20:35:04,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 32 minutes, 5 seconds)
2025-09-12 20:50:32,357 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:50:32,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:51:58,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1494.08435 ± 674.319
2025-09-12 20:51:58,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1236.5859, 972.58594, 1003.4137, 2938.6304, 1730.8827, 2273.036, 612.25867, 1072.7672, 1908.8368, 1191.8472]
2025-09-12 20:51:58,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [247.0, 182.0, 197.0, 571.0, 335.0, 442.0, 124.0, 207.0, 394.0, 245.0]
2025-09-12 20:51:58,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 10 seconds)
2025-09-12 21:06:28,612 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:06:28,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:07:53,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1586.23853 ± 748.182
2025-09-12 21:07:53,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [3329.3933, 1337.0847, 1647.9603, 1693.1759, 1545.2932, 733.5747, 720.11725, 909.2794, 2323.355, 1623.1517]
2025-09-12 21:07:53,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [618.0, 251.0, 312.0, 304.0, 282.0, 134.0, 130.0, 169.0, 430.0, 309.0]
2025-09-12 21:07:53,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1251 [DEBUG]: Training session finished
