2025-09-11 19:20:11,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc0-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-09-11 19:20:11,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc0-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-09-11 19:20:11,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1516d362f850>}
2025-09-11 19:20:11,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1111 [DEBUG]: using device: cuda
2025-09-11 19:20:11,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1133 [INFO]: Creating new trainer
2025-09-11 19:20:11,176 baseline-mbpac-noiseperc0-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-11 19:20:11,176 baseline-mbpac-noiseperc0-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:20:11,187 baseline-mbpac-noiseperc0-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 19:20:12,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1194 [DEBUG]: Starting training session...
2025-09-11 19:20:12,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 1/100
2025-09-11 19:33:02,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:33:02,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:33:11,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 164.85529 ± 76.341
2025-09-11 19:33:11,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [91.581665, 193.79398, 275.16046, 111.4903, 110.532295, 103.95922, 326.28632, 106.2272, 143.53297, 185.98854]
2025-09-11 19:33:11,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 38.0, 55.0, 21.0, 21.0, 20.0, 62.0, 21.0, 29.0, 37.0]
2025-09-11 19:33:11,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (164.86) for latency ExtremeClogL1U23
2025-09-11 19:33:11,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 25 minutes, 30 seconds)
2025-09-11 19:47:55,168 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:47:55,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:48:18,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 408.85455 ± 166.968
2025-09-11 19:48:18,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [292.04733, 326.94894, 844.7057, 354.08197, 364.1013, 231.77878, 570.0186, 396.4384, 352.64563, 355.779]
2025-09-11 19:48:18,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 60.0, 162.0, 67.0, 74.0, 49.0, 105.0, 76.0, 70.0, 75.0]
2025-09-11 19:48:18,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (408.85) for latency ExtremeClogL1U23
2025-09-11 19:48:18,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 22 hours, 56 minutes, 34 seconds)
2025-09-11 20:03:00,581 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:03:00,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:03:19,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 360.14911 ± 33.453
2025-09-11 20:03:19,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [375.157, 431.2729, 331.8845, 308.38806, 341.75726, 397.2609, 372.01828, 351.5204, 353.33942, 338.89236]
2025-09-11 20:03:19,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 79.0, 64.0, 57.0, 63.0, 74.0, 68.0, 65.0, 65.0, 62.0]
2025-09-11 20:03:19,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 23 hours, 14 minutes, 8 seconds)
2025-09-11 20:18:04,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:18:04,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:18:27,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 404.15863 ± 51.839
2025-09-11 20:18:27,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [415.00677, 419.09534, 337.62457, 343.6311, 520.74536, 427.51907, 396.48505, 368.6287, 367.87448, 444.9762]
2025-09-11 20:18:27,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 90.0, 70.0, 70.0, 98.0, 92.0, 75.0, 75.0, 74.0, 95.0]
2025-09-11 20:18:27,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 23 hours, 18 minutes, 6 seconds)
2025-09-11 20:33:09,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:33:09,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:33:31,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 356.34885 ± 74.008
2025-09-11 20:33:31,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [287.31757, 298.74966, 475.01764, 317.37842, 298.58243, 425.72632, 474.24542, 266.86246, 337.3729, 382.23584]
2025-09-11 20:33:31,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 67.0, 103.0, 70.0, 63.0, 88.0, 105.0, 61.0, 73.0, 86.0]
2025-09-11 20:33:31,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 23 hours, 12 minutes, 57 seconds)
2025-09-11 20:47:57,237 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:47:57,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:48:24,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 489.23633 ± 69.377
2025-09-11 20:48:24,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [441.10947, 486.1621, 542.47644, 441.68484, 354.0755, 543.4452, 435.64066, 504.48013, 534.1195, 609.1697]
2025-09-11 20:48:24,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 93.0, 114.0, 91.0, 64.0, 101.0, 92.0, 95.0, 101.0, 116.0]
2025-09-11 20:48:24,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (489.24) for latency ExtremeClogL1U23
2025-09-11 20:48:24,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 23 hours, 33 minutes, 56 seconds)
2025-09-11 21:02:56,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:02:56,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:03:23,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 475.08511 ± 59.107
2025-09-11 21:03:23,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [484.3091, 442.75214, 569.68854, 368.42007, 462.93582, 503.88974, 489.39285, 527.0794, 386.18793, 516.19543]
2025-09-11 21:03:23,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 84.0, 111.0, 69.0, 88.0, 110.0, 106.0, 114.0, 87.0, 103.0]
2025-09-11 21:03:23,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 23 hours, 16 minutes, 45 seconds)
2025-09-11 21:17:44,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:17:44,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:18:08,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 453.91119 ± 53.600
2025-09-11 21:18:08,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [472.76324, 542.5647, 421.56064, 417.1022, 434.9017, 479.95642, 516.96075, 473.11343, 437.97272, 342.21585]
2025-09-11 21:18:08,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 103.0, 77.0, 76.0, 80.0, 94.0, 95.0, 88.0, 84.0, 63.0]
2025-09-11 21:18:08,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 22 hours, 56 minutes, 41 seconds)
2025-09-11 21:32:26,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:32:26,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:32:49,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 429.62323 ± 67.253
2025-09-11 21:32:49,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [321.3572, 417.7329, 428.75784, 364.9579, 385.56348, 394.99377, 560.0166, 517.2892, 440.88174, 464.68167]
2025-09-11 21:32:49,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 77.0, 91.0, 68.0, 70.0, 72.0, 106.0, 95.0, 81.0, 87.0]
2025-09-11 21:32:49,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 22 hours, 33 minutes, 18 seconds)
2025-09-11 21:47:07,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:47:07,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:47:39,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 576.35327 ± 188.433
2025-09-11 21:47:39,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [589.26794, 392.6562, 722.5457, 872.83374, 402.0619, 786.9413, 394.9837, 370.67484, 787.32745, 444.2396]
2025-09-11 21:47:39,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 85.0, 141.0, 164.0, 87.0, 152.0, 86.0, 68.0, 155.0, 84.0]
2025-09-11 21:47:39,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (576.35) for latency ExtremeClogL1U23
2025-09-11 21:47:39,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 22 hours, 14 minutes, 23 seconds)
2025-09-11 22:02:02,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:02:02,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:02:28,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 470.34424 ± 69.429
2025-09-11 22:02:28,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [478.2785, 553.2627, 317.8084, 383.075, 540.9241, 461.41803, 522.60443, 518.7582, 454.80957, 472.50354]
2025-09-11 22:02:28,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 103.0, 61.0, 74.0, 106.0, 91.0, 104.0, 101.0, 98.0, 95.0]
2025-09-11 22:02:28,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 21 hours, 58 minutes, 24 seconds)
2025-09-11 22:16:58,280 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:16:58,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:17:31,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 623.20483 ± 198.878
2025-09-11 22:17:31,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [468.9454, 773.1335, 574.0405, 557.99695, 558.82916, 557.7506, 1169.0073, 462.96484, 530.3259, 579.0544]
2025-09-11 22:17:31,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 160.0, 110.0, 106.0, 103.0, 103.0, 225.0, 96.0, 99.0, 106.0]
2025-09-11 22:17:31,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (623.20) for latency ExtremeClogL1U23
2025-09-11 22:17:31,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 21 hours, 44 minutes, 38 seconds)
2025-09-11 22:31:45,125 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:31:45,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:32:13,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 531.28503 ± 132.471
2025-09-11 22:32:13,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [565.5086, 291.6259, 829.3397, 553.93414, 522.3374, 579.3347, 418.15506, 521.8748, 438.73483, 592.0047]
2025-09-11 22:32:13,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 54.0, 176.0, 106.0, 100.0, 111.0, 80.0, 99.0, 82.0, 113.0]
2025-09-11 22:32:13,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 21 hours, 29 minutes, 2 seconds)
2025-09-11 22:46:30,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:46:30,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:47:03,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 601.78259 ± 113.669
2025-09-11 22:47:03,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [589.32007, 551.125, 506.43066, 827.8837, 774.84247, 418.63602, 622.37256, 589.1963, 573.9144, 564.1047]
2025-09-11 22:47:03,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 109.0, 106.0, 171.0, 151.0, 90.0, 120.0, 128.0, 122.0, 123.0]
2025-09-11 22:47:03,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 21 hours, 17 minutes, 2 seconds)
2025-09-11 23:01:25,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:01:25,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:02:02,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 695.52216 ± 109.153
2025-09-11 23:02:02,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [679.14, 844.9062, 629.9807, 666.6137, 837.71857, 536.1198, 711.983, 852.9845, 568.9965, 626.7788]
2025-09-11 23:02:02,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 162.0, 118.0, 130.0, 162.0, 100.0, 133.0, 162.0, 122.0, 129.0]
2025-09-11 23:02:02,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (695.52) for latency ExtremeClogL1U23
2025-09-11 23:02:02,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 21 hours, 4 minutes, 42 seconds)
2025-09-11 23:16:15,895 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:16:15,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:16:53,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 677.12305 ± 241.776
2025-09-11 23:16:53,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [684.1288, 563.4425, 576.047, 530.6202, 1353.6678, 569.0932, 653.528, 669.2188, 748.84174, 422.6424]
2025-09-11 23:16:53,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 119.0, 107.0, 104.0, 266.0, 107.0, 126.0, 142.0, 147.0, 88.0]
2025-09-11 23:16:53,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 20 hours, 50 minutes, 16 seconds)
2025-09-11 23:31:13,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:31:13,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:31:53,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 688.34412 ± 126.813
2025-09-11 23:31:53,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [863.18787, 447.108, 818.059, 752.90234, 545.1729, 736.96436, 718.7051, 658.38434, 557.88824, 785.06915]
2025-09-11 23:31:53,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 94.0, 173.0, 147.0, 118.0, 154.0, 149.0, 135.0, 119.0, 155.0]
2025-09-11 23:31:53,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 34 minutes, 26 seconds)
2025-09-11 23:46:11,982 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:46:11,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:46:51,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 741.06323 ± 162.428
2025-09-11 23:46:51,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [834.823, 570.29504, 603.5429, 995.5525, 914.0695, 859.6176, 707.6888, 573.1954, 502.0967, 849.7509]
2025-09-11 23:46:51,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 109.0, 116.0, 196.0, 182.0, 163.0, 133.0, 108.0, 113.0, 159.0]
2025-09-11 23:46:51,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (741.06) for latency ExtremeClogL1U23
2025-09-11 23:46:51,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 20 hours, 24 minutes, 4 seconds)
2025-09-12 00:01:15,102 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:01:15,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:01:48,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 635.43884 ± 123.296
2025-09-12 00:01:48,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [613.07306, 677.76575, 527.2979, 739.5669, 573.51544, 596.98486, 740.7652, 653.36694, 850.2354, 381.81717]
2025-09-12 00:01:48,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 128.0, 105.0, 139.0, 114.0, 112.0, 148.0, 123.0, 164.0, 72.0]
2025-09-12 00:01:48,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 20 hours, 10 minutes, 51 seconds)
2025-09-12 00:16:13,462 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:16:13,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:16:52,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 730.56775 ± 206.479
2025-09-12 00:16:52,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [617.1965, 1127.6523, 455.48834, 903.85156, 466.6466, 621.4641, 978.03845, 739.05786, 748.8117, 647.4706]
2025-09-12 00:16:52,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 223.0, 87.0, 174.0, 88.0, 132.0, 196.0, 139.0, 142.0, 124.0]
2025-09-12 00:16:52,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 19 hours, 57 minutes, 15 seconds)
2025-09-12 00:31:09,573 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:31:09,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:32:01,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 924.94373 ± 317.616
2025-09-12 00:32:01,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [579.87317, 612.3448, 1350.7373, 1204.3025, 709.1729, 1166.5922, 1360.4352, 743.9645, 485.7269, 1036.2883]
2025-09-12 00:32:01,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 130.0, 265.0, 237.0, 153.0, 216.0, 268.0, 154.0, 104.0, 220.0]
2025-09-12 00:32:01,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (924.94) for latency ExtremeClogL1U23
2025-09-12 00:32:01,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 19 hours, 47 minutes, 4 seconds)
2025-09-12 00:46:24,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:46:24,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:47:06,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 758.95551 ± 145.414
2025-09-12 00:47:06,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [528.7816, 872.07916, 636.19806, 610.6403, 877.4744, 688.91113, 635.10187, 899.0101, 955.2636, 886.0947]
2025-09-12 00:47:06,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 161.0, 120.0, 113.0, 163.0, 128.0, 116.0, 167.0, 174.0, 177.0]
2025-09-12 00:47:06,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 33 minutes, 23 seconds)
2025-09-12 01:02:07,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:02:07,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:02:56,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 828.13281 ± 185.546
2025-09-12 01:02:56,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [863.2849, 740.1956, 715.131, 552.0687, 1228.4734, 955.91565, 778.4151, 615.3028, 859.62787, 972.91266]
2025-09-12 01:02:56,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 158.0, 139.0, 103.0, 245.0, 198.0, 148.0, 114.0, 172.0, 186.0]
2025-09-12 01:02:56,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 31 minutes, 32 seconds)
2025-09-12 01:18:09,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:18:09,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:18:55,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 799.19843 ± 186.968
2025-09-12 01:18:55,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1038.9735, 966.1003, 766.4446, 919.70605, 987.42633, 728.54596, 642.7949, 805.139, 371.9624, 764.89124]
2025-09-12 01:18:55,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 178.0, 148.0, 176.0, 192.0, 141.0, 119.0, 144.0, 79.0, 155.0]
2025-09-12 01:18:55,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 32 minutes, 4 seconds)
2025-09-12 01:33:58,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:33:58,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:34:49,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 898.11511 ± 478.377
2025-09-12 01:34:49,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [795.8523, 365.13037, 447.4133, 518.8686, 858.2056, 616.71216, 1421.2463, 1201.1194, 1986.3864, 770.21674]
2025-09-12 01:34:49,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 70.0, 83.0, 95.0, 161.0, 112.0, 274.0, 242.0, 394.0, 148.0]
2025-09-12 01:34:49,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 29 minutes, 12 seconds)
2025-09-12 01:50:11,221 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:50:11,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:51:21,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1178.83643 ± 403.162
2025-09-12 01:51:21,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1291.1075, 2089.4622, 1249.5468, 745.6416, 730.3902, 1063.2461, 784.2378, 1425.261, 923.0811, 1486.3896]
2025-09-12 01:51:21,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [249.0, 427.0, 244.0, 150.0, 148.0, 211.0, 143.0, 277.0, 194.0, 306.0]
2025-09-12 01:51:21,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1178.84) for latency ExtremeClogL1U23
2025-09-12 01:51:21,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 34 minutes, 12 seconds)
2025-09-12 02:06:18,107 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:06:18,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:07:06,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 815.74121 ± 406.913
2025-09-12 02:07:06,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [857.06067, 374.9369, 480.18707, 333.19092, 1498.7122, 1452.639, 716.34985, 697.2505, 1198.8314, 548.25397]
2025-09-12 02:07:06,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 69.0, 89.0, 70.0, 286.0, 290.0, 137.0, 143.0, 245.0, 117.0]
2025-09-12 02:07:06,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 28 minutes)
2025-09-12 02:22:15,734 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:22:15,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:23:19,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1123.09924 ± 509.492
2025-09-12 02:23:19,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1524.316, 422.99023, 2088.3193, 563.5051, 913.4033, 662.1138, 1673.8038, 1119.6603, 866.87134, 1396.009]
2025-09-12 02:23:19,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [299.0, 80.0, 407.0, 105.0, 172.0, 132.0, 306.0, 224.0, 170.0, 272.0]
2025-09-12 02:23:19,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 17 minutes, 35 seconds)
2025-09-12 02:39:20,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:39:20,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:40:35,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1248.03174 ± 842.953
2025-09-12 02:40:35,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1092.1194, 577.42993, 1243.9148, 1528.5327, 3439.2493, 577.5673, 1027.0964, 531.78064, 623.18506, 1839.4404]
2025-09-12 02:40:35,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 120.0, 263.0, 316.0, 682.0, 110.0, 210.0, 110.0, 125.0, 360.0]
2025-09-12 02:40:35,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1248.03) for latency ExtremeClogL1U23
2025-09-12 02:40:35,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 19 minutes, 49 seconds)
2025-09-12 02:54:54,547 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:54:54,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:56:07,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1295.99109 ± 955.810
2025-09-12 02:56:07,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1215.1147, 2130.171, 415.91864, 630.8934, 916.1864, 1652.2626, 1151.5349, 3713.7334, 477.04944, 657.04584]
2025-09-12 02:56:07,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [234.0, 413.0, 88.0, 120.0, 172.0, 305.0, 218.0, 696.0, 92.0, 122.0]
2025-09-12 02:56:07,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1295.99) for latency ExtremeClogL1U23
2025-09-12 02:56:07,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 58 minutes, 17 seconds)
2025-09-12 03:11:16,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:11:16,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:12:38,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1425.44312 ± 877.946
2025-09-12 03:12:38,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1049.9601, 1072.2004, 578.4476, 3722.9905, 1334.8398, 1536.3708, 1018.24536, 577.7178, 2134.3599, 1229.2992]
2025-09-12 03:12:38,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 206.0, 107.0, 723.0, 270.0, 304.0, 185.0, 109.0, 407.0, 228.0]
2025-09-12 03:12:38,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1425.44) for latency ExtremeClogL1U23
2025-09-12 03:12:38,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 41 minutes, 41 seconds)
2025-09-12 03:27:48,368 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:27:48,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:29:36,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1805.64526 ± 1261.293
2025-09-12 03:29:36,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [563.61194, 873.84045, 1490.6865, 3469.167, 499.62015, 866.4896, 1086.577, 4239.325, 3068.569, 1898.5647]
2025-09-12 03:29:36,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 174.0, 305.0, 663.0, 98.0, 184.0, 221.0, 846.0, 612.0, 381.0]
2025-09-12 03:29:36,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1805.65) for latency ExtremeClogL1U23
2025-09-12 03:29:36,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 42 minutes, 2 seconds)
2025-09-12 03:44:46,441 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:44:46,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:46:05,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1324.15393 ± 669.968
2025-09-12 03:46:05,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [858.2579, 1683.6499, 1130.3962, 949.90656, 637.43427, 2779.9558, 2196.9563, 1267.7822, 557.38794, 1179.8115]
2025-09-12 03:46:05,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 347.0, 233.0, 179.0, 119.0, 542.0, 435.0, 246.0, 102.0, 249.0]
2025-09-12 03:46:05,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 28 minutes, 59 seconds)
2025-09-12 04:01:58,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:01:58,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:03:47,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1874.52856 ± 1042.436
2025-09-12 04:03:47,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2339.5881, 1403.1901, 2298.238, 4235.0825, 554.5969, 1368.8793, 1783.745, 2423.8848, 1997.6483, 340.43115]
2025-09-12 04:03:47,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [463.0, 282.0, 453.0, 803.0, 109.0, 261.0, 337.0, 483.0, 396.0, 64.0]
2025-09-12 04:03:47,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1874.53) for latency ExtremeClogL1U23
2025-09-12 04:03:47,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 18 minutes, 8 seconds)
2025-09-12 04:18:11,097 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:18:11,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:20:19,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2220.31445 ± 1410.126
2025-09-12 04:20:19,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [926.5016, 3929.4478, 667.9202, 607.48254, 1369.065, 1873.2144, 5219.312, 2437.9915, 2336.8774, 2835.3315]
2025-09-12 04:20:19,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 749.0, 126.0, 120.0, 266.0, 350.0, 1000.0, 466.0, 443.0, 538.0]
2025-09-12 04:20:19,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2220.31) for latency ExtremeClogL1U23
2025-09-12 04:20:19,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 18 hours, 14 minutes, 35 seconds)
2025-09-12 04:36:14,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:36:14,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:37:54,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1744.71021 ± 1365.846
2025-09-12 04:37:54,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [381.77307, 2948.5461, 1874.1492, 490.44888, 1980.6223, 1746.9299, 1269.9252, 5147.183, 548.8732, 1058.65]
2025-09-12 04:37:54,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 560.0, 358.0, 105.0, 382.0, 332.0, 238.0, 998.0, 102.0, 210.0]
2025-09-12 04:37:54,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 18 hours, 11 minutes, 30 seconds)
2025-09-12 04:52:50,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:52:50,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:55:16,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2558.49731 ± 1724.453
2025-09-12 04:55:16,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1612.3866, 633.28845, 1268.8875, 1390.2595, 3400.5452, 432.4176, 5296.95, 3987.3813, 5255.663, 2307.1912]
2025-09-12 04:55:16,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [300.0, 121.0, 253.0, 261.0, 650.0, 84.0, 1000.0, 762.0, 993.0, 437.0]
2025-09-12 04:55:16,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2558.50) for latency ExtremeClogL1U23
2025-09-12 04:55:16,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 59 minutes, 20 seconds)
2025-09-12 05:09:34,926 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:09:34,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:13:04,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3548.64380 ± 1620.962
2025-09-12 05:13:04,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5121.6763, 4533.0425, 2498.197, 5089.857, 5099.4873, 5124.1445, 3440.4468, 1396.9801, 551.6224, 2630.9856]
2025-09-12 05:13:04,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 891.0, 499.0, 1000.0, 1000.0, 1000.0, 650.0, 282.0, 106.0, 523.0]
2025-09-12 05:13:04,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3548.64) for latency ExtremeClogL1U23
2025-09-12 05:13:04,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 17 hours, 58 minutes, 40 seconds)
2025-09-12 05:28:12,518 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:28:12,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:31:45,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3666.84717 ± 1685.071
2025-09-12 05:31:45,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5242.4385, 5270.866, 5222.94, 5214.987, 2864.7063, 1368.0067, 3305.3303, 1990.2826, 5215.3296, 973.58746]
2025-09-12 05:31:45,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 546.0, 263.0, 629.0, 388.0, 1000.0, 185.0]
2025-09-12 05:31:45,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3666.85) for latency ExtremeClogL1U23
2025-09-12 05:31:45,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 17 hours, 53 minutes, 6 seconds)
2025-09-12 05:47:16,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:47:16,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:51:01,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3918.18042 ± 1582.510
2025-09-12 05:51:01,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3314.0654, 1953.3615, 2127.0295, 5326.4976, 1017.5826, 5344.8574, 4792.884, 5309.522, 4729.2183, 5266.7856]
2025-09-12 05:51:01,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [633.0, 383.0, 430.0, 1000.0, 195.0, 1000.0, 901.0, 1000.0, 899.0, 1000.0]
2025-09-12 05:51:01,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3918.18) for latency ExtremeClogL1U23
2025-09-12 05:51:01,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 18 hours, 8 minutes, 26 seconds)
2025-09-12 06:05:53,143 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:05:53,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:08:55,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3119.32959 ± 1801.274
2025-09-12 06:08:55,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2355.215, 5219.7866, 5223.3027, 2525.3538, 5211.0107, 700.976, 5231.103, 849.3773, 2090.9832, 1786.1891]
2025-09-12 06:08:55,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [450.0, 1000.0, 1000.0, 499.0, 1000.0, 146.0, 1000.0, 173.0, 403.0, 341.0]
2025-09-12 06:08:55,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 17 hours, 53 minutes, 51 seconds)
2025-09-12 06:24:30,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:24:30,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:28:06,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3682.38672 ± 1824.039
2025-09-12 06:28:06,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3962.9094, 5193.7397, 5193.4424, 5199.858, 1544.8682, 5224.77, 5171.36, 1023.86224, 3800.8696, 508.1887]
2025-09-12 06:28:06,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [768.0, 1000.0, 1000.0, 1000.0, 302.0, 1000.0, 1000.0, 208.0, 726.0, 111.0]
2025-09-12 06:28:06,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 17 hours, 56 minutes, 57 seconds)
2025-09-12 06:42:58,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:42:58,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:46:10,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3250.96558 ± 1722.304
2025-09-12 06:46:10,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3569.2593, 2614.932, 2512.8086, 5180.565, 5172.721, 1142.3322, 5067.5635, 897.3402, 1188.7361, 5163.398]
2025-09-12 06:46:10,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [684.0, 516.0, 495.0, 1000.0, 1000.0, 229.0, 1000.0, 184.0, 229.0, 1000.0]
2025-09-12 06:46:10,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 17 hours, 41 minutes, 19 seconds)
2025-09-12 07:01:46,458 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:01:46,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:04:49,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3157.72021 ± 1989.375
2025-09-12 07:04:49,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5143.342, 1325.1256, 543.3191, 5282.244, 668.5762, 5288.293, 5287.9067, 3517.5173, 861.5494, 3659.3293]
2025-09-12 07:04:49,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 254.0, 102.0, 1000.0, 135.0, 1000.0, 1000.0, 667.0, 180.0, 678.0]
2025-09-12 07:04:49,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 17 hours, 22 minutes, 26 seconds)
2025-09-12 07:19:48,679 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:19:48,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:22:12,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2456.59131 ± 1887.301
2025-09-12 07:22:12,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5209.054, 5225.3687, 1580.0817, 2667.121, 5185.5713, 618.9681, 1578.9862, 800.48444, 1031.5428, 668.7369]
2025-09-12 07:22:12,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 298.0, 514.0, 1000.0, 119.0, 304.0, 154.0, 194.0, 142.0]
2025-09-12 07:22:12,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 16 hours, 42 minutes, 57 seconds)
2025-09-12 07:37:44,938 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:37:44,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:41:18,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3702.41602 ± 1782.588
2025-09-12 07:41:18,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1346.7749, 1895.7903, 5302.896, 1984.378, 5273.518, 5352.094, 5005.3516, 1015.37915, 5347.0586, 4500.9194]
2025-09-12 07:41:18,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 359.0, 1000.0, 373.0, 1000.0, 1000.0, 933.0, 187.0, 1000.0, 858.0]
2025-09-12 07:41:18,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 16 hours, 37 minutes, 46 seconds)
2025-09-12 07:57:22,835 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:57:22,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:00:03,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2727.26392 ± 1814.046
2025-09-12 08:00:03,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3622.3577, 836.3272, 5168.7188, 5135.794, 785.7318, 1101.9299, 1998.3096, 816.5099, 2598.011, 5208.9497]
2025-09-12 08:00:03,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [707.0, 154.0, 1000.0, 1000.0, 153.0, 213.0, 402.0, 155.0, 514.0, 1000.0]
2025-09-12 08:00:03,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 16 hours, 14 minutes, 41 seconds)
2025-09-12 08:15:03,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:15:03,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:18:18,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3385.55005 ± 1480.590
2025-09-12 08:18:18,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5234.624, 5220.8755, 2831.6274, 4728.982, 1330.3129, 2566.3687, 2842.305, 1539.1903, 5214.3022, 2346.9111]
2025-09-12 08:18:18,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 548.0, 902.0, 277.0, 489.0, 547.0, 277.0, 1000.0, 472.0]
2025-09-12 08:18:18,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 15 hours, 58 minutes, 9 seconds)
2025-09-12 08:32:40,379 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:32:40,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:36:05,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3513.52148 ± 1667.712
2025-09-12 08:36:05,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5146.721, 733.1737, 5254.5415, 5208.7983, 4082.7014, 1622.7156, 2379.7883, 3923.524, 5141.99, 1641.2626]
2025-09-12 08:36:05,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 150.0, 1000.0, 1000.0, 818.0, 330.0, 463.0, 761.0, 1000.0, 304.0]
2025-09-12 08:36:05,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 15 hours, 30 minutes, 51 seconds)
2025-09-12 08:51:51,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:51:51,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:55:05,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3293.89771 ± 1443.784
2025-09-12 08:55:05,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1149.9395, 2475.453, 2965.177, 5173.743, 3661.2168, 1329.0503, 5135.136, 5200.8545, 2357.9395, 3490.467]
2025-09-12 08:55:05,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 475.0, 563.0, 1000.0, 705.0, 255.0, 1000.0, 1000.0, 449.0, 679.0]
2025-09-12 08:55:05,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 15 hours, 28 minutes, 45 seconds)
2025-09-12 09:10:50,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:10:50,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:14:04,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3270.25244 ± 1622.910
2025-09-12 09:14:04,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2218.098, 5092.642, 3090.943, 5160.4946, 5077.9097, 2164.6958, 648.65485, 1675.091, 5176.9297, 2397.066]
2025-09-12 09:14:04,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [421.0, 990.0, 591.0, 1000.0, 1000.0, 437.0, 123.0, 325.0, 1000.0, 460.0]
2025-09-12 09:14:04,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 15 hours, 9 minutes, 6 seconds)
2025-09-12 09:28:00,929 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:28:00,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:31:20,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3439.70898 ± 1437.988
2025-09-12 09:31:20,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3109.8162, 2034.0557, 3258.7283, 5233.783, 2124.999, 690.4921, 5277.0806, 4021.916, 4947.453, 3698.7664]
2025-09-12 09:31:20,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [591.0, 388.0, 642.0, 1000.0, 405.0, 123.0, 1000.0, 762.0, 953.0, 708.0]
2025-09-12 09:31:20,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 14 hours, 36 minutes, 16 seconds)
2025-09-12 09:47:40,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:47:40,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:50:51,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3282.56689 ± 1755.746
2025-09-12 09:50:51,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [947.238, 2289.4185, 5342.951, 5194.5093, 2956.2988, 1631.4823, 1000.3389, 5340.9834, 2837.5344, 5284.9126]
2025-09-12 09:50:51,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 439.0, 1000.0, 989.0, 571.0, 327.0, 212.0, 1000.0, 562.0, 1000.0]
2025-09-12 09:50:51,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 14 hours, 29 minutes, 54 seconds)
2025-09-12 10:05:04,146 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:05:04,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:09:01,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4132.15967 ± 1577.756
2025-09-12 10:09:01,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1035.8209, 5291.5073, 5334.115, 2989.9697, 3955.3623, 5308.161, 5227.5815, 5242.7446, 5268.5977, 1667.7382]
2025-09-12 10:09:01,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 1000.0, 1000.0, 556.0, 760.0, 1000.0, 1000.0, 1000.0, 1000.0, 315.0]
2025-09-12 10:09:01,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (4132.16) for latency ExtremeClogL1U23
2025-09-12 10:09:01,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 14 hours, 14 minutes, 59 seconds)
2025-09-12 10:24:05,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:24:05,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:27:05,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3167.55713 ± 1895.737
2025-09-12 10:27:05,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [984.7255, 5314.4185, 936.4771, 879.4518, 5453.3335, 2049.0867, 5311.62, 3180.844, 2279.2632, 5286.3506]
2025-09-12 10:27:05,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 1000.0, 174.0, 180.0, 1000.0, 388.0, 1000.0, 605.0, 413.0, 1000.0]
2025-09-12 10:27:05,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 13 hours, 48 minutes)
2025-09-12 10:42:15,215 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:42:15,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:45:50,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3798.69971 ± 1645.995
2025-09-12 10:45:50,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1840.552, 4335.992, 2781.655, 5214.336, 5329.557, 645.2502, 5224.708, 2371.4338, 5311.815, 4931.697]
2025-09-12 10:45:50,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [347.0, 812.0, 519.0, 1000.0, 1000.0, 122.0, 1000.0, 444.0, 1000.0, 920.0]
2025-09-12 10:45:50,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 13 hours, 27 minutes, 33 seconds)
2025-09-12 11:00:56,633 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:00:56,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:04:19,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3623.69409 ± 1849.198
2025-09-12 11:04:19,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5378.968, 5257.4653, 5430.7417, 2653.7903, 5286.9556, 1442.5784, 3188.6477, 1533.2372, 638.90375, 5425.655]
2025-09-12 11:04:19,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 967.0, 1000.0, 486.0, 968.0, 272.0, 580.0, 300.0, 123.0, 1000.0]
2025-09-12 11:04:19,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 13 hours, 19 minutes, 40 seconds)
2025-09-12 11:20:00,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:20:00,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:23:57,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4044.25708 ± 1618.514
2025-09-12 11:23:57,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5173.1475, 2043.8945, 5033.892, 5221.6045, 4387.6724, 5179.7456, 5185.0537, 2443.25, 584.4579, 5189.852]
2025-09-12 11:23:57,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 387.0, 1000.0, 1000.0, 839.0, 1000.0, 1000.0, 492.0, 125.0, 1000.0]
2025-09-12 11:23:57,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 13 hours, 2 minutes, 8 seconds)
2025-09-12 11:39:06,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:39:06,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:42:19,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3368.58057 ± 1922.175
2025-09-12 11:42:19,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1279.6741, 1237.333, 961.4747, 1182.203, 5130.924, 2901.811, 5261.7036, 5234.379, 5328.764, 5167.539]
2025-09-12 11:42:19,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [239.0, 232.0, 179.0, 224.0, 967.0, 560.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:42:19,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 12 hours, 45 minutes, 8 seconds)
2025-09-12 11:57:37,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:57:37,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:01:25,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3949.33398 ± 1322.425
2025-09-12 12:01:25,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1507.3706, 3551.997, 5171.8433, 5130.493, 3897.2834, 3898.1475, 5228.508, 4265.679, 1663.5405, 5178.482]
2025-09-12 12:01:25,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [291.0, 667.0, 1000.0, 1000.0, 754.0, 725.0, 1000.0, 816.0, 324.0, 1000.0]
2025-09-12 12:01:25,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 12 hours, 34 minutes, 40 seconds)
2025-09-12 12:16:13,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:16:13,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:18:29,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2325.20850 ± 1485.157
2025-09-12 12:18:29,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [478.14542, 5240.48, 1271.4695, 2114.6726, 3264.1787, 1229.5352, 3646.529, 795.237, 3760.328, 1451.5084]
2025-09-12 12:18:29,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 1000.0, 265.0, 397.0, 645.0, 226.0, 713.0, 166.0, 735.0, 287.0]
2025-09-12 12:18:29,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 12 hours, 2 minutes, 41 seconds)
2025-09-12 12:33:58,163 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:33:58,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:38:15,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4496.99512 ± 1171.452
2025-09-12 12:38:15,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5343.9253, 5275.1655, 5253.9272, 2186.3835, 5279.4385, 4872.958, 4379.247, 2265.4045, 5295.7197, 4817.783]
2025-09-12 12:38:15,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 986.0, 1000.0, 424.0, 1000.0, 909.0, 839.0, 449.0, 1000.0, 909.0]
2025-09-12 12:38:15,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (4497.00) for latency ExtremeClogL1U23
2025-09-12 12:38:15,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 11 hours, 53 minutes, 53 seconds)
2025-09-12 12:53:48,873 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:53:48,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:57:41,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3982.05542 ± 1372.888
2025-09-12 12:57:41,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5030.5815, 1946.4596, 4801.9746, 3144.9558, 5171.956, 5083.8086, 1893.1611, 2419.8152, 5115.09, 5212.748]
2025-09-12 12:57:41,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 369.0, 937.0, 598.0, 1000.0, 1000.0, 367.0, 455.0, 1000.0, 1000.0]
2025-09-12 12:57:41,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 11 hours, 33 minutes, 34 seconds)
2025-09-12 13:13:38,363 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:13:38,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:16:55,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3440.57349 ± 1706.394
2025-09-12 13:16:55,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1831.297, 1599.3784, 1050.795, 5280.5854, 1798.6425, 5278.3457, 5340.65, 2795.9543, 5303.9604, 4126.124]
2025-09-12 13:16:55,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [337.0, 302.0, 214.0, 1000.0, 337.0, 1000.0, 1000.0, 533.0, 1000.0, 766.0]
2025-09-12 13:16:55,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 11 hours, 21 minutes, 5 seconds)
2025-09-12 13:31:45,222 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:31:45,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:35:57,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4253.67480 ± 1457.948
2025-09-12 13:35:57,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5121.1855, 5175.7188, 5046.742, 5119.903, 5018.0356, 5220.24, 1894.2079, 5139.5073, 3730.2793, 1070.931]
2025-09-12 13:35:57,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 990.0, 1000.0, 1000.0, 1000.0, 1000.0, 371.0, 1000.0, 737.0, 204.0]
2025-09-12 13:35:57,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 11 hours, 1 minute, 42 seconds)
2025-09-12 13:51:19,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:51:19,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:54:52,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3758.30859 ± 1497.568
2025-09-12 13:54:52,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5256.1934, 5216.51, 5332.2163, 4119.182, 5163.5444, 2355.6853, 2588.402, 1709.0577, 4427.4434, 1414.8517]
2025-09-12 13:54:52,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 771.0, 1000.0, 457.0, 511.0, 323.0, 840.0, 271.0]
2025-09-12 13:54:52,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 10 hours, 55 minutes, 25 seconds)
2025-09-12 14:10:19,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:10:19,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:14:11,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4023.52026 ± 1659.800
2025-09-12 14:14:11,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1180.5859, 4255.697, 5278.007, 5199.877, 1628.3503, 5243.489, 5242.5537, 5287.975, 1799.6736, 5119.002]
2025-09-12 14:14:11,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [244.0, 811.0, 1000.0, 1000.0, 293.0, 1000.0, 1000.0, 1000.0, 349.0, 1000.0]
2025-09-12 14:14:11,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 10 hours, 33 minutes, 7 seconds)
2025-09-12 14:29:12,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:29:12,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:33:09,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4062.15430 ± 1540.577
2025-09-12 14:33:09,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3690.0017, 2909.868, 5210.953, 5182.244, 1003.83356, 5227.3125, 5171.6055, 5240.916, 1802.334, 5182.4736]
2025-09-12 14:33:09,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [704.0, 549.0, 1000.0, 1000.0, 205.0, 1000.0, 1000.0, 1000.0, 352.0, 1000.0]
2025-09-12 14:33:09,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 10 hours, 11 minutes, 2 seconds)
2025-09-12 14:48:21,439 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:48:21,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:52:08,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3909.98633 ± 1711.716
2025-09-12 14:52:08,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5172.057, 5266.2275, 5240.005, 5188.756, 2277.4146, 3176.2676, 5253.7085, 767.52985, 5251.7847, 1506.1118]
2025-09-12 14:52:08,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 441.0, 611.0, 1000.0, 152.0, 1000.0, 290.0]
2025-09-12 14:52:08,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 9 hours, 50 minutes, 18 seconds)
2025-09-12 15:07:40,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:07:40,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:10:50,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3133.70850 ± 1876.770
2025-09-12 15:10:50,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [4960.919, 5055.4854, 1191.0801, 2113.1958, 1028.4713, 807.0566, 4841.7573, 5001.297, 5056.0186, 1281.8037]
2025-09-12 15:10:50,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 249.0, 419.0, 205.0, 170.0, 964.0, 1000.0, 1000.0, 240.0]
2025-09-12 15:10:50,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 9 hours, 29 minutes, 21 seconds)
2025-09-12 15:25:49,338 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:25:49,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:30:16,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4581.72852 ± 1122.550
2025-09-12 15:30:16,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2251.1365, 5258.44, 5232.714, 5226.626, 5278.9375, 5145.997, 4436.198, 5217.2275, 2530.1025, 5239.9116]
2025-09-12 15:30:16,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [428.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 857.0, 1000.0, 477.0, 1000.0]
2025-09-12 15:30:16,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (4581.73) for latency ExtremeClogL1U23
2025-09-12 15:30:16,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 9 hours, 13 minutes, 14 seconds)
2025-09-12 15:45:30,253 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:45:30,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:49:21,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3912.98755 ± 1529.934
2025-09-12 15:49:21,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5213.945, 5027.7417, 5176.5557, 3631.843, 1725.2909, 2726.497, 4886.5493, 841.16895, 5133.345, 4766.9395]
2025-09-12 15:49:21,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 689.0, 332.0, 529.0, 952.0, 155.0, 1000.0, 929.0]
2025-09-12 15:49:21,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 52 minutes, 56 seconds)
2025-09-12 16:04:09,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:04:09,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:08:11,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4208.88379 ± 1705.970
2025-09-12 16:08:11,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5182.6343, 5269.278, 2438.8655, 5289.9194, 5313.071, 646.7641, 5305.7886, 1960.7224, 5320.3857, 5361.4106]
2025-09-12 16:08:11,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 451.0, 1000.0, 1000.0, 118.0, 1000.0, 386.0, 1000.0, 1000.0]
2025-09-12 16:08:11,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 8 hours, 33 minutes, 8 seconds)
2025-09-12 16:23:26,481 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:23:26,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:26:42,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3516.08350 ± 1524.406
2025-09-12 16:26:42,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3688.3086, 1016.77954, 5388.24, 2737.915, 5354.741, 4283.117, 5403.1343, 2833.5, 1351.4987, 3103.603]
2025-09-12 16:26:42,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [710.0, 190.0, 1000.0, 507.0, 1000.0, 793.0, 1000.0, 523.0, 250.0, 571.0]
2025-09-12 16:26:42,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 8 hours, 11 minutes, 48 seconds)
2025-09-12 16:42:27,836 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:42:27,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:45:52,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3524.01245 ± 1526.318
2025-09-12 16:45:52,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5127.418, 5103.404, 1079.9419, 5219.6753, 5256.889, 2206.0796, 2654.6138, 4123.33, 1917.1912, 2551.5803]
2025-09-12 16:45:52,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 209.0, 1000.0, 1000.0, 444.0, 506.0, 787.0, 381.0, 492.0]
2025-09-12 16:45:52,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 55 minutes, 9 seconds)
2025-09-12 17:00:12,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:00:12,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:03:58,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4001.83594 ± 1495.785
2025-09-12 17:03:58,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5395.132, 4176.4414, 5236.7407, 5308.162, 5250.2656, 3692.7559, 2298.1956, 1642.9766, 5320.4316, 1697.2565]
2025-09-12 17:03:58,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 783.0, 1000.0, 1000.0, 1000.0, 701.0, 440.0, 297.0, 1000.0, 319.0]
2025-09-12 17:03:58,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 7 hours, 29 minutes, 48 seconds)
2025-09-12 17:20:10,398 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:20:10,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:23:21,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3302.97974 ± 1680.324
2025-09-12 17:23:21,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5266.0806, 5308.286, 4206.8975, 1945.059, 3424.4355, 3899.3936, 870.2872, 1338.9054, 5326.7812, 1443.6716]
2025-09-12 17:23:21,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 820.0, 376.0, 653.0, 728.0, 162.0, 257.0, 1000.0, 273.0]
2025-09-12 17:23:21,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 7 hours, 12 minutes, 23 seconds)
2025-09-12 17:38:13,526 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:38:13,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:42:15,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4144.41357 ± 1439.873
2025-09-12 17:42:15,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1265.6238, 5103.0522, 4189.098, 5192.8413, 1941.1641, 5131.9653, 5187.0957, 5217.81, 3023.0566, 5192.433]
2025-09-12 17:42:15,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [244.0, 1000.0, 793.0, 1000.0, 369.0, 1000.0, 1000.0, 1000.0, 595.0, 1000.0]
2025-09-12 17:42:15,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 53 minutes, 55 seconds)
2025-09-12 17:57:43,186 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:57:43,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:01:09,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3603.40942 ± 2019.810
2025-09-12 18:01:09,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [832.14233, 584.89923, 5252.6323, 5292.0493, 3686.9521, 4048.425, 5193.831, 5322.93, 483.07947, 5337.152]
2025-09-12 18:01:09,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 111.0, 1000.0, 1000.0, 683.0, 778.0, 1000.0, 1000.0, 88.0, 1000.0]
2025-09-12 18:01:09,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 36 minutes, 41 seconds)
2025-09-12 18:15:40,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:15:40,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:19:18,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3774.28906 ± 1717.880
2025-09-12 18:19:18,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1259.7579, 5104.4053, 4905.0845, 5045.4624, 5126.2393, 3195.9016, 5209.9243, 5240.3574, 1629.464, 1026.2935]
2025-09-12 18:19:18,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 1000.0, 932.0, 1000.0, 1000.0, 607.0, 1000.0, 1000.0, 317.0, 190.0]
2025-09-12 18:19:18,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 6 hours, 13 minutes, 45 seconds)
2025-09-12 18:34:23,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:34:23,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:38:51,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4752.70605 ± 1331.278
2025-09-12 18:38:51,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1078.0631, 5344.0386, 5292.7744, 5359.755, 3609.4937, 5434.2793, 5357.236, 5355.7217, 5323.406, 5372.294]
2025-09-12 18:38:51,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [201.0, 1000.0, 1000.0, 1000.0, 674.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:38:51,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (4752.71) for latency ExtremeClogL1U23
2025-09-12 18:38:51,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 6 hours, 34 seconds)
2025-09-12 18:54:13,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:54:13,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:57:24,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3261.16797 ± 1430.531
2025-09-12 18:57:24,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1787.2728, 5136.803, 2021.9915, 5212.4414, 1725.5762, 3818.7593, 4632.997, 3997.1494, 1185.0675, 3093.6184]
2025-09-12 18:57:24,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [361.0, 992.0, 388.0, 1000.0, 341.0, 742.0, 878.0, 798.0, 237.0, 591.0]
2025-09-12 18:57:24,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 38 minutes, 35 seconds)
2025-09-12 19:12:29,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:12:29,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:16:14,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4022.34766 ± 1645.926
2025-09-12 19:16:14,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5191.7686, 5342.1143, 1800.594, 5373.1914, 5342.8926, 5317.867, 5382.934, 1287.4786, 2157.8318, 3026.8054]
2025-09-12 19:16:14,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [969.0, 1000.0, 337.0, 1000.0, 1000.0, 1000.0, 1000.0, 242.0, 393.0, 575.0]
2025-09-12 19:16:14,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 19 minutes, 31 seconds)
2025-09-12 19:31:36,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:31:36,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:36:22,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4842.13184 ± 861.551
2025-09-12 19:36:22,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5161.2256, 5132.1655, 5038.8423, 5215.2026, 2260.6948, 5109.974, 5149.474, 5139.0923, 5103.7954, 5110.8535]
2025-09-12 19:36:22,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 428.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:36:22,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (4842.13) for latency ExtremeClogL1U23
2025-09-12 19:36:22,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 4 minutes, 41 seconds)
2025-09-12 19:51:29,840 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:51:29,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:55:10,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3702.10864 ± 1815.220
2025-09-12 19:55:10,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2805.2249, 5160.798, 963.98035, 5163.6006, 5008.918, 2066.715, 5119.9375, 5083.969, 5119.6704, 528.2752]
2025-09-12 19:55:10,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [534.0, 1000.0, 195.0, 1000.0, 1000.0, 411.0, 1000.0, 1000.0, 1000.0, 105.0]
2025-09-12 19:55:10,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 47 minutes, 33 seconds)
2025-09-12 20:11:04,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:11:04,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:14:56,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3992.10205 ± 1742.902
2025-09-12 20:14:56,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5181.616, 4252.019, 5298.357, 707.9024, 5238.1553, 1785.1453, 5295.175, 1706.335, 5210.669, 5245.6445]
2025-09-12 20:14:56,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 797.0, 1000.0, 134.0, 1000.0, 357.0, 1000.0, 313.0, 1000.0, 1000.0]
2025-09-12 20:14:56,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 29 minutes)
2025-09-12 20:29:24,299 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:29:24,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:33:08,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3783.58740 ± 1449.661
2025-09-12 20:33:08,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2924.3857, 4132.7715, 4816.633, 5100.461, 4967.7144, 5227.0513, 2135.8547, 1490.6678, 1823.605, 5216.7305]
2025-09-12 20:33:08,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [559.0, 802.0, 917.0, 1000.0, 1000.0, 1000.0, 433.0, 308.0, 383.0, 1000.0]
2025-09-12 20:33:08,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 4 hours, 8 minutes, 53 seconds)
2025-09-12 20:48:35,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:48:35,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:50:42,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2154.69971 ± 1718.813
2025-09-12 20:50:42,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1249.4801, 778.8144, 2198.7722, 3760.9526, 5082.725, 557.45294, 688.26575, 4995.083, 503.1814, 1732.2668]
2025-09-12 20:50:42,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [250.0, 153.0, 437.0, 734.0, 1000.0, 105.0, 134.0, 1000.0, 95.0, 332.0]
2025-09-12 20:50:42,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 46 minutes, 43 seconds)
2025-09-12 21:05:35,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:05:35,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:09:40,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4082.77344 ± 1287.472
2025-09-12 21:09:40,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [4532.6064, 3571.1638, 3104.896, 5232.8467, 4971.868, 5145.9805, 5218.8545, 2369.1843, 1526.065, 5154.269]
2025-09-12 21:09:40,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [883.0, 731.0, 629.0, 1000.0, 1000.0, 1000.0, 1000.0, 493.0, 289.0, 1000.0]
2025-09-12 21:09:40,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 25 minutes, 14 seconds)
2025-09-12 21:24:59,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:24:59,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:29:12,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4569.94238 ± 1554.063
2025-09-12 21:29:12,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [4543.934, 5350.914, 5504.3125, 2195.397, 5487.4814, 5444.6104, 926.8512, 5413.771, 5361.0864, 5471.0654]
2025-09-12 21:29:12,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [847.0, 1000.0, 1000.0, 398.0, 1000.0, 1000.0, 179.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:29:12,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 8 minutes, 5 seconds)
2025-09-12 21:44:58,793 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:44:58,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:48:32,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3768.27783 ± 1825.247
2025-09-12 21:48:32,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5406.0317, 5187.086, 1678.6046, 4509.2905, 5341.663, 3012.3406, 5334.299, 1448.8164, 510.08447, 5254.562]
2025-09-12 21:48:32,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 311.0, 841.0, 1000.0, 562.0, 1000.0, 268.0, 95.0, 1000.0]
2025-09-12 21:48:32,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 48 minutes, 29 seconds)
2025-09-12 22:03:30,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:03:30,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:07:00,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3777.41162 ± 1719.598
2025-09-12 22:07:00,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1803.0088, 5417.834, 5560.6055, 5373.996, 444.36627, 4735.1587, 2506.3977, 2731.1624, 5362.223, 3839.362]
2025-09-12 22:07:00,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [362.0, 1000.0, 1000.0, 1000.0, 95.0, 858.0, 461.0, 500.0, 963.0, 723.0]
2025-09-12 22:07:00,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 30 minutes, 12 seconds)
2025-09-12 22:22:45,881 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:22:45,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:26:50,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4337.44873 ± 1604.112
2025-09-12 22:26:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5175.778, 5422.893, 1675.3741, 1265.8837, 5325.0396, 5437.8276, 2951.206, 5405.549, 5352.4873, 5362.4497]
2025-09-12 22:26:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [970.0, 1000.0, 309.0, 239.0, 1000.0, 1000.0, 569.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:26:50,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 14 minutes, 34 seconds)
2025-09-12 22:41:58,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:41:58,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:46:01,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4304.67236 ± 1498.905
2025-09-12 22:46:01,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [4296.5547, 1855.637, 5407.169, 5313.691, 1334.7714, 5307.442, 5465.5225, 5305.5522, 3367.9922, 5392.396]
2025-09-12 22:46:01,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [811.0, 347.0, 1000.0, 1000.0, 272.0, 1000.0, 1000.0, 1000.0, 621.0, 1000.0]
2025-09-12 22:46:01,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 55 minutes, 37 seconds)
2025-09-12 23:00:45,729 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:00:45,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 23:05:22,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4628.11133 ± 1087.498
2025-09-12 23:05:22,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5186.5747, 1609.0098, 4851.308, 5210.9146, 5063.701, 5273.218, 5114.953, 5013.669, 5176.7656, 3781.0017]
2025-09-12 23:05:22,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 320.0, 941.0, 1000.0, 1000.0, 1000.0, 1000.0, 961.0, 1000.0, 719.0]
2025-09-12 23:05:22,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 36 minutes, 9 seconds)
2025-09-12 23:21:14,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:21:14,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 23:24:06,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3047.74585 ± 1821.299
2025-09-12 23:24:06,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [4697.979, 534.1586, 1334.6213, 5254.4985, 2899.4836, 2226.7847, 1137.9843, 5366.8027, 5269.2812, 1755.8661]
2025-09-12 23:24:06,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [881.0, 99.0, 254.0, 1000.0, 544.0, 416.0, 210.0, 1000.0, 1000.0, 333.0]
2025-09-12 23:24:06,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 16 minutes, 26 seconds)
2025-09-12 23:39:13,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:39:13,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 23:43:12,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4189.38867 ± 1809.495
2025-09-12 23:43:12,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [506.27634, 5291.0312, 1881.5305, 5377.621, 5297.5557, 5336.9434, 5323.3784, 5398.226, 5403.6167, 2077.7073]
2025-09-12 23:43:12,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 1000.0, 349.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 393.0]
2025-09-12 23:43:12,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 57 minutes, 43 seconds)
2025-09-12 23:58:23,222 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:58:23,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-13 00:02:46,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4584.15625 ± 1200.069
2025-09-13 00:02:46,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5310.8115, 5369.7856, 1520.9146, 3641.71, 5363.3125, 4646.721, 5397.5444, 5363.844, 3870.9836, 5355.9355]
2025-09-13 00:02:46,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 304.0, 681.0, 1000.0, 870.0, 992.0, 1000.0, 731.0, 1000.0]
2025-09-13 00:02:46,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 38 minutes, 22 seconds)
2025-09-13 00:18:55,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:18:55,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-13 00:23:40,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4840.22266 ± 917.605
2025-09-13 00:23:40,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5203.8867, 5236.901, 2226.2776, 5215.8643, 5341.37, 4280.4297, 5249.788, 5216.9487, 5194.1543, 5236.6084]
2025-09-13 00:23:40,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 417.0, 1000.0, 1000.0, 802.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:23:40,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 19 minutes, 31 seconds)
2025-09-13 00:38:37,143 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:38:37,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-13 00:41:05,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2726.62451 ± 1643.971
2025-09-13 00:41:05,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1503.8514, 4078.0642, 5172.399, 1692.2378, 5705.479, 2368.1956, 1743.4634, 3147.2598, 795.6676, 1059.6256]
2025-09-13 00:41:05,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [272.0, 713.0, 911.0, 302.0, 1000.0, 417.0, 311.0, 551.0, 147.0, 187.0]
2025-09-13 00:41:05,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1251 [DEBUG]: Training session finished
