2025-05-01 18:03:55,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-05-01 18:03:55,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-05-01 18:03:55,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7f0f99b23af0>}
2025-05-01 18:03:55,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1009 [DEBUG]: using device: cuda
2025-05-01 18:03:55,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1031 [INFO]: Creating new trainer
2025-05-01 18:03:55,653 baseline-mbpac-noisy-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-01 18:03:55,653 baseline-mbpac-noisy-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-01 18:03:55,683 baseline-mbpac-noisy-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-05-01 18:03:57,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1092 [DEBUG]: Starting training session...
2025-05-01 18:03:57,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 1/100
2025-05-01 18:21:47,452 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 18:21:47,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 18:22:23,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 301.30878 ± 20.258
2025-05-01 18:22:23,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [298.246, 276.70795, 343.13812, 284.95193, 295.01874, 288.48404, 319.97092, 280.6264, 302.48425, 323.45956]
2025-05-01 18:22:23,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [57.0, 51.0, 65.0, 55.0, 55.0, 55.0, 62.0, 56.0, 58.0, 61.0]
2025-05-01 18:22:23,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (301.31) for latency ExtremeClogL1U23
2025-05-01 18:22:23,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-01 18:22:24,001 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 18:22:24,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 2/100 (estimated time remaining: 30 hours, 26 minutes, 14 seconds)
2025-05-01 18:43:13,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 18:43:13,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 18:43:53,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 403.98984 ± 105.899
2025-05-01 18:43:53,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [234.92969, 553.0095, 506.4617, 297.4951, 381.78018, 311.53824, 449.09122, 377.57346, 362.67258, 565.34674]
2025-05-01 18:43:53,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [51.0, 104.0, 109.0, 64.0, 71.0, 58.0, 85.0, 70.0, 68.0, 108.0]
2025-05-01 18:43:53,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (403.99) for latency ExtremeClogL1U23
2025-05-01 18:43:53,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-01 18:43:53,845 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 18:43:53,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 3/100 (estimated time remaining: 32 hours, 37 minutes, 16 seconds)
2025-05-01 19:05:11,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 19:05:11,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 19:05:45,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 411.78326 ± 114.328
2025-05-01 19:05:45,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [345.97348, 385.179, 501.7479, 343.01114, 720.93555, 344.35968, 386.89386, 312.66806, 364.0208, 413.0432]
2025-05-01 19:05:45,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [63.0, 71.0, 100.0, 62.0, 137.0, 64.0, 72.0, 57.0, 67.0, 75.0]
2025-05-01 19:05:45,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (411.78) for latency ExtremeClogL1U23
2025-05-01 19:05:45,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-01 19:05:45,050 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 19:05:45,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 4/100 (estimated time remaining: 33 hours, 18 minutes, 7 seconds)
2025-05-01 19:27:28,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 19:27:28,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 19:28:11,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 398.25449 ± 53.350
2025-05-01 19:28:11,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [388.90784, 411.56866, 425.68893, 429.10013, 348.0958, 396.7072, 379.19537, 525.7023, 326.6063, 350.97247]
2025-05-01 19:28:11,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [72.0, 77.0, 87.0, 79.0, 70.0, 73.0, 69.0, 94.0, 62.0, 66.0]
2025-05-01 19:28:11,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 5/100 (estimated time remaining: 33 hours, 41 minutes, 50 seconds)
2025-05-01 19:51:10,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 19:51:10,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 19:52:00,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 402.51065 ± 122.367
2025-05-01 19:52:00,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [388.42984, 342.4451, 317.66248, 327.20944, 331.24216, 376.10297, 536.651, 348.76962, 334.9588, 721.63513]
2025-05-01 19:52:00,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [72.0, 64.0, 60.0, 63.0, 63.0, 69.0, 100.0, 65.0, 76.0, 139.0]
2025-05-01 19:52:00,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 6/100 (estimated time remaining: 34 hours, 13 minutes, 3 seconds)
2025-05-01 20:14:09,820 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 20:14:09,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 20:15:08,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 457.02280 ± 66.050
2025-05-01 20:15:08,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [415.80652, 479.61148, 470.564, 458.00067, 343.40533, 503.34454, 611.41315, 434.78827, 427.65457, 425.63913]
2025-05-01 20:15:08,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [81.0, 88.0, 88.0, 87.0, 62.0, 95.0, 115.0, 80.0, 78.0, 80.0]
2025-05-01 20:15:08,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (457.02) for latency ExtremeClogL1U23
2025-05-01 20:15:08,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-01 20:15:08,397 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 20:15:08,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 7/100 (estimated time remaining: 35 hours, 19 minutes, 30 seconds)
2025-05-01 20:36:30,337 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 20:36:30,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 20:37:29,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 513.37549 ± 140.273
2025-05-01 20:37:29,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [395.29984, 772.8194, 384.63394, 699.8548, 361.7623, 539.7938, 660.9342, 445.9063, 470.25134, 402.49847]
2025-05-01 20:37:29,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [78.0, 150.0, 72.0, 130.0, 76.0, 103.0, 142.0, 84.0, 87.0, 85.0]
2025-05-01 20:37:29,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (513.38) for latency ExtremeClogL1U23
2025-05-01 20:37:29,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-01 20:37:29,451 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 20:37:29,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 8/100 (estimated time remaining: 35 hours, 12 minutes, 50 seconds)
2025-05-01 20:59:19,718 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 20:59:19,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 21:00:15,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 473.14117 ± 67.496
2025-05-01 21:00:15,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [530.6709, 373.3943, 528.96124, 554.69794, 502.5329, 561.7671, 400.3865, 390.97525, 438.08118, 449.94446]
2025-05-01 21:00:15,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [112.0, 80.0, 99.0, 120.0, 96.0, 121.0, 82.0, 87.0, 87.0, 98.0]
2025-05-01 21:00:15,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 9/100 (estimated time remaining: 35 hours, 6 minutes, 56 seconds)
2025-05-01 21:23:51,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 21:23:51,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 21:25:07,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 555.18500 ± 187.586
2025-05-01 21:25:07,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [516.9275, 571.90625, 424.8577, 663.7825, 444.6695, 1070.7803, 490.1921, 396.97406, 533.0468, 438.71326]
2025-05-01 21:25:07,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [95.0, 120.0, 78.0, 134.0, 82.0, 224.0, 92.0, 86.0, 105.0, 92.0]
2025-05-01 21:25:07,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (555.18) for latency ExtremeClogL1U23
2025-05-01 21:25:07,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-01 21:25:07,788 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 21:25:07,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 10/100 (estimated time remaining: 35 hours, 28 minutes, 11 seconds)
2025-05-01 21:46:11,394 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 21:46:11,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 21:47:26,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 544.39880 ± 158.365
2025-05-01 21:47:26,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [410.234, 609.80945, 713.2693, 906.0853, 575.12317, 443.11374, 478.74942, 438.1937, 530.4246, 338.98553]
2025-05-01 21:47:26,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [89.0, 132.0, 135.0, 177.0, 121.0, 82.0, 90.0, 82.0, 97.0, 62.0]
2025-05-01 21:47:26,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 11/100 (estimated time remaining: 34 hours, 37 minutes, 49 seconds)
2025-05-01 22:10:24,345 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 22:10:24,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 22:11:14,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 458.41431 ± 79.819
2025-05-01 22:11:14,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [586.6786, 379.5606, 458.72974, 537.99744, 351.89352, 500.92462, 449.89914, 397.31168, 366.39822, 554.74963]
2025-05-01 22:11:14,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [114.0, 70.0, 85.0, 100.0, 64.0, 91.0, 81.0, 73.0, 67.0, 103.0]
2025-05-01 22:11:14,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 12/100 (estimated time remaining: 34 hours, 26 minutes, 33 seconds)
2025-05-01 22:32:18,203 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 22:32:18,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 22:33:31,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 560.99622 ± 115.808
2025-05-01 22:33:31,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [512.569, 754.7324, 400.94788, 467.7065, 599.14404, 524.354, 457.82697, 676.1769, 488.3716, 728.13226]
2025-05-01 22:33:31,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [103.0, 144.0, 74.0, 87.0, 120.0, 96.0, 102.0, 133.0, 94.0, 142.0]
2025-05-01 22:33:31,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (561.00) for latency ExtremeClogL1U23
2025-05-01 22:33:31,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-01 22:33:31,807 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 22:33:31,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 13/100 (estimated time remaining: 34 hours, 2 minutes, 17 seconds)
2025-05-01 22:52:02,549 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 22:52:02,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 22:52:51,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 528.83008 ± 204.010
2025-05-01 22:52:51,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [580.9343, 144.8383, 839.0851, 441.23047, 361.82794, 435.45068, 811.3873, 689.6745, 399.5701, 584.3018]
2025-05-01 22:52:51,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [116.0, 28.0, 178.0, 84.0, 77.0, 94.0, 148.0, 132.0, 86.0, 111.0]
2025-05-01 22:52:51,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 14/100 (estimated time remaining: 32 hours, 39 minutes, 9 seconds)
2025-05-01 23:10:21,074 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 23:10:21,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 23:11:06,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 535.95349 ± 112.145
2025-05-01 23:11:06,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [621.76965, 469.26944, 342.39532, 495.26456, 426.528, 425.87854, 676.95123, 609.407, 670.8572, 621.2137]
2025-05-01 23:11:06,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [125.0, 86.0, 65.0, 93.0, 80.0, 80.0, 133.0, 115.0, 132.0, 114.0]
2025-05-01 23:11:06,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 15/100 (estimated time remaining: 30 hours, 22 minutes, 49 seconds)
2025-05-01 23:28:24,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 23:28:24,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 23:29:11,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 510.44540 ± 93.551
2025-05-01 23:29:11,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [598.2929, 467.6986, 542.04315, 389.97708, 426.37946, 416.7538, 684.40485, 432.20206, 607.40857, 539.29333]
2025-05-01 23:29:11,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [111.0, 99.0, 100.0, 86.0, 94.0, 92.0, 132.0, 84.0, 131.0, 98.0]
2025-05-01 23:29:11,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 16/100 (estimated time remaining: 28 hours, 49 minutes, 38 seconds)
2025-05-01 23:45:56,073 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 23:45:56,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 23:46:43,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 537.88055 ± 100.802
2025-05-01 23:46:43,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [765.6982, 436.24783, 468.39575, 500.43668, 521.2295, 542.47046, 505.94815, 449.05673, 501.13403, 688.1887]
2025-05-01 23:46:43,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [163.0, 80.0, 88.0, 101.0, 106.0, 105.0, 100.0, 89.0, 94.0, 131.0]
2025-05-01 23:46:43,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 17/100 (estimated time remaining: 26 hours, 44 minutes, 6 seconds)
2025-05-02 00:04:32,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 00:04:32,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 00:05:27,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 632.13947 ± 82.987
2025-05-02 00:05:27,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [623.8336, 540.5412, 597.06683, 792.6456, 645.0375, 528.1558, 604.4344, 587.89905, 772.516, 629.2643]
2025-05-02 00:05:27,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [123.0, 111.0, 119.0, 160.0, 126.0, 103.0, 112.0, 112.0, 149.0, 120.0]
2025-05-02 00:05:27,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (632.14) for latency ExtremeClogL1U23
2025-05-02 00:05:27,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 00:05:27,781 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 00:05:27,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 18/100 (estimated time remaining: 25 hours, 26 minutes, 4 seconds)
2025-05-02 00:22:25,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 00:22:25,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 00:23:23,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 636.35217 ± 252.228
2025-05-02 00:23:23,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [480.40622, 557.6547, 662.33673, 1143.5128, 503.59293, 124.662704, 705.61707, 631.0136, 886.32605, 668.3982]
2025-05-02 00:23:23,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [90.0, 125.0, 125.0, 240.0, 101.0, 24.0, 137.0, 136.0, 167.0, 122.0]
2025-05-02 00:23:23,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (636.35) for latency ExtremeClogL1U23
2025-05-02 00:23:23,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 00:23:23,223 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 00:23:23,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 19/100 (estimated time remaining: 24 hours, 44 minutes, 44 seconds)
2025-05-02 00:40:32,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 00:40:32,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 00:41:37,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 736.63281 ± 111.471
2025-05-02 00:41:37,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [771.90985, 676.08185, 648.58124, 646.4564, 595.2811, 761.3039, 730.4466, 801.59125, 718.2633, 1016.41266]
2025-05-02 00:41:37,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [152.0, 125.0, 139.0, 131.0, 110.0, 139.0, 138.0, 155.0, 139.0, 201.0]
2025-05-02 00:41:37,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (736.63) for latency ExtremeClogL1U23
2025-05-02 00:41:37,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 00:41:37,448 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 00:41:37,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 20/100 (estimated time remaining: 24 hours, 26 minutes, 21 seconds)
2025-05-02 00:58:38,190 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 00:58:38,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 00:59:38,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 662.53827 ± 155.822
2025-05-02 00:59:38,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [692.3501, 522.95416, 739.8482, 495.7247, 644.54095, 907.1436, 867.8459, 372.16458, 678.40344, 704.40735]
2025-05-02 00:59:38,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [131.0, 118.0, 160.0, 101.0, 139.0, 169.0, 169.0, 70.0, 141.0, 148.0]
2025-05-02 00:59:38,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 21/100 (estimated time remaining: 24 hours, 7 minutes, 11 seconds)
2025-05-02 01:17:54,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 01:17:54,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 01:18:56,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 679.36743 ± 160.392
2025-05-02 01:18:56,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [609.8489, 822.84296, 724.5698, 589.50336, 947.7997, 487.75558, 912.6983, 612.9679, 634.54266, 451.145]
2025-05-02 01:18:56,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [129.0, 175.0, 140.0, 114.0, 186.0, 94.0, 190.0, 127.0, 135.0, 83.0]
2025-05-02 01:18:56,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 22/100 (estimated time remaining: 24 hours, 16 minutes, 56 seconds)
2025-05-02 01:36:12,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 01:36:12,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 01:37:10,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 689.27875 ± 260.610
2025-05-02 01:37:10,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [476.33945, 630.91656, 1326.5127, 568.1328, 693.9272, 490.40427, 486.01193, 469.14066, 949.2777, 802.1245]
2025-05-02 01:37:10,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [85.0, 112.0, 261.0, 103.0, 130.0, 89.0, 101.0, 100.0, 175.0, 152.0]
2025-05-02 01:37:10,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 23/100 (estimated time remaining: 23 hours, 50 minutes, 47 seconds)
2025-05-02 01:55:14,486 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 01:55:14,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 01:56:31,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 807.20935 ± 193.585
2025-05-02 01:56:31,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1062.8936, 611.7986, 531.6941, 847.76465, 769.78107, 756.1172, 737.7177, 608.12665, 1116.5046, 1029.6951]
2025-05-02 01:56:31,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [203.0, 131.0, 116.0, 160.0, 141.0, 142.0, 146.0, 119.0, 225.0, 191.0]
2025-05-02 01:56:31,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (807.21) for latency ExtremeClogL1U23
2025-05-02 01:56:31,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 01:56:31,496 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 01:56:31,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 24/100 (estimated time remaining: 23 hours, 54 minutes, 19 seconds)
2025-05-02 02:13:27,463 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 02:13:27,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 02:14:33,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 789.33203 ± 171.937
2025-05-02 02:14:33,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [573.14197, 721.01776, 1155.9606, 933.4566, 709.15643, 649.1513, 976.5669, 826.43365, 676.9397, 671.496]
2025-05-02 02:14:33,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [120.0, 161.0, 219.0, 174.0, 132.0, 121.0, 181.0, 158.0, 132.0, 121.0]
2025-05-02 02:14:33,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 25/100 (estimated time remaining: 23 hours, 32 minutes, 34 seconds)
2025-05-02 02:31:26,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 02:31:26,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 02:32:33,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 812.67615 ± 206.967
2025-05-02 02:32:33,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [595.89087, 426.52264, 985.3432, 640.99915, 1110.1378, 742.73004, 974.6384, 1036.3643, 766.7741, 847.3614]
2025-05-02 02:32:33,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [115.0, 80.0, 182.0, 137.0, 218.0, 149.0, 192.0, 188.0, 137.0, 157.0]
2025-05-02 02:32:33,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (812.68) for latency ExtremeClogL1U23
2025-05-02 02:32:33,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 02:32:33,191 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 02:32:33,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 26/100 (estimated time remaining: 23 hours, 13 minutes, 44 seconds)
2025-05-02 02:49:52,507 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 02:49:52,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 02:50:55,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 747.40131 ± 197.522
2025-05-02 02:50:55,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [745.1224, 503.8633, 1232.9978, 682.6337, 790.1134, 795.6426, 751.6761, 850.6416, 616.61414, 504.7078]
2025-05-02 02:50:55,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [155.0, 98.0, 229.0, 148.0, 145.0, 151.0, 145.0, 158.0, 134.0, 108.0]
2025-05-02 02:50:55,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 27/100 (estimated time remaining: 22 hours, 41 minutes, 31 seconds)
2025-05-02 03:08:32,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 03:08:32,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 03:09:30,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 728.72845 ± 178.266
2025-05-02 03:09:30,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [702.0074, 768.2314, 541.17523, 753.7395, 1003.1476, 873.55475, 381.8047, 801.3273, 561.1583, 901.1383]
2025-05-02 03:09:30,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [136.0, 148.0, 99.0, 141.0, 189.0, 164.0, 72.0, 148.0, 106.0, 164.0]
2025-05-02 03:09:30,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 28/100 (estimated time remaining: 22 hours, 27 minutes, 58 seconds)
2025-05-02 03:26:05,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 03:26:05,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 03:27:33,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1040.98230 ± 364.588
2025-05-02 03:27:33,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [696.67804, 786.97046, 741.6935, 1297.8739, 932.9097, 1408.411, 1131.3959, 956.7851, 1844.3025, 612.80316]
2025-05-02 03:27:33,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [150.0, 169.0, 138.0, 247.0, 171.0, 261.0, 226.0, 202.0, 339.0, 131.0]
2025-05-02 03:27:33,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1040.98) for latency ExtremeClogL1U23
2025-05-02 03:27:33,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 03:27:33,143 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 03:27:33,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 29/100 (estimated time remaining: 21 hours, 50 minutes, 47 seconds)
2025-05-02 03:42:35,549 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 03:42:35,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 03:43:54,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 939.18866 ± 372.253
2025-05-02 03:43:54,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [671.7596, 1513.5508, 1457.5868, 1374.0693, 526.7631, 1095.5859, 696.43463, 540.1864, 602.9097, 913.0396]
2025-05-02 03:43:54,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [144.0, 283.0, 296.0, 257.0, 103.0, 206.0, 128.0, 101.0, 119.0, 177.0]
2025-05-02 03:43:54,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 30/100 (estimated time remaining: 21 hours, 8 minutes, 42 seconds)
2025-05-02 03:58:52,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 03:58:52,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 04:00:13,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 972.49963 ± 561.511
2025-05-02 04:00:13,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [623.2754, 1334.9712, 1156.2075, 791.1232, 1881.7924, 214.97484, 652.9612, 166.10129, 1783.813, 1119.7753]
2025-05-02 04:00:13,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [136.0, 261.0, 220.0, 146.0, 354.0, 41.0, 131.0, 32.0, 353.0, 228.0]
2025-05-02 04:00:13,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 31/100 (estimated time remaining: 20 hours, 27 minutes, 21 seconds)
2025-05-02 04:15:24,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 04:15:24,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 04:16:40,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 985.97937 ± 176.279
2025-05-02 04:16:40,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1140.1641, 879.592, 853.87036, 1047.8636, 924.1704, 1025.5546, 1143.8931, 583.22815, 1230.5609, 1030.8966]
2025-05-02 04:16:40,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [217.0, 169.0, 159.0, 190.0, 180.0, 194.0, 211.0, 109.0, 222.0, 211.0]
2025-05-02 04:16:40,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 32/100 (estimated time remaining: 19 hours, 43 minutes, 14 seconds)
2025-05-02 04:32:20,063 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 04:32:20,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 04:33:56,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1152.35168 ± 461.042
2025-05-02 04:33:56,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1095.817, 980.2992, 732.0634, 978.6831, 611.994, 770.5252, 1735.6586, 1310.1526, 2192.4194, 1115.9037]
2025-05-02 04:33:56,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [200.0, 194.0, 155.0, 184.0, 114.0, 146.0, 322.0, 244.0, 420.0, 216.0]
2025-05-02 04:33:56,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1152.35) for latency ExtremeClogL1U23
2025-05-02 04:33:56,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 04:33:56,395 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 04:33:56,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 33/100 (estimated time remaining: 19 hours, 8 minutes, 17 seconds)
2025-05-02 04:48:46,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 04:48:46,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 04:49:56,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 887.47675 ± 197.761
2025-05-02 04:49:56,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [807.73535, 575.4215, 958.8468, 832.84283, 933.04175, 902.0182, 789.1129, 1090.194, 673.3675, 1312.186]
2025-05-02 04:49:56,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [169.0, 106.0, 181.0, 182.0, 197.0, 181.0, 149.0, 214.0, 141.0, 247.0]
2025-05-02 04:49:56,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 24 minutes, 5 seconds)
2025-05-02 05:03:58,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 05:03:58,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 05:06:19,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1698.07349 ± 635.187
2025-05-02 05:06:19,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2525.5627, 1471.8937, 1471.9667, 2193.9353, 1269.2126, 1942.9028, 1040.7269, 1303.8053, 854.72906, 2905.9993]
2025-05-02 05:06:19,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [487.0, 280.0, 283.0, 446.0, 233.0, 379.0, 206.0, 257.0, 158.0, 574.0]
2025-05-02 05:06:19,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1698.07) for latency ExtremeClogL1U23
2025-05-02 05:06:19,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 05:06:19,337 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 05:06:19,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 7 minutes, 56 seconds)
2025-05-02 05:20:43,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 05:20:43,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 05:22:35,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1261.96704 ± 700.190
2025-05-02 05:22:35,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1405.172, 2586.9805, 1760.5874, 1239.5112, 1209.1543, 439.88748, 437.12695, 2178.6912, 723.8709, 638.68835]
2025-05-02 05:22:35,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [272.0, 501.0, 335.0, 251.0, 235.0, 97.0, 97.0, 456.0, 157.0, 139.0]
2025-05-02 05:22:35,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 50 minutes, 48 seconds)
2025-05-02 05:37:03,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 05:37:03,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 05:39:36,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1762.81091 ± 812.300
2025-05-02 05:39:36,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1720.9081, 861.202, 1219.9904, 1243.2832, 1056.8522, 1704.6407, 2223.3484, 2972.012, 3416.974, 1208.8992]
2025-05-02 05:39:36,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [341.0, 191.0, 242.0, 254.0, 210.0, 353.0, 441.0, 610.0, 674.0, 241.0]
2025-05-02 05:39:36,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1762.81) for latency ExtremeClogL1U23
2025-05-02 05:39:36,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 05:39:36,853 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 05:39:36,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 41 minutes, 40 seconds)
2025-05-02 05:53:51,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 05:53:51,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 05:55:41,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1287.37573 ± 775.596
2025-05-02 05:55:41,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1063.9716, 840.0746, 1447.4761, 1027.5518, 3051.2559, 748.5328, 2441.6917, 751.7021, 556.5764, 944.9237]
2025-05-02 05:55:41,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [217.0, 161.0, 285.0, 212.0, 608.0, 137.0, 485.0, 154.0, 125.0, 180.0]
2025-05-02 05:55:41,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 9 minutes, 59 seconds)
2025-05-02 06:10:03,226 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 06:10:03,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 06:12:58,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1989.62866 ± 790.815
2025-05-02 06:12:58,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2809.357, 2503.1594, 1898.3638, 1463.2734, 1947.6758, 1230.353, 1174.1597, 1240.8873, 1848.0787, 3780.9783]
2025-05-02 06:12:58,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [538.0, 488.0, 366.0, 286.0, 369.0, 235.0, 237.0, 260.0, 349.0, 737.0]
2025-05-02 06:12:58,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1989.63) for latency ExtremeClogL1U23
2025-05-02 06:12:58,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 06:12:58,593 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 06:12:58,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 39/100 (estimated time remaining: 17 hours, 9 minutes, 33 seconds)
2025-05-02 06:26:56,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 06:26:56,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 06:28:56,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1289.45032 ± 650.425
2025-05-02 06:28:56,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [485.97678, 2176.5352, 1674.6871, 956.593, 1229.3751, 1882.5165, 170.73239, 2045.2451, 1532.5919, 740.2497]
2025-05-02 06:28:56,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [94.0, 417.0, 318.0, 200.0, 235.0, 351.0, 33.0, 407.0, 310.0, 156.0]
2025-05-02 06:28:56,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 47 minutes, 54 seconds)
2025-05-02 06:43:22,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 06:43:22,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 06:46:47,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2275.60767 ± 977.323
2025-05-02 06:46:47,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1169.3983, 1434.4723, 3920.2388, 855.494, 2256.955, 1823.3951, 2707.5312, 1941.165, 3638.8008, 3008.6252]
2025-05-02 06:46:47,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [227.0, 286.0, 771.0, 185.0, 445.0, 366.0, 544.0, 378.0, 705.0, 587.0]
2025-05-02 06:46:47,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (2275.61) for latency ExtremeClogL1U23
2025-05-02 06:46:47,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 06:46:47,132 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 06:46:47,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 41/100 (estimated time remaining: 16 hours, 50 minutes, 19 seconds)
2025-05-02 07:01:06,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 07:01:06,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 07:03:33,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1703.20276 ± 1123.270
2025-05-02 07:03:33,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1704.9906, 1132.8461, 426.86316, 4248.089, 2057.357, 1948.4885, 504.09314, 434.6987, 1918.7579, 2655.8442]
2025-05-02 07:03:33,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [334.0, 229.0, 81.0, 835.0, 420.0, 356.0, 109.0, 83.0, 391.0, 520.0]
2025-05-02 07:03:33,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 42/100 (estimated time remaining: 16 hours, 30 minutes, 28 seconds)
2025-05-02 07:17:55,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 07:17:55,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 07:20:34,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1909.17737 ± 621.636
2025-05-02 07:20:34,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1341.8801, 1108.2998, 1336.7076, 2086.1218, 2454.217, 2634.3252, 2457.7014, 923.6315, 2458.1685, 2290.721]
2025-05-02 07:20:34,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [254.0, 230.0, 265.0, 400.0, 467.0, 516.0, 480.0, 179.0, 462.0, 432.0]
2025-05-02 07:20:34,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 43/100 (estimated time remaining: 16 hours, 24 minutes, 37 seconds)
2025-05-02 07:35:15,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 07:35:15,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 07:38:23,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2260.64697 ± 1545.412
2025-05-02 07:38:23,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5118.137, 1193.6154, 1009.0759, 1111.5363, 3465.755, 1645.5322, 1920.9037, 810.94727, 1432.4016, 4898.563]
2025-05-02 07:38:23,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [980.0, 229.0, 194.0, 214.0, 662.0, 314.0, 389.0, 152.0, 274.0, 931.0]
2025-05-02 07:38:23,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 44/100 (estimated time remaining: 16 hours, 13 minutes, 46 seconds)
2025-05-02 07:52:33,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 07:52:33,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 07:55:27,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2140.99414 ± 1284.759
2025-05-02 07:55:27,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5279.05, 1710.2584, 1924.5181, 3292.5427, 778.25336, 1002.30164, 908.9632, 1656.9695, 2308.1628, 2548.9243]
2025-05-02 07:55:27,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [991.0, 330.0, 371.0, 630.0, 149.0, 194.0, 173.0, 313.0, 443.0, 484.0]
2025-05-02 07:55:27,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 45/100 (estimated time remaining: 16 hours, 9 minutes, 6 seconds)
2025-05-02 08:09:37,421 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 08:09:37,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 08:12:58,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2461.10693 ± 1658.316
2025-05-02 08:12:58,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5161.782, 1160.2257, 5130.5986, 1694.7856, 2739.8862, 1332.9052, 196.5729, 3090.7285, 3391.6753, 711.9093]
2025-05-02 08:12:58,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 229.0, 1000.0, 320.0, 521.0, 261.0, 38.0, 594.0, 634.0, 137.0]
2025-05-02 08:12:58,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (2461.11) for latency ExtremeClogL1U23
2025-05-02 08:12:58,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 08:12:58,223 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 08:12:58,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 46/100 (estimated time remaining: 15 hours, 48 minutes, 2 seconds)
2025-05-02 08:27:25,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 08:27:25,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 08:30:57,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2597.14038 ± 1072.523
2025-05-02 08:30:57,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2373.3838, 2332.4634, 4353.6025, 1157.976, 1892.8026, 4458.513, 1521.4573, 3304.3315, 2745.5854, 1831.2878]
2025-05-02 08:30:57,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [458.0, 448.0, 842.0, 229.0, 355.0, 851.0, 296.0, 637.0, 541.0, 355.0]
2025-05-02 08:30:57,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (2597.14) for latency ExtremeClogL1U23
2025-05-02 08:30:57,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 08:30:57,087 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 08:30:57,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 47/100 (estimated time remaining: 15 hours, 43 minutes, 54 seconds)
2025-05-02 08:45:07,784 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 08:45:07,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 08:49:15,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3014.23315 ± 1541.509
2025-05-02 08:49:15,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2055.511, 5190.7495, 4389.6963, 3205.5132, 2062.4705, 466.45392, 1193.2106, 2599.7124, 3804.0166, 5174.9956]
2025-05-02 08:49:15,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [386.0, 1000.0, 842.0, 603.0, 396.0, 84.0, 230.0, 503.0, 707.0, 1000.0]
2025-05-02 08:49:15,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (3014.23) for latency ExtremeClogL1U23
2025-05-02 08:49:15,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 08:49:15,238 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 08:49:15,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 48/100 (estimated time remaining: 15 hours, 40 minutes, 4 seconds)
2025-05-02 09:03:43,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 09:03:43,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 09:07:09,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2490.95947 ± 1034.059
2025-05-02 09:07:09,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1179.6257, 3472.593, 4126.0225, 1180.5692, 2420.376, 1509.8145, 3169.1094, 2757.2495, 1518.9412, 3575.2944]
2025-05-02 09:07:09,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [226.0, 656.0, 799.0, 219.0, 473.0, 284.0, 605.0, 530.0, 303.0, 687.0]
2025-05-02 09:07:09,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 49/100 (estimated time remaining: 15 hours, 23 minutes, 6 seconds)
2025-05-02 09:21:22,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 09:21:22,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 09:24:46,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2459.82227 ± 1619.685
2025-05-02 09:24:46,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [927.8106, 5182.335, 144.40018, 1620.7401, 5127.784, 2025.7906, 2826.8232, 3621.1929, 1524.6356, 1596.7106]
2025-05-02 09:24:46,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [176.0, 1000.0, 28.0, 309.0, 1000.0, 398.0, 524.0, 701.0, 304.0, 312.0]
2025-05-02 09:24:46,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 50/100 (estimated time remaining: 15 hours, 10 minutes, 58 seconds)
2025-05-02 09:38:42,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 09:38:42,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 09:41:55,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2213.58057 ± 723.500
2025-05-02 09:41:55,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1836.3542, 3526.6008, 1485.8195, 3174.1458, 1244.5021, 1886.8329, 2077.969, 1793.1234, 2085.45, 3025.0083]
2025-05-02 09:41:55,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [349.0, 678.0, 284.0, 617.0, 249.0, 375.0, 423.0, 349.0, 398.0, 580.0]
2025-05-02 09:41:55,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 51/100 (estimated time remaining: 14 hours, 49 minutes, 33 seconds)
2025-05-02 09:57:15,515 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 09:57:15,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 10:01:13,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2749.08838 ± 1778.106
2025-05-02 10:01:13,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5185.0073, 3374.5513, 2049.6033, 2342.568, 1234.8335, 2477.8794, 150.49037, 554.9094, 4965.4614, 5155.5796]
2025-05-02 10:01:13,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 655.0, 397.0, 450.0, 233.0, 481.0, 29.0, 109.0, 972.0, 1000.0]
2025-05-02 10:01:13,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 52/100 (estimated time remaining: 14 hours, 44 minutes, 36 seconds)
2025-05-02 10:14:31,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 10:14:31,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 10:17:52,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2246.71753 ± 1997.194
2025-05-02 10:17:52,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2993.315, 4519.981, 647.85626, 182.34862, 326.63034, 5149.169, 670.3376, 5079.546, 2762.6074, 135.3843]
2025-05-02 10:17:52,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [587.0, 887.0, 130.0, 35.0, 64.0, 1000.0, 121.0, 1000.0, 554.0, 26.0]
2025-05-02 10:17:52,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 53/100 (estimated time remaining: 14 hours, 10 minutes, 42 seconds)
2025-05-02 10:32:14,761 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 10:32:14,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 10:36:24,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2928.78906 ± 1498.568
2025-05-02 10:36:24,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1581.5323, 2813.9355, 5106.566, 3796.2737, 1558.0311, 145.92432, 5103.005, 2730.685, 2693.4585, 3758.479]
2025-05-02 10:36:24,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [303.0, 544.0, 1000.0, 731.0, 288.0, 28.0, 1000.0, 535.0, 514.0, 734.0]
2025-05-02 10:36:24,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 54/100 (estimated time remaining: 13 hours, 59 minutes)
2025-05-02 10:51:13,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 10:51:13,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 10:55:27,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3022.19141 ± 1505.333
2025-05-02 10:55:27,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [3099.2292, 621.0822, 3965.3154, 5185.418, 1616.7096, 2879.0437, 3473.2727, 5159.0503, 908.19867, 3314.594]
2025-05-02 10:55:27,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [609.0, 134.0, 766.0, 1000.0, 305.0, 554.0, 669.0, 1000.0, 171.0, 652.0]
2025-05-02 10:55:27,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (3022.19) for latency ExtremeClogL1U23
2025-05-02 10:55:27,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 10:55:27,466 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 10:55:28,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 55/100 (estimated time remaining: 13 hours, 54 minutes, 21 seconds)
2025-05-02 11:09:37,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 11:09:37,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 11:14:37,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3595.36523 ± 1726.223
2025-05-02 11:14:37,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5217.5127, 3845.3708, 4995.423, 2037.7798, 2175.0068, 1726.6129, 461.80322, 5161.986, 5144.827, 5187.3335]
2025-05-02 11:14:37,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 751.0, 1000.0, 390.0, 421.0, 335.0, 92.0, 1000.0, 1000.0, 1000.0]
2025-05-02 11:14:37,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (3595.37) for latency ExtremeClogL1U23
2025-05-02 11:14:37,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 11:14:37,535 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 11:14:37,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 56/100 (estimated time remaining: 13 hours, 54 minutes, 17 seconds)
2025-05-02 11:29:22,105 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 11:29:22,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 11:34:38,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3759.13403 ± 1589.734
2025-05-02 11:34:38,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5104.3115, 5103.663, 2903.5056, 683.4452, 1978.3494, 4441.501, 2144.2915, 5116.2573, 5020.1235, 5095.891]
2025-05-02 11:34:38,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 580.0, 147.0, 396.0, 885.0, 428.0, 1000.0, 1000.0, 1000.0]
2025-05-02 11:34:38,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (3759.13) for latency ExtremeClogL1U23
2025-05-02 11:34:38,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 11:34:38,603 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 11:34:38,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 57/100 (estimated time remaining: 13 hours, 42 minutes, 8 seconds)
2025-05-02 11:49:24,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 11:49:24,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 11:53:18,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2688.06787 ± 1582.836
2025-05-02 11:53:18,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5142.127, 1721.8303, 1219.878, 1882.7095, 1710.7219, 5083.178, 1738.934, 1693.6036, 1633.3156, 5054.3784]
2025-05-02 11:53:18,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 336.0, 226.0, 367.0, 332.0, 1000.0, 348.0, 329.0, 332.0, 1000.0]
2025-05-02 11:53:19,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 58/100 (estimated time remaining: 13 hours, 40 minutes, 50 seconds)
2025-05-02 12:07:00,872 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 12:07:00,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 12:12:58,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4211.41553 ± 1203.063
2025-05-02 12:12:58,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [4725.27, 5105.422, 2676.4016, 5079.0103, 5178.373, 2446.6313, 5161.271, 4426.417, 2126.3152, 5189.041]
2025-05-02 12:12:58,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [922.0, 969.0, 532.0, 1000.0, 993.0, 469.0, 1000.0, 843.0, 413.0, 1000.0]
2025-05-02 12:12:58,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (4211.42) for latency ExtremeClogL1U23
2025-05-02 12:12:58,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 12:12:58,715 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 12:12:58,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 59/100 (estimated time remaining: 13 hours, 31 minutes, 10 seconds)
2025-05-02 12:27:20,177 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 12:27:20,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 12:31:55,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3351.13208 ± 1863.130
2025-05-02 12:31:55,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [491.96957, 5313.762, 5190.078, 5227.2036, 5230.74, 2025.5728, 4544.017, 1420.7021, 1065.8325, 3001.4426]
2025-05-02 12:31:55,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [96.0, 1000.0, 1000.0, 1000.0, 1000.0, 380.0, 886.0, 262.0, 201.0, 562.0]
2025-05-02 12:31:55,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 60/100 (estimated time remaining: 13 hours, 10 minutes, 53 seconds)
2025-05-02 12:46:54,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 12:46:54,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 12:51:45,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3457.39990 ± 1423.434
2025-05-02 12:51:45,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2767.8442, 2257.0076, 2159.6765, 2769.3386, 877.19867, 5083.5503, 4665.7866, 3826.2266, 5078.8403, 5088.5303]
2025-05-02 12:51:45,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [535.0, 433.0, 412.0, 547.0, 168.0, 1000.0, 917.0, 757.0, 1000.0, 1000.0]
2025-05-02 12:51:45,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 61/100 (estimated time remaining: 12 hours, 57 minutes, 1 second)
2025-05-02 13:06:22,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 13:06:22,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 13:12:14,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4114.22705 ± 1245.933
2025-05-02 13:12:14,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5068.0464, 5112.064, 5060.6987, 5082.5317, 5113.119, 5172.735, 2935.4968, 2087.5757, 2267.8157, 3242.189]
2025-05-02 13:12:14,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 575.0, 412.0, 423.0, 645.0]
2025-05-02 13:12:14,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 62/100 (estimated time remaining: 12 hours, 41 minutes, 12 seconds)
2025-05-02 13:25:25,956 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 13:25:25,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 13:30:49,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3627.71558 ± 1957.364
2025-05-02 13:30:49,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [4971.0464, 4908.9766, 4714.4614, 5050.114, 5008.88, 134.242, 194.93993, 1848.4812, 4405.6626, 5040.354]
2025-05-02 13:30:49,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 904.0, 1000.0, 1000.0, 26.0, 37.0, 365.0, 855.0, 1000.0]
2025-05-02 13:30:49,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 63/100 (estimated time remaining: 12 hours, 21 minutes, 2 seconds)
2025-05-02 13:45:14,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 13:45:14,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 13:50:41,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3866.08936 ± 1610.680
2025-05-02 13:50:41,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5085.7563, 5041.4473, 5079.19, 5026.9854, 4959.376, 4734.5254, 1334.5347, 3201.574, 485.095, 3712.4119]
2025-05-02 13:50:41,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 980.0, 919.0, 258.0, 629.0, 96.0, 725.0]
2025-05-02 13:50:41,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 64/100 (estimated time remaining: 12 hours, 3 minutes, 3 seconds)
2025-05-02 14:06:01,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 14:06:01,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 14:09:59,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2843.03442 ± 1521.990
2025-05-02 14:09:59,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2703.4897, 3499.3723, 2769.733, 4858.768, 734.77216, 4639.007, 985.24506, 3233.2441, 613.7775, 4392.9365]
2025-05-02 14:09:59,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [524.0, 688.0, 534.0, 932.0, 136.0, 873.0, 178.0, 662.0, 114.0, 849.0]
2025-05-02 14:09:59,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 65/100 (estimated time remaining: 11 hours, 46 minutes, 5 seconds)
2025-05-02 14:23:55,580 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 14:23:55,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 14:27:35,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2644.26074 ± 1610.820
2025-05-02 14:27:35,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1119.6279, 2924.2876, 5052.688, 2707.1858, 5225.384, 1241.368, 1086.3481, 316.99454, 3676.734, 3091.99]
2025-05-02 14:27:35,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [220.0, 554.0, 982.0, 527.0, 1000.0, 235.0, 212.0, 62.0, 711.0, 595.0]
2025-05-02 14:27:35,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 66/100 (estimated time remaining: 11 hours, 10 minutes, 49 seconds)
2025-05-02 14:41:56,696 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 14:41:56,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 14:47:27,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3581.96631 ± 1797.969
2025-05-02 14:47:27,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [727.21045, 5053.544, 5132.418, 420.0756, 1998.5942, 3669.878, 5143.269, 3463.3962, 5123.22, 5088.0596]
2025-05-02 14:47:27,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [135.0, 1000.0, 1000.0, 85.0, 406.0, 711.0, 1000.0, 675.0, 1000.0, 1000.0]
2025-05-02 14:47:27,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 67/100 (estimated time remaining: 10 hours, 47 minutes, 32 seconds)
2025-05-02 15:04:52,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 15:04:52,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 15:10:18,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3508.50000 ± 1617.600
2025-05-02 15:10:18,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5093.3555, 2380.2698, 4228.9473, 5222.5967, 3531.4878, 1926.0526, 5213.659, 1527.2969, 5105.5054, 855.834]
2025-05-02 15:10:18,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 461.0, 807.0, 1000.0, 671.0, 375.0, 1000.0, 290.0, 1000.0, 161.0]
2025-05-02 15:10:18,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 68/100 (estimated time remaining: 10 hours, 56 minutes, 35 seconds)
2025-05-02 15:28:30,467 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 15:28:30,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 15:35:47,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4608.76416 ± 1084.524
2025-05-02 15:35:47,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5080.9478, 5081.4653, 4113.2153, 5048.501, 5056.731, 5039.548, 5143.318, 5042.255, 5012.0107, 1469.6475]
2025-05-02 15:35:47,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 820.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 295.0]
2025-05-02 15:35:47,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (4608.76) for latency ExtremeClogL1U23
2025-05-02 15:35:47,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-02 15:35:47,548 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 15:35:47,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 69/100 (estimated time remaining: 11 hours, 12 minutes, 40 seconds)
2025-05-02 15:52:28,656 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 15:52:28,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 15:58:51,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3837.16919 ± 1761.693
2025-05-02 15:58:51,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5051.597, 4500.834, 5137.539, 5089.752, 4947.0923, 818.2256, 1447.5348, 1237.2317, 5067.0547, 5074.833]
2025-05-02 15:58:51,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 876.0, 1000.0, 1000.0, 1000.0, 141.0, 282.0, 246.0, 1000.0, 1000.0]
2025-05-02 15:58:52,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 70/100 (estimated time remaining: 11 hours, 15 minutes, 4 seconds)
2025-05-02 16:17:28,289 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 16:17:28,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 16:21:57,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2339.53198 ± 2053.915
2025-05-02 16:21:57,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5245.2896, 1615.5093, 386.4444, 5132.2554, 451.0888, 5127.39, 3447.5527, 392.88464, 1297.8657, 299.0413]
2025-05-02 16:21:57,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 323.0, 76.0, 1000.0, 100.0, 1000.0, 677.0, 77.0, 251.0, 56.0]
2025-05-02 16:21:57,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 71/100 (estimated time remaining: 11 hours, 26 minutes, 11 seconds)
2025-05-02 16:40:41,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 16:40:41,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 16:46:42,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3613.83911 ± 1913.006
2025-05-02 16:46:42,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [3023.4053, 5062.1567, 5039.5303, 778.38763, 5083.848, 1694.2806, 192.23892, 5043.341, 5122.241, 5098.9624]
2025-05-02 16:46:42,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [605.0, 1000.0, 1000.0, 154.0, 1000.0, 322.0, 37.0, 1000.0, 1000.0, 1000.0]
2025-05-02 16:46:42,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 72/100 (estimated time remaining: 11 hours, 31 minutes, 35 seconds)
2025-05-02 17:05:12,206 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 17:05:12,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 17:11:09,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3967.54248 ± 1403.005
2025-05-02 17:11:09,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5150.6616, 2724.0083, 5136.851, 1242.9396, 4126.8223, 5159.08, 5115.123, 5168.3916, 3812.525, 2039.0227]
2025-05-02 17:11:09,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 528.0, 1000.0, 236.0, 795.0, 1000.0, 1000.0, 1000.0, 732.0, 374.0]
2025-05-02 17:11:09,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 73/100 (estimated time remaining: 11 hours, 16 minutes, 46 seconds)
2025-05-02 17:28:31,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 17:28:31,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 17:33:24,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3276.03174 ± 1875.342
2025-05-02 17:33:24,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [4588.624, 172.17923, 4080.1367, 4075.9104, 1129.1984, 5151.76, 797.578, 5215.086, 5188.8184, 2361.0266]
2025-05-02 17:33:24,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [882.0, 33.0, 787.0, 799.0, 212.0, 1000.0, 172.0, 1000.0, 1000.0, 456.0]
2025-05-02 17:33:24,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 74/100 (estimated time remaining: 10 hours, 35 minutes, 6 seconds)
2025-05-02 17:51:25,433 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 17:51:25,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 17:56:08,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3113.70166 ± 1455.350
2025-05-02 17:56:08,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5225.576, 2356.4792, 2857.5837, 2618.5989, 3161.7893, 603.9825, 4136.9097, 5170.8755, 3827.286, 1177.9364]
2025-05-02 17:56:08,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 457.0, 567.0, 512.0, 602.0, 116.0, 799.0, 1000.0, 740.0, 216.0]
2025-05-02 17:56:08,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 75/100 (estimated time remaining: 10 hours, 9 minutes, 49 seconds)
2025-05-02 18:14:05,325 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 18:14:05,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 18:18:51,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3082.75635 ± 1755.498
2025-05-02 18:18:51,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1783.2583, 3808.6821, 832.105, 1618.1079, 5296.7275, 4495.2803, 565.3803, 4806.5947, 5247.011, 2374.4167]
2025-05-02 18:18:51,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [352.0, 721.0, 161.0, 297.0, 1000.0, 856.0, 112.0, 928.0, 1000.0, 445.0]
2025-05-02 18:18:51,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 76/100 (estimated time remaining: 9 hours, 44 minutes, 33 seconds)
2025-05-02 18:37:27,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 18:37:27,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 18:42:32,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3205.35376 ± 1753.913
2025-05-02 18:42:32,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5097.841, 4455.995, 621.56885, 1562.7289, 992.15967, 2354.208, 5069.927, 2084.6028, 5124.0845, 4690.422]
2025-05-02 18:42:32,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 875.0, 131.0, 297.0, 188.0, 460.0, 1000.0, 404.0, 1000.0, 923.0]
2025-05-02 18:42:32,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 77/100 (estimated time remaining: 9 hours, 16 minutes, 2 seconds)
2025-05-02 18:59:09,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 18:59:09,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 19:05:41,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4012.64209 ± 1539.954
2025-05-02 19:05:41,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5001.598, 1410.5852, 5174.081, 5128.67, 5165.8906, 2398.213, 5155.078, 1421.342, 4070.1682, 5200.7974]
2025-05-02 19:05:41,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 296.0, 1000.0, 1000.0, 1000.0, 462.0, 1000.0, 276.0, 794.0, 1000.0]
2025-05-02 19:05:41,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 78/100 (estimated time remaining: 8 hours, 46 minutes, 49 seconds)
2025-05-02 19:23:24,879 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 19:23:24,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 19:29:02,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3596.73047 ± 1783.494
2025-05-02 19:29:02,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5105.1206, 4155.083, 5061.5767, 877.4119, 1443.0386, 5074.333, 5060.2563, 671.8361, 3457.96, 5060.686]
2025-05-02 19:29:02,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 810.0, 1000.0, 173.0, 283.0, 1000.0, 1000.0, 133.0, 674.0, 1000.0]
2025-05-02 19:29:02,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 79/100 (estimated time remaining: 8 hours, 28 minutes, 46 seconds)
2025-05-02 19:46:58,063 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 19:46:58,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 19:51:51,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3253.34058 ± 1483.733
2025-05-02 19:51:51,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5161.26, 5094.4336, 1050.4736, 1919.634, 5002.703, 4138.6675, 2700.8872, 2043.9535, 1673.2377, 3748.1555]
2025-05-02 19:51:51,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 221.0, 391.0, 971.0, 819.0, 518.0, 404.0, 326.0, 736.0]
2025-05-02 19:51:51,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 80/100 (estimated time remaining: 8 hours, 6 minutes, 2 seconds)
2025-05-02 20:10:30,506 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 20:10:30,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 20:15:03,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3056.55737 ± 1650.652
2025-05-02 20:15:03,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [3197.7593, 1007.22107, 3823.2134, 5087.381, 5130.139, 381.1197, 2389.2622, 4478.8555, 3921.9443, 1148.6799]
2025-05-02 20:15:03,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [625.0, 199.0, 752.0, 1000.0, 1000.0, 71.0, 448.0, 874.0, 759.0, 209.0]
2025-05-02 20:15:03,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 81/100 (estimated time remaining: 7 hours, 44 minutes, 48 seconds)
2025-05-02 20:31:01,808 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 20:31:01,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 20:36:12,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3248.22046 ± 1565.499
2025-05-02 20:36:12,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1200.4655, 5099.5728, 5095.926, 2109.7612, 1943.0894, 2147.153, 5157.3296, 2694.2773, 5124.312, 1910.3182]
2025-05-02 20:36:12,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [247.0, 1000.0, 1000.0, 416.0, 375.0, 424.0, 1000.0, 524.0, 1000.0, 383.0]
2025-05-02 20:36:12,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 82/100 (estimated time remaining: 7 hours, 11 minutes, 53 seconds)
2025-05-02 20:54:14,325 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 20:54:14,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 20:59:01,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3082.25171 ± 1763.819
2025-05-02 20:59:01,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5165.659, 2156.8003, 5182.176, 839.4865, 5045.3926, 5159.889, 1967.8833, 1748.0936, 2748.7197, 808.42017]
2025-05-02 20:59:01,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 419.0, 1000.0, 163.0, 1000.0, 998.0, 404.0, 329.0, 537.0, 161.0]
2025-05-02 20:59:01,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 83/100 (estimated time remaining: 6 hours, 48 minutes, 1 second)
2025-05-02 21:16:13,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 21:16:13,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 21:21:32,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3297.93433 ± 1817.887
2025-05-02 21:21:32,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1045.1235, 5052.491, 379.35373, 3806.9402, 5029.377, 2871.9624, 4996.078, 785.63873, 5147.2275, 3865.1526]
2025-05-02 21:21:32,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [224.0, 1000.0, 70.0, 751.0, 1000.0, 564.0, 1000.0, 142.0, 1000.0, 751.0]
2025-05-02 21:21:32,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 84/100 (estimated time remaining: 6 hours, 22 minutes, 32 seconds)
2025-05-02 21:40:08,943 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 21:40:08,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 21:45:35,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3511.70166 ± 2093.476
2025-05-02 21:45:35,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5149.484, 5071.293, 5111.8887, 5064.8687, 2847.8684, 954.198, 5223.2275, 140.48645, 391.17227, 5162.5254]
2025-05-02 21:45:35,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 986.0, 1000.0, 1000.0, 558.0, 177.0, 1000.0, 27.0, 72.0, 1000.0]
2025-05-02 21:45:35,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 85/100 (estimated time remaining: 6 hours, 3 minutes, 57 seconds)
2025-05-02 22:02:39,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 22:02:39,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 22:09:20,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4418.27881 ± 1303.927
2025-05-02 22:09:20,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [4988.2305, 3998.4197, 5123.5615, 4852.6772, 5227.826, 5053.143, 5159.84, 3883.523, 5138.22, 757.35266]
2025-05-02 22:09:20,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [960.0, 772.0, 1000.0, 954.0, 1000.0, 1000.0, 1000.0, 780.0, 1000.0, 147.0]
2025-05-02 22:09:20,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 86/100 (estimated time remaining: 5 hours, 42 minutes, 50 seconds)
2025-05-02 22:27:33,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 22:27:33,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 22:33:58,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4162.91113 ± 1389.901
2025-05-02 22:33:58,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [3844.1143, 5142.05, 1493.5822, 5171.8374, 5110.8, 1512.5278, 5103.7017, 4688.245, 4416.306, 5145.95]
2025-05-02 22:33:58,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [737.0, 1000.0, 282.0, 1000.0, 1000.0, 296.0, 1000.0, 904.0, 869.0, 1000.0]
2025-05-02 22:33:58,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 87/100 (estimated time remaining: 5 hours, 29 minutes, 47 seconds)
2025-05-02 22:50:58,928 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 22:50:58,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 22:56:45,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3567.89697 ± 1726.717
2025-05-02 22:56:45,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5085.6885, 1603.9564, 3262.4045, 5063.093, 5050.0444, 5106.6943, 1246.2649, 3467.8335, 5183.37, 609.61896]
2025-05-02 22:56:45,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 319.0, 656.0, 1000.0, 1000.0, 1000.0, 241.0, 686.0, 1000.0, 125.0]
2025-05-02 22:56:45,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 88/100 (estimated time remaining: 5 hours, 6 minutes, 5 seconds)
2025-05-02 23:15:14,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 23:15:14,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 23:18:56,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2212.75610 ± 1798.401
2025-05-02 23:18:56,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5102.5674, 338.59988, 3750.544, 1734.8184, 439.082, 855.0585, 5145.722, 2341.37, 2274.378, 145.41905]
2025-05-02 23:18:56,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 71.0, 753.0, 340.0, 80.0, 183.0, 1000.0, 473.0, 467.0, 28.0]
2025-05-02 23:18:56,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 89/100 (estimated time remaining: 4 hours, 41 minutes, 45 seconds)
2025-05-02 23:36:07,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 23:36:07,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 23:42:28,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4030.23169 ± 1683.257
2025-05-02 23:42:28,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5109.1704, 5128.3916, 5152.5156, 5108.734, 1035.6345, 5099.551, 1344.4163, 5087.9897, 5160.2476, 2075.6643]
2025-05-02 23:42:28,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 199.0, 1000.0, 245.0, 1000.0, 1000.0, 401.0]
2025-05-02 23:42:28,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 90/100 (estimated time remaining: 4 hours, 17 minutes, 8 seconds)
2025-05-03 00:00:27,164 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 00:00:27,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 00:07:20,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4189.36035 ± 1527.738
2025-05-03 00:07:20,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5137.23, 5128.72, 533.4061, 5003.8076, 5006.6284, 5008.34, 5091.5493, 3962.851, 2026.6012, 4994.474]
2025-05-03 00:07:20,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 115.0, 1000.0, 1000.0, 1000.0, 1000.0, 799.0, 405.0, 1000.0]
2025-05-03 00:07:20,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 55 minutes, 59 seconds)
2025-05-03 00:25:43,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 00:25:43,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 00:31:01,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3302.25537 ± 1863.965
2025-05-03 00:31:01,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5016.044, 5043.487, 1175.3174, 662.0144, 2991.8826, 1919.9562, 4987.016, 5003.4697, 5287.4243, 935.94244]
2025-05-03 00:31:01,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 234.0, 125.0, 582.0, 366.0, 1000.0, 1000.0, 1000.0, 188.0]
2025-05-03 00:31:01,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 92/100 (estimated time remaining: 3 hours, 30 minutes, 40 seconds)
2025-05-03 00:48:14,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 00:48:14,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 00:52:49,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2990.66797 ± 1902.121
2025-05-03 00:52:49,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1536.6155, 1619.061, 3329.3894, 5177.8335, 5185.0327, 829.96466, 5090.253, 1483.278, 5180.19, 475.06268]
2025-05-03 00:52:49,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [313.0, 312.0, 643.0, 1000.0, 1000.0, 170.0, 998.0, 271.0, 1000.0, 98.0]
2025-05-03 00:52:49,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 93/100 (estimated time remaining: 3 hours, 5 minutes, 42 seconds)
2025-05-03 01:11:00,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 01:11:00,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 01:18:22,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4490.61621 ± 1472.667
2025-05-03 01:18:22,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [4655.9326, 5140.3535, 4683.5776, 5157.3027, 5208.3843, 4504.3096, 5190.42, 5163.2725, 5067.037, 135.56729]
2025-05-03 01:18:22,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [910.0, 1000.0, 900.0, 1000.0, 1000.0, 873.0, 1000.0, 1000.0, 1000.0, 26.0]
2025-05-03 01:18:22,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 47 minutes, 12 seconds)
2025-05-03 01:35:17,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 01:35:17,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 01:41:05,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3810.89185 ± 1765.799
2025-05-03 01:41:05,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5168.6704, 5219.999, 1109.8552, 703.5628, 2628.0664, 2571.1812, 5182.735, 5189.7856, 5167.517, 5167.5474]
2025-05-03 01:41:05,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 214.0, 131.0, 483.0, 482.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 01:41:05,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 95/100 (estimated time remaining: 2 hours, 22 minutes, 20 seconds)
2025-05-03 01:58:54,037 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 01:58:54,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 02:04:46,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3968.74097 ± 1592.174
2025-05-03 02:04:46,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1968.5889, 5150.395, 5157.9814, 5110.4175, 3188.1228, 5106.028, 3284.8801, 5099.782, 5145.4863, 475.7314]
2025-05-03 02:04:46,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [366.0, 1000.0, 1000.0, 1000.0, 603.0, 1000.0, 633.0, 1000.0, 1000.0, 89.0]
2025-05-03 02:04:46,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 57 minutes, 25 seconds)
2025-05-03 02:20:56,878 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 02:20:56,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 02:25:48,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3423.78906 ± 1562.617
2025-05-03 02:25:48,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5218.873, 5222.4233, 1733.2068, 5221.435, 1990.424, 3329.441, 345.94833, 3682.441, 3917.9282, 3575.7708]
2025-05-03 02:25:48,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 330.0, 1000.0, 377.0, 645.0, 66.0, 701.0, 740.0, 679.0]
2025-05-03 02:25:48,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 31 minutes, 49 seconds)
2025-05-03 02:42:25,455 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 02:42:25,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 02:48:55,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 4581.09277 ± 1216.586
2025-05-03 02:48:55,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5251.677, 5116.3823, 2856.9297, 5195.5063, 5138.9116, 5245.464, 5167.3857, 5082.747, 5176.767, 1579.1638]
2025-05-03 02:48:55,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 560.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 304.0]
2025-05-03 02:48:55,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 98/100 (estimated time remaining: 1 hour, 9 minutes, 39 seconds)
2025-05-03 03:03:52,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 03:03:52,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 03:06:15,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1629.52441 ± 1214.769
2025-05-03 03:06:15,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [448.93842, 2770.9487, 1058.1246, 691.7637, 407.91904, 4376.2725, 747.01764, 1767.3207, 1411.5869, 2615.3523]
2025-05-03 03:06:15,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [83.0, 544.0, 197.0, 131.0, 82.0, 793.0, 139.0, 336.0, 286.0, 499.0]
2025-05-03 03:06:15,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 99/100 (estimated time remaining: 43 minutes, 9 seconds)
2025-05-03 03:21:11,292 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 03:21:11,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 03:26:11,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3513.04443 ± 1412.628
2025-05-03 03:26:11,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2465.0278, 4402.85, 4440.245, 1219.9496, 2029.75, 5142.3877, 5127.987, 5121.715, 2901.1472, 2279.388]
2025-05-03 03:26:11,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [494.0, 845.0, 870.0, 245.0, 389.0, 1000.0, 1000.0, 1000.0, 563.0, 437.0]
2025-05-03 03:26:11,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 100/100 (estimated time remaining: 21 minutes, 1 second)
2025-05-03 03:40:59,364 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 03:40:59,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 03:45:59,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3540.36060 ± 1616.732
2025-05-03 03:45:59,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5252.4414, 4079.8337, 1577.373, 327.71304, 4976.292, 2045.0536, 4577.6304, 5321.7876, 3322.143, 3923.339]
2025-05-03 03:45:59,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 785.0, 306.0, 64.0, 947.0, 388.0, 871.0, 1000.0, 635.0, 737.0]
2025-05-03 03:45:59,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1149 [DEBUG]: Training session finished
