2025-05-06 15:36:34,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-05-06 15:36:34,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-05-06 15:36:34,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x151478ad3810>}
2025-05-06 15:36:34,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1009 [DEBUG]: using device: cuda
2025-05-06 15:36:34,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1031 [INFO]: Creating new trainer
2025-05-06 15:36:34,316 baseline-mbpac-noisy-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-06 15:36:34,316 baseline-mbpac-noisy-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 15:36:34,326 baseline-mbpac-noisy-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-05-06 15:36:35,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1092 [DEBUG]: Starting training session...
2025-05-06 15:36:35,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 1/100
2025-05-06 15:49:39,358 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 15:49:39,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:49:54,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 233.79956 ± 24.235
2025-05-06 15:49:54,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [252.10051, 246.59277, 220.62392, 201.40083, 209.74803, 215.9242, 221.19917, 240.15862, 241.75671, 288.49075]
2025-05-06 15:49:55,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [54.0, 50.0, 47.0, 44.0, 43.0, 45.0, 45.0, 51.0, 50.0, 59.0]
2025-05-06 15:49:55,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (233.80) for latency ExtremeSparseL4U32
2025-05-06 15:49:56,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 15:49:56,206 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 15:49:56,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 2/100 (estimated time remaining: 22 hours, 1 minute, 38 seconds)
2025-05-06 16:02:32,894 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:02:33,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:02:54,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 350.81476 ± 89.200
2025-05-06 16:02:54,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [389.8185, 329.05078, 291.792, 425.74225, 481.99768, 301.08917, 408.57935, 385.13632, 140.42445, 354.51715]
2025-05-06 16:02:54,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [83.0, 62.0, 54.0, 82.0, 90.0, 63.0, 87.0, 71.0, 27.0, 66.0]
2025-05-06 16:02:54,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (350.81) for latency ExtremeSparseL4U32
2025-05-06 16:02:54,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 16:02:54,360 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 16:02:54,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 3/100 (estimated time remaining: 21 hours, 29 minutes, 36 seconds)
2025-05-06 16:15:01,139 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:15:01,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:15:22,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 377.74005 ± 35.124
2025-05-06 16:15:22,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [378.19177, 437.10776, 379.22375, 306.1104, 416.5747, 381.26978, 342.34616, 386.84683, 394.94287, 354.78616]
2025-05-06 16:15:22,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [71.0, 81.0, 70.0, 70.0, 78.0, 82.0, 62.0, 73.0, 73.0, 65.0]
2025-05-06 16:15:22,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (377.74) for latency ExtremeSparseL4U32
2025-05-06 16:15:22,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 16:15:22,855 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 16:15:22,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 4/100 (estimated time remaining: 20 hours, 54 minutes, 19 seconds)
2025-05-06 16:27:15,439 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:27:15,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:27:39,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 425.53409 ± 79.222
2025-05-06 16:27:39,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [347.52267, 414.6459, 552.0633, 430.5934, 512.716, 343.03577, 421.24466, 327.0364, 367.04715, 539.4354]
2025-05-06 16:27:39,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [63.0, 76.0, 110.0, 92.0, 95.0, 75.0, 77.0, 59.0, 73.0, 100.0]
2025-05-06 16:27:39,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (425.53) for latency ExtremeSparseL4U32
2025-05-06 16:27:39,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 16:27:39,666 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 16:27:39,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 5/100 (estimated time remaining: 20 hours, 25 minutes, 48 seconds)
2025-05-06 16:39:30,928 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:39:30,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:39:51,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 349.04236 ± 96.302
2025-05-06 16:39:51,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [246.82607, 150.2653, 401.94955, 411.28033, 329.3749, 344.84247, 341.2909, 535.0192, 348.06036, 381.51437]
2025-05-06 16:39:51,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [50.0, 29.0, 74.0, 79.0, 70.0, 71.0, 66.0, 111.0, 72.0, 77.0]
2025-05-06 16:39:51,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 6/100 (estimated time remaining: 20 hours, 2 minutes, 11 seconds)
2025-05-06 16:51:47,574 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:51:47,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:52:10,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 388.41101 ± 129.397
2025-05-06 16:52:10,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [554.02264, 410.55038, 152.28395, 150.58176, 389.0944, 422.17737, 383.02252, 489.4507, 509.1077, 423.81894]
2025-05-06 16:52:10,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [104.0, 75.0, 29.0, 29.0, 71.0, 78.0, 70.0, 92.0, 109.0, 92.0]
2025-05-06 16:52:10,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 29 minutes, 56 seconds)
2025-05-06 17:04:02,320 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:04:02,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:04:28,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 466.32529 ± 90.711
2025-05-06 17:04:28,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [465.737, 509.92117, 451.3794, 689.4951, 418.94943, 490.58145, 480.03378, 359.69733, 457.15488, 340.30338]
2025-05-06 17:04:28,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [85.0, 94.0, 84.0, 133.0, 76.0, 91.0, 89.0, 69.0, 87.0, 62.0]
2025-05-06 17:04:28,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (466.33) for latency ExtremeSparseL4U32
2025-05-06 17:04:28,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 17:04:28,856 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 17:04:28,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 5 minutes, 17 seconds)
2025-05-06 17:16:24,754 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:16:25,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:16:54,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 477.47559 ± 116.271
2025-05-06 17:16:54,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [520.5707, 630.17236, 482.4507, 356.5614, 708.8713, 515.29974, 346.01617, 363.69833, 376.6729, 474.44226]
2025-05-06 17:16:54,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [109.0, 132.0, 99.0, 75.0, 151.0, 96.0, 68.0, 68.0, 77.0, 89.0]
2025-05-06 17:16:54,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (477.48) for latency ExtremeSparseL4U32
2025-05-06 17:16:54,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 17:16:54,682 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 17:16:54,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 52 minutes, 9 seconds)
2025-05-06 17:28:40,755 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:28:40,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:29:13,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 552.90070 ± 166.720
2025-05-06 17:29:13,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [683.0389, 897.6635, 404.61945, 470.53842, 772.9194, 461.9183, 552.8016, 404.15622, 509.18573, 372.16568]
2025-05-06 17:29:13,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [130.0, 188.0, 74.0, 87.0, 150.0, 86.0, 117.0, 86.0, 95.0, 81.0]
2025-05-06 17:29:13,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (552.90) for latency ExtremeSparseL4U32
2025-05-06 17:29:13,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 17:29:13,325 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 17:29:13,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 40 minutes, 22 seconds)
2025-05-06 17:41:07,195 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:41:07,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:41:37,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 520.28748 ± 77.459
2025-05-06 17:41:37,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [414.5684, 437.4372, 541.0641, 454.60614, 572.05316, 655.0927, 559.275, 621.44434, 462.24213, 485.09137]
2025-05-06 17:41:37,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [79.0, 81.0, 102.0, 84.0, 107.0, 124.0, 119.0, 118.0, 89.0, 91.0]
2025-05-06 17:41:37,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 31 minutes, 42 seconds)
2025-05-06 17:53:25,370 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:53:25,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:53:56,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 528.45886 ± 76.492
2025-05-06 17:53:56,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [550.3504, 494.73227, 559.67993, 530.9189, 399.73148, 696.5782, 483.70425, 461.02237, 513.7569, 594.11365]
2025-05-06 17:53:56,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [102.0, 104.0, 121.0, 112.0, 74.0, 132.0, 102.0, 85.0, 95.0, 114.0]
2025-05-06 17:53:56,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 19 minutes, 37 seconds)
2025-05-06 18:05:48,059 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:05:48,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:06:14,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 454.28253 ± 119.454
2025-05-06 18:06:14,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [526.3562, 514.4945, 404.8774, 440.23544, 176.16743, 425.127, 490.18277, 536.37585, 377.11, 651.8986]
2025-05-06 18:06:14,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [98.0, 96.0, 75.0, 83.0, 34.0, 90.0, 104.0, 101.0, 69.0, 137.0]
2025-05-06 18:06:14,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 6 minutes, 53 seconds)
2025-05-06 18:18:07,719 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:18:07,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:18:30,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 376.10492 ± 152.623
2025-05-06 18:18:30,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [344.7108, 283.3209, 331.81186, 339.27234, 388.83463, 780.0388, 145.16486, 424.25092, 379.70215, 343.94202]
2025-05-06 18:18:30,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [70.0, 60.0, 73.0, 71.0, 84.0, 152.0, 28.0, 82.0, 71.0, 74.0]
2025-05-06 18:18:30,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 51 minutes, 45 seconds)
2025-05-06 18:30:09,966 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:30:09,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:30:36,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 461.48428 ± 45.556
2025-05-06 18:30:36,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [482.3495, 411.54, 484.61307, 418.25912, 404.96832, 516.6465, 519.24994, 409.0867, 451.8031, 516.3267]
2025-05-06 18:30:36,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [91.0, 89.0, 91.0, 77.0, 85.0, 95.0, 96.0, 75.0, 97.0, 111.0]
2025-05-06 18:30:36,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 35 minutes, 58 seconds)
2025-05-06 18:42:28,256 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:42:28,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:42:59,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 548.32922 ± 88.631
2025-05-06 18:42:59,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [557.5536, 543.4371, 585.9935, 705.25415, 655.3441, 372.8686, 564.02814, 541.02655, 489.39, 468.39642]
2025-05-06 18:42:59,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [107.0, 116.0, 113.0, 134.0, 124.0, 79.0, 106.0, 105.0, 91.0, 87.0]
2025-05-06 18:42:59,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 23 minutes, 22 seconds)
2025-05-06 18:54:48,649 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:54:49,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:55:17,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 479.26154 ± 192.325
2025-05-06 18:55:17,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [822.1532, 475.06836, 529.6349, 683.1205, 224.38452, 446.9217, 384.21808, 614.34515, 472.4224, 140.34612]
2025-05-06 18:55:17,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [160.0, 96.0, 112.0, 137.0, 43.0, 89.0, 84.0, 132.0, 88.0, 27.0]
2025-05-06 18:55:17,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 10 minutes, 38 seconds)
2025-05-06 19:07:02,153 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:07:02,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:07:33,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 518.78687 ± 96.294
2025-05-06 19:07:33,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [454.4158, 532.778, 504.70776, 433.53488, 674.5608, 460.56897, 352.1348, 547.6795, 556.19666, 671.2917]
2025-05-06 19:07:33,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [99.0, 97.0, 108.0, 80.0, 141.0, 100.0, 76.0, 102.0, 119.0, 142.0]
2025-05-06 19:07:34,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 58 minutes, 16 seconds)
2025-05-06 19:19:24,845 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:19:24,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:19:49,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 450.32098 ± 180.024
2025-05-06 19:19:49,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [513.24133, 663.52313, 501.39764, 390.45886, 170.94427, 156.38762, 303.00766, 696.7161, 543.29663, 564.23663]
2025-05-06 19:19:49,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [95.0, 127.0, 93.0, 72.0, 33.0, 30.0, 57.0, 131.0, 100.0, 106.0]
2025-05-06 19:19:49,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 45 minutes, 40 seconds)
2025-05-06 19:31:31,642 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:31:31,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:32:03,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 518.24237 ± 223.518
2025-05-06 19:32:03,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [399.03555, 707.0589, 480.17786, 521.0454, 629.605, 175.31548, 613.87573, 584.43024, 925.8952, 145.98462]
2025-05-06 19:32:03,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [85.0, 148.0, 93.0, 97.0, 120.0, 34.0, 131.0, 115.0, 194.0, 28.0]
2025-05-06 19:32:03,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 35 minutes, 23 seconds)
2025-05-06 19:43:56,864 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:43:57,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:44:36,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 670.30750 ± 260.338
2025-05-06 19:44:36,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [495.58853, 740.5514, 432.71832, 988.70526, 652.34674, 698.418, 963.76465, 579.9718, 1006.26324, 144.74756]
2025-05-06 19:44:36,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [104.0, 139.0, 81.0, 197.0, 129.0, 129.0, 184.0, 110.0, 208.0, 28.0]
2025-05-06 19:44:36,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (670.31) for latency ExtremeSparseL4U32
2025-05-06 19:44:36,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 19:44:36,293 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 19:44:36,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 25 minutes, 54 seconds)
2025-05-06 19:56:21,108 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:56:21,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:56:49,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 509.75674 ± 138.345
2025-05-06 19:56:49,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [717.62195, 521.7155, 551.1606, 532.8966, 619.5237, 155.58868, 497.73932, 572.98035, 461.91656, 466.42426]
2025-05-06 19:56:49,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [137.0, 96.0, 100.0, 99.0, 116.0, 30.0, 94.0, 106.0, 99.0, 85.0]
2025-05-06 19:56:49,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 12 minutes, 12 seconds)
2025-05-06 20:08:30,296 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:08:30,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:09:01,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 552.81476 ± 92.663
2025-05-06 20:09:01,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [719.79767, 467.73053, 537.0572, 513.08246, 439.26926, 690.00824, 536.1331, 596.61096, 591.7707, 436.6877]
2025-05-06 20:09:01,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [138.0, 86.0, 98.0, 93.0, 82.0, 130.0, 100.0, 127.0, 110.0, 81.0]
2025-05-06 20:09:01,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 58 minutes, 27 seconds)
2025-05-06 20:20:47,201 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:20:47,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:21:17,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 534.99115 ± 94.405
2025-05-06 20:21:17,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [606.14935, 451.41003, 623.87177, 641.60706, 551.7112, 644.04376, 461.01294, 343.7016, 549.4016, 477.00235]
2025-05-06 20:21:17,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [115.0, 84.0, 119.0, 120.0, 104.0, 120.0, 100.0, 75.0, 99.0, 90.0]
2025-05-06 20:21:17,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 46 minutes, 33 seconds)
2025-05-06 20:33:04,493 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:33:05,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:33:43,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 652.28772 ± 129.522
2025-05-06 20:33:43,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [733.2007, 547.3036, 475.69394, 561.6646, 932.15515, 698.01105, 710.0006, 584.2494, 535.10876, 745.4888]
2025-05-06 20:33:43,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [142.0, 100.0, 88.0, 106.0, 186.0, 129.0, 134.0, 108.0, 114.0, 149.0]
2025-05-06 20:33:43,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 37 minutes, 22 seconds)
2025-05-06 20:45:27,835 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:45:27,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:45:56,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 507.65161 ± 255.254
2025-05-06 20:45:56,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [565.02014, 200.81358, 218.12268, 197.03946, 431.7621, 719.8704, 1036.8263, 473.5547, 521.86743, 711.63934]
2025-05-06 20:45:56,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [105.0, 39.0, 42.0, 38.0, 80.0, 138.0, 203.0, 89.0, 98.0, 135.0]
2025-05-06 20:45:56,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 19 minutes, 57 seconds)
2025-05-06 20:57:38,445 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:57:38,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:58:19,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 696.48938 ± 396.394
2025-05-06 20:58:19,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [546.19666, 753.84766, 353.38766, 495.83066, 804.5636, 493.43307, 823.6738, 856.7939, 1691.2352, 145.93176]
2025-05-06 20:58:19,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [105.0, 158.0, 77.0, 105.0, 166.0, 92.0, 160.0, 158.0, 336.0, 28.0]
2025-05-06 20:58:19,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (696.49) for latency ExtremeSparseL4U32
2025-05-06 20:58:19,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 20:58:19,399 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 20:58:20,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 10 minutes, 20 seconds)
2025-05-06 21:10:11,114 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:10:11,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:10:47,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 622.81970 ± 144.445
2025-05-06 21:10:47,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [527.85956, 668.4331, 725.7049, 790.8717, 518.93134, 796.34314, 490.41516, 815.588, 421.35696, 472.6928]
2025-05-06 21:10:47,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [105.0, 127.0, 151.0, 146.0, 97.0, 153.0, 92.0, 155.0, 91.0, 89.0]
2025-05-06 21:10:48,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 2 minutes, 2 seconds)
2025-05-06 21:22:30,956 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:22:31,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:23:04,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 594.23816 ± 115.049
2025-05-06 21:23:04,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [568.6299, 757.45624, 620.0143, 519.1614, 390.72946, 585.41943, 761.96796, 715.205, 490.15417, 533.64386]
2025-05-06 21:23:04,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [109.0, 143.0, 116.0, 97.0, 72.0, 125.0, 140.0, 139.0, 98.0, 101.0]
2025-05-06 21:23:04,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 49 minutes, 43 seconds)
2025-05-06 21:34:47,135 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:34:47,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:35:17,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 516.67639 ± 141.977
2025-05-06 21:35:17,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [522.5552, 461.89023, 611.5297, 632.0089, 625.0573, 161.88681, 580.11096, 373.24564, 585.2919, 613.18677]
2025-05-06 21:35:17,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [96.0, 102.0, 117.0, 118.0, 135.0, 31.0, 105.0, 82.0, 110.0, 127.0]
2025-05-06 21:35:17,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 34 minutes, 6 seconds)
2025-05-06 21:46:57,748 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:46:57,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:47:32,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 627.93945 ± 83.906
2025-05-06 21:47:32,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [624.939, 649.252, 535.8386, 740.917, 575.8066, 450.40677, 629.81775, 731.8536, 664.4007, 676.16235]
2025-05-06 21:47:32,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [118.0, 121.0, 100.0, 143.0, 110.0, 82.0, 118.0, 134.0, 124.0, 128.0]
2025-05-06 21:47:32,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 22 minutes, 15 seconds)
2025-05-06 21:59:16,064 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:59:16,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:59:45,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 511.41193 ± 193.039
2025-05-06 21:59:45,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [430.96304, 166.7798, 698.8214, 523.93384, 631.29736, 150.63603, 677.5437, 595.0394, 688.84344, 550.26105]
2025-05-06 21:59:45,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [94.0, 32.0, 148.0, 96.0, 115.0, 29.0, 135.0, 112.0, 140.0, 101.0]
2025-05-06 21:59:45,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 7 minutes, 44 seconds)
2025-05-06 22:11:33,402 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:11:33,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:12:11,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 655.76129 ± 115.093
2025-05-06 22:12:11,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [654.3948, 827.3182, 700.1491, 593.339, 773.4747, 681.3808, 562.38025, 578.3592, 766.85004, 419.96686]
2025-05-06 22:12:11,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [123.0, 170.0, 132.0, 111.0, 165.0, 129.0, 105.0, 108.0, 146.0, 79.0]
2025-05-06 22:12:12,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 55 minutes, 3 seconds)
2025-05-06 22:23:51,406 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:23:51,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:24:29,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 672.24353 ± 115.899
2025-05-06 22:24:29,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [595.90356, 659.01117, 867.6346, 695.2373, 634.38684, 591.8534, 843.4878, 446.1855, 681.81744, 706.9177]
2025-05-06 22:24:29,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [111.0, 123.0, 181.0, 129.0, 129.0, 109.0, 157.0, 82.0, 128.0, 138.0]
2025-05-06 22:24:29,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 43 minutes, 1 second)
2025-05-06 22:36:20,844 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:36:21,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:37:00,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 656.28748 ± 336.147
2025-05-06 22:37:00,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [624.02075, 716.0057, 554.95264, 691.17426, 472.28445, 1321.192, 166.27782, 180.91194, 786.1391, 1049.9164]
2025-05-06 22:37:00,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [132.0, 137.0, 112.0, 143.0, 96.0, 264.0, 32.0, 35.0, 144.0, 208.0]
2025-05-06 22:37:00,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 34 minutes, 44 seconds)
2025-05-06 22:48:46,437 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:48:46,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:49:25,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 687.40167 ± 97.845
2025-05-06 22:49:25,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [802.5728, 577.82806, 726.261, 694.22815, 620.8476, 754.504, 666.62244, 816.8378, 729.4546, 484.8597]
2025-05-06 22:49:25,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [170.0, 126.0, 135.0, 128.0, 115.0, 141.0, 124.0, 152.0, 137.0, 102.0]
2025-05-06 22:49:26,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 24 minutes, 46 seconds)
2025-05-06 23:01:13,454 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:01:13,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:01:59,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 824.93811 ± 112.106
2025-05-06 23:01:59,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [656.9736, 746.68854, 844.9827, 865.2196, 642.8633, 831.58624, 993.5843, 808.5657, 982.1978, 876.71967]
2025-05-06 23:01:59,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [126.0, 141.0, 163.0, 164.0, 120.0, 153.0, 199.0, 150.0, 185.0, 168.0]
2025-05-06 23:01:59,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (824.94) for latency ExtremeSparseL4U32
2025-05-06 23:01:59,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 23:01:59,959 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 23:02:00,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 16 minutes, 47 seconds)
2025-05-06 23:13:40,655 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:13:41,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:14:20,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 695.63599 ± 156.700
2025-05-06 23:14:20,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [706.3574, 756.84674, 501.37994, 529.3833, 830.23425, 578.9933, 841.558, 553.12756, 1012.66425, 645.81525]
2025-05-06 23:14:20,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [148.0, 142.0, 92.0, 100.0, 150.0, 104.0, 160.0, 103.0, 190.0, 119.0]
2025-05-06 23:14:20,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 2 minutes, 54 seconds)
2025-05-06 23:25:57,288 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:25:57,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:26:34,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 650.22180 ± 247.014
2025-05-06 23:26:34,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [156.48996, 381.88925, 899.1408, 794.97345, 534.3465, 869.181, 675.27545, 435.9125, 866.7414, 888.26776]
2025-05-06 23:26:34,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 74.0, 171.0, 148.0, 113.0, 161.0, 128.0, 100.0, 175.0, 169.0]
2025-05-06 23:26:35,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 49 minutes, 50 seconds)
2025-05-06 23:38:23,331 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:38:23,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:39:18,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 946.55841 ± 219.365
2025-05-06 23:39:18,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [782.17126, 1136.2117, 985.3345, 669.5652, 973.9215, 873.6677, 1426.8973, 637.2355, 947.2881, 1033.2908]
2025-05-06 23:39:18,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [154.0, 216.0, 183.0, 130.0, 188.0, 164.0, 292.0, 131.0, 184.0, 196.0]
2025-05-06 23:39:18,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (946.56) for latency ExtremeSparseL4U32
2025-05-06 23:39:18,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 23:39:18,430 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 23:39:18,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 40 minutes, 3 seconds)
2025-05-06 23:51:08,931 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:51:08,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:51:48,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 683.45471 ± 365.428
2025-05-06 23:51:48,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [800.22205, 590.8458, 173.92581, 876.67596, 529.8901, 1036.6874, 146.73352, 535.7345, 714.03326, 1429.7994]
2025-05-06 23:51:48,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [153.0, 112.0, 34.0, 177.0, 99.0, 192.0, 28.0, 97.0, 136.0, 291.0]
2025-05-06 23:51:49,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 28 minutes, 29 seconds)
2025-05-07 00:03:31,688 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:03:32,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:04:07,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 616.82245 ± 297.188
2025-05-07 00:04:07,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [414.40845, 786.30444, 720.94385, 512.4495, 1017.0995, 1066.1372, 172.07881, 595.0688, 150.16563, 733.5686]
2025-05-07 00:04:07,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [87.0, 146.0, 133.0, 98.0, 191.0, 206.0, 33.0, 116.0, 29.0, 156.0]
2025-05-07 00:04:07,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 12 minutes, 56 seconds)
2025-05-07 00:15:57,853 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:15:58,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:16:37,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 708.23187 ± 345.129
2025-05-07 00:16:37,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [144.69926, 482.66492, 761.7304, 1502.0093, 745.14984, 371.0766, 871.2143, 677.57324, 906.5975, 619.6031]
2025-05-07 00:16:37,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 90.0, 145.0, 281.0, 141.0, 75.0, 164.0, 125.0, 168.0, 112.0]
2025-05-07 00:16:38,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 2 minutes, 40 seconds)
2025-05-07 00:28:24,553 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:28:24,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:29:10,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 811.48840 ± 134.985
2025-05-07 00:29:10,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1097.7345, 783.34235, 762.29095, 768.8704, 774.12866, 967.08026, 641.08344, 685.7348, 928.4821, 706.1361]
2025-05-07 00:29:10,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [207.0, 151.0, 142.0, 148.0, 143.0, 180.0, 129.0, 126.0, 175.0, 130.0]
2025-05-07 00:29:10,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 53 minutes, 33 seconds)
2025-05-07 00:40:54,802 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:40:55,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:41:40,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 785.33929 ± 183.607
2025-05-07 00:41:40,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [962.70746, 801.5674, 601.2893, 867.2832, 630.8036, 977.04553, 803.2355, 532.9085, 578.928, 1097.6246]
2025-05-07 00:41:40,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [189.0, 156.0, 125.0, 176.0, 127.0, 193.0, 154.0, 96.0, 114.0, 209.0]
2025-05-07 00:41:40,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 38 minutes, 35 seconds)
2025-05-07 00:53:40,228 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:53:40,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:54:19,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 688.23236 ± 410.698
2025-05-07 00:54:19,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1215.4424, 1012.9813, 1248.0651, 813.0432, 518.80835, 160.62099, 140.78296, 165.6395, 619.2351, 987.7048]
2025-05-07 00:54:19,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [221.0, 187.0, 247.0, 156.0, 111.0, 31.0, 27.0, 32.0, 114.0, 192.0]
2025-05-07 00:54:19,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 27 minutes, 37 seconds)
2025-05-07 01:05:58,558 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:05:58,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:06:45,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 784.86041 ± 515.824
2025-05-07 01:06:45,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [673.7648, 2013.9637, 1299.239, 151.66475, 446.67734, 679.1037, 533.70593, 451.4138, 506.33618, 1092.7349]
2025-05-07 01:06:45,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [128.0, 388.0, 270.0, 29.0, 82.0, 147.0, 111.0, 97.0, 109.0, 211.0]
2025-05-07 01:06:45,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 16 minutes, 27 seconds)
2025-05-07 01:18:39,740 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:18:39,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:19:32,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 896.03308 ± 460.501
2025-05-07 01:19:32,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1151.4038, 1696.436, 701.11127, 151.98413, 449.76913, 681.84705, 1207.2404, 510.6675, 938.5504, 1471.322]
2025-05-07 01:19:32,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [233.0, 319.0, 130.0, 29.0, 83.0, 146.0, 231.0, 108.0, 186.0, 277.0]
2025-05-07 01:19:32,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 6 minutes, 46 seconds)
2025-05-07 01:31:25,677 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:31:25,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:32:24,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 994.13281 ± 410.886
2025-05-07 01:32:24,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1463.8152, 1052.9701, 986.49786, 1405.5385, 146.68721, 1511.4907, 1091.9309, 822.8918, 474.41882, 985.0865]
2025-05-07 01:32:24,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [276.0, 219.0, 201.0, 273.0, 28.0, 300.0, 225.0, 157.0, 87.0, 210.0]
2025-05-07 01:32:24,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (994.13) for latency ExtremeSparseL4U32
2025-05-07 01:32:24,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 01:32:24,824 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 01:32:24,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 57 minutes, 40 seconds)
2025-05-07 01:47:20,494 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:47:20,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:48:20,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1025.51880 ± 201.880
2025-05-07 01:48:20,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1012.8631, 723.41394, 1282.0538, 945.92975, 1117.5282, 1039.97, 1239.1353, 1223.0178, 637.44434, 1033.8315]
2025-05-07 01:48:20,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [210.0, 135.0, 245.0, 194.0, 222.0, 215.0, 223.0, 248.0, 118.0, 190.0]
2025-05-07 01:48:20,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1025.52) for latency ExtremeSparseL4U32
2025-05-07 01:48:20,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 01:48:20,251 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 01:48:20,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 19 minutes, 53 seconds)
2025-05-07 02:00:00,786 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:00:00,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:00:46,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 817.79327 ± 488.967
2025-05-07 02:00:46,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1081.1366, 529.6796, 145.9679, 837.78766, 998.53253, 1241.1355, 789.8731, 546.1887, 1849.7289, 157.90169]
2025-05-07 02:00:46,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [200.0, 97.0, 28.0, 157.0, 187.0, 232.0, 152.0, 110.0, 343.0, 31.0]
2025-05-07 02:00:46,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 4 minutes, 29 seconds)
2025-05-07 02:12:55,372 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:12:55,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:13:51,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 963.59814 ± 273.624
2025-05-07 02:13:51,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [756.65656, 1398.1646, 1069.6892, 1304.9719, 871.83344, 711.1348, 1174.5829, 1089.5973, 759.44196, 499.9079]
2025-05-07 02:13:51,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [154.0, 262.0, 206.0, 254.0, 179.0, 133.0, 245.0, 224.0, 143.0, 93.0]
2025-05-07 02:13:51,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 57 minutes, 30 seconds)
2025-05-07 02:25:26,622 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:25:26,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:26:22,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 981.40588 ± 396.827
2025-05-07 02:26:22,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1473.7102, 796.3721, 1177.3672, 172.3379, 1071.1017, 641.4337, 716.8038, 903.7448, 1461.517, 1399.6705]
2025-05-07 02:26:22,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [283.0, 162.0, 221.0, 33.0, 204.0, 122.0, 134.0, 183.0, 273.0, 265.0]
2025-05-07 02:26:23,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 41 minutes, 45 seconds)
2025-05-07 02:38:11,912 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:38:11,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:39:03,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 905.13361 ± 586.291
2025-05-07 02:39:03,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2510.2444, 819.00226, 587.1639, 997.3657, 170.98706, 1107.9972, 799.45764, 608.4918, 723.95245, 726.6738]
2025-05-07 02:39:03,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [487.0, 167.0, 109.0, 205.0, 33.0, 211.0, 154.0, 113.0, 135.0, 136.0]
2025-05-07 02:39:03,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 26 minutes, 30 seconds)
2025-05-07 02:50:49,753 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:50:49,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:51:48,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1015.42303 ± 307.334
2025-05-07 02:51:48,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [857.32513, 372.85883, 1180.8844, 1259.9159, 1413.2997, 896.2425, 907.73254, 1431.1128, 770.1292, 1064.7291]
2025-05-07 02:51:48,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [175.0, 72.0, 226.0, 238.0, 274.0, 175.0, 177.0, 279.0, 149.0, 202.0]
2025-05-07 02:51:48,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 43 minutes, 54 seconds)
2025-05-07 03:03:26,716 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:03:26,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:04:35,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1195.57739 ± 367.480
2025-05-07 03:04:35,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [902.31683, 1173.248, 1449.5319, 1523.8906, 1116.8032, 697.8684, 1038.4288, 691.49695, 1890.2238, 1471.9657]
2025-05-07 03:04:35,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [182.0, 229.0, 267.0, 296.0, 234.0, 131.0, 206.0, 129.0, 359.0, 281.0]
2025-05-07 03:04:35,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1195.58) for latency ExtremeSparseL4U32
2025-05-07 03:04:35,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 03:04:35,835 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 03:04:35,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 34 minutes, 21 seconds)
2025-05-07 03:16:21,358 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:16:21,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:17:23,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1103.69263 ± 289.346
2025-05-07 03:17:23,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1097.0704, 795.3132, 935.7263, 1346.1088, 1485.5767, 678.0313, 1209.3616, 937.2757, 937.35785, 1615.1056]
2025-05-07 03:17:23,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [203.0, 155.0, 176.0, 259.0, 279.0, 127.0, 228.0, 179.0, 175.0, 320.0]
2025-05-07 03:17:23,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 19 minutes, 7 seconds)
2025-05-07 03:29:13,356 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:29:14,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:30:26,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1229.04688 ± 671.356
2025-05-07 03:30:26,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [151.5812, 1504.7694, 1751.7218, 1961.4944, 1027.763, 691.40625, 1404.2208, 2447.9502, 665.9081, 683.654]
2025-05-07 03:30:26,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 275.0, 361.0, 375.0, 212.0, 142.0, 279.0, 478.0, 124.0, 150.0]
2025-05-07 03:30:26,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1229.05) for latency ExtremeSparseL4U32
2025-05-07 03:30:26,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 03:30:26,449 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 03:30:26,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 10 minutes, 49 seconds)
2025-05-07 03:42:07,461 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:42:07,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:43:13,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1145.65405 ± 598.043
2025-05-07 03:43:13,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1601.3464, 812.8288, 1501.4159, 1014.1221, 2059.3848, 359.32095, 410.63766, 1063.6177, 2043.3555, 590.5097]
2025-05-07 03:43:13,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [309.0, 172.0, 303.0, 193.0, 392.0, 68.0, 74.0, 198.0, 390.0, 126.0]
2025-05-07 03:43:13,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 58 minutes, 54 seconds)
2025-05-07 03:54:54,472 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:54:54,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:56:09,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1306.90076 ± 1203.647
2025-05-07 03:56:09,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1107.8308, 1525.0989, 941.3228, 1070.14, 1284.7852, 4687.539, 969.4178, 167.10184, 145.0312, 1170.7402]
2025-05-07 03:56:09,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [223.0, 290.0, 190.0, 211.0, 238.0, 895.0, 179.0, 32.0, 28.0, 223.0]
2025-05-07 03:56:09,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1306.90) for latency ExtremeSparseL4U32
2025-05-07 03:56:09,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 03:56:09,431 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 03:56:09,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 47 minutes, 41 seconds)
2025-05-07 04:08:08,594 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:08:08,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:09:17,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1217.21069 ± 771.527
2025-05-07 04:09:17,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2845.2544, 756.7497, 458.05124, 480.66925, 1608.2054, 1426.8228, 1370.3042, 1984.4413, 1095.6117, 145.9972]
2025-05-07 04:09:17,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [564.0, 142.0, 87.0, 89.0, 307.0, 272.0, 252.0, 372.0, 197.0, 28.0]
2025-05-07 04:09:17,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 37 minutes, 35 seconds)
2025-05-07 04:20:44,485 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:20:44,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:21:57,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1235.01587 ± 456.102
2025-05-07 04:21:57,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [933.43414, 1897.7683, 917.10913, 1270.782, 1551.7123, 841.7913, 1691.7986, 911.8076, 1826.6974, 507.25876]
2025-05-07 04:21:57,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [176.0, 364.0, 195.0, 259.0, 330.0, 159.0, 316.0, 179.0, 360.0, 97.0]
2025-05-07 04:21:57,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 23 minutes, 35 seconds)
2025-05-07 04:34:05,589 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:34:05,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:35:11,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1146.35962 ± 431.079
2025-05-07 04:35:11,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1219.7476, 927.52167, 733.04205, 1297.3093, 557.5347, 519.55695, 1312.9584, 1922.5475, 1416.2235, 1557.1544]
2025-05-07 04:35:11,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [230.0, 169.0, 140.0, 246.0, 106.0, 97.0, 247.0, 377.0, 279.0, 303.0]
2025-05-07 04:35:11,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 12 minutes, 3 seconds)
2025-05-07 04:46:39,011 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:46:39,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:47:27,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 822.04138 ± 434.368
2025-05-07 04:47:27,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1045.1725, 1273.1862, 185.51096, 534.53357, 145.69313, 1553.7743, 865.0678, 1121.4198, 589.4503, 906.6055]
2025-05-07 04:47:27,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [198.0, 245.0, 36.0, 99.0, 28.0, 318.0, 170.0, 220.0, 120.0, 173.0]
2025-05-07 04:47:27,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 55 minutes, 22 seconds)
2025-05-07 04:59:08,725 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:59:08,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:00:32,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1457.90125 ± 637.245
2025-05-07 05:00:32,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1554.3878, 1188.8381, 1174.4286, 754.66315, 510.5299, 1627.6744, 2247.1638, 2811.3071, 1322.9264, 1387.0938]
2025-05-07 05:00:32,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [300.0, 220.0, 223.0, 139.0, 95.0, 303.0, 432.0, 556.0, 244.0, 280.0]
2025-05-07 05:00:32,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (1457.90) for latency ExtremeSparseL4U32
2025-05-07 05:00:32,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 05:00:32,330 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 05:00:32,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 43 minutes, 32 seconds)
2025-05-07 05:12:37,345 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:12:37,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:13:27,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 884.66766 ± 611.672
2025-05-07 05:13:27,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1001.6983, 738.35815, 1762.3705, 1981.3066, 164.90567, 135.71234, 762.0956, 1077.4078, 1087.2506, 135.5713]
2025-05-07 05:13:27,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [206.0, 158.0, 326.0, 372.0, 32.0, 26.0, 137.0, 194.0, 201.0, 26.0]
2025-05-07 05:13:27,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 29 minutes, 7 seconds)
2025-05-07 05:24:58,915 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:24:58,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:26:14,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1292.30127 ± 491.334
2025-05-07 05:26:14,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [895.0874, 555.681, 2451.4763, 1243.6342, 1237.9518, 1054.2955, 1429.6447, 1756.2496, 1020.7598, 1278.2327]
2025-05-07 05:26:14,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [162.0, 101.0, 482.0, 268.0, 237.0, 203.0, 273.0, 349.0, 197.0, 253.0]
2025-05-07 05:26:14,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 17 minutes, 8 seconds)
2025-05-07 05:38:20,506 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:38:20,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:40:32,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2311.50513 ± 1422.299
2025-05-07 05:40:32,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2510.2717, 3837.6724, 3174.1926, 5143.483, 176.55467, 2266.1501, 2674.3142, 941.0544, 1060.8518, 1330.5056]
2025-05-07 05:40:32,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [481.0, 745.0, 619.0, 1000.0, 34.0, 426.0, 527.0, 185.0, 200.0, 260.0]
2025-05-07 05:40:32,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (2311.51) for latency ExtremeSparseL4U32
2025-05-07 05:40:32,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 05:40:32,482 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 05:40:32,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 11 minutes, 21 seconds)
2025-05-07 05:51:45,297 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:51:45,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:53:04,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1353.39478 ± 936.956
2025-05-07 05:53:04,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1684.605, 1028.7324, 831.44666, 2251.593, 3186.675, 2372.332, 554.0534, 1081.1693, 144.97614, 398.36398]
2025-05-07 05:53:04,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [331.0, 195.0, 154.0, 432.0, 630.0, 474.0, 123.0, 202.0, 28.0, 75.0]
2025-05-07 05:53:04,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 59 minutes, 54 seconds)
2025-05-07 06:05:00,179 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:05:00,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:06:15,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1288.02307 ± 678.787
2025-05-07 06:06:15,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [754.5953, 1965.7593, 617.426, 976.9499, 1791.0968, 1637.4429, 874.15015, 1657.6838, 2450.0913, 155.03535]
2025-05-07 06:06:15,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [164.0, 377.0, 121.0, 182.0, 338.0, 309.0, 178.0, 319.0, 477.0, 30.0]
2025-05-07 06:06:15,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 47 minutes, 28 seconds)
2025-05-07 06:18:12,489 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:18:12,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:18:59,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 833.13831 ± 526.013
2025-05-07 06:18:59,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1174.1132, 430.08673, 1870.7256, 141.15927, 615.23413, 937.896, 1382.924, 1053.1608, 565.0967, 160.98659]
2025-05-07 06:18:59,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [216.0, 80.0, 362.0, 27.0, 115.0, 171.0, 250.0, 205.0, 111.0, 31.0]
2025-05-07 06:18:59,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 33 minutes, 10 seconds)
2025-05-07 06:30:31,842 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:30:32,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:32:22,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1896.20044 ± 1053.561
2025-05-07 06:32:22,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2089.5908, 1673.2451, 2905.5134, 1924.9908, 1424.8441, 1088.3787, 4357.3594, 2103.761, 430.03052, 964.2895]
2025-05-07 06:32:22,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [403.0, 321.0, 576.0, 385.0, 286.0, 228.0, 849.0, 405.0, 79.0, 197.0]
2025-05-07 06:32:22,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 23 minutes, 36 seconds)
2025-05-07 06:44:00,980 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:44:01,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:45:35,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1642.83069 ± 1044.707
2025-05-07 06:45:35,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2075.8486, 880.81714, 2043.944, 3689.7942, 1113.5062, 3146.847, 1534.1147, 838.18726, 223.70146, 881.54535]
2025-05-07 06:45:35,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [406.0, 193.0, 380.0, 705.0, 213.0, 596.0, 285.0, 178.0, 43.0, 164.0]
2025-05-07 06:45:35,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 4 minutes, 19 seconds)
2025-05-07 06:57:49,018 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:57:49,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:58:50,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1083.44788 ± 613.876
2025-05-07 06:58:50,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1516.6226, 1389.4583, 1028.6925, 1563.4093, 2090.4182, 1542.5496, 482.0964, 180.576, 850.4929, 190.16345]
2025-05-07 06:58:50,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [287.0, 261.0, 196.0, 297.0, 402.0, 280.0, 91.0, 35.0, 153.0, 37.0]
2025-05-07 06:58:50,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 55 minutes, 12 seconds)
2025-05-07 07:10:12,119 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:10:12,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:11:41,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1557.95667 ± 655.413
2025-05-07 07:11:41,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [151.2935, 1712.9471, 2328.9778, 1494.1085, 2042.763, 2566.5842, 1494.0707, 1378.8726, 925.0853, 1484.8639]
2025-05-07 07:11:41,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 329.0, 432.0, 280.0, 374.0, 485.0, 278.0, 277.0, 192.0, 289.0]
2025-05-07 07:11:42,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 40 minutes, 21 seconds)
2025-05-07 07:23:30,280 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:23:30,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:24:57,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1490.70837 ± 960.772
2025-05-07 07:24:57,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [860.0184, 1931.0864, 155.82518, 175.56763, 3024.1082, 1022.2236, 2076.1929, 1216.714, 2953.226, 1492.1206]
2025-05-07 07:24:57,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [178.0, 370.0, 30.0, 34.0, 579.0, 199.0, 417.0, 232.0, 576.0, 292.0]
2025-05-07 07:24:57,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 29 minutes, 52 seconds)
2025-05-07 07:36:51,074 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:36:51,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:37:47,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 995.33594 ± 1006.437
2025-05-07 07:37:47,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2938.631, 924.7413, 141.01343, 130.52528, 130.43962, 460.32285, 567.49664, 751.3193, 1018.1131, 2890.7573]
2025-05-07 07:37:47,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [579.0, 171.0, 27.0, 25.0, 25.0, 85.0, 117.0, 152.0, 199.0, 540.0]
2025-05-07 07:37:48,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 14 minutes, 2 seconds)
2025-05-07 07:49:26,830 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:49:27,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:51:00,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1651.67578 ± 768.209
2025-05-07 07:51:01,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [861.2073, 1443.3176, 1628.7593, 760.60425, 1905.7634, 3603.2195, 1308.7938, 1219.3936, 1623.7635, 2161.9363]
2025-05-07 07:51:01,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [161.0, 271.0, 316.0, 142.0, 351.0, 672.0, 268.0, 230.0, 315.0, 398.0]
2025-05-07 07:51:01,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 59 seconds)
2025-05-07 08:02:51,553 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:02:51,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:05:02,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2234.37451 ± 1090.725
2025-05-07 08:05:02,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [3336.5225, 1292.7261, 3106.93, 1756.0265, 4572.116, 1183.8522, 2113.9902, 992.25586, 2525.842, 1463.4829]
2025-05-07 08:05:02,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [640.0, 263.0, 582.0, 330.0, 882.0, 256.0, 409.0, 200.0, 512.0, 309.0]
2025-05-07 08:05:02,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 51 minutes, 15 seconds)
2025-05-07 08:17:05,055 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:17:05,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:18:52,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1875.64221 ± 1190.460
2025-05-07 08:18:52,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1347.6667, 1351.4379, 140.66513, 4040.666, 1420.1206, 3786.6428, 2568.457, 1624.1613, 644.22516, 1832.3794]
2025-05-07 08:18:52,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [258.0, 264.0, 27.0, 755.0, 270.0, 702.0, 478.0, 322.0, 120.0, 351.0]
2025-05-07 08:18:52,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 42 minutes, 4 seconds)
2025-05-07 08:30:37,026 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:30:37,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:32:15,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1736.06995 ± 692.954
2025-05-07 08:32:15,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [552.9151, 1403.001, 1036.6656, 2287.425, 2110.3096, 2257.1052, 899.6274, 2416.1257, 1700.7194, 2696.8044]
2025-05-07 08:32:15,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [102.0, 271.0, 203.0, 451.0, 393.0, 433.0, 176.0, 453.0, 316.0, 502.0]
2025-05-07 08:32:15,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 29 minutes, 10 seconds)
2025-05-07 08:44:45,334 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:44:45,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:46:01,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1321.68396 ± 810.424
2025-05-07 08:46:01,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1135.1449, 838.3209, 1032.105, 1707.6467, 909.2418, 161.62585, 1352.7972, 3441.8652, 1153.8049, 1484.2875]
2025-05-07 08:46:01,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [219.0, 178.0, 201.0, 318.0, 170.0, 31.0, 259.0, 681.0, 235.0, 294.0]
2025-05-07 08:46:01,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 19 minutes, 14 seconds)
2025-05-07 08:57:12,043 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:57:12,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:59:34,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2456.64795 ± 1040.514
2025-05-07 08:59:34,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1670.6986, 3229.9736, 2022.1404, 1700.9532, 2578.197, 1601.6699, 1728.9266, 2248.5574, 5198.0625, 2587.3013]
2025-05-07 08:59:34,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [320.0, 618.0, 387.0, 321.0, 509.0, 314.0, 350.0, 442.0, 1000.0, 510.0]
2025-05-07 08:59:34,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (2456.65) for latency ExtremeSparseL4U32
2025-05-07 08:59:34,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 08:59:34,912 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 08:59:35,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 6 minutes, 47 seconds)
2025-05-07 09:11:01,835 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:11:01,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:12:50,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1954.03870 ± 686.829
2025-05-07 09:12:50,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1403.9075, 475.96835, 2020.6506, 2690.8, 2057.6501, 2749.8723, 1344.4272, 2719.1506, 2222.3018, 1855.6567]
2025-05-07 09:12:50,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [256.0, 105.0, 370.0, 521.0, 379.0, 514.0, 251.0, 493.0, 413.0, 361.0]
2025-05-07 09:12:50,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 50 minutes, 31 seconds)
2025-05-07 09:24:31,313 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:24:31,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:26:48,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2386.30542 ± 1579.591
2025-05-07 09:26:48,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2397.483, 4212.1787, 536.01733, 1749.9291, 3476.2063, 1315.5267, 844.4804, 3389.657, 5356.8896, 584.687]
2025-05-07 09:26:48,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [448.0, 817.0, 109.0, 352.0, 669.0, 248.0, 183.0, 652.0, 1000.0, 119.0]
2025-05-07 09:26:49,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 37 minutes, 25 seconds)
2025-05-07 09:38:40,063 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:38:40,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:40:27,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1883.27954 ± 994.001
2025-05-07 09:40:27,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [3470.2302, 2426.3706, 3041.517, 477.48578, 933.1195, 2945.91, 1077.1986, 2064.986, 979.7776, 1416.2013]
2025-05-07 09:40:27,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [659.0, 451.0, 586.0, 89.0, 199.0, 562.0, 203.0, 382.0, 212.0, 260.0]
2025-05-07 09:40:27,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 24 minutes, 37 seconds)
2025-05-07 09:52:16,848 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:52:16,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:54:31,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2354.34033 ± 1470.504
2025-05-07 09:54:31,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [407.09128, 4439.311, 616.1647, 5294.5327, 2826.8845, 2428.461, 2034.1361, 1583.5941, 1419.4031, 2493.8262]
2025-05-07 09:54:31,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [77.0, 833.0, 112.0, 1000.0, 524.0, 471.0, 385.0, 293.0, 279.0, 486.0]
2025-05-07 09:54:31,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 11 minutes, 46 seconds)
2025-05-07 10:07:01,186 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:07:01,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:09:18,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2377.22998 ± 1668.173
2025-05-07 10:09:18,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [562.3144, 3114.1458, 4720.9155, 5246.013, 2019.8989, 3844.597, 1348.0192, 1499.9008, 1250.199, 166.29636]
2025-05-07 10:09:18,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [117.0, 600.0, 890.0, 1000.0, 396.0, 718.0, 267.0, 307.0, 246.0, 32.0]
2025-05-07 10:09:18,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 1 minute, 15 seconds)
2025-05-07 10:20:31,860 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:20:31,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:22:17,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1843.91382 ± 1076.993
2025-05-07 10:22:17,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [937.54016, 2440.1196, 2844.0369, 1470.8922, 1034.5934, 1814.1497, 741.78, 450.7613, 2674.8218, 4030.4429]
2025-05-07 10:22:17,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [172.0, 469.0, 553.0, 278.0, 203.0, 354.0, 157.0, 98.0, 502.0, 769.0]
2025-05-07 10:22:17,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 46 minutes, 40 seconds)
2025-05-07 10:34:52,277 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:34:52,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:37:48,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3086.76172 ± 1616.804
2025-05-07 10:37:48,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5050.416, 965.0176, 3116.684, 1499.7106, 3836.2026, 5346.4893, 2612.5774, 1743.9517, 5275.7666, 1420.8013]
2025-05-07 10:37:48,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [950.0, 200.0, 593.0, 283.0, 739.0, 1000.0, 496.0, 328.0, 1000.0, 268.0]
2025-05-07 10:37:48,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (3086.76) for latency ExtremeSparseL4U32
2025-05-07 10:37:48,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 10:37:48,205 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 10:37:48,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 36 minutes, 9 seconds)
2025-05-07 10:49:25,296 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:49:25,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:50:43,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1321.80139 ± 1248.206
2025-05-07 10:50:43,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1528.4573, 583.18243, 913.5923, 4516.387, 1047.4131, 1333.6962, 135.39412, 151.07147, 582.5628, 2426.2573]
2025-05-07 10:50:43,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [304.0, 125.0, 166.0, 891.0, 223.0, 266.0, 26.0, 29.0, 104.0, 465.0]
2025-05-07 10:50:43,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 20 minutes, 30 seconds)
2025-05-07 11:02:08,164 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:02:08,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:05:36,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3725.59302 ± 1513.380
2025-05-07 11:05:36,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1241.1692, 3063.825, 4094.4224, 5309.448, 3971.9053, 5274.903, 4728.866, 3157.6338, 1048.1803, 5365.5757]
2025-05-07 11:05:36,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [237.0, 580.0, 778.0, 1000.0, 751.0, 1000.0, 890.0, 596.0, 195.0, 1000.0]
2025-05-07 11:05:36,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1124 [INFO]: New best (3725.59) for latency ExtremeSparseL4U32
2025-05-07 11:05:36,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 11:05:36,866 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:05:36,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 7 minutes, 58 seconds)
2025-05-07 11:18:03,073 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:18:03,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:19:25,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1459.48999 ± 912.339
2025-05-07 11:19:25,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1583.4955, 1292.358, 935.578, 3179.6753, 753.6362, 1722.5728, 1219.808, 723.449, 2978.694, 205.63338]
2025-05-07 11:19:25,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [300.0, 257.0, 192.0, 583.0, 152.0, 315.0, 228.0, 133.0, 557.0, 40.0]
2025-05-07 11:19:25,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 52 minutes, 11 seconds)
2025-05-07 11:30:52,754 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:30:53,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:33:14,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2499.98242 ± 1899.876
2025-05-07 11:33:14,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [4202.157, 3609.4226, 1987.0074, 185.4489, 2877.0764, 382.43518, 160.65073, 5267.2925, 1201.023, 5127.3086]
2025-05-07 11:33:14,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [810.0, 694.0, 382.0, 36.0, 540.0, 83.0, 31.0, 1000.0, 223.0, 993.0]
2025-05-07 11:33:15,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 39 minutes, 20 seconds)
2025-05-07 11:44:56,028 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:44:56,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:46:20,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1467.58411 ± 970.093
2025-05-07 11:46:20,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1133.7731, 1376.7032, 3581.6858, 1506.5948, 2440.9856, 1897.6311, 1012.89716, 162.39491, 1428.2228, 134.95177]
2025-05-07 11:46:20,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [215.0, 277.0, 709.0, 281.0, 452.0, 375.0, 201.0, 31.0, 265.0, 26.0]
2025-05-07 11:46:20,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 22 minutes, 14 seconds)
2025-05-07 11:58:52,847 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:58:52,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:00:44,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 1960.73792 ± 1656.214
2025-05-07 12:00:45,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1080.2135, 1180.4323, 882.50824, 4333.225, 1479.0885, 140.4809, 3472.3882, 530.6654, 5258.3647, 1250.0125]
2025-05-07 12:00:45,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [230.0, 232.0, 168.0, 824.0, 293.0, 27.0, 678.0, 96.0, 1000.0, 240.0]
2025-05-07 12:00:45,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 10 minutes, 1 second)
2025-05-07 12:11:46,529 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:11:46,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:14:39,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3043.81714 ± 1146.583
2025-05-07 12:14:39,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [2080.1511, 4173.5503, 3986.4111, 5163.354, 3806.233, 2796.8303, 2226.9478, 1735.8439, 3018.668, 1450.182]
2025-05-07 12:14:39,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [409.0, 785.0, 750.0, 1000.0, 738.0, 559.0, 441.0, 326.0, 576.0, 294.0]
2025-05-07 12:14:39,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 97/100 (estimated time remaining: 55 minutes, 14 seconds)
2025-05-07 12:27:05,911 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:27:05,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:29:08,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2121.35596 ± 1685.702
2025-05-07 12:29:08,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [4591.333, 5117.486, 135.56483, 2611.8357, 803.7082, 2407.2388, 3192.1309, 1418.3822, 130.18796, 805.6906]
2025-05-07 12:29:08,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [880.0, 1000.0, 26.0, 513.0, 162.0, 482.0, 622.0, 285.0, 25.0, 162.0]
2025-05-07 12:29:08,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 98/100 (estimated time remaining: 41 minutes, 50 seconds)
2025-05-07 12:40:41,439 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:40:41,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:42:47,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2247.82666 ± 1555.761
2025-05-07 12:42:47,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [748.73334, 2930.5298, 2028.7427, 806.2893, 699.66644, 1089.1243, 5405.432, 4153.248, 1304.077, 3312.4236]
2025-05-07 12:42:47,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [140.0, 562.0, 376.0, 168.0, 147.0, 211.0, 1000.0, 803.0, 244.0, 627.0]
2025-05-07 12:42:47,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 99/100 (estimated time remaining: 27 minutes, 49 seconds)
2025-05-07 12:54:29,906 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:54:29,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:56:38,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 2296.61646 ± 1719.117
2025-05-07 12:56:38,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [5281.7617, 1175.839, 852.5353, 1083.8558, 890.01434, 3969.0527, 5155.0244, 881.4015, 1358.077, 2318.6028]
2025-05-07 12:56:38,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [1000.0, 228.0, 154.0, 199.0, 166.0, 743.0, 1000.0, 160.0, 261.0, 435.0]
2025-05-07 12:56:38,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1097 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 3 seconds)
2025-05-07 13:07:56,999 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:07:57,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:11:28,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1119 [DEBUG]: Total Reward: 3681.50708 ± 1651.862
2025-05-07 13:11:29,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1120 [DEBUG]: All rewards: [1700.1641, 5266.735, 1529.5065, 5177.316, 2732.2668, 5301.6797, 5045.0547, 3709.6108, 1166.0355, 5186.7007]
2025-05-07 13:11:29,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [338.0, 1000.0, 310.0, 1000.0, 523.0, 1000.0, 946.0, 714.0, 208.0, 1000.0]
2025-05-07 13:11:29,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1149 [DEBUG]: Training session finished
