2025-09-13 06:13:48,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-13 06:13:48,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-13 06:13:48,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x145ba33e9910>}
2025-09-13 06:13:48,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1111 [DEBUG]: using device: cuda
2025-09-13 06:13:48,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1133 [INFO]: Creating new trainer
2025-09-13 06:13:48,407 baseline-mbpac-noiseperc5-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-13 06:13:48,407 baseline-mbpac-noiseperc5-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 06:13:48,417 baseline-mbpac-noiseperc5-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-13 06:13:49,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1194 [DEBUG]: Starting training session...
2025-09-13 06:13:49,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 1/100
2025-09-13 06:25:52,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:25:52,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:26:14,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 356.22614 ± 57.285
2025-09-13 06:26:14,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [361.59613, 348.19653, 371.978, 285.5809, 292.76273, 409.3677, 291.89645, 386.5755, 335.09332, 479.21417]
2025-09-13 06:26:14,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 65.0, 70.0, 53.0, 54.0, 78.0, 57.0, 75.0, 72.0, 100.0]
2025-09-13 06:26:14,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (356.23) for latency ExtremeSparseL4U32
2025-09-13 06:26:14,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 29 minutes, 18 seconds)
2025-09-13 06:38:03,080 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:38:03,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:38:19,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 270.38907 ± 79.844
2025-09-13 06:38:19,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [389.71124, 307.84854, 175.1671, 135.56055, 389.80615, 301.25974, 245.8669, 273.6992, 200.767, 284.20398]
2025-09-13 06:38:19,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 58.0, 34.0, 26.0, 73.0, 57.0, 49.0, 52.0, 39.0, 54.0]
2025-09-13 06:38:19,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 20 hours, 16 seconds)
2025-09-13 06:50:11,765 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:50:11,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:50:32,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 356.08777 ± 106.188
2025-09-13 06:50:32,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [529.1796, 387.24066, 358.02475, 338.10345, 145.21022, 212.82751, 321.46616, 403.16727, 407.19058, 458.46732]
2025-09-13 06:50:32,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 71.0, 66.0, 62.0, 28.0, 45.0, 59.0, 74.0, 75.0, 85.0]
2025-09-13 06:50:32,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 47 minutes, 11 seconds)
2025-09-13 07:02:17,281 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:02:17,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:02:46,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 467.50391 ± 116.350
2025-09-13 07:02:46,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [404.31372, 345.2183, 436.9604, 478.19287, 647.31, 376.75674, 721.36365, 371.6524, 459.09448, 434.17603]
2025-09-13 07:02:46,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 77.0, 83.0, 88.0, 138.0, 70.0, 153.0, 79.0, 85.0, 83.0]
2025-09-13 07:02:46,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (467.50) for latency ExtremeSparseL4U32
2025-09-13 07:02:46,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 34 minutes, 39 seconds)
2025-09-13 07:14:34,513 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:14:34,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:14:55,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 366.93600 ± 135.937
2025-09-13 07:14:55,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [380.17227, 405.44724, 565.1638, 371.03165, 339.73276, 384.9685, 586.0818, 335.32993, 161.41132, 140.02086]
2025-09-13 07:14:55,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 75.0, 106.0, 69.0, 63.0, 71.0, 110.0, 61.0, 31.0, 27.0]
2025-09-13 07:14:55,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 21 minutes)
2025-09-13 07:26:39,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:26:39,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:27:05,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 410.03043 ± 93.621
2025-09-13 07:27:05,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [185.41095, 456.59204, 385.66876, 529.7768, 402.55112, 407.33902, 382.78662, 548.25446, 397.87106, 404.0535]
2025-09-13 07:27:05,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 93.0, 71.0, 100.0, 84.0, 88.0, 77.0, 103.0, 85.0, 84.0]
2025-09-13 07:27:05,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 3 minutes, 52 seconds)
2025-09-13 07:38:48,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:38:48,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:39:16,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 458.01270 ± 138.410
2025-09-13 07:39:16,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [280.91718, 482.1222, 374.77765, 454.62576, 430.10553, 836.45984, 453.77728, 460.36475, 440.35425, 366.62183]
2025-09-13 07:39:16,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 90.0, 81.0, 91.0, 91.0, 161.0, 85.0, 95.0, 81.0, 80.0]
2025-09-13 07:39:16,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 53 minutes, 48 seconds)
2025-09-13 07:51:01,100 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:51:01,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:51:29,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 473.97125 ± 78.850
2025-09-13 07:51:29,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [477.50256, 417.29123, 560.8989, 629.8068, 551.4328, 402.4229, 372.66516, 480.53183, 406.21445, 440.94574]
2025-09-13 07:51:29,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 77.0, 119.0, 121.0, 102.0, 74.0, 71.0, 90.0, 75.0, 84.0]
2025-09-13 07:51:29,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (473.97) for latency ExtremeSparseL4U32
2025-09-13 07:51:29,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 41 minutes, 18 seconds)
2025-09-13 08:03:16,956 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:03:16,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:03:43,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 456.62378 ± 106.997
2025-09-13 08:03:43,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [421.53284, 414.7578, 481.75397, 598.37866, 468.34488, 171.81363, 502.40744, 514.50946, 475.0147, 517.7244]
2025-09-13 08:03:43,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 77.0, 89.0, 113.0, 90.0, 33.0, 93.0, 94.0, 88.0, 97.0]
2025-09-13 08:03:43,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 29 minutes, 17 seconds)
2025-09-13 08:15:26,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:15:26,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:15:58,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 503.77679 ± 85.392
2025-09-13 08:15:58,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [565.9864, 397.44485, 608.08984, 402.99487, 443.7708, 669.71466, 434.80997, 496.41022, 490.61172, 527.93475]
2025-09-13 08:15:58,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 84.0, 117.0, 89.0, 91.0, 128.0, 81.0, 93.0, 93.0, 114.0]
2025-09-13 08:15:58,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (503.78) for latency ExtremeSparseL4U32
2025-09-13 08:15:58,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 18 minutes, 44 seconds)
2025-09-13 08:27:38,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:27:38,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:28:08,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 503.23895 ± 65.367
2025-09-13 08:28:08,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [568.6994, 504.53613, 416.8459, 489.6941, 483.72583, 549.4476, 469.5293, 471.3135, 430.85077, 647.74695]
2025-09-13 08:28:08,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 95.0, 78.0, 92.0, 90.0, 108.0, 89.0, 102.0, 82.0, 122.0]
2025-09-13 08:28:08,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 6 minutes, 52 seconds)
2025-09-13 08:39:57,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:39:57,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:40:23,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 450.21869 ± 164.168
2025-09-13 08:40:23,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [704.1838, 508.3875, 454.75656, 425.00653, 620.6508, 477.42194, 169.54163, 156.06956, 459.86987, 526.2987]
2025-09-13 08:40:23,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 95.0, 85.0, 79.0, 129.0, 89.0, 33.0, 30.0, 90.0, 97.0]
2025-09-13 08:40:23,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 55 minutes, 34 seconds)
2025-09-13 08:52:00,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:52:00,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:52:31,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 524.29126 ± 97.148
2025-09-13 08:52:31,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [505.53024, 472.38226, 518.5903, 431.4925, 416.87894, 402.33926, 667.03, 695.0151, 527.2225, 606.43097]
2025-09-13 08:52:31,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 88.0, 97.0, 81.0, 78.0, 74.0, 126.0, 132.0, 100.0, 129.0]
2025-09-13 08:52:31,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (524.29) for latency ExtremeSparseL4U32
2025-09-13 08:52:31,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 42 minutes, 3 seconds)
2025-09-13 09:04:14,186 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:04:14,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:04:45,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 528.92737 ± 73.138
2025-09-13 09:04:45,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [440.40958, 533.7976, 602.37286, 533.0046, 692.7976, 535.19977, 432.4278, 492.03967, 480.5409, 546.68353]
2025-09-13 09:04:45,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 98.0, 127.0, 99.0, 147.0, 100.0, 84.0, 106.0, 88.0, 101.0]
2025-09-13 09:04:45,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (528.93) for latency ExtremeSparseL4U32
2025-09-13 09:04:45,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 29 minutes, 49 seconds)
2025-09-13 09:16:31,150 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:16:31,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:17:04,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 538.32062 ± 62.546
2025-09-13 09:17:04,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [531.8778, 511.70352, 645.8149, 493.95248, 550.2695, 477.99103, 485.44553, 499.94864, 519.6368, 666.56573]
2025-09-13 09:17:04,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 95.0, 123.0, 92.0, 118.0, 102.0, 105.0, 92.0, 111.0, 126.0]
2025-09-13 09:17:04,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (538.32) for latency ExtremeSparseL4U32
2025-09-13 09:17:04,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 18 minutes, 34 seconds)
2025-09-13 09:28:43,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:28:43,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:29:15,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 560.89374 ± 122.390
2025-09-13 09:29:15,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [567.2942, 590.9023, 392.29517, 537.1193, 864.149, 565.6362, 453.67047, 485.75052, 647.7224, 504.39746]
2025-09-13 09:29:15,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 112.0, 73.0, 100.0, 160.0, 104.0, 85.0, 91.0, 124.0, 93.0]
2025-09-13 09:29:15,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (560.89) for latency ExtremeSparseL4U32
2025-09-13 09:29:15,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 6 minutes, 41 seconds)
2025-09-13 09:40:58,293 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:40:58,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:41:26,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 468.52548 ± 152.333
2025-09-13 09:41:26,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [510.32297, 623.30475, 484.96625, 684.5938, 479.51666, 255.51018, 482.04987, 396.0823, 601.6289, 167.27953]
2025-09-13 09:41:26,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 118.0, 95.0, 127.0, 90.0, 49.0, 105.0, 72.0, 113.0, 32.0]
2025-09-13 09:41:26,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 53 minutes, 18 seconds)
2025-09-13 09:53:03,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:53:03,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:53:39,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 610.02747 ± 103.187
2025-09-13 09:53:39,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [647.31537, 583.3404, 596.9489, 820.8957, 463.95953, 565.4448, 501.28473, 553.0568, 756.1088, 611.92]
2025-09-13 09:53:39,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 113.0, 113.0, 157.0, 97.0, 108.0, 98.0, 103.0, 150.0, 128.0]
2025-09-13 09:53:39,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (610.03) for latency ExtremeSparseL4U32
2025-09-13 09:53:39,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 42 minutes, 37 seconds)
2025-09-13 10:05:21,987 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:05:22,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:05:54,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 526.80731 ± 80.284
2025-09-13 10:05:54,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [532.8796, 453.19458, 433.01138, 628.45984, 613.11096, 517.7134, 500.13885, 393.77307, 554.1882, 641.6037]
2025-09-13 10:05:54,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 98.0, 82.0, 131.0, 127.0, 101.0, 93.0, 86.0, 107.0, 123.0]
2025-09-13 10:05:54,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 30 minutes, 39 seconds)
2025-09-13 10:17:38,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:17:38,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:18:14,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 586.64911 ± 123.191
2025-09-13 10:18:14,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [635.7354, 619.8569, 588.9301, 580.4701, 341.27606, 684.5096, 394.23395, 784.16455, 631.6132, 605.7014]
2025-09-13 10:18:14,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 115.0, 111.0, 107.0, 64.0, 127.0, 83.0, 148.0, 133.0, 127.0]
2025-09-13 10:18:14,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 18 minutes, 48 seconds)
2025-09-13 10:29:54,363 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:29:54,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:30:29,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 583.56873 ± 69.214
2025-09-13 10:30:29,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [654.7547, 547.87976, 562.8083, 693.76105, 491.4288, 598.2099, 470.14642, 570.2841, 577.9093, 668.50494]
2025-09-13 10:30:29,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 114.0, 108.0, 146.0, 104.0, 112.0, 86.0, 104.0, 123.0, 125.0]
2025-09-13 10:30:29,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 7 minutes, 20 seconds)
2025-09-13 10:42:18,904 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:42:18,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:42:56,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 623.84143 ± 158.940
2025-09-13 10:42:56,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [559.282, 770.5641, 450.00598, 465.18832, 613.16016, 505.77725, 470.0125, 765.19934, 683.2619, 955.9623]
2025-09-13 10:42:56,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 146.0, 96.0, 99.0, 114.0, 108.0, 90.0, 157.0, 129.0, 188.0]
2025-09-13 10:42:56,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (623.84) for latency ExtremeSparseL4U32
2025-09-13 10:42:56,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 59 minutes, 37 seconds)
2025-09-13 10:54:36,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:54:36,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:55:06,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 504.70200 ± 165.958
2025-09-13 10:55:06,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [394.9224, 160.57208, 377.87512, 356.70663, 595.2258, 598.77075, 595.81494, 603.2794, 614.5596, 749.29346]
2025-09-13 10:55:06,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 31.0, 70.0, 67.0, 112.0, 125.0, 120.0, 115.0, 116.0, 158.0]
2025-09-13 10:55:06,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 46 minutes, 19 seconds)
2025-09-13 11:06:52,934 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:06:52,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:07:29,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 614.36194 ± 179.960
2025-09-13 11:07:29,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [140.61188, 645.0473, 587.00604, 641.99445, 700.51984, 710.5899, 877.5973, 667.68915, 643.5842, 528.9794]
2025-09-13 11:07:29,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 123.0, 108.0, 118.0, 136.0, 152.0, 177.0, 127.0, 124.0, 99.0]
2025-09-13 11:07:29,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 35 minutes, 57 seconds)
2025-09-13 11:19:03,742 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:19:03,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:19:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 658.76691 ± 115.394
2025-09-13 11:19:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [589.23773, 637.00885, 642.95135, 489.38458, 918.3327, 559.8212, 614.20746, 763.19434, 747.11975, 626.41095]
2025-09-13 11:19:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 126.0, 121.0, 91.0, 193.0, 107.0, 114.0, 163.0, 139.0, 122.0]
2025-09-13 11:19:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (658.77) for latency ExtremeSparseL4U32
2025-09-13 11:19:43,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 22 minutes, 12 seconds)
2025-09-13 11:31:27,584 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:31:27,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:32:10,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 686.74243 ± 169.257
2025-09-13 11:32:10,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [563.336, 832.64124, 870.6801, 633.48236, 703.4402, 635.5289, 467.347, 979.47626, 761.2349, 420.25775]
2025-09-13 11:32:10,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 175.0, 169.0, 134.0, 153.0, 118.0, 95.0, 188.0, 144.0, 90.0]
2025-09-13 11:32:10,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (686.74) for latency ExtremeSparseL4U32
2025-09-13 11:32:10,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 12 minutes, 58 seconds)
2025-09-13 11:43:49,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:43:49,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:44:26,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 612.85724 ± 191.619
2025-09-13 11:44:26,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [841.73737, 160.5915, 621.8227, 501.63516, 531.44025, 651.2342, 602.3803, 594.06067, 882.7179, 740.9528]
2025-09-13 11:44:26,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 31.0, 116.0, 94.0, 99.0, 131.0, 118.0, 125.0, 172.0, 139.0]
2025-09-13 11:44:26,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 57 minutes, 42 seconds)
2025-09-13 11:56:18,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:56:18,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:56:57,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 660.96625 ± 155.313
2025-09-13 11:56:57,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [524.97125, 556.3877, 742.3671, 822.2361, 483.03522, 615.02216, 562.22864, 565.49884, 727.53174, 1010.384]
2025-09-13 11:56:57,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 107.0, 139.0, 159.0, 104.0, 114.0, 105.0, 105.0, 140.0, 194.0]
2025-09-13 11:56:57,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 50 minutes, 34 seconds)
2025-09-13 12:08:31,760 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:08:31,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:09:13,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 697.85205 ± 142.609
2025-09-13 12:09:13,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [760.70636, 680.40735, 845.3268, 782.28394, 518.9461, 776.3049, 585.77747, 952.7443, 579.54193, 496.48157]
2025-09-13 12:09:13,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 129.0, 164.0, 152.0, 98.0, 155.0, 127.0, 178.0, 123.0, 111.0]
2025-09-13 12:09:13,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (697.85) for latency ExtremeSparseL4U32
2025-09-13 12:09:13,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 36 minutes, 42 seconds)
2025-09-13 12:20:54,687 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:20:54,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:21:37,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 736.81232 ± 284.951
2025-09-13 12:21:37,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1072.9148, 724.74615, 485.11444, 1006.9443, 1052.2916, 554.8722, 763.05743, 985.7969, 567.4014, 154.98386]
2025-09-13 12:21:37,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [218.0, 135.0, 95.0, 188.0, 217.0, 105.0, 142.0, 186.0, 108.0, 30.0]
2025-09-13 12:21:37,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (736.81) for latency ExtremeSparseL4U32
2025-09-13 12:21:37,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 26 minutes, 41 seconds)
2025-09-13 12:33:23,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:33:23,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:34:10,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 810.98297 ± 453.119
2025-09-13 12:34:10,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [956.004, 434.90778, 1439.9652, 630.3846, 514.156, 794.6836, 532.93225, 1710.7781, 965.64624, 130.37148]
2025-09-13 12:34:10,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 92.0, 284.0, 133.0, 96.0, 152.0, 98.0, 325.0, 184.0, 25.0]
2025-09-13 12:34:10,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (810.98) for latency ExtremeSparseL4U32
2025-09-13 12:34:10,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 15 minutes, 43 seconds)
2025-09-13 12:45:56,371 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:45:56,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:46:46,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 866.40265 ± 284.903
2025-09-13 12:46:46,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [581.8647, 913.721, 1431.0776, 686.3499, 465.89322, 930.37634, 636.68726, 794.0576, 981.46295, 1242.536]
2025-09-13 12:46:46,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 168.0, 273.0, 128.0, 84.0, 177.0, 131.0, 147.0, 180.0, 234.0]
2025-09-13 12:46:46,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (866.40) for latency ExtremeSparseL4U32
2025-09-13 12:46:46,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 7 minutes, 42 seconds)
2025-09-13 12:58:34,695 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:58:34,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:59:07,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 561.42590 ± 244.173
2025-09-13 12:59:07,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [909.24664, 641.3335, 401.93622, 583.3948, 666.1385, 842.1024, 633.95905, 134.7083, 639.6511, 161.78822]
2025-09-13 12:59:07,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 119.0, 74.0, 110.0, 123.0, 176.0, 120.0, 26.0, 118.0, 31.0]
2025-09-13 12:59:07,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 53 minutes, 12 seconds)
2025-09-13 13:10:38,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:10:38,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:11:21,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 721.33246 ± 211.262
2025-09-13 13:11:21,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [772.5361, 531.7668, 887.61914, 406.7023, 950.09625, 891.80615, 605.0635, 962.0341, 379.0331, 826.66754]
2025-09-13 13:11:21,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 105.0, 169.0, 89.0, 178.0, 168.0, 114.0, 190.0, 72.0, 156.0]
2025-09-13 13:11:21,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 40 minutes, 5 seconds)
2025-09-13 13:23:06,527 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:23:06,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:23:58,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 892.12891 ± 411.641
2025-09-13 13:23:58,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [610.784, 129.78555, 717.3171, 1194.0363, 988.91754, 576.6026, 983.26196, 1002.7132, 942.29346, 1775.5775]
2025-09-13 13:23:58,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 25.0, 134.0, 227.0, 189.0, 117.0, 186.0, 188.0, 177.0, 336.0]
2025-09-13 13:23:58,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (892.13) for latency ExtremeSparseL4U32
2025-09-13 13:23:58,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 30 minutes, 27 seconds)
2025-09-13 13:35:36,988 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:35:36,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:36:44,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1113.37231 ± 352.152
2025-09-13 13:36:44,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1712.2776, 1088.2039, 1128.8201, 992.49664, 889.0914, 895.55273, 951.2737, 834.0457, 1858.4224, 783.53876]
2025-09-13 13:36:44,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [353.0, 235.0, 221.0, 189.0, 166.0, 168.0, 181.0, 174.0, 371.0, 147.0]
2025-09-13 13:36:44,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1113.37) for latency ExtremeSparseL4U32
2025-09-13 13:36:44,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 20 minutes, 40 seconds)
2025-09-13 13:48:48,890 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:48:48,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:49:18,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 521.53625 ± 221.972
2025-09-13 13:49:18,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [155.39723, 197.69305, 520.6803, 610.8676, 679.6815, 519.11053, 907.12177, 749.76575, 491.85678, 383.18787]
2025-09-13 13:49:18,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 38.0, 95.0, 110.0, 123.0, 97.0, 168.0, 140.0, 90.0, 73.0]
2025-09-13 13:49:18,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 7 minutes, 56 seconds)
2025-09-13 14:00:35,975 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:00:35,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:01:33,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 978.14276 ± 403.931
2025-09-13 14:01:33,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [953.892, 2014.4324, 862.26385, 904.0746, 1113.978, 734.8056, 686.93915, 378.0607, 1062.756, 1070.225]
2025-09-13 14:01:33,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 414.0, 162.0, 173.0, 209.0, 148.0, 129.0, 70.0, 202.0, 207.0]
2025-09-13 14:01:33,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 54 minutes, 9 seconds)
2025-09-13 14:13:25,512 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:13:25,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:14:26,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1031.27563 ± 468.829
2025-09-13 14:14:26,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1113.347, 744.3314, 698.6138, 1671.2195, 661.06635, 145.4586, 1108.7666, 1385.1372, 1763.1908, 1021.6258]
2025-09-13 14:14:26,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 142.0, 130.0, 313.0, 124.0, 28.0, 212.0, 266.0, 350.0, 198.0]
2025-09-13 14:14:26,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 49 minutes, 34 seconds)
2025-09-13 14:26:05,376 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:26:05,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:27:17,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1206.71448 ± 467.586
2025-09-13 14:27:17,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1310.9342, 798.8891, 572.2166, 1863.9438, 402.52167, 1870.2797, 1310.1017, 1280.695, 1473.0718, 1184.4927]
2025-09-13 14:27:17,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 150.0, 107.0, 359.0, 73.0, 366.0, 252.0, 243.0, 284.0, 227.0]
2025-09-13 14:27:17,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1206.71) for latency ExtremeSparseL4U32
2025-09-13 14:27:17,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 39 minutes, 44 seconds)
2025-09-13 14:39:06,928 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:39:06,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:40:31,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1425.37305 ± 542.507
2025-09-13 14:40:31,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1038.3066, 1603.971, 2330.9011, 573.8039, 1765.5166, 2215.2095, 1091.6091, 914.38586, 1578.7412, 1141.2858]
2025-09-13 14:40:31,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 303.0, 441.0, 124.0, 350.0, 424.0, 202.0, 190.0, 297.0, 232.0]
2025-09-13 14:40:31,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1425.37) for latency ExtremeSparseL4U32
2025-09-13 14:40:31,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 32 minutes, 44 seconds)
2025-09-13 14:52:16,865 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:52:16,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:52:58,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 708.31934 ± 629.781
2025-09-13 14:52:58,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [173.49077, 155.34526, 2136.062, 1254.9087, 145.65749, 150.6982, 682.6454, 318.47327, 1185.0488, 880.8635]
2025-09-13 14:52:58,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 30.0, 407.0, 234.0, 28.0, 29.0, 126.0, 66.0, 223.0, 168.0]
2025-09-13 14:52:58,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 18 minutes, 38 seconds)
2025-09-13 15:04:30,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:04:30,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:05:37,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1106.06274 ± 827.199
2025-09-13 15:05:37,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [391.62677, 3074.8533, 1208.6548, 1692.0018, 1071.6616, 1502.6609, 737.441, 165.58652, 145.83302, 1070.3076]
2025-09-13 15:05:37,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 601.0, 241.0, 327.0, 202.0, 307.0, 164.0, 32.0, 28.0, 201.0]
2025-09-13 15:05:37,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 10 minutes, 16 seconds)
2025-09-13 15:17:37,053 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:17:37,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:18:37,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1017.17627 ± 284.149
2025-09-13 15:18:37,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [964.02094, 918.95154, 1317.8488, 1377.7493, 514.432, 600.04407, 899.27814, 1290.3318, 1260.3118, 1028.7948]
2025-09-13 15:18:37,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 195.0, 248.0, 264.0, 96.0, 115.0, 167.0, 246.0, 252.0, 194.0]
2025-09-13 15:18:37,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 58 minutes, 55 seconds)
2025-09-13 15:30:13,051 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:30:13,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:31:47,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1605.14954 ± 968.013
2025-09-13 15:31:47,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [897.43854, 3159.1477, 1812.4519, 1126.4569, 734.4942, 1327.354, 966.71875, 1236.0717, 3728.665, 1062.6964]
2025-09-13 15:31:47,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 600.0, 352.0, 208.0, 138.0, 272.0, 197.0, 240.0, 717.0, 198.0]
2025-09-13 15:31:47,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1605.15) for latency ExtremeSparseL4U32
2025-09-13 15:31:47,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 49 minutes, 34 seconds)
2025-09-13 15:43:25,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:43:25,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:44:16,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 888.36072 ± 486.282
2025-09-13 15:44:16,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1150.6251, 1503.1986, 1395.8936, 655.80945, 382.05368, 169.79556, 183.12343, 1440.8435, 948.65283, 1053.6113]
2025-09-13 15:44:16,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 284.0, 260.0, 121.0, 71.0, 33.0, 35.0, 272.0, 194.0, 195.0]
2025-09-13 15:44:16,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 28 minutes, 29 seconds)
2025-09-13 15:55:57,366 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:55:57,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:56:59,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1038.36792 ± 639.639
2025-09-13 15:56:59,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1745.2863, 561.1812, 191.9688, 735.1127, 536.91284, 914.2266, 1425.0258, 1839.5212, 370.64508, 2063.7986]
2025-09-13 15:56:59,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [369.0, 105.0, 37.0, 135.0, 96.0, 191.0, 294.0, 360.0, 72.0, 403.0]
2025-09-13 15:56:59,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 18 minutes, 34 seconds)
2025-09-13 16:09:07,220 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:09:07,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:10:15,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1178.58545 ± 993.770
2025-09-13 16:10:15,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [178.23836, 176.77939, 3483.324, 1384.0144, 931.6893, 1041.6757, 2174.7542, 710.01337, 140.8499, 1564.5166]
2025-09-13 16:10:15,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 34.0, 668.0, 260.0, 179.0, 191.0, 420.0, 136.0, 27.0, 294.0]
2025-09-13 16:10:15,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 12 minutes, 8 seconds)
2025-09-13 16:22:04,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:22:04,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:23:26,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1350.15759 ± 1146.140
2025-09-13 16:23:26,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1507.6921, 1278.3193, 4326.2183, 518.00415, 1999.4705, 578.8702, 1803.07, 779.6647, 233.19556, 477.07166]
2025-09-13 16:23:26,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [319.0, 260.0, 836.0, 100.0, 385.0, 108.0, 362.0, 149.0, 45.0, 100.0]
2025-09-13 16:23:26,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 1 minute, 10 seconds)
2025-09-13 16:34:55,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:34:55,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:36:19,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1412.85010 ± 828.235
2025-09-13 16:36:19,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1476.2817, 1834.4468, 2445.9497, 956.2155, 972.4463, 2810.3687, 1800.3954, 1495.7897, 180.17447, 156.43214]
2025-09-13 16:36:19,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [297.0, 341.0, 462.0, 183.0, 181.0, 539.0, 343.0, 281.0, 35.0, 30.0]
2025-09-13 16:36:19,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 45 minutes, 16 seconds)
2025-09-13 16:48:03,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:48:03,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:49:30,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1454.46655 ± 1186.231
2025-09-13 16:49:30,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [4089.6443, 1085.6267, 1126.3522, 1923.4863, 655.8492, 181.05724, 339.0813, 1846.3751, 437.9772, 2859.2156]
2025-09-13 16:49:30,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [783.0, 216.0, 217.0, 387.0, 130.0, 35.0, 64.0, 355.0, 80.0, 564.0]
2025-09-13 16:49:30,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 39 minutes, 22 seconds)
2025-09-13 17:00:59,934 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:00:59,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:02:44,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1714.84937 ± 933.063
2025-09-13 17:02:44,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3669.8564, 1680.4657, 1206.0975, 141.12802, 2575.5132, 1513.6265, 2216.4136, 1890.0372, 683.6194, 1571.737]
2025-09-13 17:02:44,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [723.0, 326.0, 224.0, 27.0, 502.0, 296.0, 449.0, 356.0, 147.0, 306.0]
2025-09-13 17:02:44,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1714.85) for latency ExtremeSparseL4U32
2025-09-13 17:02:44,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 31 minutes, 7 seconds)
2025-09-13 17:14:52,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:14:52,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:17:00,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2109.50073 ± 1068.891
2025-09-13 17:17:00,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2485.0293, 2227.8755, 1627.2372, 2080.7366, 2213.0066, 4860.159, 2272.9055, 573.11176, 1354.258, 1400.6868]
2025-09-13 17:17:00,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [478.0, 435.0, 348.0, 408.0, 429.0, 939.0, 434.0, 118.0, 277.0, 290.0]
2025-09-13 17:17:00,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2109.50) for latency ExtremeSparseL4U32
2025-09-13 17:17:00,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 27 minutes, 28 seconds)
2025-09-13 17:28:13,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:28:13,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:29:50,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1662.41443 ± 980.144
2025-09-13 17:29:50,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [985.1063, 1278.4243, 825.02924, 3203.3462, 3698.1638, 849.811, 1740.5365, 1881.9696, 639.476, 1522.2832]
2025-09-13 17:29:50,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 242.0, 155.0, 606.0, 702.0, 174.0, 335.0, 358.0, 121.0, 289.0]
2025-09-13 17:29:50,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 10 minutes, 46 seconds)
2025-09-13 17:41:39,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:41:39,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:42:43,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1044.29565 ± 1062.205
2025-09-13 17:42:43,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [166.11887, 185.27292, 140.4952, 3777.0461, 1192.7538, 1431.9714, 467.09122, 649.8315, 596.2268, 1836.149]
2025-09-13 17:42:43,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 36.0, 27.0, 754.0, 242.0, 294.0, 95.0, 143.0, 120.0, 359.0]
2025-09-13 17:42:43,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 57 minutes, 37 seconds)
2025-09-13 17:54:43,474 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:54:43,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:56:51,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2160.00635 ± 1245.318
2025-09-13 17:56:51,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1630.9478, 1762.1455, 2382.5537, 3698.7961, 3212.685, 1801.2695, 1909.2024, 4403.884, 156.79515, 641.783]
2025-09-13 17:56:51,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [299.0, 335.0, 483.0, 686.0, 607.0, 349.0, 357.0, 852.0, 30.0, 121.0]
2025-09-13 17:56:51,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2160.01) for latency ExtremeSparseL4U32
2025-09-13 17:56:51,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 52 minutes, 33 seconds)
2025-09-13 18:08:33,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:08:33,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:10:48,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2231.64087 ± 1497.376
2025-09-13 18:10:48,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2789.227, 2303.2236, 5113.2695, 484.19238, 3177.2903, 643.03796, 171.52463, 3589.0679, 2783.092, 1262.4838]
2025-09-13 18:10:48,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [548.0, 449.0, 993.0, 91.0, 608.0, 134.0, 33.0, 707.0, 517.0, 245.0]
2025-09-13 18:10:48,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2231.64) for latency ExtremeSparseL4U32
2025-09-13 18:10:48,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 45 minutes, 24 seconds)
2025-09-13 18:22:16,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:22:16,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:24:12,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1979.76099 ± 709.294
2025-09-13 18:24:12,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1840.1548, 822.76465, 2572.189, 2036.4265, 3203.9705, 1085.0754, 1448.3114, 1779.763, 2775.055, 2233.9006]
2025-09-13 18:24:12,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [348.0, 162.0, 480.0, 388.0, 607.0, 215.0, 268.0, 353.0, 529.0, 419.0]
2025-09-13 18:24:12,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 24 minutes, 27 seconds)
2025-09-13 18:35:51,748 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:35:51,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:37:16,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1454.62524 ± 995.385
2025-09-13 18:37:16,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3244.1758, 2072.198, 1443.7076, 1563.7284, 200.00322, 155.55458, 1717.5426, 541.6918, 828.1485, 2779.5012]
2025-09-13 18:37:16,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [622.0, 398.0, 270.0, 302.0, 38.0, 30.0, 333.0, 100.0, 155.0, 533.0]
2025-09-13 18:37:16,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 12 minutes, 59 seconds)
2025-09-13 18:50:00,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:50:00,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:51:31,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1569.89905 ± 1333.439
2025-09-13 18:51:31,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [789.0693, 1109.187, 1782.1156, 3938.7046, 4217.0303, 1096.148, 1501.8232, 491.3677, 613.89856, 159.64684]
2025-09-13 18:51:31,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 218.0, 336.0, 749.0, 805.0, 208.0, 285.0, 92.0, 111.0, 31.0]
2025-09-13 18:51:31,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 10 minutes, 24 seconds)
2025-09-13 19:02:40,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:02:40,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:05:00,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2352.27295 ± 1374.487
2025-09-13 19:05:00,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [170.74352, 3768.7375, 4967.684, 1125.2711, 3598.0974, 943.4978, 2444.431, 1989.519, 2464.7073, 2050.0417]
2025-09-13 19:05:00,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 733.0, 939.0, 207.0, 687.0, 200.0, 494.0, 376.0, 462.0, 409.0]
2025-09-13 19:05:00,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2352.27) for latency ExtremeSparseL4U32
2025-09-13 19:05:00,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 51 minutes, 37 seconds)
2025-09-13 19:16:42,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:16:42,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:19:08,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2440.73169 ± 1171.797
2025-09-13 19:19:08,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1259.1492, 1816.4875, 2330.3616, 3084.9233, 1428.2407, 1825.8201, 3776.562, 1785.3575, 1901.1146, 5199.299]
2025-09-13 19:19:08,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [256.0, 364.0, 441.0, 582.0, 282.0, 354.0, 710.0, 343.0, 354.0, 1000.0]
2025-09-13 19:19:08,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2440.73) for latency ExtremeSparseL4U32
2025-09-13 19:19:08,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 39 minutes, 21 seconds)
2025-09-13 19:30:38,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:30:38,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:32:03,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1355.75977 ± 475.290
2025-09-13 19:32:03,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2165.9834, 864.77515, 2205.3772, 932.7383, 1137.3853, 989.70526, 1224.5889, 1590.8529, 1515.7788, 930.4117]
2025-09-13 19:32:03,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [406.0, 182.0, 446.0, 193.0, 239.0, 215.0, 261.0, 331.0, 288.0, 190.0]
2025-09-13 19:32:03,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 22 minutes, 4 seconds)
2025-09-13 19:43:35,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:43:35,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:46:15,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2754.20190 ± 1833.768
2025-09-13 19:46:15,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5112.285, 423.84216, 5101.676, 1004.07587, 2922.294, 3070.6628, 167.2905, 2554.4985, 5295.0186, 1890.3754]
2025-09-13 19:46:15,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [949.0, 80.0, 973.0, 205.0, 551.0, 577.0, 32.0, 473.0, 1000.0, 367.0]
2025-09-13 19:46:15,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2754.20) for latency ExtremeSparseL4U32
2025-09-13 19:46:15,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 16 minutes, 39 seconds)
2025-09-13 19:58:13,675 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:58:13,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:59:46,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1575.47888 ± 1885.005
2025-09-13 19:59:46,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5248.586, 712.8776, 859.54736, 190.18805, 166.89738, 589.92267, 1278.5765, 1274.8523, 5278.5522, 154.78859]
2025-09-13 19:59:46,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 132.0, 162.0, 37.0, 32.0, 111.0, 237.0, 238.0, 1000.0, 30.0]
2025-09-13 19:59:46,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 57 minutes, 46 seconds)
2025-09-13 20:11:27,164 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:11:27,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:13:35,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2202.62866 ± 1429.081
2025-09-13 20:13:35,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [854.98755, 2562.8582, 645.32263, 2820.506, 4989.634, 1380.2252, 1121.9221, 1673.4274, 1481.6699, 4495.7344]
2025-09-13 20:13:35,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 473.0, 119.0, 523.0, 934.0, 256.0, 216.0, 324.0, 295.0, 850.0]
2025-09-13 20:13:35,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 46 minutes, 24 seconds)
2025-09-13 20:25:55,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:25:55,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:28:24,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2501.63721 ± 2042.999
2025-09-13 20:28:24,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [364.75128, 140.8581, 170.72932, 5150.2217, 2614.2961, 5220.661, 3691.1404, 915.48175, 5147.775, 1600.4579]
2025-09-13 20:28:24,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 27.0, 33.0, 1000.0, 501.0, 1000.0, 691.0, 184.0, 1000.0, 322.0]
2025-09-13 20:28:24,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 37 minutes, 11 seconds)
2025-09-13 20:39:26,595 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:39:26,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:41:46,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2431.09888 ± 1534.794
2025-09-13 20:41:46,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1176.6847, 5286.4434, 4399.918, 2034.8156, 3679.4104, 2046.6066, 160.4517, 2907.274, 1619.7626, 999.6222]
2025-09-13 20:41:46,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 989.0, 830.0, 383.0, 699.0, 382.0, 31.0, 549.0, 322.0, 206.0]
2025-09-13 20:41:46,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 26 minutes, 14 seconds)
2025-09-13 20:53:28,888 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:53:28,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:55:35,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2172.48779 ± 1672.596
2025-09-13 20:55:35,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [928.7108, 5282.305, 592.9444, 640.0067, 2303.3206, 2639.9238, 4628.7637, 413.8865, 3277.008, 1018.00745]
2025-09-13 20:55:35,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 1000.0, 124.0, 136.0, 430.0, 499.0, 879.0, 78.0, 603.0, 207.0]
2025-09-13 20:55:35,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 9 minutes, 55 seconds)
2025-09-13 21:07:38,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:07:38,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:08:57,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1361.41724 ± 1521.367
2025-09-13 21:08:57,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [4626.441, 821.55865, 172.27803, 566.48285, 3200.0847, 479.4596, 161.08235, 165.37003, 511.98312, 2909.4329]
2025-09-13 21:08:57,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [890.0, 157.0, 33.0, 120.0, 606.0, 87.0, 31.0, 32.0, 95.0, 561.0]
2025-09-13 21:08:57,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 55 minutes, 8 seconds)
2025-09-13 21:20:36,807 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:20:36,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:23:05,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2434.51050 ± 1627.031
2025-09-13 21:23:05,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [150.9322, 5247.7646, 2350.9026, 1798.7412, 1653.7969, 445.07224, 2713.0986, 1896.1776, 5230.5366, 2858.085]
2025-09-13 21:23:05,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 1000.0, 453.0, 365.0, 337.0, 90.0, 532.0, 376.0, 1000.0, 588.0]
2025-09-13 21:23:05,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 43 minutes, 2 seconds)
2025-09-13 21:34:38,367 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:34:38,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:35:30,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 911.04474 ± 824.790
2025-09-13 21:35:30,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [145.14839, 744.4171, 124.84698, 1422.4589, 2841.978, 194.45842, 165.06404, 810.97626, 1602.858, 1058.2412]
2025-09-13 21:35:30,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 142.0, 24.0, 267.0, 550.0, 38.0, 32.0, 145.0, 295.0, 201.0]
2025-09-13 21:35:30,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 15 minutes, 43 seconds)
2025-09-13 21:47:24,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:47:24,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:49:39,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2359.75537 ± 1809.214
2025-09-13 21:49:39,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [169.91945, 532.4284, 2635.1506, 2119.5356, 5326.7153, 1674.4467, 5386.0786, 3442.5664, 2145.2405, 165.47372]
2025-09-13 21:49:39,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 95.0, 492.0, 392.0, 1000.0, 321.0, 1000.0, 641.0, 396.0, 32.0]
2025-09-13 21:49:39,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 6 minutes, 30 seconds)
2025-09-13 22:01:31,275 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:01:31,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:03:53,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2370.00732 ± 1933.507
2025-09-13 22:03:53,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2714.3337, 400.6616, 783.9411, 135.10434, 187.66289, 1647.3243, 5256.8833, 3072.077, 5212.9478, 4289.136]
2025-09-13 22:03:53,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [540.0, 73.0, 165.0, 26.0, 36.0, 333.0, 1000.0, 610.0, 1000.0, 834.0]
2025-09-13 22:03:53,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 55 minutes, 11 seconds)
2025-09-13 22:15:16,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:15:16,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:17:51,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2557.00049 ± 1699.764
2025-09-13 22:17:51,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1139.1298, 4171.6543, 4665.8926, 3136.5288, 796.2541, 5185.1875, 2190.6064, 3315.523, 788.32965, 180.89995]
2025-09-13 22:17:51,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 834.0, 903.0, 626.0, 151.0, 1000.0, 424.0, 646.0, 168.0, 35.0]
2025-09-13 22:17:51,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 44 minutes, 26 seconds)
2025-09-13 22:30:34,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:30:34,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:32:36,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2007.88635 ± 1703.757
2025-09-13 22:32:36,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2863.1423, 1743.949, 1636.9774, 943.83704, 284.36884, 243.78378, 519.81885, 4987.4463, 1732.1672, 5123.373]
2025-09-13 22:32:36,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [575.0, 350.0, 316.0, 176.0, 55.0, 48.0, 95.0, 1000.0, 343.0, 1000.0]
2025-09-13 22:32:36,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 33 minutes, 42 seconds)
2025-09-13 22:43:57,444 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:43:57,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:46:29,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2598.43701 ± 1683.938
2025-09-13 22:46:29,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [141.1136, 1017.12067, 1440.744, 3212.6328, 5394.948, 1729.7255, 3604.9285, 5329.653, 1574.3483, 2539.1545]
2025-09-13 22:46:29,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 194.0, 286.0, 630.0, 1000.0, 337.0, 671.0, 1000.0, 302.0, 494.0]
2025-09-13 22:46:29,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 26 minutes, 32 seconds)
2025-09-13 22:57:30,166 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:57:30,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:00:17,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2802.80518 ± 1837.536
2025-09-13 23:00:17,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2170.7441, 176.6948, 694.7217, 1796.1664, 1205.5592, 4454.091, 5262.395, 4794.3643, 5146.287, 2327.0269]
2025-09-13 23:00:17,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [414.0, 34.0, 127.0, 331.0, 236.0, 869.0, 1000.0, 914.0, 1000.0, 476.0]
2025-09-13 23:00:17,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2802.81) for latency ExtremeSparseL4U32
2025-09-13 23:00:17,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 10 minutes, 50 seconds)
2025-09-13 23:12:45,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:12:45,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:14:52,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2111.14380 ± 2132.214
2025-09-13 23:14:52,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5143.7554, 5211.4673, 141.26324, 5156.518, 1266.0665, 2633.746, 166.14645, 1091.6533, 124.91993, 175.89946]
2025-09-13 23:14:52,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 27.0, 1000.0, 260.0, 510.0, 32.0, 207.0, 24.0, 34.0]
2025-09-13 23:14:52,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 58 minutes, 6 seconds)
2025-09-13 23:26:04,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:26:04,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:27:59,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1970.79822 ± 1886.343
2025-09-13 23:27:59,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3488.813, 129.63623, 5207.5454, 2106.2136, 5117.0757, 1810.4763, 874.4372, 155.7861, 626.1766, 191.8247]
2025-09-13 23:27:59,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [654.0, 25.0, 1000.0, 388.0, 956.0, 334.0, 185.0, 30.0, 116.0, 37.0]
2025-09-13 23:27:59,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 40 minutes, 33 seconds)
2025-09-13 23:39:55,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:39:55,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:42:28,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2626.20752 ± 2134.716
2025-09-13 23:42:28,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [4297.2134, 5337.6743, 5288.8374, 3106.7144, 660.4234, 441.45145, 5161.465, 1620.3683, 202.22957, 145.69586]
2025-09-13 23:42:28,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [817.0, 1000.0, 1000.0, 581.0, 142.0, 87.0, 1000.0, 329.0, 39.0, 28.0]
2025-09-13 23:42:28,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 25 minutes, 28 seconds)
2025-09-13 23:54:55,836 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:54:55,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:57:31,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2693.01318 ± 1973.703
2025-09-13 23:57:31,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2742.434, 1513.8744, 592.63965, 5232.7725, 5267.547, 198.5847, 436.92584, 1796.2946, 3912.856, 5236.2056]
2025-09-13 23:57:31,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [515.0, 284.0, 111.0, 1000.0, 1000.0, 39.0, 80.0, 344.0, 760.0, 1000.0]
2025-09-13 23:57:31,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 15 minutes, 43 seconds)
2025-09-14 00:08:17,397 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:08:17,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:10:31,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2271.24878 ± 1495.967
2025-09-14 00:10:31,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1382.5619, 3805.4062, 3636.0344, 5110.3784, 2598.557, 2100.3572, 1662.8267, 2059.2368, 144.28131, 212.84843]
2025-09-14 00:10:31,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [255.0, 723.0, 691.0, 1000.0, 486.0, 427.0, 324.0, 389.0, 28.0, 41.0]
2025-09-14 00:10:31,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 58 minutes, 47 seconds)
2025-09-14 00:22:51,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:22:51,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:25:44,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3024.91919 ± 1891.378
2025-09-14 00:25:44,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1270.8016, 1132.2659, 150.10408, 1998.0106, 2063.3127, 5210.6865, 5350.618, 5340.571, 2890.3606, 4842.46]
2025-09-14 00:25:44,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [255.0, 211.0, 29.0, 371.0, 386.0, 999.0, 1000.0, 1000.0, 533.0, 911.0]
2025-09-14 00:25:44,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (3024.92) for latency ExtremeSparseL4U32
2025-09-14 00:25:44,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 46 minutes, 45 seconds)
2025-09-14 00:37:22,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:37:22,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:40:19,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2968.08447 ± 1891.930
2025-09-14 00:40:19,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5116.249, 4414.673, 2399.857, 4700.204, 169.14648, 927.1359, 2749.9978, 152.21527, 5209.4985, 3841.867]
2025-09-14 00:40:19,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 855.0, 486.0, 932.0, 33.0, 178.0, 525.0, 29.0, 1000.0, 757.0]
2025-09-14 00:40:19,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 37 minutes)
2025-09-14 00:51:25,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:51:25,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:52:57,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1544.22083 ± 587.233
2025-09-14 00:52:57,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1400.8047, 1870.832, 923.04047, 872.5709, 2587.7024, 1098.6445, 2501.008, 1576.0171, 1017.1233, 1594.4652]
2025-09-14 00:52:57,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [261.0, 357.0, 184.0, 166.0, 505.0, 228.0, 479.0, 318.0, 192.0, 331.0]
2025-09-14 00:52:57,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 17 minutes, 20 seconds)
2025-09-14 01:04:42,934 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:04:42,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:06:15,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1563.20410 ± 1009.659
2025-09-14 01:06:15,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [485.21805, 161.96257, 2394.7224, 1324.6951, 3051.7346, 1864.5731, 3146.0762, 1708.8866, 468.76706, 1025.4048]
2025-09-14 01:06:15,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 31.0, 471.0, 262.0, 589.0, 371.0, 591.0, 322.0, 90.0, 213.0]
2025-09-14 01:06:15,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 58 minutes, 40 seconds)
2025-09-14 01:18:07,292 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:18:07,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:20:09,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1947.66992 ± 1007.666
2025-09-14 01:20:09,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2058.7266, 1129.5957, 2411.165, 2442.3594, 342.9383, 3015.4714, 566.08075, 1784.9613, 1944.1144, 3781.287]
2025-09-14 01:20:09,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [430.0, 226.0, 489.0, 489.0, 64.0, 578.0, 121.0, 378.0, 394.0, 753.0]
2025-09-14 01:20:09,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 47 minutes, 6 seconds)
2025-09-14 01:31:54,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:31:54,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:33:12,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1279.29785 ± 713.237
2025-09-14 01:33:12,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1742.9513, 2067.6614, 1168.6996, 465.38678, 497.1568, 2443.0266, 979.9586, 2129.295, 695.78235, 603.0606]
2025-09-14 01:33:12,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [328.0, 406.0, 229.0, 88.0, 93.0, 485.0, 182.0, 450.0, 132.0, 127.0]
2025-09-14 01:33:12,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 28 minutes, 25 seconds)
2025-09-14 01:45:00,647 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:45:00,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:47:26,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2508.97144 ± 2159.056
2025-09-14 01:47:26,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [124.79479, 4413.791, 5247.172, 5306.3623, 500.2993, 167.14503, 5346.018, 976.5943, 1357.7319, 1649.8071]
2025-09-14 01:47:26,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 834.0, 1000.0, 1000.0, 93.0, 32.0, 1000.0, 189.0, 278.0, 316.0]
2025-09-14 01:47:26,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 14 minutes, 14 seconds)
2025-09-14 01:58:55,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:58:55,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:00:59,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2081.43311 ± 1492.081
2025-09-14 02:00:59,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1773.8099, 2110.2297, 893.34753, 697.3494, 4563.848, 5243.578, 955.0783, 1678.446, 1922.926, 975.7163]
2025-09-14 02:00:59,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [351.0, 425.0, 168.0, 141.0, 869.0, 1000.0, 192.0, 333.0, 363.0, 208.0]
2025-09-14 02:00:59,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 2 minutes, 27 seconds)
2025-09-14 02:13:05,776 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:13:05,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:15:12,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2187.08350 ± 1890.311
2025-09-14 02:15:12,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1183.7186, 5204.0884, 5205.512, 3657.4172, 3169.8506, 124.46611, 186.49124, 1236.769, 1716.3372, 186.18332]
2025-09-14 02:15:12,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [222.0, 1000.0, 1000.0, 696.0, 600.0, 24.0, 36.0, 236.0, 327.0, 36.0]
2025-09-14 02:15:12,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 50 minutes, 19 seconds)
2025-09-14 02:26:44,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:26:44,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:29:32,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2846.56470 ± 1832.994
2025-09-14 02:29:32,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [4990.1094, 1275.6573, 180.69568, 2457.479, 5184.235, 4253.5938, 1923.7571, 3244.49, 4774.7227, 180.90729]
2025-09-14 02:29:32,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [947.0, 246.0, 35.0, 481.0, 1000.0, 838.0, 358.0, 622.0, 915.0, 35.0]
2025-09-14 02:29:32,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 37 minutes, 8 seconds)
2025-09-14 02:41:14,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:41:14,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:42:40,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1434.00415 ± 1923.209
2025-09-14 02:42:40,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1517.2437, 5188.714, 846.5802, 159.45753, 156.09654, 161.25017, 198.08467, 5184.0503, 771.69354, 156.87132]
2025-09-14 02:42:40,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [292.0, 1000.0, 173.0, 31.0, 30.0, 31.0, 39.0, 1000.0, 145.0, 30.0]
2025-09-14 02:42:40,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 23 minutes, 21 seconds)
2025-09-14 02:55:08,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:55:08,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:57:26,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2434.70654 ± 1667.087
2025-09-14 02:57:26,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3377.5923, 5348.433, 2220.5386, 2726.5918, 155.70662, 3990.9797, 185.24413, 393.10934, 2320.5085, 3628.3606]
2025-09-14 02:57:26,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [611.0, 1000.0, 417.0, 510.0, 30.0, 739.0, 36.0, 77.0, 437.0, 680.0]
2025-09-14 02:57:26,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 9 minutes, 59 seconds)
2025-09-14 03:08:16,708 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:08:16,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:10:08,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1881.33923 ± 1777.119
2025-09-14 03:10:08,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2419.9697, 5244.019, 1569.7529, 1571.2444, 581.57025, 4939.379, 1964.5009, 150.88528, 182.09341, 189.97598]
2025-09-14 03:10:08,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [484.0, 1000.0, 298.0, 315.0, 109.0, 937.0, 380.0, 29.0, 35.0, 37.0]
2025-09-14 03:10:08,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 55 minutes, 19 seconds)
2025-09-14 03:22:18,473 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:22:18,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:24:19,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2051.91602 ± 1639.676
2025-09-14 03:24:19,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [729.84406, 165.87404, 1132.9541, 842.2986, 2358.1792, 5243.884, 3960.6455, 2060.1758, 378.0528, 3647.253]
2025-09-14 03:24:19,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 32.0, 212.0, 156.0, 453.0, 1000.0, 751.0, 388.0, 81.0, 684.0]
2025-09-14 03:24:19,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 41 minutes, 28 seconds)
2025-09-14 03:35:59,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:35:59,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:37:20,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1362.80054 ± 1428.312
2025-09-14 03:37:20,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [165.62907, 1626.1091, 2307.5073, 842.5812, 182.7817, 1084.9922, 1650.9381, 485.71643, 176.50607, 5105.244]
2025-09-14 03:37:20,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 327.0, 431.0, 164.0, 35.0, 220.0, 313.0, 89.0, 34.0, 1000.0]
2025-09-14 03:37:20,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 27 minutes, 7 seconds)
2025-09-14 03:48:44,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:48:44,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:50:50,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2099.45923 ± 1737.861
2025-09-14 03:50:50,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [633.07965, 696.43475, 1722.9037, 1278.505, 751.7766, 5074.363, 5117.9785, 2169.8752, 3398.6633, 151.01245]
2025-09-14 03:50:50,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 153.0, 348.0, 264.0, 159.0, 969.0, 1000.0, 414.0, 661.0, 29.0]
2025-09-14 03:50:50,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 38 seconds)
2025-09-14 04:02:48,351 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:02:48,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:05:04,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2302.52661 ± 1555.236
2025-09-14 04:05:04,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2904.2185, 1774.687, 3837.5461, 3399.0618, 1198.1536, 3006.2698, 155.12907, 140.18475, 1437.3331, 5172.681]
2025-09-14 04:05:04,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [564.0, 352.0, 740.0, 679.0, 208.0, 586.0, 30.0, 27.0, 290.0, 1000.0]
2025-09-14 04:05:04,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1251 [DEBUG]: Training session finished
