2025-09-13 12:26:54,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-13 12:26:54,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-13 12:26:54,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x154159af9010>}
2025-09-13 12:26:54,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-13 12:26:54,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1133 [INFO]: Creating new trainer
2025-09-13 12:26:54,163 baseline-mbpac-noiseperc25-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-13 12:26:54,163 baseline-mbpac-noiseperc25-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 12:26:54,173 baseline-mbpac-noiseperc25-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-13 12:26:55,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-13 12:26:55,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-13 12:38:41,191 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:38:41,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:39:01,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 361.34155 ± 101.005
2025-09-13 12:39:01,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [334.962, 306.06332, 582.61633, 260.76962, 475.84906, 315.04996, 285.52164, 259.7996, 448.44937, 344.3345]
2025-09-13 12:39:01,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 56.0, 111.0, 48.0, 89.0, 58.0, 54.0, 48.0, 89.0, 64.0]
2025-09-13 12:39:01,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (361.34) for latency ExtremeSparseL4U32
2025-09-13 12:39:01,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 58 minutes, 18 seconds)
2025-09-13 12:50:26,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:50:26,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:50:50,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 392.14371 ± 84.191
2025-09-13 12:50:50,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [376.40207, 367.23596, 351.90356, 323.6612, 368.2573, 301.95303, 557.55286, 538.5366, 418.15808, 317.77634]
2025-09-13 12:50:50,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 68.0, 65.0, 65.0, 73.0, 70.0, 116.0, 103.0, 78.0, 70.0]
2025-09-13 12:50:50,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (392.14) for latency ExtremeSparseL4U32
2025-09-13 12:50:50,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 31 minutes, 48 seconds)
2025-09-13 13:02:12,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:02:12,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:02:26,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 243.94974 ± 162.813
2025-09-13 13:02:26,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.16256, 94.50738, 88.81732, 101.60807, 365.96558, 284.46606, 321.03622, 88.77281, 540.3757, 452.7856]
2025-09-13 13:02:26,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 18.0, 20.0, 69.0, 62.0, 69.0, 18.0, 101.0, 86.0]
2025-09-13 13:02:26,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 8 minutes, 23 seconds)
2025-09-13 13:13:47,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:13:47,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:14:09,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 373.76807 ± 67.493
2025-09-13 13:14:09,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [318.22736, 374.3704, 417.32782, 413.75854, 365.00076, 304.57892, 261.73846, 368.683, 395.04587, 518.94934]
2025-09-13 13:14:09,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 80.0, 87.0, 79.0, 67.0, 70.0, 61.0, 69.0, 74.0, 101.0]
2025-09-13 13:14:09,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 53 minutes, 44 seconds)
2025-09-13 13:25:34,742 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:25:34,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:25:54,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 331.37213 ± 29.796
2025-09-13 13:25:54,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [313.49097, 344.55692, 345.95337, 313.40753, 373.22934, 310.9685, 289.65503, 385.14917, 299.61533, 337.69553]
2025-09-13 13:25:54,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 65.0, 64.0, 66.0, 78.0, 70.0, 56.0, 71.0, 55.0, 65.0]
2025-09-13 13:25:54,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 40 minutes, 41 seconds)
2025-09-13 13:37:16,309 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:37:16,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:37:41,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 410.73447 ± 129.477
2025-09-13 13:37:41,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [768.7668, 403.5267, 429.05896, 319.96777, 371.40674, 381.79987, 391.19968, 318.6819, 447.30905, 275.62753]
2025-09-13 13:37:41,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 83.0, 92.0, 70.0, 82.0, 82.0, 83.0, 62.0, 86.0, 56.0]
2025-09-13 13:37:41,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (410.73) for latency ExtremeSparseL4U32
2025-09-13 13:37:41,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 22 minutes, 48 seconds)
2025-09-13 13:49:08,365 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:49:08,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:49:35,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 459.19342 ± 135.614
2025-09-13 13:49:35,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [433.9599, 336.3495, 524.8747, 510.18964, 347.1018, 538.4535, 754.55817, 534.6565, 270.6404, 341.14975]
2025-09-13 13:49:35,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 65.0, 102.0, 100.0, 63.0, 100.0, 159.0, 106.0, 51.0, 64.0]
2025-09-13 13:49:35,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (459.19) for latency ExtremeSparseL4U32
2025-09-13 13:49:35,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 12 minutes, 51 seconds)
2025-09-13 14:01:15,602 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:01:15,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:01:37,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 386.59540 ± 115.179
2025-09-13 14:01:37,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [324.76825, 489.8301, 457.04868, 331.4242, 532.5216, 150.15355, 512.4534, 362.1044, 268.40375, 437.246]
2025-09-13 14:01:37,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 90.0, 86.0, 62.0, 107.0, 29.0, 102.0, 67.0, 50.0, 83.0]
2025-09-13 14:01:37,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 9 minutes, 2 seconds)
2025-09-13 14:13:18,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:13:18,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:13:42,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 396.23230 ± 125.836
2025-09-13 14:13:42,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [386.77356, 614.91626, 493.16666, 329.06607, 357.32645, 149.14693, 380.4821, 277.06644, 525.70123, 448.6772]
2025-09-13 14:13:42,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 131.0, 95.0, 74.0, 66.0, 29.0, 73.0, 59.0, 101.0, 83.0]
2025-09-13 14:13:42,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 3 minutes, 45 seconds)
2025-09-13 14:25:29,444 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:25:29,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:25:53,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 402.25128 ± 96.028
2025-09-13 14:25:53,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [436.11963, 348.821, 386.74622, 613.7514, 525.3248, 418.17847, 301.23032, 294.4368, 336.1362, 361.76794]
2025-09-13 14:25:53,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 66.0, 73.0, 118.0, 98.0, 80.0, 57.0, 55.0, 74.0, 80.0]
2025-09-13 14:25:53,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 59 minutes, 40 seconds)
2025-09-13 14:37:36,259 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:37:36,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:37:57,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 358.18723 ± 138.439
2025-09-13 14:37:57,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [169.27177, 112.772896, 322.45007, 516.8592, 315.75812, 463.89658, 520.2395, 519.5891, 344.58344, 296.45135]
2025-09-13 14:37:57,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 22.0, 61.0, 100.0, 59.0, 86.0, 110.0, 96.0, 74.0, 62.0]
2025-09-13 14:37:57,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 52 minutes, 44 seconds)
2025-09-13 14:49:42,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:49:42,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:50:02,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 360.07452 ± 142.703
2025-09-13 14:50:02,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [427.5727, 96.17186, 320.5087, 508.67868, 495.60446, 390.9471, 367.7892, 94.657814, 447.73615, 451.07855]
2025-09-13 14:50:02,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 19.0, 60.0, 96.0, 95.0, 74.0, 68.0, 19.0, 87.0, 85.0]
2025-09-13 14:50:02,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 43 minutes, 49 seconds)
2025-09-13 15:01:42,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:01:42,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:02:05,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 402.29681 ± 127.738
2025-09-13 15:02:05,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [473.57645, 397.7314, 499.88513, 309.73627, 410.84314, 363.78055, 363.134, 580.7206, 522.3325, 101.22804]
2025-09-13 15:02:05,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 75.0, 94.0, 58.0, 78.0, 71.0, 80.0, 108.0, 99.0, 20.0]
2025-09-13 15:02:05,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 31 minutes, 55 seconds)
2025-09-13 15:13:47,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:13:47,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:14:13,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 439.34976 ± 77.379
2025-09-13 15:14:13,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [382.0774, 391.31766, 392.73587, 460.58978, 578.5017, 381.28488, 442.9732, 569.9929, 332.73642, 461.28772]
2025-09-13 15:14:13,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 84.0, 73.0, 99.0, 114.0, 69.0, 83.0, 113.0, 61.0, 85.0]
2025-09-13 15:14:13,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 20 minutes, 46 seconds)
2025-09-13 15:25:43,100 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:25:43,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:26:05,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 390.85950 ± 51.708
2025-09-13 15:26:05,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [419.48297, 284.2528, 389.18695, 414.54327, 402.54166, 393.64236, 323.81863, 383.45523, 412.96793, 484.70306]
2025-09-13 15:26:05,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 52.0, 72.0, 76.0, 74.0, 74.0, 59.0, 72.0, 77.0, 93.0]
2025-09-13 15:26:05,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 3 minutes, 21 seconds)
2025-09-13 15:37:29,976 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:37:29,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:37:53,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 409.27435 ± 150.956
2025-09-13 15:37:53,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [416.3932, 487.39505, 376.8059, 548.02826, 543.6686, 497.48532, 113.38017, 417.67245, 146.04094, 545.8735]
2025-09-13 15:37:53,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 92.0, 68.0, 102.0, 101.0, 92.0, 22.0, 78.0, 28.0, 114.0]
2025-09-13 15:37:53,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 46 minutes, 52 seconds)
2025-09-13 15:49:21,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:49:21,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:49:49,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 472.18872 ± 137.686
2025-09-13 15:49:49,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [355.3119, 483.41696, 600.3318, 329.38843, 788.3996, 317.50775, 493.10965, 532.0264, 453.63406, 368.7604]
2025-09-13 15:49:49,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 89.0, 125.0, 60.0, 161.0, 60.0, 94.0, 99.0, 84.0, 68.0]
2025-09-13 15:49:49,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (472.19) for latency ExtremeSparseL4U32
2025-09-13 15:49:49,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 32 minutes, 22 seconds)
2025-09-13 16:01:18,422 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:01:18,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:01:44,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 458.17169 ± 78.244
2025-09-13 16:01:44,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [436.9383, 557.2981, 434.91635, 335.60028, 519.8952, 451.71838, 417.7722, 388.7951, 613.165, 425.61795]
2025-09-13 16:01:44,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 107.0, 86.0, 71.0, 98.0, 85.0, 80.0, 71.0, 114.0, 80.0]
2025-09-13 16:01:44,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 18 minutes, 17 seconds)
2025-09-13 16:13:11,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:13:11,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:13:28,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 305.57086 ± 143.222
2025-09-13 16:13:28,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [280.74567, 443.3567, 110.64206, 89.9648, 94.81134, 446.23117, 392.621, 356.17386, 431.62646, 409.53534]
2025-09-13 16:13:28,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 86.0, 22.0, 18.0, 19.0, 82.0, 73.0, 76.0, 80.0, 75.0]
2025-09-13 16:13:28,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 1 second)
2025-09-13 16:24:58,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:24:58,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:25:24,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 439.63751 ± 71.029
2025-09-13 16:25:24,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [301.14624, 466.23666, 414.8107, 346.03577, 504.86072, 461.645, 519.4881, 531.419, 400.46332, 450.26962]
2025-09-13 16:25:24,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 95.0, 86.0, 74.0, 97.0, 87.0, 97.0, 98.0, 73.0, 82.0]
2025-09-13 16:25:24,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 49 minutes, 6 seconds)
2025-09-13 16:36:55,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:36:55,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:37:15,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 351.41101 ± 114.254
2025-09-13 16:37:15,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [411.54556, 375.86545, 276.66397, 448.80667, 370.1357, 541.6978, 180.5526, 411.21286, 146.65877, 350.971]
2025-09-13 16:37:15,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 69.0, 52.0, 86.0, 71.0, 102.0, 35.0, 76.0, 28.0, 64.0]
2025-09-13 16:37:15,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 38 minutes, 6 seconds)
2025-09-13 16:48:42,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:48:42,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:49:10,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 506.51239 ± 147.386
2025-09-13 16:49:10,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [462.8153, 452.66153, 695.92505, 168.20503, 622.2124, 581.11755, 680.00934, 419.72247, 441.0522, 541.4034]
2025-09-13 16:49:10,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 85.0, 134.0, 32.0, 117.0, 109.0, 133.0, 78.0, 81.0, 106.0]
2025-09-13 16:49:10,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (506.51) for latency ExtremeSparseL4U32
2025-09-13 16:49:10,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 26 minutes)
2025-09-13 17:00:37,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:00:37,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:01:05,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 486.28516 ± 126.780
2025-09-13 17:01:05,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [459.77225, 485.00626, 658.6574, 330.6355, 294.57916, 541.53174, 481.44388, 503.2608, 722.51666, 385.44788]
2025-09-13 17:01:05,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 95.0, 125.0, 62.0, 64.0, 113.0, 99.0, 93.0, 137.0, 84.0]
2025-09-13 17:01:05,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 14 minutes, 7 seconds)
2025-09-13 17:12:37,306 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:12:37,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:13:01,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 408.42911 ± 101.271
2025-09-13 17:13:01,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [172.63066, 317.20178, 473.88052, 426.4511, 378.61285, 471.66553, 474.25137, 366.585, 447.9859, 555.0265]
2025-09-13 17:13:01,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 70.0, 87.0, 79.0, 70.0, 88.0, 99.0, 67.0, 97.0, 106.0]
2025-09-13 17:13:01,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 5 minutes, 3 seconds)
2025-09-13 17:24:32,549 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:24:32,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:24:50,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 314.91577 ± 134.684
2025-09-13 17:24:50,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [329.62973, 95.67813, 236.34372, 135.39677, 476.76727, 391.87946, 505.54187, 350.13577, 202.05678, 425.72815]
2025-09-13 17:24:50,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 19.0, 44.0, 26.0, 90.0, 72.0, 108.0, 65.0, 38.0, 88.0]
2025-09-13 17:24:50,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 14 hours, 51 minutes, 37 seconds)
2025-09-13 17:36:18,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:36:18,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:36:28,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 175.23526 ± 100.157
2025-09-13 17:36:28,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.74834, 121.248055, 99.99973, 218.45248, 102.35707, 118.05797, 168.038, 99.53096, 380.80307, 342.11682]
2025-09-13 17:36:28,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 24.0, 20.0, 46.0, 20.0, 23.0, 34.0, 20.0, 70.0, 71.0]
2025-09-13 17:36:28,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 36 minutes, 23 seconds)
2025-09-13 17:47:56,776 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:47:56,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:48:23,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 450.09210 ± 94.380
2025-09-13 17:48:23,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [250.59782, 416.76572, 447.29077, 461.2452, 652.97626, 481.82117, 439.0016, 400.67123, 507.15442, 443.39697]
2025-09-13 17:48:23,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [49.0, 78.0, 97.0, 85.0, 124.0, 88.0, 92.0, 86.0, 111.0, 96.0]
2025-09-13 17:48:23,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 24 minutes, 34 seconds)
2025-09-13 17:59:50,781 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:59:50,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:00:18,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 492.57870 ± 112.997
2025-09-13 18:00:18,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [322.04962, 530.39746, 450.36847, 536.0714, 454.57065, 774.58954, 491.99353, 493.7181, 491.3542, 380.67456]
2025-09-13 18:00:18,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 109.0, 96.0, 101.0, 85.0, 147.0, 91.0, 104.0, 92.0, 72.0]
2025-09-13 18:00:19,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 12 minutes, 47 seconds)
2025-09-13 18:11:48,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:11:48,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:12:16,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 459.42108 ± 52.025
2025-09-13 18:12:16,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [487.5148, 395.87833, 466.30133, 392.56323, 447.9618, 413.0383, 463.49902, 443.3591, 569.52844, 514.56616]
2025-09-13 18:12:16,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 88.0, 87.0, 75.0, 97.0, 78.0, 90.0, 94.0, 117.0, 109.0]
2025-09-13 18:12:16,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 1 minute, 26 seconds)
2025-09-13 18:23:46,106 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:23:46,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:24:15,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 510.76035 ± 131.151
2025-09-13 18:24:15,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [779.43353, 416.59964, 471.45346, 598.7792, 355.87424, 505.60742, 388.17157, 683.88416, 398.31732, 509.48315]
2025-09-13 18:24:15,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 76.0, 101.0, 114.0, 69.0, 98.0, 72.0, 130.0, 75.0, 98.0]
2025-09-13 18:24:15,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (510.76) for latency ExtremeSparseL4U32
2025-09-13 18:24:15,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 51 minutes, 43 seconds)
2025-09-13 18:35:50,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:35:50,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:36:13,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 360.76556 ± 120.690
2025-09-13 18:36:13,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [326.17548, 352.39993, 401.81647, 107.656334, 175.2259, 457.46466, 489.9487, 390.1723, 444.5974, 462.19843]
2025-09-13 18:36:13,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 78.0, 87.0, 21.0, 34.0, 98.0, 102.0, 82.0, 97.0, 84.0]
2025-09-13 18:36:13,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 44 minutes, 29 seconds)
2025-09-13 18:47:36,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:47:36,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:48:03,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 460.33966 ± 191.732
2025-09-13 18:48:03,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.481995, 560.7984, 612.44867, 515.0965, 440.93692, 410.34888, 366.86658, 343.3323, 399.6338, 865.4527]
2025-09-13 18:48:03,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 109.0, 115.0, 104.0, 82.0, 82.0, 71.0, 64.0, 83.0, 167.0]
2025-09-13 18:48:03,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 31 minutes, 21 seconds)
2025-09-13 18:59:31,780 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:59:31,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:59:50,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 317.44678 ± 154.163
2025-09-13 18:59:50,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [436.9871, 363.29813, 414.38043, 83.97985, 400.81485, 478.3369, 89.07802, 89.84972, 364.92264, 452.82004]
2025-09-13 18:59:50,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 67.0, 77.0, 17.0, 72.0, 99.0, 18.0, 18.0, 78.0, 93.0]
2025-09-13 18:59:50,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 17 minutes, 33 seconds)
2025-09-13 19:11:21,947 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:11:21,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:11:45,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 419.84164 ± 126.329
2025-09-13 19:11:45,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [445.09647, 556.55774, 475.79337, 134.31601, 297.66043, 457.6509, 305.6024, 535.4834, 467.8447, 522.4113]
2025-09-13 19:11:45,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 109.0, 103.0, 26.0, 54.0, 83.0, 57.0, 116.0, 85.0, 97.0]
2025-09-13 19:11:45,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 5 minutes, 11 seconds)
2025-09-13 19:23:16,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:23:16,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:23:43,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 467.49493 ± 69.561
2025-09-13 19:23:43,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [522.15173, 489.36685, 453.5601, 628.6146, 452.11514, 505.43707, 392.88467, 443.19476, 387.62613, 399.99802]
2025-09-13 19:23:43,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 101.0, 83.0, 119.0, 83.0, 98.0, 74.0, 84.0, 73.0, 73.0]
2025-09-13 19:23:43,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 53 minutes, 1 second)
2025-09-13 19:35:12,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:35:12,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:35:41,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 503.47549 ± 87.302
2025-09-13 19:35:41,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [571.49945, 336.08105, 637.9837, 602.1902, 463.1223, 465.17865, 534.6274, 412.56668, 464.57126, 546.93396]
2025-09-13 19:35:41,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 62.0, 124.0, 113.0, 88.0, 86.0, 106.0, 77.0, 87.0, 102.0]
2025-09-13 19:35:41,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 41 minutes, 8 seconds)
2025-09-13 19:47:10,883 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:47:10,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:47:39,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 515.06232 ± 115.276
2025-09-13 19:47:39,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [640.9437, 347.05045, 499.78223, 404.79715, 645.9545, 507.9494, 519.05084, 702.81946, 521.8793, 360.39603]
2025-09-13 19:47:39,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 64.0, 93.0, 75.0, 136.0, 93.0, 106.0, 131.0, 99.0, 66.0]
2025-09-13 19:47:39,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (515.06) for latency ExtremeSparseL4U32
2025-09-13 19:47:39,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 31 minutes, 3 seconds)
2025-09-13 19:59:10,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:59:10,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:59:36,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 462.40967 ± 141.798
2025-09-13 19:59:36,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [502.7021, 378.30917, 432.1086, 129.16107, 490.47968, 455.91504, 630.66754, 590.214, 631.5608, 382.9784]
2025-09-13 19:59:36,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 70.0, 86.0, 25.0, 91.0, 90.0, 122.0, 112.0, 127.0, 72.0]
2025-09-13 19:59:36,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 21 minutes, 12 seconds)
2025-09-13 20:11:02,677 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:11:02,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:11:26,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 405.91919 ± 170.429
2025-09-13 20:11:26,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [492.8103, 88.80808, 391.36053, 533.00745, 632.56915, 451.3551, 99.65248, 495.08603, 503.74356, 370.79895]
2025-09-13 20:11:26,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 18.0, 74.0, 114.0, 126.0, 88.0, 20.0, 92.0, 93.0, 81.0]
2025-09-13 20:11:26,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 8 minutes, 5 seconds)
2025-09-13 20:22:52,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:22:52,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:23:20,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 479.73770 ± 135.527
2025-09-13 20:23:20,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [333.9875, 396.9494, 783.25464, 409.63217, 673.68494, 444.91888, 404.72565, 362.72104, 494.37744, 493.1254]
2025-09-13 20:23:20,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 72.0, 150.0, 74.0, 128.0, 99.0, 76.0, 80.0, 90.0, 92.0]
2025-09-13 20:23:20,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 55 minutes, 22 seconds)
2025-09-13 20:34:48,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:34:48,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:35:13,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 438.39688 ± 141.299
2025-09-13 20:35:13,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [445.38788, 621.364, 567.8177, 94.552345, 376.6987, 406.93442, 355.22235, 568.41785, 470.37753, 477.1958]
2025-09-13 20:35:13,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 117.0, 113.0, 19.0, 69.0, 76.0, 66.0, 109.0, 87.0, 89.0]
2025-09-13 20:35:13,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 42 minutes, 28 seconds)
2025-09-13 20:46:42,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:46:42,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:47:10,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 490.51718 ± 65.002
2025-09-13 20:47:10,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [604.11414, 531.8184, 501.96344, 457.1137, 562.9062, 406.1432, 520.68164, 496.51163, 431.3076, 392.61203]
2025-09-13 20:47:10,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 103.0, 92.0, 85.0, 105.0, 77.0, 98.0, 93.0, 78.0, 71.0]
2025-09-13 20:47:10,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 30 minutes, 14 seconds)
2025-09-13 20:58:43,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:58:43,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:59:07,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 419.86566 ± 126.039
2025-09-13 20:59:07,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [621.2998, 363.59747, 515.28296, 520.5815, 435.40067, 130.44383, 401.66635, 465.35428, 322.62698, 422.40292]
2025-09-13 20:59:07,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 75.0, 94.0, 105.0, 81.0, 28.0, 74.0, 91.0, 65.0, 87.0]
2025-09-13 20:59:07,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 18 minutes, 24 seconds)
2025-09-13 21:10:36,064 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:10:36,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:10:54,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 330.30307 ± 206.610
2025-09-13 21:10:54,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [171.55869, 156.10815, 152.9902, 253.06195, 137.55267, 515.4836, 454.9527, 732.1839, 160.74582, 568.3932]
2025-09-13 21:10:54,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 30.0, 30.0, 52.0, 27.0, 97.0, 84.0, 155.0, 31.0, 105.0]
2025-09-13 21:10:54,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 6 minutes, 3 seconds)
2025-09-13 21:22:21,526 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:22:21,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:22:46,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 445.15228 ± 136.402
2025-09-13 21:22:46,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [429.56216, 510.02365, 397.2491, 491.81525, 533.0991, 613.4424, 95.86937, 355.8227, 544.52405, 480.11514]
2025-09-13 21:22:46,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 93.0, 73.0, 96.0, 99.0, 131.0, 19.0, 69.0, 102.0, 90.0]
2025-09-13 21:22:46,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 10 hours, 53 minutes, 53 seconds)
2025-09-13 21:34:13,406 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:34:13,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:34:30,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 299.36530 ± 168.464
2025-09-13 21:34:30,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [406.064, 334.4147, 96.088165, 419.62683, 100.72436, 88.79685, 134.99739, 367.04105, 526.87305, 519.0267]
2025-09-13 21:34:30,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 71.0, 19.0, 76.0, 20.0, 18.0, 26.0, 68.0, 103.0, 95.0]
2025-09-13 21:34:30,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 40 minutes, 18 seconds)
2025-09-13 21:46:00,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:46:00,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:46:26,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 462.88266 ± 159.939
2025-09-13 21:46:26,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [473.52905, 497.27768, 705.96454, 282.80447, 362.26297, 119.58764, 549.5726, 483.509, 555.3026, 599.0163]
2025-09-13 21:46:26,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 101.0, 136.0, 53.0, 69.0, 23.0, 101.0, 89.0, 105.0, 124.0]
2025-09-13 21:46:26,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 28 minutes, 19 seconds)
2025-09-13 21:57:59,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:57:59,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:58:26,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 503.37607 ± 141.563
2025-09-13 21:58:26,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [476.7154, 688.92114, 353.59595, 606.1294, 474.01047, 476.81427, 292.57568, 771.08813, 511.0979, 382.8126]
2025-09-13 21:58:26,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 128.0, 65.0, 112.0, 87.0, 89.0, 57.0, 145.0, 92.0, 73.0]
2025-09-13 21:58:26,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 16 minutes, 58 seconds)
2025-09-13 22:09:51,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:09:51,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:10:24,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 562.95874 ± 133.731
2025-09-13 22:10:24,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [312.55933, 738.0854, 623.90607, 674.4061, 386.16293, 657.4382, 474.86566, 666.4272, 476.3453, 619.39124]
2025-09-13 22:10:24,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 141.0, 115.0, 135.0, 77.0, 123.0, 88.0, 135.0, 91.0, 115.0]
2025-09-13 22:10:24,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (562.96) for latency ExtremeSparseL4U32
2025-09-13 22:10:24,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 6 minutes, 47 seconds)
2025-09-13 22:21:54,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:21:54,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:22:23,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 491.10376 ± 145.210
2025-09-13 22:22:23,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [522.09076, 589.62537, 741.99634, 413.23114, 466.4603, 397.6242, 168.80861, 472.86475, 621.3261, 517.01025]
2025-09-13 22:22:23,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 111.0, 141.0, 91.0, 85.0, 75.0, 33.0, 87.0, 118.0, 94.0]
2025-09-13 22:22:23,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 56 minutes, 4 seconds)
2025-09-13 22:33:47,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:33:47,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:34:10,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 407.98114 ± 227.222
2025-09-13 22:34:10,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [474.637, 344.11346, 577.529, 145.86162, 95.447136, 83.893654, 789.8962, 499.49484, 431.07196, 637.8665]
2025-09-13 22:34:10,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 65.0, 110.0, 30.0, 19.0, 17.0, 151.0, 92.0, 79.0, 119.0]
2025-09-13 22:34:10,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 44 minutes, 47 seconds)
2025-09-13 22:45:39,247 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:45:39,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:46:08,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 495.42914 ± 98.396
2025-09-13 22:46:08,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [508.72354, 386.92416, 329.78976, 427.983, 581.51685, 419.7958, 639.9729, 525.489, 505.0146, 629.08185]
2025-09-13 22:46:08,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 71.0, 72.0, 93.0, 124.0, 77.0, 129.0, 100.0, 104.0, 118.0]
2025-09-13 22:46:08,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 33 minutes, 5 seconds)
2025-09-13 22:57:44,020 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:57:44,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:58:11,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 472.59552 ± 100.568
2025-09-13 22:58:11,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [497.3711, 469.35718, 297.6232, 621.8428, 612.37634, 541.4165, 499.79904, 419.51404, 427.2308, 339.42383]
2025-09-13 22:58:11,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 87.0, 63.0, 125.0, 114.0, 102.0, 91.0, 77.0, 88.0, 64.0]
2025-09-13 22:58:11,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 21 minutes, 36 seconds)
2025-09-13 23:09:33,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:09:33,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:09:56,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 413.69562 ± 193.056
2025-09-13 23:09:56,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [90.01727, 667.0379, 452.88052, 84.146095, 412.4688, 588.9848, 412.1745, 486.27853, 311.88962, 631.07806]
2025-09-13 23:09:56,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 122.0, 83.0, 17.0, 82.0, 120.0, 76.0, 91.0, 61.0, 130.0]
2025-09-13 23:09:56,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 7 minutes, 47 seconds)
2025-09-13 23:21:30,632 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:21:30,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:22:02,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 552.63593 ± 109.600
2025-09-13 23:22:02,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [594.506, 462.3099, 706.3121, 607.4047, 714.5386, 436.54395, 518.21, 611.5642, 517.2552, 357.7142]
2025-09-13 23:22:02,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 93.0, 138.0, 120.0, 135.0, 97.0, 97.0, 115.0, 99.0, 69.0]
2025-09-13 23:22:02,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 56 minutes, 56 seconds)
2025-09-13 23:33:32,070 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:33:32,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:34:00,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 477.33209 ± 194.717
2025-09-13 23:34:00,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [489.3058, 338.51804, 106.9983, 390.8008, 788.36945, 339.70038, 656.6109, 699.7079, 575.9563, 387.3529]
2025-09-13 23:34:00,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 62.0, 21.0, 79.0, 156.0, 65.0, 127.0, 144.0, 112.0, 85.0]
2025-09-13 23:34:00,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 46 minutes, 32 seconds)
2025-09-13 23:45:32,593 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:45:32,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:45:55,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 369.69980 ± 131.750
2025-09-13 23:45:55,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [342.19153, 490.57562, 365.52087, 501.31317, 458.06656, 415.7064, 497.6808, 361.28247, 102.25571, 162.40515]
2025-09-13 23:45:55,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 106.0, 79.0, 102.0, 91.0, 90.0, 92.0, 79.0, 20.0, 32.0]
2025-09-13 23:45:55,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 34 minutes, 7 seconds)
2025-09-13 23:57:18,571 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:57:18,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:57:44,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 452.27353 ± 241.683
2025-09-13 23:57:44,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [754.5408, 542.25256, 590.1641, 149.20694, 523.059, 848.648, 148.98878, 113.69726, 468.93787, 383.2398]
2025-09-13 23:57:44,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 100.0, 112.0, 29.0, 95.0, 162.0, 30.0, 22.0, 86.0, 71.0]
2025-09-13 23:57:44,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 20 minutes, 13 seconds)
2025-09-14 00:09:08,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:09:08,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:09:36,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 482.11288 ± 90.623
2025-09-14 00:09:36,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [349.44727, 391.1743, 401.9947, 600.7881, 495.26675, 621.9456, 562.57117, 394.60474, 477.89862, 525.4379]
2025-09-14 00:09:36,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 71.0, 74.0, 115.0, 98.0, 122.0, 102.0, 72.0, 89.0, 112.0]
2025-09-14 00:09:36,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 9 minutes, 10 seconds)
2025-09-14 00:21:00,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:21:00,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:21:25,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 445.18985 ± 154.911
2025-09-14 00:21:25,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [323.8656, 704.35284, 83.8419, 418.95294, 520.0606, 375.45865, 498.9085, 491.88168, 538.4982, 496.07755]
2025-09-14 00:21:25,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 134.0, 17.0, 77.0, 100.0, 69.0, 91.0, 94.0, 101.0, 91.0]
2025-09-14 00:21:25,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 55 minutes, 4 seconds)
2025-09-14 00:33:04,190 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:33:04,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:33:30,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 447.96924 ± 110.174
2025-09-14 00:33:30,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [525.49976, 399.4064, 378.47955, 530.2406, 447.7276, 520.2499, 280.06216, 261.33392, 614.09406, 522.59845]
2025-09-14 00:33:30,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 74.0, 75.0, 104.0, 82.0, 109.0, 55.0, 53.0, 117.0, 111.0]
2025-09-14 00:33:30,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 44 minutes, 5 seconds)
2025-09-14 00:45:00,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:45:00,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:45:20,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 347.79254 ± 180.278
2025-09-14 00:45:20,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [464.4587, 397.50925, 89.30802, 112.37559, 95.96799, 467.77075, 567.2928, 409.0403, 293.9403, 580.2614]
2025-09-14 00:45:20,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 72.0, 18.0, 22.0, 19.0, 100.0, 106.0, 85.0, 55.0, 126.0]
2025-09-14 00:45:20,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 31 minutes, 35 seconds)
2025-09-14 00:56:40,000 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:56:40,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:57:04,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 436.88617 ± 123.158
2025-09-14 00:57:04,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [452.99362, 495.8214, 531.40045, 470.70615, 434.08496, 531.45087, 545.919, 316.5857, 471.39804, 118.50156]
2025-09-14 00:57:04,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 94.0, 97.0, 95.0, 85.0, 98.0, 103.0, 72.0, 88.0, 23.0]
2025-09-14 00:57:04,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 19 minutes, 7 seconds)
2025-09-14 01:08:32,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:08:32,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:09:03,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 543.10449 ± 109.153
2025-09-14 01:09:03,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [653.45105, 477.08365, 422.67383, 512.75146, 421.54272, 523.3757, 761.075, 442.3904, 662.4312, 554.2702]
2025-09-14 01:09:03,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 86.0, 77.0, 98.0, 93.0, 96.0, 147.0, 82.0, 128.0, 104.0]
2025-09-14 01:09:03,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 8 minutes, 4 seconds)
2025-09-14 01:20:30,976 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:20:30,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:21:00,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 503.53925 ± 100.502
2025-09-14 01:21:00,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [564.0927, 539.0097, 513.85834, 466.57498, 517.9145, 473.12543, 629.0578, 647.2292, 291.9834, 392.54666]
2025-09-14 01:21:00,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 99.0, 106.0, 92.0, 110.0, 88.0, 121.0, 136.0, 54.0, 73.0]
2025-09-14 01:21:00,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 57 minutes)
2025-09-14 01:32:24,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:32:24,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:32:52,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 497.37311 ± 83.077
2025-09-14 01:32:52,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [618.0423, 495.59378, 422.89246, 478.29095, 371.25513, 422.19998, 485.36554, 653.93195, 482.77612, 543.3825]
2025-09-14 01:32:52,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 89.0, 86.0, 87.0, 78.0, 78.0, 92.0, 122.0, 92.0, 103.0]
2025-09-14 01:32:52,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 43 minutes, 37 seconds)
2025-09-14 01:44:22,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:44:22,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:44:51,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 490.48941 ± 217.446
2025-09-14 01:44:51,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [498.99048, 630.3536, 354.28632, 796.2029, 785.72046, 407.4486, 527.16187, 111.363815, 191.20233, 602.1634]
2025-09-14 01:44:51,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 115.0, 77.0, 149.0, 145.0, 78.0, 96.0, 22.0, 37.0, 133.0]
2025-09-14 01:44:51,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 32 minutes, 47 seconds)
2025-09-14 01:56:19,455 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:56:19,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:56:51,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 580.96423 ± 145.018
2025-09-14 01:56:51,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [513.484, 551.6444, 321.43668, 554.8452, 542.2658, 659.9354, 468.24182, 767.139, 564.2547, 866.3957]
2025-09-14 01:56:51,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 117.0, 59.0, 103.0, 101.0, 124.0, 85.0, 145.0, 108.0, 169.0]
2025-09-14 01:56:51,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (580.96) for latency ExtremeSparseL4U32
2025-09-14 01:56:52,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 22 minutes, 37 seconds)
2025-09-14 02:08:23,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:08:23,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:08:49,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 440.47754 ± 192.459
2025-09-14 02:08:49,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [383.02466, 573.4465, 607.93317, 89.75543, 88.96104, 471.7882, 579.42035, 601.03644, 605.937, 403.47287]
2025-09-14 02:08:49,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 116.0, 114.0, 18.0, 18.0, 99.0, 107.0, 109.0, 133.0, 87.0]
2025-09-14 02:08:49,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 10 minutes, 31 seconds)
2025-09-14 02:20:14,258 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:20:14,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:20:43,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 509.25946 ± 132.385
2025-09-14 02:20:43,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [592.25024, 770.3311, 574.6257, 328.42172, 443.58594, 400.71408, 408.1303, 409.96964, 672.8153, 491.7505]
2025-09-14 02:20:43,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 149.0, 116.0, 61.0, 84.0, 74.0, 73.0, 75.0, 140.0, 88.0]
2025-09-14 02:20:43,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 5 hours, 58 minutes, 20 seconds)
2025-09-14 02:32:13,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:32:13,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:32:42,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 514.84509 ± 184.907
2025-09-14 02:32:42,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [134.87971, 492.93506, 420.16818, 725.9867, 612.55585, 822.35693, 625.8212, 369.1831, 438.81677, 505.74753]
2025-09-14 02:32:42,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 103.0, 77.0, 135.0, 117.0, 154.0, 131.0, 73.0, 81.0, 93.0]
2025-09-14 02:32:42,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 47 minutes, 4 seconds)
2025-09-14 02:44:14,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:44:14,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:44:46,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 536.06635 ± 89.144
2025-09-14 02:44:46,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [652.7675, 461.15155, 629.31647, 620.78217, 508.1765, 460.10803, 665.9446, 431.17767, 462.4902, 468.74915]
2025-09-14 02:44:46,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 93.0, 127.0, 130.0, 98.0, 84.0, 126.0, 79.0, 84.0, 104.0]
2025-09-14 02:44:46,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 35 minutes, 32 seconds)
2025-09-14 02:56:19,049 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:56:19,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:56:48,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 529.89911 ± 116.327
2025-09-14 02:56:48,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [710.657, 470.04645, 295.31158, 578.74286, 577.01587, 562.07336, 495.18576, 381.13354, 634.0386, 594.7863]
2025-09-14 02:56:48,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 89.0, 54.0, 106.0, 108.0, 102.0, 91.0, 71.0, 122.0, 110.0]
2025-09-14 02:56:48,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 23 minutes, 42 seconds)
2025-09-14 03:08:13,699 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:08:13,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:08:46,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 553.25110 ± 134.055
2025-09-14 03:08:46,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [466.57074, 440.49432, 581.1117, 556.4985, 477.79932, 392.0489, 535.1863, 824.1643, 484.88623, 773.751]
2025-09-14 03:08:46,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 92.0, 125.0, 103.0, 88.0, 73.0, 99.0, 160.0, 108.0, 164.0]
2025-09-14 03:08:46,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 11 minutes, 46 seconds)
2025-09-14 03:20:14,023 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:20:14,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:20:43,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 516.91266 ± 90.710
2025-09-14 03:20:43,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [700.613, 385.12918, 505.31937, 490.6565, 438.58743, 611.65955, 496.14032, 493.33597, 439.72504, 607.9599]
2025-09-14 03:20:43,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 70.0, 93.0, 89.0, 79.0, 113.0, 93.0, 107.0, 83.0, 120.0]
2025-09-14 03:20:43,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 59 minutes, 56 seconds)
2025-09-14 03:32:15,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:32:15,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:32:39,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 430.06812 ± 243.876
2025-09-14 03:32:39,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [741.54816, 556.83966, 713.42554, 105.405304, 331.40048, 101.0586, 105.96743, 406.33853, 600.2728, 638.4243]
2025-09-14 03:32:39,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 104.0, 129.0, 21.0, 62.0, 20.0, 21.0, 74.0, 115.0, 130.0]
2025-09-14 03:32:39,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 47 minutes, 44 seconds)
2025-09-14 03:44:08,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:44:08,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:44:44,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 638.29541 ± 104.370
2025-09-14 03:44:44,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [915.34326, 615.44244, 584.8649, 607.0237, 503.47794, 603.23724, 565.78033, 652.2409, 643.01874, 692.5251]
2025-09-14 03:44:44,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 119.0, 120.0, 112.0, 94.0, 114.0, 104.0, 122.0, 131.0, 129.0]
2025-09-14 03:44:44,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (638.30) for latency ExtremeSparseL4U32
2025-09-14 03:44:44,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 35 minutes, 52 seconds)
2025-09-14 03:56:13,558 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:56:13,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:56:43,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 518.06775 ± 139.460
2025-09-14 03:56:43,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [485.20374, 695.3888, 612.06, 387.85892, 488.12796, 497.46896, 408.54764, 718.87366, 632.5726, 254.57518]
2025-09-14 03:56:43,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 138.0, 118.0, 72.0, 98.0, 94.0, 77.0, 146.0, 116.0, 46.0]
2025-09-14 03:56:43,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 23 minutes, 36 seconds)
2025-09-14 04:08:15,125 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:08:15,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:08:45,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 536.90985 ± 111.477
2025-09-14 04:08:45,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [655.4671, 551.5095, 545.42816, 402.56958, 630.6603, 445.8552, 422.65723, 757.7553, 423.02884, 534.1674]
2025-09-14 04:08:45,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 99.0, 105.0, 82.0, 119.0, 96.0, 83.0, 139.0, 76.0, 98.0]
2025-09-14 04:08:45,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 11 minutes, 56 seconds)
2025-09-14 04:20:15,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:20:15,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:20:33,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 302.35788 ± 180.663
2025-09-14 04:20:33,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [502.37366, 445.9937, 102.80049, 152.7209, 200.07741, 456.1909, 576.3125, 395.97406, 101.683525, 89.45145]
2025-09-14 04:20:33,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 80.0, 20.0, 30.0, 37.0, 93.0, 104.0, 86.0, 20.0, 18.0]
2025-09-14 04:20:33,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 59 minutes, 22 seconds)
2025-09-14 04:31:59,650 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:31:59,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:32:28,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 514.41608 ± 91.755
2025-09-14 04:32:28,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [501.12387, 583.6141, 561.2177, 462.102, 500.02036, 546.76544, 695.31244, 373.8723, 374.45197, 545.68]
2025-09-14 04:32:28,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 109.0, 102.0, 83.0, 93.0, 100.0, 129.0, 68.0, 68.0, 120.0]
2025-09-14 04:32:28,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 47 minutes, 17 seconds)
2025-09-14 04:43:57,511 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:43:57,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:44:26,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 544.16949 ± 74.056
2025-09-14 04:44:26,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [624.9335, 579.6476, 569.7163, 488.93893, 550.8248, 490.82147, 520.4512, 690.6617, 513.93616, 411.76367]
2025-09-14 04:44:26,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 108.0, 108.0, 90.0, 102.0, 90.0, 97.0, 127.0, 93.0, 74.0]
2025-09-14 04:44:27,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 34 minutes, 56 seconds)
2025-09-14 04:55:51,234 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:55:51,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:56:07,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 274.74487 ± 163.565
2025-09-14 04:56:07,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [113.31447, 100.4322, 130.20221, 106.23446, 134.24115, 378.79303, 490.5702, 487.91238, 459.34882, 346.3999]
2025-09-14 04:56:07,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 26.0, 21.0, 26.0, 85.0, 98.0, 106.0, 88.0, 65.0]
2025-09-14 04:56:07,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 21 minutes, 59 seconds)
2025-09-14 05:07:34,128 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:07:34,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:08:05,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 548.85223 ± 219.805
2025-09-14 05:08:05,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [698.0351, 947.0651, 633.9106, 586.66437, 405.07996, 634.03046, 435.39642, 683.72876, 363.18225, 101.42925]
2025-09-14 05:08:05,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 185.0, 114.0, 110.0, 74.0, 113.0, 94.0, 126.0, 79.0, 20.0]
2025-09-14 05:08:05,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 9 minutes, 51 seconds)
2025-09-14 05:19:38,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:19:38,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:20:09,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 547.96991 ± 98.842
2025-09-14 05:20:09,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [373.82602, 721.93365, 546.0189, 585.1312, 510.6264, 555.2103, 543.13684, 667.6385, 408.20844, 567.96924]
2025-09-14 05:20:09,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 151.0, 99.0, 113.0, 103.0, 100.0, 98.0, 124.0, 89.0, 104.0]
2025-09-14 05:20:09,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 58 minutes, 48 seconds)
2025-09-14 05:31:39,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:31:39,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:32:11,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 557.61633 ± 112.947
2025-09-14 05:32:11,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [701.8284, 409.55884, 491.05966, 549.8765, 632.98706, 388.44443, 659.14386, 444.56384, 704.77216, 593.92914]
2025-09-14 05:32:11,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 74.0, 104.0, 104.0, 119.0, 72.0, 129.0, 97.0, 138.0, 126.0]
2025-09-14 05:32:11,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 47 minutes, 13 seconds)
2025-09-14 05:43:36,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:43:36,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:44:04,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 463.42499 ± 125.134
2025-09-14 05:44:04,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [480.10803, 420.17847, 630.9563, 557.65106, 164.45518, 357.54977, 506.39938, 479.6662, 584.5612, 452.72388]
2025-09-14 05:44:04,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 75.0, 129.0, 120.0, 32.0, 80.0, 92.0, 87.0, 129.0, 82.0]
2025-09-14 05:44:04,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 35 minutes, 1 second)
2025-09-14 05:55:37,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:55:37,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:56:09,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 548.85248 ± 238.209
2025-09-14 05:56:09,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [663.6091, 506.59338, 927.2818, 154.91994, 128.5694, 568.6504, 773.74286, 673.2961, 620.0286, 471.83298]
2025-09-14 05:56:09,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 92.0, 185.0, 33.0, 25.0, 107.0, 153.0, 126.0, 117.0, 88.0]
2025-09-14 05:56:09,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 24 minutes, 3 seconds)
2025-09-14 06:07:30,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:07:30,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:08:00,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 505.60068 ± 195.003
2025-09-14 06:08:00,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [302.20593, 433.1226, 767.4719, 101.161736, 507.14117, 729.72955, 561.2756, 491.6945, 716.30963, 445.89395]
2025-09-14 06:08:00,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 80.0, 143.0, 20.0, 93.0, 156.0, 104.0, 90.0, 131.0, 95.0]
2025-09-14 06:08:00,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 11 minutes, 48 seconds)
2025-09-14 06:19:34,962 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:19:34,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:19:59,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 424.42828 ± 202.217
2025-09-14 06:19:59,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [408.9473, 485.1592, 630.468, 479.50958, 101.72455, 165.6969, 137.23502, 630.425, 607.39044, 597.72675]
2025-09-14 06:19:59,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 89.0, 120.0, 90.0, 20.0, 33.0, 27.0, 119.0, 117.0, 108.0]
2025-09-14 06:19:59,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 59 minutes, 38 seconds)
2025-09-14 06:31:23,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:31:23,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:31:56,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 576.37036 ± 134.517
2025-09-14 06:31:56,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [510.30896, 534.09985, 650.2471, 665.1687, 542.6127, 618.3223, 868.85706, 562.8311, 320.43042, 490.82486]
2025-09-14 06:31:56,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 96.0, 127.0, 124.0, 100.0, 109.0, 165.0, 105.0, 63.0, 95.0]
2025-09-14 06:31:56,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 47 minutes, 32 seconds)
2025-09-14 06:43:30,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:43:30,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:44:01,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 551.46973 ± 173.649
2025-09-14 06:44:01,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [594.4628, 100.48905, 720.00635, 491.1385, 651.9082, 449.48032, 728.5436, 525.5587, 641.8888, 611.2214]
2025-09-14 06:44:01,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 20.0, 133.0, 106.0, 120.0, 99.0, 134.0, 97.0, 118.0, 114.0]
2025-09-14 06:44:01,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 35 minutes, 55 seconds)
2025-09-14 06:55:29,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:55:29,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:56:02,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 595.85413 ± 143.743
2025-09-14 06:56:02,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [747.01807, 835.9195, 475.90707, 772.4219, 490.38898, 603.38495, 641.7862, 471.28278, 373.6946, 546.7375]
2025-09-14 06:56:02,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 158.0, 100.0, 141.0, 90.0, 112.0, 116.0, 87.0, 69.0, 100.0]
2025-09-14 06:56:02,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 23 minutes, 50 seconds)
2025-09-14 07:07:32,685 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:07:32,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:08:04,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 548.41034 ± 171.474
2025-09-14 07:08:04,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.102715, 453.76852, 524.8741, 730.0845, 633.35645, 623.746, 678.07074, 567.7553, 682.04926, 489.29572]
2025-09-14 07:08:04,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 102.0, 96.0, 136.0, 117.0, 133.0, 136.0, 105.0, 125.0, 91.0]
2025-09-14 07:08:04,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 12 minutes, 5 seconds)
2025-09-14 07:19:30,025 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:19:30,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:20:02,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 599.11633 ± 144.958
2025-09-14 07:20:02,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [475.72827, 844.77985, 419.23874, 497.1347, 591.4755, 615.83575, 605.31824, 502.06644, 882.3902, 557.1964]
2025-09-14 07:20:02,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 159.0, 78.0, 89.0, 109.0, 113.0, 110.0, 90.0, 175.0, 105.0]
2025-09-14 07:20:02,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 3 seconds)
2025-09-14 07:31:35,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:31:35,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:32:07,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 574.73010 ± 120.562
2025-09-14 07:32:07,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [498.21716, 640.7945, 747.57263, 449.884, 575.3652, 443.12585, 614.15326, 446.051, 803.1002, 529.0371]
2025-09-14 07:32:07,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 117.0, 144.0, 83.0, 108.0, 81.0, 110.0, 80.0, 151.0, 100.0]
2025-09-14 07:32:07,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 48 minutes, 8 seconds)
2025-09-14 07:43:30,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:43:30,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:44:08,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 649.32166 ± 270.740
2025-09-14 07:44:08,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1201.1869, 594.91626, 736.371, 714.927, 737.0683, 856.47516, 470.20883, 94.72183, 566.09625, 521.24524]
2025-09-14 07:44:08,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [227.0, 107.0, 139.0, 148.0, 135.0, 171.0, 99.0, 19.0, 103.0, 117.0]
2025-09-14 07:44:08,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (649.32) for latency ExtremeSparseL4U32
2025-09-14 07:44:08,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 36 minutes, 4 seconds)
2025-09-14 07:55:34,446 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:55:34,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:55:53,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 339.31393 ± 169.725
2025-09-14 07:55:53,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [432.327, 517.96155, 397.06155, 154.8423, 89.65442, 643.18335, 379.48502, 236.45769, 140.76805, 401.39853]
2025-09-14 07:55:53,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 93.0, 86.0, 30.0, 18.0, 118.0, 73.0, 49.0, 27.0, 73.0]
2025-09-14 07:55:53,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 23 minutes, 56 seconds)
2025-09-14 08:07:25,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:07:25,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:07:59,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 587.81067 ± 115.245
2025-09-14 08:07:59,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [399.17383, 541.25665, 644.9834, 697.4989, 590.41345, 760.80634, 584.30176, 651.01984, 630.58417, 378.06906]
2025-09-14 08:07:59,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 117.0, 122.0, 130.0, 109.0, 139.0, 108.0, 119.0, 117.0, 69.0]
2025-09-14 08:07:59,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 11 minutes, 58 seconds)
2025-09-14 08:19:33,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:19:33,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:19:52,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 328.29633 ± 223.942
2025-09-14 08:19:52,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [416.67126, 124.42776, 113.775085, 648.2725, 612.5977, 533.764, 511.59796, 94.690475, 125.88367, 101.28264]
2025-09-14 08:19:52,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 24.0, 22.0, 122.0, 116.0, 100.0, 97.0, 19.0, 25.0, 20.0]
2025-09-14 08:19:52,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1251 [DEBUG]: Training session finished
