2025-09-13 11:21:35,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-13 11:21:35,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-13 11:21:35,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1550d4aa9550>}
2025-09-13 11:21:35,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1111 [DEBUG]: using device: cuda
2025-09-13 11:21:35,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1133 [INFO]: Creating new trainer
2025-09-13 11:21:35,152 baseline-mbpac-noiseperc20-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-13 11:21:35,152 baseline-mbpac-noiseperc20-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 11:21:35,163 baseline-mbpac-noiseperc20-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-13 11:21:36,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1194 [DEBUG]: Starting training session...
2025-09-13 11:21:36,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 1/100
2025-09-13 11:33:31,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:33:31,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:33:43,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 193.95911 ± 64.405
2025-09-13 11:33:43,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [181.08696, 162.69225, 174.3834, 229.44565, 198.60664, 162.10077, 132.19016, 362.55475, 213.52754, 123.002884]
2025-09-13 11:33:43,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 35.0, 37.0, 46.0, 36.0, 35.0, 30.0, 73.0, 46.0, 28.0]
2025-09-13 11:33:43,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (193.96) for latency ExtremeSparseL4U32
2025-09-13 11:33:43,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 14 seconds)
2025-09-13 11:45:36,837 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:45:36,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:45:57,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 351.72543 ± 79.581
2025-09-13 11:45:57,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [299.11087, 430.17908, 343.27676, 487.3437, 304.9595, 443.00674, 264.52826, 410.61996, 271.3144, 262.91498]
2025-09-13 11:45:57,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 81.0, 65.0, 93.0, 62.0, 85.0, 56.0, 80.0, 51.0, 54.0]
2025-09-13 11:45:57,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (351.73) for latency ExtremeSparseL4U32
2025-09-13 11:45:57,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 53 minutes, 35 seconds)
2025-09-13 11:57:41,686 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:57:41,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:58:02,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 358.74948 ± 39.789
2025-09-13 11:58:02,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [312.72815, 324.0651, 350.98062, 369.4632, 347.88135, 388.022, 411.48746, 309.59494, 338.638, 434.6343]
2025-09-13 11:58:02,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 59.0, 63.0, 69.0, 66.0, 71.0, 75.0, 56.0, 69.0, 86.0]
2025-09-13 11:58:02,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (358.75) for latency ExtremeSparseL4U32
2025-09-13 11:58:02,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 37 minutes, 58 seconds)
2025-09-13 12:09:55,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:09:55,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:10:22,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 428.97720 ± 155.073
2025-09-13 12:10:22,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [453.50638, 847.6915, 349.21194, 395.42303, 367.78104, 342.3165, 357.3081, 273.22052, 363.9546, 539.35846]
2025-09-13 12:10:22,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 168.0, 74.0, 85.0, 76.0, 64.0, 78.0, 60.0, 80.0, 106.0]
2025-09-13 12:10:22,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (428.98) for latency ExtremeSparseL4U32
2025-09-13 12:10:22,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 30 minutes, 28 seconds)
2025-09-13 12:22:10,514 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:22:10,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:22:34,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 393.09970 ± 191.734
2025-09-13 12:22:34,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [374.26364, 386.82507, 879.2915, 408.10684, 364.46588, 501.5317, 276.24042, 266.62222, 96.21077, 377.43924]
2025-09-13 12:22:34,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 72.0, 173.0, 88.0, 82.0, 97.0, 53.0, 61.0, 19.0, 69.0]
2025-09-13 12:22:34,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 18 minutes, 23 seconds)
2025-09-13 12:34:16,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:34:16,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:34:36,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 338.18732 ± 89.878
2025-09-13 12:34:36,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [408.6466, 321.7767, 101.17073, 378.26843, 271.86533, 323.89563, 383.40347, 396.929, 381.83047, 414.08652]
2025-09-13 12:34:36,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 60.0, 20.0, 71.0, 59.0, 60.0, 73.0, 75.0, 71.0, 90.0]
2025-09-13 12:34:36,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 4 minutes, 24 seconds)
2025-09-13 12:46:21,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:46:21,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:46:52,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 481.60141 ± 100.104
2025-09-13 12:46:52,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [686.1072, 373.99316, 455.63693, 447.7335, 368.00757, 622.908, 483.2905, 378.32568, 519.5977, 480.41428]
2025-09-13 12:46:52,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 82.0, 87.0, 85.0, 81.0, 117.0, 91.0, 79.0, 112.0, 104.0]
2025-09-13 12:46:52,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (481.60) for latency ExtremeSparseL4U32
2025-09-13 12:46:52,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 52 minutes, 52 seconds)
2025-09-13 12:58:32,539 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:58:32,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:58:57,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 434.83649 ± 40.526
2025-09-13 12:58:57,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [414.59082, 378.82666, 484.6125, 350.2614, 459.67932, 452.5965, 438.04678, 476.32364, 458.9609, 434.4661]
2025-09-13 12:58:57,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 69.0, 105.0, 67.0, 85.0, 87.0, 83.0, 89.0, 86.0, 84.0]
2025-09-13 12:58:57,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 40 minutes, 58 seconds)
2025-09-13 13:10:40,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:10:40,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:11:05,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 417.61505 ± 176.551
2025-09-13 13:11:05,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [548.0949, 470.0699, 778.44116, 488.7202, 306.98636, 157.80664, 412.87115, 141.74857, 444.5394, 426.87207]
2025-09-13 13:11:05,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 99.0, 150.0, 104.0, 60.0, 30.0, 88.0, 28.0, 82.0, 77.0]
2025-09-13 13:11:05,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 24 minutes, 55 seconds)
2025-09-13 13:22:47,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:22:47,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:23:11,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 410.04428 ± 214.956
2025-09-13 13:23:11,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [521.3567, 456.95697, 620.6525, 510.82907, 103.3155, 129.4459, 106.74211, 652.1915, 335.90115, 663.0514]
2025-09-13 13:23:11,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 96.0, 117.0, 96.0, 20.0, 25.0, 21.0, 124.0, 66.0, 126.0]
2025-09-13 13:23:11,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 11 minutes, 1 second)
2025-09-13 13:34:55,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:34:55,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:35:14,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 313.82797 ± 206.141
2025-09-13 13:35:14,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [573.50256, 101.25168, 113.06234, 152.0885, 113.93283, 145.8806, 284.09824, 594.7016, 482.14066, 577.6206]
2025-09-13 13:35:14,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 20.0, 22.0, 30.0, 22.0, 29.0, 55.0, 112.0, 90.0, 109.0]
2025-09-13 13:35:14,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 59 minutes, 16 seconds)
2025-09-13 13:46:49,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:46:49,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:47:15,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 423.64780 ± 160.656
2025-09-13 13:47:15,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [563.7439, 425.63138, 517.9116, 433.8989, 599.7079, 514.33813, 113.98292, 124.74378, 486.9052, 455.61414]
2025-09-13 13:47:15,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 79.0, 97.0, 82.0, 114.0, 107.0, 22.0, 24.0, 92.0, 85.0]
2025-09-13 13:47:15,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 42 minutes, 37 seconds)
2025-09-13 13:58:57,469 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:58:57,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:59:25,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 489.80869 ± 131.136
2025-09-13 13:59:25,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [468.25885, 431.96582, 634.2716, 418.70273, 409.0737, 573.02167, 643.2409, 276.84772, 351.1984, 691.50604]
2025-09-13 13:59:25,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 80.0, 120.0, 77.0, 79.0, 108.0, 122.0, 54.0, 64.0, 131.0]
2025-09-13 13:59:25,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (489.81) for latency ExtremeSparseL4U32
2025-09-13 13:59:25,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 32 minutes, 1 second)
2025-09-13 14:11:07,095 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:11:07,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:11:37,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 527.00610 ± 97.087
2025-09-13 14:11:37,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [479.09183, 504.68558, 486.4746, 729.18243, 407.45877, 598.2919, 642.8412, 540.9618, 418.09717, 462.9756]
2025-09-13 14:11:37,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 95.0, 93.0, 141.0, 75.0, 113.0, 122.0, 102.0, 91.0, 99.0]
2025-09-13 14:11:37,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (527.01) for latency ExtremeSparseL4U32
2025-09-13 14:11:37,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 21 minutes, 22 seconds)
2025-09-13 14:23:13,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:23:13,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:23:41,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 487.09219 ± 183.436
2025-09-13 14:23:41,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.37807, 322.22665, 487.84015, 563.91125, 429.1379, 565.22644, 502.52698, 415.32846, 709.90955, 779.43604]
2025-09-13 14:23:41,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 65.0, 91.0, 106.0, 79.0, 107.0, 95.0, 77.0, 138.0, 153.0]
2025-09-13 14:23:41,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 8 minutes, 38 seconds)
2025-09-13 14:35:25,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:35:25,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:35:53,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 472.75391 ± 192.813
2025-09-13 14:35:53,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [726.96814, 524.4364, 489.60406, 673.44385, 543.5987, 139.89606, 107.28437, 442.88232, 491.38156, 588.04376]
2025-09-13 14:35:53,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 99.0, 107.0, 129.0, 115.0, 28.0, 21.0, 84.0, 91.0, 111.0]
2025-09-13 14:35:53,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 59 minutes, 6 seconds)
2025-09-13 14:47:32,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:47:32,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:48:07,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 570.14709 ± 105.507
2025-09-13 14:48:07,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [781.74365, 620.0178, 526.3671, 497.85715, 465.5686, 619.67847, 627.1528, 597.9892, 592.4418, 372.65363]
2025-09-13 14:48:07,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 134.0, 99.0, 109.0, 100.0, 115.0, 119.0, 112.0, 111.0, 72.0]
2025-09-13 14:48:07,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (570.15) for latency ExtremeSparseL4U32
2025-09-13 14:48:07,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 50 minutes, 26 seconds)
2025-09-13 14:59:52,465 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:59:52,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:00:21,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 505.70660 ± 78.664
2025-09-13 15:00:21,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [455.8799, 503.17822, 468.70532, 506.73447, 531.1517, 710.0282, 442.07642, 555.63556, 463.35486, 420.32123]
2025-09-13 15:00:21,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 94.0, 87.0, 94.0, 99.0, 138.0, 82.0, 110.0, 87.0, 78.0]
2025-09-13 15:00:21,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 39 minutes, 23 seconds)
2025-09-13 15:12:02,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:12:02,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:12:25,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 381.32794 ± 183.784
2025-09-13 15:12:25,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [634.3154, 392.9921, 117.12844, 129.84259, 535.565, 518.99286, 459.26938, 438.5941, 102.2608, 484.31863]
2025-09-13 15:12:25,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 73.0, 23.0, 25.0, 99.0, 111.0, 88.0, 81.0, 20.0, 91.0]
2025-09-13 15:12:25,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 24 minutes, 53 seconds)
2025-09-13 15:24:08,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:24:08,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:24:38,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 502.58405 ± 57.381
2025-09-13 15:24:38,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [473.88797, 463.7195, 521.0695, 595.7999, 468.9522, 441.50253, 494.0649, 623.8094, 471.64105, 471.3936]
2025-09-13 15:24:38,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 87.0, 112.0, 127.0, 87.0, 81.0, 93.0, 119.0, 87.0, 88.0]
2025-09-13 15:24:38,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 15 minutes, 2 seconds)
2025-09-13 15:36:16,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:36:16,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:36:52,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 606.62598 ± 140.046
2025-09-13 15:36:52,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [888.05585, 652.7817, 440.19797, 491.97476, 641.9084, 595.535, 543.35846, 796.7794, 427.51453, 588.1533]
2025-09-13 15:36:52,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 123.0, 82.0, 91.0, 132.0, 124.0, 101.0, 151.0, 78.0, 111.0]
2025-09-13 15:36:52,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (606.63) for latency ExtremeSparseL4U32
2025-09-13 15:36:52,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 3 minutes, 27 seconds)
2025-09-13 15:48:36,727 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:48:36,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:48:59,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 377.77203 ± 184.971
2025-09-13 15:48:59,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [452.02527, 506.59528, 555.218, 95.00636, 101.23147, 118.654, 414.5084, 515.35315, 580.1195, 439.00885]
2025-09-13 15:48:59,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 109.0, 105.0, 19.0, 20.0, 23.0, 79.0, 97.0, 112.0, 96.0]
2025-09-13 15:48:59,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 49 minutes, 31 seconds)
2025-09-13 16:00:36,789 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:00:36,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:01:11,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 560.83459 ± 109.095
2025-09-13 16:01:11,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [556.4129, 352.87164, 451.6801, 421.70337, 650.9274, 702.0109, 638.9182, 598.8818, 652.6054, 582.33374]
2025-09-13 16:01:11,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 79.0, 98.0, 92.0, 139.0, 133.0, 119.0, 126.0, 123.0, 109.0]
2025-09-13 16:01:11,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 36 minutes, 45 seconds)
2025-09-13 16:12:59,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:12:59,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:13:26,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 451.46454 ± 140.945
2025-09-13 16:13:26,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [568.6215, 426.47168, 579.6848, 425.0595, 364.83237, 620.6982, 435.6349, 96.41622, 487.08206, 510.14468]
2025-09-13 16:13:26,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 81.0, 110.0, 90.0, 70.0, 121.0, 83.0, 19.0, 92.0, 96.0]
2025-09-13 16:13:26,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 27 minutes, 20 seconds)
2025-09-13 16:25:07,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:25:07,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:25:43,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 601.23822 ± 145.987
2025-09-13 16:25:43,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [611.5172, 502.59372, 655.1402, 447.0726, 485.2039, 932.3502, 445.68832, 571.9569, 591.2163, 769.64307]
2025-09-13 16:25:43,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 108.0, 134.0, 82.0, 90.0, 196.0, 82.0, 108.0, 126.0, 150.0]
2025-09-13 16:25:43,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 16 minutes, 24 seconds)
2025-09-13 16:37:22,604 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:37:22,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:37:57,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 576.78235 ± 127.084
2025-09-13 16:37:57,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [604.48395, 609.323, 823.0681, 315.71805, 615.2715, 506.07867, 496.8443, 550.2157, 542.9729, 703.84753]
2025-09-13 16:37:57,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 116.0, 162.0, 59.0, 115.0, 95.0, 93.0, 103.0, 117.0, 134.0]
2025-09-13 16:37:57,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 3 minutes, 54 seconds)
2025-09-13 16:49:40,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:49:40,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:50:09,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 487.99814 ± 171.183
2025-09-13 16:50:09,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [589.86664, 378.187, 100.41176, 511.71927, 307.19922, 467.6886, 627.1115, 655.8593, 664.6818, 577.2565]
2025-09-13 16:50:09,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 81.0, 20.0, 96.0, 66.0, 87.0, 118.0, 125.0, 132.0, 122.0]
2025-09-13 16:50:09,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 53 minutes, 12 seconds)
2025-09-13 17:01:49,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:01:49,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:02:15,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 450.95636 ± 198.487
2025-09-13 17:02:15,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [603.84204, 136.48434, 101.69128, 357.0005, 736.41266, 532.37726, 478.85208, 676.5953, 421.19724, 465.11108]
2025-09-13 17:02:15,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 27.0, 20.0, 67.0, 156.0, 100.0, 88.0, 127.0, 78.0, 88.0]
2025-09-13 17:02:15,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 39 minutes, 27 seconds)
2025-09-13 17:13:52,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:13:52,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:14:26,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 593.47211 ± 105.989
2025-09-13 17:14:26,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [680.3255, 581.39, 470.35815, 740.7676, 757.4232, 579.85406, 541.8405, 439.64383, 489.58774, 653.53046]
2025-09-13 17:14:26,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 108.0, 87.0, 143.0, 146.0, 109.0, 115.0, 81.0, 91.0, 124.0]
2025-09-13 17:14:26,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 26 minutes, 12 seconds)
2025-09-13 17:26:06,171 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:26:06,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:26:38,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 544.54083 ± 188.201
2025-09-13 17:26:38,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [799.2754, 720.3275, 314.38956, 455.37317, 557.8488, 527.0397, 665.3701, 546.1633, 150.4592, 709.1617]
2025-09-13 17:26:38,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 138.0, 57.0, 97.0, 104.0, 109.0, 126.0, 102.0, 29.0, 132.0]
2025-09-13 17:26:38,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 12 minutes, 43 seconds)
2025-09-13 17:38:19,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:38:19,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:38:39,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 344.04742 ± 239.748
2025-09-13 17:38:39,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [564.362, 178.64984, 140.408, 128.07574, 141.60417, 166.56464, 161.04695, 745.87463, 556.3715, 657.51654]
2025-09-13 17:38:39,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 35.0, 27.0, 25.0, 27.0, 32.0, 31.0, 146.0, 106.0, 126.0]
2025-09-13 17:38:39,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 57 minutes, 49 seconds)
2025-09-13 17:50:31,042 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:50:31,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:50:58,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 459.13885 ± 237.236
2025-09-13 17:50:58,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [130.80634, 864.2387, 512.70264, 449.37372, 179.07994, 144.9588, 768.9787, 539.09796, 550.4841, 451.6681]
2025-09-13 17:50:58,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 179.0, 95.0, 84.0, 35.0, 28.0, 147.0, 100.0, 103.0, 83.0]
2025-09-13 17:50:58,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 46 minutes, 54 seconds)
2025-09-13 18:02:31,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:02:31,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:03:00,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 496.92017 ± 166.061
2025-09-13 18:03:00,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [532.01587, 776.02045, 582.01544, 111.57973, 358.34607, 521.9414, 588.68884, 590.67175, 463.33322, 444.58902]
2025-09-13 18:03:00,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 165.0, 109.0, 22.0, 68.0, 113.0, 112.0, 112.0, 86.0, 84.0]
2025-09-13 18:03:00,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 34 minutes, 5 seconds)
2025-09-13 18:14:42,772 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:14:42,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:15:22,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 646.77753 ± 224.377
2025-09-13 18:15:22,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [575.4109, 434.5378, 575.3151, 674.22723, 506.54156, 758.5474, 420.9037, 987.59247, 436.41043, 1098.2887]
2025-09-13 18:15:22,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 92.0, 109.0, 143.0, 95.0, 142.0, 93.0, 206.0, 81.0, 213.0]
2025-09-13 18:15:22,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (646.78) for latency ExtremeSparseL4U32
2025-09-13 18:15:22,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 24 minutes, 21 seconds)
2025-09-13 18:27:04,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:27:04,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:27:34,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 507.73517 ± 243.111
2025-09-13 18:27:34,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [703.4084, 139.53615, 434.71826, 974.3317, 114.586, 394.67462, 630.70996, 590.55585, 518.7207, 576.1099]
2025-09-13 18:27:34,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 28.0, 81.0, 197.0, 22.0, 72.0, 119.0, 110.0, 107.0, 111.0]
2025-09-13 18:27:34,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 12 minutes, 5 seconds)
2025-09-13 18:39:17,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:39:17,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:39:51,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 561.17480 ± 169.705
2025-09-13 18:39:51,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [580.98444, 713.46216, 551.30774, 521.68463, 559.14526, 593.06744, 106.67628, 663.1924, 772.4359, 549.7913]
2025-09-13 18:39:51,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 136.0, 116.0, 97.0, 117.0, 112.0, 21.0, 136.0, 164.0, 102.0]
2025-09-13 18:39:51,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 3 minutes, 21 seconds)
2025-09-13 18:51:35,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:51:35,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:52:04,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 481.73428 ± 138.961
2025-09-13 18:52:04,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [417.09225, 473.4096, 137.32018, 414.3218, 600.0392, 667.1806, 543.26166, 575.28986, 541.7568, 447.67096]
2025-09-13 18:52:04,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 91.0, 27.0, 84.0, 112.0, 126.0, 103.0, 110.0, 101.0, 97.0]
2025-09-13 18:52:04,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 49 minutes, 57 seconds)
2025-09-13 19:03:46,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:03:46,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:04:16,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 484.77701 ± 252.580
2025-09-13 19:04:16,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [431.998, 477.4067, 101.58694, 618.90814, 482.78177, 884.5434, 923.6344, 184.41653, 401.99835, 340.4963]
2025-09-13 19:04:16,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 91.0, 20.0, 129.0, 102.0, 170.0, 184.0, 35.0, 87.0, 70.0]
2025-09-13 19:04:16,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 39 minutes, 34 seconds)
2025-09-13 19:15:56,921 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:15:56,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:16:23,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 450.11246 ± 217.560
2025-09-13 19:16:23,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [618.1677, 495.84705, 598.2453, 644.2055, 616.16003, 642.2575, 129.46397, 84.438324, 168.26398, 504.07498]
2025-09-13 19:16:23,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 107.0, 122.0, 121.0, 113.0, 120.0, 25.0, 17.0, 32.0, 92.0]
2025-09-13 19:16:23,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 24 minutes, 27 seconds)
2025-09-13 19:28:00,622 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:28:00,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:28:37,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 619.31702 ± 133.613
2025-09-13 19:28:37,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [665.32574, 652.89606, 711.98663, 372.85886, 525.5932, 482.86685, 846.0783, 772.99347, 598.8735, 563.69727]
2025-09-13 19:28:37,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 123.0, 147.0, 71.0, 99.0, 89.0, 171.0, 146.0, 123.0, 118.0]
2025-09-13 19:28:37,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 12 minutes, 37 seconds)
2025-09-13 19:40:25,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:40:25,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:41:05,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 655.22009 ± 304.037
2025-09-13 19:41:05,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [837.91223, 94.88274, 840.7867, 964.53827, 717.92255, 602.47675, 581.3147, 100.388275, 882.1853, 929.79407]
2025-09-13 19:41:05,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 19.0, 167.0, 178.0, 145.0, 122.0, 112.0, 20.0, 164.0, 176.0]
2025-09-13 19:41:05,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (655.22) for latency ExtremeSparseL4U32
2025-09-13 19:41:05,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 2 minutes, 25 seconds)
2025-09-13 19:52:45,176 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:52:45,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:53:13,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 465.92072 ± 262.398
2025-09-13 19:53:13,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [671.1863, 135.6642, 302.7303, 893.2392, 104.350685, 243.04494, 832.9709, 577.1682, 453.76608, 445.0865]
2025-09-13 19:53:13,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 26.0, 56.0, 181.0, 21.0, 46.0, 163.0, 111.0, 88.0, 96.0]
2025-09-13 19:53:13,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 49 minutes, 21 seconds)
2025-09-13 20:04:52,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:04:52,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:05:19,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 427.79297 ± 287.082
2025-09-13 20:05:19,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [966.73944, 517.32117, 96.99527, 89.168465, 169.1927, 140.39552, 653.49066, 417.49567, 486.98148, 740.1495]
2025-09-13 20:05:19,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [200.0, 112.0, 19.0, 18.0, 33.0, 27.0, 136.0, 91.0, 105.0, 151.0]
2025-09-13 20:05:19,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 36 minutes, 2 seconds)
2025-09-13 20:16:58,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:16:58,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:17:36,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 623.85901 ± 146.056
2025-09-13 20:17:36,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [678.9845, 710.7326, 777.21674, 567.7544, 743.05493, 703.55383, 426.5959, 629.5106, 708.17206, 293.015]
2025-09-13 20:17:36,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 134.0, 157.0, 111.0, 159.0, 140.0, 93.0, 118.0, 134.0, 56.0]
2025-09-13 20:17:36,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 25 minutes, 39 seconds)
2025-09-13 20:29:27,609 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:29:27,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:30:09,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 690.69440 ± 183.480
2025-09-13 20:30:09,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [705.475, 790.805, 915.0239, 848.4564, 594.72833, 301.4, 906.83856, 716.7584, 490.25012, 637.2084]
2025-09-13 20:30:09,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 162.0, 169.0, 167.0, 127.0, 58.0, 170.0, 132.0, 105.0, 124.0]
2025-09-13 20:30:09,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (690.69) for latency ExtremeSparseL4U32
2025-09-13 20:30:09,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 16 minutes, 57 seconds)
2025-09-13 20:41:43,724 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:41:43,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:42:18,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 576.70245 ± 250.872
2025-09-13 20:42:18,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [676.94025, 678.17413, 677.99786, 90.09458, 95.1656, 639.1313, 606.25305, 870.45593, 715.0115, 717.79974]
2025-09-13 20:42:18,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 127.0, 126.0, 18.0, 19.0, 122.0, 127.0, 176.0, 135.0, 142.0]
2025-09-13 20:42:18,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 1 minute, 13 seconds)
2025-09-13 20:53:57,981 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:53:57,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:54:35,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 628.61475 ± 215.110
2025-09-13 20:54:35,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [727.82587, 655.99835, 817.64233, 590.6196, 624.1548, 898.3787, 423.9239, 100.026566, 760.8349, 686.74207]
2025-09-13 20:54:35,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 120.0, 156.0, 126.0, 134.0, 168.0, 91.0, 20.0, 143.0, 145.0]
2025-09-13 20:54:35,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 50 minutes, 29 seconds)
2025-09-13 21:06:17,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:06:17,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:06:49,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 532.87219 ± 270.813
2025-09-13 21:06:49,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [495.81516, 796.6201, 669.1837, 749.28345, 96.095955, 101.955505, 271.5804, 747.9631, 541.8272, 858.3974]
2025-09-13 21:06:49,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 164.0, 127.0, 144.0, 19.0, 20.0, 59.0, 142.0, 101.0, 163.0]
2025-09-13 21:06:49,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 39 minutes, 30 seconds)
2025-09-13 21:18:30,445 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:18:30,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:19:16,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 766.24182 ± 186.807
2025-09-13 21:19:16,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [769.85986, 623.09937, 1222.4718, 833.7155, 643.94836, 703.7501, 810.6158, 725.91296, 473.97217, 855.07184]
2025-09-13 21:19:16,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 130.0, 246.0, 157.0, 125.0, 131.0, 153.0, 140.0, 86.0, 167.0]
2025-09-13 21:19:16,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (766.24) for latency ExtremeSparseL4U32
2025-09-13 21:19:16,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 28 minutes, 50 seconds)
2025-09-13 21:31:09,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:31:09,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:31:47,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 598.55579 ± 148.003
2025-09-13 21:31:47,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [443.74557, 491.22675, 851.0285, 754.96106, 667.1318, 510.74265, 517.58844, 806.08704, 448.66287, 494.38303]
2025-09-13 21:31:47,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 91.0, 174.0, 140.0, 135.0, 110.0, 104.0, 166.0, 96.0, 106.0]
2025-09-13 21:31:47,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 16 minutes, 13 seconds)
2025-09-13 21:43:23,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:43:23,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:44:02,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 644.71747 ± 226.759
2025-09-13 21:44:02,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [844.9746, 331.68527, 454.2951, 503.30087, 445.48386, 593.13904, 588.1249, 662.6607, 964.1591, 1059.3512]
2025-09-13 21:44:02,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 59.0, 98.0, 111.0, 82.0, 108.0, 109.0, 142.0, 179.0, 214.0]
2025-09-13 21:44:02,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 4 minutes, 58 seconds)
2025-09-13 21:55:37,162 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:55:37,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:56:13,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 627.61353 ± 227.630
2025-09-13 21:56:13,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [855.7654, 640.3875, 343.67422, 597.3137, 662.0522, 860.6921, 763.55444, 106.56656, 621.3465, 824.7827]
2025-09-13 21:56:13,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 114.0, 77.0, 110.0, 126.0, 160.0, 151.0, 21.0, 118.0, 168.0]
2025-09-13 21:56:13,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 51 minutes, 44 seconds)
2025-09-13 22:07:54,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:07:54,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:08:32,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 624.57111 ± 127.343
2025-09-13 22:08:32,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [712.57715, 705.609, 498.0084, 769.4925, 403.5209, 736.0094, 715.6636, 431.41483, 594.01404, 679.40076]
2025-09-13 22:08:32,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 132.0, 90.0, 155.0, 74.0, 139.0, 132.0, 93.0, 121.0, 144.0]
2025-09-13 22:08:32,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 40 minutes, 7 seconds)
2025-09-13 22:20:13,645 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:20:13,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:20:49,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 609.40436 ± 263.048
2025-09-13 22:20:49,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [595.76276, 702.06433, 840.58716, 538.7724, 897.1708, 635.98895, 834.06354, 89.11172, 174.74045, 785.7817]
2025-09-13 22:20:49,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 130.0, 160.0, 104.0, 171.0, 115.0, 157.0, 18.0, 34.0, 144.0]
2025-09-13 22:20:49,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 26 minutes, 19 seconds)
2025-09-13 22:32:34,565 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:32:34,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:33:17,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 743.96295 ± 136.632
2025-09-13 22:33:17,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [580.3144, 593.96063, 613.4211, 925.5501, 981.56335, 687.57733, 647.5133, 748.8509, 872.0936, 788.78467]
2025-09-13 22:33:17,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 112.0, 129.0, 193.0, 185.0, 131.0, 118.0, 139.0, 161.0, 150.0]
2025-09-13 22:33:17,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 13 minutes, 33 seconds)
2025-09-13 22:44:54,750 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:44:54,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:45:35,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 666.70911 ± 212.586
2025-09-13 22:45:35,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [755.6962, 485.6236, 729.12616, 610.8016, 950.3736, 689.3337, 851.3817, 712.0949, 136.8106, 745.84875]
2025-09-13 22:45:35,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 104.0, 152.0, 115.0, 189.0, 147.0, 180.0, 138.0, 26.0, 152.0]
2025-09-13 22:45:35,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 1 minute, 35 seconds)
2025-09-13 22:57:15,157 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:57:15,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:57:51,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 623.64655 ± 234.475
2025-09-13 22:57:51,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [356.67188, 664.5935, 751.57306, 654.4969, 754.2585, 932.96344, 798.76495, 740.7637, 89.51777, 492.86194]
2025-09-13 22:57:51,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 125.0, 137.0, 138.0, 144.0, 182.0, 148.0, 156.0, 18.0, 91.0]
2025-09-13 22:57:51,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 50 minutes, 2 seconds)
2025-09-13 23:09:47,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:09:47,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:10:12,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 405.10754 ± 262.482
2025-09-13 23:10:12,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [395.07516, 695.7418, 835.4876, 594.72345, 96.18349, 155.60353, 102.879196, 106.8526, 457.21545, 611.31323]
2025-09-13 23:10:12,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 129.0, 156.0, 114.0, 19.0, 31.0, 20.0, 21.0, 99.0, 131.0]
2025-09-13 23:10:12,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 38 minutes, 2 seconds)
2025-09-13 23:21:37,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:21:37,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:22:14,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 619.55542 ± 356.833
2025-09-13 23:22:14,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [956.1851, 719.37836, 128.20883, 760.3245, 672.7233, 94.842445, 202.99803, 830.9883, 1245.0157, 584.8897]
2025-09-13 23:22:14,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 136.0, 25.0, 149.0, 134.0, 19.0, 40.0, 158.0, 236.0, 115.0]
2025-09-13 23:22:14,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 23 minutes, 34 seconds)
2025-09-13 23:34:03,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:34:03,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:34:37,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 555.05310 ± 304.448
2025-09-13 23:34:37,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [581.2384, 996.8421, 108.747826, 243.10298, 613.25934, 106.6794, 927.689, 464.38968, 796.8638, 711.71893]
2025-09-13 23:34:37,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 193.0, 21.0, 48.0, 115.0, 21.0, 173.0, 86.0, 148.0, 137.0]
2025-09-13 23:34:37,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 10 minutes, 37 seconds)
2025-09-13 23:46:19,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:46:19,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:47:09,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 845.00256 ± 274.453
2025-09-13 23:47:09,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [714.8222, 748.6475, 910.4306, 522.8087, 1090.2827, 1051.8912, 538.82074, 1326.5951, 1072.7571, 472.96988]
2025-09-13 23:47:09,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 136.0, 186.0, 96.0, 213.0, 200.0, 115.0, 252.0, 218.0, 101.0]
2025-09-13 23:47:09,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (845.00) for latency ExtremeSparseL4U32
2025-09-13 23:47:09,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 17 seconds)
2025-09-13 23:58:54,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:58:54,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:59:31,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 628.53796 ± 205.347
2025-09-13 23:59:31,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [695.0218, 668.52734, 593.6492, 528.0073, 143.57133, 546.7432, 879.66064, 547.962, 805.6739, 876.56323]
2025-09-13 23:59:31,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 124.0, 129.0, 115.0, 28.0, 102.0, 167.0, 117.0, 146.0, 163.0]
2025-09-13 23:59:31,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 48 minutes, 34 seconds)
2025-09-14 00:11:03,654 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:11:03,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:11:35,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 546.73395 ± 272.859
2025-09-14 00:11:35,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [380.10272, 804.0109, 358.05655, 598.9321, 1037.465, 722.2308, 561.7581, 705.2398, 157.36346, 142.18027]
2025-09-14 00:11:35,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 158.0, 65.0, 118.0, 202.0, 130.0, 103.0, 144.0, 31.0, 29.0]
2025-09-14 00:11:35,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 34 minutes, 17 seconds)
2025-09-14 00:23:16,254 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:23:16,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:24:05,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 840.51935 ± 279.727
2025-09-14 00:24:05,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [533.97253, 417.11972, 1188.5789, 550.40063, 807.9936, 1215.8895, 913.9991, 1119.3618, 630.7759, 1027.1018]
2025-09-14 00:24:05,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 93.0, 219.0, 104.0, 165.0, 231.0, 176.0, 221.0, 134.0, 198.0]
2025-09-14 00:24:05,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 25 minutes, 25 seconds)
2025-09-14 00:36:00,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:36:00,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:36:40,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 676.25433 ± 265.409
2025-09-14 00:36:40,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1106.9263, 809.43243, 622.4809, 568.8891, 873.8243, 709.687, 101.77013, 379.86783, 776.9735, 812.6919]
2025-09-14 00:36:40,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [205.0, 153.0, 120.0, 123.0, 164.0, 133.0, 20.0, 74.0, 144.0, 159.0]
2025-09-14 00:36:40,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 14 minutes, 23 seconds)
2025-09-14 00:48:07,271 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:48:07,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:48:50,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 719.61713 ± 168.060
2025-09-14 00:48:50,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [796.86017, 591.8767, 1096.3452, 624.8107, 479.5471, 765.9529, 840.2351, 778.27734, 548.14087, 674.1255]
2025-09-14 00:48:50,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 122.0, 211.0, 120.0, 86.0, 146.0, 154.0, 146.0, 105.0, 128.0]
2025-09-14 00:48:50,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 59 minutes, 21 seconds)
2025-09-14 01:00:36,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:00:36,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:01:09,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 527.03986 ± 262.517
2025-09-14 01:01:09,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [495.96863, 473.92856, 797.31476, 618.3175, 762.631, 407.0702, 118.2073, 89.87159, 562.20337, 944.8859]
2025-09-14 01:01:09,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 103.0, 149.0, 132.0, 163.0, 76.0, 23.0, 18.0, 120.0, 195.0]
2025-09-14 01:01:09,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 46 minutes, 46 seconds)
2025-09-14 01:13:00,748 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:13:00,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:13:40,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 687.58234 ± 254.577
2025-09-14 01:13:40,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [356.18695, 997.4233, 845.63385, 763.4975, 641.98737, 701.61066, 893.9413, 611.5611, 136.72362, 927.25745]
2025-09-14 01:13:40,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 187.0, 178.0, 144.0, 119.0, 147.0, 166.0, 112.0, 26.0, 171.0]
2025-09-14 01:13:40,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 37 minutes, 20 seconds)
2025-09-14 01:25:05,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:25:05,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:25:41,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 578.75305 ± 279.351
2025-09-14 01:25:41,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [675.9064, 780.90875, 291.33795, 619.5861, 112.9148, 118.10165, 691.15607, 909.5678, 763.8126, 824.23895]
2025-09-14 01:25:41,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 148.0, 59.0, 132.0, 22.0, 23.0, 130.0, 188.0, 159.0, 169.0]
2025-09-14 01:25:41,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 21 minutes, 54 seconds)
2025-09-14 01:37:21,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:37:21,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:37:59,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 603.24182 ± 342.976
2025-09-14 01:37:59,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [703.27844, 1029.7025, 540.73584, 1242.8531, 571.92474, 148.34462, 102.52421, 315.50574, 776.4831, 601.0663]
2025-09-14 01:37:59,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 195.0, 101.0, 256.0, 119.0, 29.0, 20.0, 69.0, 145.0, 126.0]
2025-09-14 01:37:59,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 7 minutes, 52 seconds)
2025-09-14 01:49:43,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:49:43,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:50:33,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 833.99854 ± 201.177
2025-09-14 01:50:33,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [691.30444, 1263.7076, 1064.5941, 802.0057, 697.47314, 695.7762, 950.3625, 818.5186, 527.47046, 828.7728]
2025-09-14 01:50:33,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 243.0, 212.0, 169.0, 127.0, 146.0, 174.0, 151.0, 97.0, 155.0]
2025-09-14 01:50:33,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 58 minutes)
2025-09-14 02:02:16,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:02:16,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:03:02,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 788.77112 ± 189.576
2025-09-14 02:03:02,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [721.46234, 978.0382, 846.57135, 983.25433, 447.3206, 467.1879, 717.86523, 828.8777, 939.53955, 957.5939]
2025-09-14 02:03:02,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 182.0, 158.0, 189.0, 84.0, 88.0, 131.0, 175.0, 174.0, 181.0]
2025-09-14 02:03:02,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 46 minutes, 33 seconds)
2025-09-14 02:15:01,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:15:01,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:15:33,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 546.06659 ± 303.630
2025-09-14 02:15:33,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [566.13055, 645.4067, 550.11084, 891.1796, 456.4745, 1031.1509, 188.66997, 117.074356, 164.88733, 849.5815]
2025-09-14 02:15:33,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 138.0, 107.0, 161.0, 99.0, 187.0, 37.0, 23.0, 34.0, 149.0]
2025-09-14 02:15:33,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 34 minutes, 10 seconds)
2025-09-14 02:27:04,293 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:27:04,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:27:49,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 762.92072 ± 259.527
2025-09-14 02:27:49,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [536.42255, 937.5269, 839.75183, 714.8651, 1306.3967, 447.32404, 367.80804, 724.0022, 829.2753, 925.8342]
2025-09-14 02:27:49,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 178.0, 170.0, 152.0, 240.0, 98.0, 68.0, 134.0, 166.0, 186.0]
2025-09-14 02:27:49,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 23 minutes, 4 seconds)
2025-09-14 02:39:38,515 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:39:38,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:40:20,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 728.72412 ± 193.811
2025-09-14 02:40:20,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [711.6171, 969.87415, 558.86835, 489.53433, 455.91714, 1061.9648, 652.46454, 664.9921, 880.6535, 841.35535]
2025-09-14 02:40:20,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 179.0, 121.0, 92.0, 81.0, 204.0, 128.0, 122.0, 163.0, 158.0]
2025-09-14 02:40:20,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 11 minutes, 48 seconds)
2025-09-14 02:51:52,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:51:52,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:52:41,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 810.99701 ± 198.335
2025-09-14 02:52:41,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [692.0679, 740.4562, 726.468, 826.15497, 746.4419, 810.3625, 1227.385, 1087.5977, 776.10724, 476.92874]
2025-09-14 02:52:41,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 136.0, 150.0, 164.0, 145.0, 154.0, 222.0, 208.0, 156.0, 103.0]
2025-09-14 02:52:41,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 58 minutes, 12 seconds)
2025-09-14 03:04:24,084 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:04:24,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:05:10,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 787.42810 ± 302.288
2025-09-14 03:05:10,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1428.5537, 967.0244, 803.0091, 973.56586, 275.73422, 853.3585, 503.6311, 512.48315, 859.4535, 697.4669]
2025-09-14 03:05:10,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [265.0, 182.0, 153.0, 184.0, 52.0, 156.0, 107.0, 106.0, 167.0, 142.0]
2025-09-14 03:05:10,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 45 minutes, 48 seconds)
2025-09-14 03:16:49,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:16:49,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:17:33,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 760.23529 ± 309.439
2025-09-14 03:17:33,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [966.0076, 1065.8833, 775.0336, 923.8352, 1166.2739, 860.91095, 633.2691, 767.14343, 111.98182, 332.01376]
2025-09-14 03:17:33,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 191.0, 147.0, 171.0, 223.0, 163.0, 130.0, 158.0, 22.0, 63.0]
2025-09-14 03:17:33,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 32 minutes, 45 seconds)
2025-09-14 03:29:10,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:29:10,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:29:47,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 600.58246 ± 435.599
2025-09-14 03:29:47,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [108.53018, 930.32385, 683.51373, 1522.2955, 143.29103, 308.9094, 1004.714, 566.8925, 617.657, 119.697586]
2025-09-14 03:29:47,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 167.0, 130.0, 298.0, 28.0, 63.0, 194.0, 122.0, 124.0, 23.0]
2025-09-14 03:29:47,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 20 minutes, 14 seconds)
2025-09-14 03:41:31,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:41:31,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:42:22,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 884.81122 ± 486.281
2025-09-14 03:42:22,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1200.6228, 987.7303, 800.6972, 1968.5548, 980.88, 554.14734, 1155.7288, 123.60981, 363.51413, 712.6273]
2025-09-14 03:42:22,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 206.0, 151.0, 388.0, 181.0, 115.0, 225.0, 24.0, 68.0, 129.0]
2025-09-14 03:42:22,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (884.81) for latency ExtremeSparseL4U32
2025-09-14 03:42:23,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 8 minutes, 8 seconds)
2025-09-14 03:54:07,715 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:54:07,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:54:46,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 663.94202 ± 234.648
2025-09-14 03:54:46,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [764.902, 916.13324, 586.7697, 633.1592, 945.54346, 114.65273, 425.75134, 671.5132, 785.2131, 795.78253]
2025-09-14 03:54:46,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 171.0, 107.0, 114.0, 182.0, 22.0, 83.0, 124.0, 157.0, 153.0]
2025-09-14 03:54:46,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 55 minutes, 54 seconds)
2025-09-14 04:06:35,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:06:35,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:07:04,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 502.86749 ± 396.753
2025-09-14 04:07:04,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [747.4546, 1276.8993, 95.09097, 143.52446, 155.94273, 703.9808, 125.36071, 160.04308, 739.34393, 881.03455]
2025-09-14 04:07:04,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 247.0, 19.0, 28.0, 30.0, 132.0, 25.0, 31.0, 137.0, 162.0]
2025-09-14 04:07:04,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 42 minutes, 50 seconds)
2025-09-14 04:18:46,034 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:18:46,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:19:30,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 742.44318 ± 480.104
2025-09-14 04:19:30,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1068.6771, 816.4781, 732.788, 1745.8005, 840.8889, 89.40797, 252.3132, 1017.7828, 84.44299, 775.8519]
2025-09-14 04:19:30,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 148.0, 155.0, 337.0, 156.0, 18.0, 47.0, 215.0, 17.0, 148.0]
2025-09-14 04:19:30,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 30 minutes, 38 seconds)
2025-09-14 04:31:02,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:31:02,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:31:54,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 902.66016 ± 224.854
2025-09-14 04:31:54,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1351.7476, 856.3451, 529.42725, 933.0882, 673.1272, 1005.81335, 830.6553, 1062.6084, 708.57074, 1075.2188]
2025-09-14 04:31:54,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [267.0, 168.0, 99.0, 166.0, 122.0, 183.0, 151.0, 188.0, 129.0, 210.0]
2025-09-14 04:31:54,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (902.66) for latency ExtremeSparseL4U32
2025-09-14 04:31:54,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 18 minutes, 47 seconds)
2025-09-14 04:43:37,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:43:37,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:44:36,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 992.36292 ± 437.726
2025-09-14 04:44:36,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1292.2017, 876.1825, 656.6213, 848.213, 735.53125, 1057.5299, 983.66327, 258.7957, 2003.7284, 1211.1626]
2025-09-14 04:44:36,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 172.0, 128.0, 152.0, 133.0, 222.0, 195.0, 47.0, 391.0, 227.0]
2025-09-14 04:44:36,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (992.36) for latency ExtremeSparseL4U32
2025-09-14 04:44:36,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 6 minutes, 40 seconds)
2025-09-14 04:56:19,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:56:19,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:57:04,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 795.29987 ± 347.265
2025-09-14 04:57:04,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [903.821, 1350.3707, 548.79517, 588.1909, 425.28485, 743.88477, 1452.2896, 356.5755, 695.20703, 888.5793]
2025-09-14 04:57:04,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 256.0, 118.0, 112.0, 77.0, 140.0, 276.0, 68.0, 144.0, 160.0]
2025-09-14 04:57:04,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 54 minutes, 28 seconds)
2025-09-14 05:08:50,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:08:50,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:09:29,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 666.50922 ± 586.176
2025-09-14 05:09:29,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [742.0389, 2108.5742, 106.56366, 688.9587, 705.65106, 822.8962, 130.95634, 167.90788, 113.43132, 1078.1143]
2025-09-14 05:09:29,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 387.0, 21.0, 143.0, 148.0, 168.0, 25.0, 32.0, 22.0, 214.0]
2025-09-14 05:09:29,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 42 minutes, 18 seconds)
2025-09-14 05:21:12,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:21:12,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:22:03,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 880.07275 ± 263.869
2025-09-14 05:22:03,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [471.78055, 1210.8477, 1013.19305, 1081.966, 877.86835, 591.1634, 779.0527, 600.75244, 858.6942, 1315.4092]
2025-09-14 05:22:03,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 225.0, 210.0, 192.0, 186.0, 115.0, 139.0, 129.0, 152.0, 241.0]
2025-09-14 05:22:03,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 30 minutes, 7 seconds)
2025-09-14 05:33:36,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:33:36,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:34:15,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 635.92932 ± 363.382
2025-09-14 05:34:15,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [974.1601, 1192.6616, 625.6702, 1025.2716, 747.65564, 839.3566, 140.17322, 96.47778, 371.6484, 346.2177]
2025-09-14 05:34:15,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 245.0, 136.0, 209.0, 148.0, 174.0, 27.0, 19.0, 70.0, 63.0]
2025-09-14 05:34:15,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 17 minutes, 10 seconds)
2025-09-14 05:45:58,382 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:45:58,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:46:40,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 762.87970 ± 382.372
2025-09-14 05:46:40,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1311.6724, 932.15326, 1041.2013, 101.58365, 1152.5887, 903.37384, 200.22754, 406.00064, 805.10486, 774.8901]
2025-09-14 05:46:40,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 171.0, 187.0, 20.0, 207.0, 166.0, 39.0, 74.0, 148.0, 159.0]
2025-09-14 05:46:41,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 4 minutes, 8 seconds)
2025-09-14 05:58:18,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:58:18,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:59:05,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 801.75903 ± 299.332
2025-09-14 05:59:05,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [574.04724, 766.5743, 784.1999, 1444.8363, 1142.938, 867.77014, 334.5893, 567.108, 639.9639, 895.56335]
2025-09-14 05:59:05,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 138.0, 146.0, 269.0, 205.0, 185.0, 63.0, 123.0, 131.0, 164.0]
2025-09-14 05:59:05,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 51 minutes, 37 seconds)
2025-09-14 06:10:48,887 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:10:48,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:11:31,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 742.70386 ± 294.534
2025-09-14 06:11:31,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [550.82654, 1131.5228, 1014.6951, 517.4183, 153.34935, 914.27356, 838.06866, 993.70325, 869.98444, 443.19647]
2025-09-14 06:11:31,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 209.0, 195.0, 98.0, 32.0, 169.0, 168.0, 181.0, 162.0, 83.0]
2025-09-14 06:11:31,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 39 minutes, 15 seconds)
2025-09-14 06:23:10,482 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:23:10,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:24:11,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1043.06152 ± 339.158
2025-09-14 06:24:11,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1705.7723, 990.0759, 601.92706, 877.75275, 1503.9364, 609.9614, 1277.5742, 983.8499, 882.50073, 997.2644]
2025-09-14 06:24:11,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [317.0, 205.0, 116.0, 161.0, 295.0, 117.0, 256.0, 186.0, 160.0, 197.0]
2025-09-14 06:24:11,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1043.06) for latency ExtremeSparseL4U32
2025-09-14 06:24:11,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 26 minutes, 58 seconds)
2025-09-14 06:35:53,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:35:53,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:36:53,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1024.22375 ± 450.320
2025-09-14 06:36:53,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1952.5687, 351.40836, 552.1626, 840.3694, 1198.8428, 585.6078, 1214.9812, 1080.3396, 1001.501, 1464.4562]
2025-09-14 06:36:53,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [386.0, 65.0, 107.0, 175.0, 251.0, 104.0, 223.0, 183.0, 188.0, 284.0]
2025-09-14 06:36:53,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 15 minutes, 9 seconds)
2025-09-14 06:48:36,652 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:48:36,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:49:04,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 452.31781 ± 291.869
2025-09-14 06:49:04,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [678.81104, 454.8751, 124.22212, 155.33105, 148.09097, 89.05587, 876.443, 460.10754, 788.1637, 748.0783]
2025-09-14 06:49:04,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 99.0, 25.0, 30.0, 29.0, 18.0, 161.0, 102.0, 165.0, 155.0]
2025-09-14 06:49:04,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 2 minutes, 23 seconds)
2025-09-14 07:00:57,786 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:00:57,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:01:45,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 786.01752 ± 468.706
2025-09-14 07:01:45,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [295.12967, 463.39438, 1204.5637, 1383.4077, 983.5179, 487.07327, 106.95342, 1486.8584, 1064.8772, 384.39975]
2025-09-14 07:01:45,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 99.0, 227.0, 268.0, 194.0, 97.0, 21.0, 284.0, 208.0, 84.0]
2025-09-14 07:01:45,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 50 minutes, 7 seconds)
2025-09-14 07:13:25,335 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:13:25,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:14:16,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 890.77063 ± 420.323
2025-09-14 07:14:16,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1135.2155, 548.5725, 1125.3693, 1276.1133, 1107.5706, 1614.0023, 894.02264, 602.5789, 160.34303, 443.9176]
2025-09-14 07:14:16,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [238.0, 118.0, 204.0, 231.0, 218.0, 286.0, 167.0, 114.0, 33.0, 94.0]
2025-09-14 07:14:16,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 37 minutes, 38 seconds)
2025-09-14 07:25:48,480 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:25:48,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:26:42,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 944.02380 ± 201.944
2025-09-14 07:26:42,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [846.63116, 1442.154, 866.47705, 688.18506, 935.40936, 977.71326, 840.1901, 992.9054, 1103.6735, 746.89984]
2025-09-14 07:26:42,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 270.0, 168.0, 125.0, 168.0, 178.0, 154.0, 197.0, 201.0, 137.0]
2025-09-14 07:26:42,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 25 minutes)
2025-09-14 07:38:38,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:38:38,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:39:21,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 750.65137 ± 432.165
2025-09-14 07:39:21,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1610.7698, 644.5151, 89.149086, 176.99771, 436.05832, 1050.3317, 637.5287, 1067.5173, 845.70825, 947.93726]
2025-09-14 07:39:21,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [315.0, 124.0, 18.0, 34.0, 81.0, 197.0, 119.0, 199.0, 153.0, 174.0]
2025-09-14 07:39:21,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 29 seconds)
2025-09-14 07:51:04,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:51:04,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:51:52,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 820.92859 ± 395.967
2025-09-14 07:51:52,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1069.6945, 1146.4232, 393.20294, 790.16034, 149.20035, 603.22314, 1079.2698, 471.23206, 975.5956, 1531.2844]
2025-09-14 07:51:52,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 223.0, 84.0, 144.0, 29.0, 112.0, 199.0, 101.0, 173.0, 294.0]
2025-09-14 07:51:52,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1251 [DEBUG]: Training session finished
