2025-09-16 12:22:49,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.050-delay_15
2025-09-16 12:22:49,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.050-delay_15
2025-09-16 12:22:49,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x14c7eb994890>}
2025-09-16 12:22:49,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:22:49,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:22:49,077 baseline-bpql-noisepromille50-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=631, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:22:49,077 baseline-bpql-noisepromille50-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:22:52,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:22:52,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:24:42,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:24:43,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 289.17703 ± 44.044
2025-09-16 12:24:43,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [313.23804, 247.50209, 253.70772, 306.3252, 244.79616, 284.7928, 295.8185, 250.71382, 398.57413, 296.30182]
2025-09-16 12:24:43,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 48.0, 49.0, 59.0, 47.0, 54.0, 56.0, 48.0, 79.0, 56.0]
2025-09-16 12:24:43,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (289.18) for latency 15
2025-09-16 12:24:43,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 3 minutes, 7 seconds)
2025-09-16 12:26:42,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:26:43,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 425.16681 ± 82.482
2025-09-16 12:26:43,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [354.628, 431.90894, 519.4922, 272.58456, 393.41022, 504.28076, 420.731, 374.67538, 569.3368, 410.62006]
2025-09-16 12:26:43,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 93.0, 101.0, 51.0, 74.0, 102.0, 80.0, 71.0, 114.0, 77.0]
2025-09-16 12:26:43,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (425.17) for latency 15
2025-09-16 12:26:43,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 9 minutes, 6 seconds)
2025-09-16 12:28:41,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:28:42,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 365.14349 ± 111.364
2025-09-16 12:28:42,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [524.05194, 399.38174, 451.799, 515.64825, 323.07993, 229.10313, 373.92935, 154.39558, 366.52304, 313.52307]
2025-09-16 12:28:42,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 76.0, 85.0, 99.0, 66.0, 44.0, 74.0, 30.0, 77.0, 65.0]
2025-09-16 12:28:42,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 9 minutes, 4 seconds)
2025-09-16 12:30:41,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:30:42,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 381.78616 ± 79.966
2025-09-16 12:30:42,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [307.02753, 271.7732, 476.67477, 476.1168, 325.16165, 276.98752, 394.9277, 430.7769, 488.75485, 369.6606]
2025-09-16 12:30:42,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 54.0, 95.0, 96.0, 65.0, 55.0, 77.0, 93.0, 99.0, 75.0]
2025-09-16 12:30:42,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 8 minutes, 15 seconds)
2025-09-16 12:32:42,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:32:43,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 419.66504 ± 46.318
2025-09-16 12:32:43,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [396.42682, 388.7399, 344.23557, 502.9126, 455.5244, 411.907, 452.00494, 471.08087, 378.06818, 395.75034]
2025-09-16 12:32:43,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 86.0, 64.0, 104.0, 84.0, 82.0, 85.0, 94.0, 72.0, 77.0]
2025-09-16 12:32:43,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 7 minutes, 25 seconds)
2025-09-16 12:34:42,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:34:44,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 479.72394 ± 78.451
2025-09-16 12:34:44,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [507.49942, 444.39578, 596.94904, 402.61597, 561.34686, 386.35593, 414.669, 575.83606, 384.34482, 523.22675]
2025-09-16 12:34:44,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 80.0, 123.0, 87.0, 117.0, 71.0, 89.0, 108.0, 73.0, 101.0]
2025-09-16 12:34:44,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (479.72) for latency 15
2025-09-16 12:34:44,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 8 minutes, 22 seconds)
2025-09-16 12:36:44,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:36:45,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 472.40387 ± 119.673
2025-09-16 12:36:45,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [698.15063, 462.92255, 389.37045, 670.6274, 388.59268, 522.6957, 340.9939, 496.7104, 363.6231, 390.35156]
2025-09-16 12:36:45,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 87.0, 73.0, 132.0, 74.0, 109.0, 73.0, 96.0, 70.0, 73.0]
2025-09-16 12:36:45,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 6 minutes, 40 seconds)
2025-09-16 12:38:46,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:38:47,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 431.96518 ± 67.096
2025-09-16 12:38:47,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [402.7924, 530.9722, 425.74527, 564.5971, 455.72827, 333.82352, 394.5837, 421.91623, 428.43414, 361.05902]
2025-09-16 12:38:47,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 113.0, 81.0, 116.0, 86.0, 72.0, 80.0, 78.0, 80.0, 78.0]
2025-09-16 12:38:47,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 5 minutes, 24 seconds)
2025-09-16 12:40:47,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:40:48,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 420.92035 ± 77.478
2025-09-16 12:40:48,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [545.32965, 394.65823, 511.5383, 356.35416, 296.75662, 386.81635, 508.364, 464.23837, 391.32858, 353.81912]
2025-09-16 12:40:48,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 76.0, 100.0, 68.0, 58.0, 74.0, 97.0, 89.0, 84.0, 76.0]
2025-09-16 12:40:48,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 3 minutes, 51 seconds)
2025-09-16 12:42:48,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:42:49,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 476.57169 ± 174.125
2025-09-16 12:42:49,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [822.6627, 348.61276, 664.0394, 418.73816, 577.2179, 460.53604, 515.8767, 349.27322, 447.76392, 160.99602]
2025-09-16 12:42:49,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 66.0, 129.0, 77.0, 121.0, 85.0, 94.0, 68.0, 86.0, 31.0]
2025-09-16 12:42:49,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 1 minute, 40 seconds)
2025-09-16 12:44:50,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:44:51,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 478.60269 ± 103.391
2025-09-16 12:44:51,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [496.35452, 429.188, 688.7479, 385.91107, 519.121, 451.20963, 460.80304, 329.98813, 622.612, 402.09198]
2025-09-16 12:44:51,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 80.0, 141.0, 74.0, 96.0, 85.0, 85.0, 62.0, 117.0, 74.0]
2025-09-16 12:44:51,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 10 seconds)
2025-09-16 12:46:50,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:46:52,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 497.39630 ± 65.270
2025-09-16 12:46:52,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [593.5873, 470.84244, 499.60278, 386.93835, 546.9044, 574.92224, 422.2637, 452.39526, 556.8647, 469.64163]
2025-09-16 12:46:52,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 91.0, 93.0, 74.0, 116.0, 109.0, 79.0, 97.0, 105.0, 88.0]
2025-09-16 12:46:52,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (497.40) for latency 15
2025-09-16 12:46:52,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 57 minutes, 54 seconds)
2025-09-16 12:48:52,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:48:53,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 433.87201 ± 65.546
2025-09-16 12:48:53,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [455.5933, 431.38672, 501.98132, 405.41754, 519.54565, 384.72003, 439.24573, 526.0204, 346.03, 328.77936]
2025-09-16 12:48:53,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 82.0, 98.0, 76.0, 98.0, 84.0, 82.0, 99.0, 66.0, 62.0]
2025-09-16 12:48:53,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 55 minutes, 45 seconds)
2025-09-16 12:50:54,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:50:55,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 495.59668 ± 67.300
2025-09-16 12:50:55,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [499.1077, 430.70834, 492.0334, 522.9722, 382.77576, 599.44183, 462.38196, 498.5017, 613.8054, 454.23843]
2025-09-16 12:50:55,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 79.0, 106.0, 99.0, 71.0, 126.0, 91.0, 105.0, 131.0, 84.0]
2025-09-16 12:50:55,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 53 minutes, 56 seconds)
2025-09-16 12:52:55,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:52:56,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 435.97794 ± 126.145
2025-09-16 12:52:56,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [282.14175, 588.7415, 436.93063, 263.01047, 373.7074, 492.39432, 635.309, 297.16858, 427.61923, 562.75616]
2025-09-16 12:52:56,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 112.0, 83.0, 50.0, 70.0, 94.0, 132.0, 56.0, 79.0, 103.0]
2025-09-16 12:52:56,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 51 minutes, 59 seconds)
2025-09-16 12:54:57,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:54:58,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 475.39301 ± 137.049
2025-09-16 12:54:58,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [380.36435, 542.75574, 175.18451, 413.06888, 410.248, 598.3612, 629.07715, 522.09686, 656.2157, 426.55774]
2025-09-16 12:54:58,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 104.0, 34.0, 88.0, 77.0, 112.0, 118.0, 97.0, 140.0, 77.0]
2025-09-16 12:54:58,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 49 minutes, 59 seconds)
2025-09-16 12:56:59,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:57:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 525.77991 ± 121.470
2025-09-16 12:57:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [566.80536, 434.07245, 210.5791, 642.4972, 508.57227, 528.7484, 563.89557, 614.90906, 541.374, 646.34546]
2025-09-16 12:57:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 80.0, 41.0, 132.0, 97.0, 107.0, 115.0, 115.0, 100.0, 124.0]
2025-09-16 12:57:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (525.78) for latency 15
2025-09-16 12:57:00,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 48 minutes, 14 seconds)
2025-09-16 12:59:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:59:02,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 484.82477 ± 98.330
2025-09-16 12:59:02,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [552.9581, 455.49768, 330.92203, 603.89325, 373.6916, 419.6369, 399.1264, 507.80527, 577.63416, 627.082]
2025-09-16 12:59:02,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 88.0, 63.0, 115.0, 71.0, 81.0, 72.0, 98.0, 123.0, 115.0]
2025-09-16 12:59:02,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 46 minutes, 19 seconds)
2025-09-16 13:01:01,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:01:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 512.77826 ± 126.820
2025-09-16 13:01:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [458.40335, 584.3277, 439.415, 345.76315, 420.59564, 772.5634, 533.6041, 680.208, 390.63632, 502.2662]
2025-09-16 13:01:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 111.0, 97.0, 66.0, 80.0, 162.0, 101.0, 127.0, 78.0, 94.0]
2025-09-16 13:01:03,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 44 minutes, 3 seconds)
2025-09-16 13:03:03,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:03:04,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 467.36505 ± 133.525
2025-09-16 13:03:04,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [567.90125, 204.60132, 587.4783, 529.05865, 446.06583, 534.3181, 388.8938, 592.9061, 565.7162, 256.71075]
2025-09-16 13:03:04,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 39.0, 120.0, 111.0, 92.0, 99.0, 84.0, 110.0, 107.0, 52.0]
2025-09-16 13:03:04,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 42 minutes, 6 seconds)
2025-09-16 13:05:04,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:05:06,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 444.38477 ± 64.405
2025-09-16 13:05:06,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [456.0784, 536.49677, 473.2742, 323.58908, 405.3906, 507.88824, 466.51852, 434.99664, 348.77014, 490.84503]
2025-09-16 13:05:06,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 100.0, 88.0, 68.0, 75.0, 110.0, 101.0, 80.0, 74.0, 91.0]
2025-09-16 13:05:06,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 40 minutes)
2025-09-16 13:07:07,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:07:08,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 556.19934 ± 162.440
2025-09-16 13:07:08,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [379.36984, 932.4078, 503.0596, 732.68274, 374.11462, 488.63126, 507.02597, 615.6514, 443.9928, 585.0569]
2025-09-16 13:07:08,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 175.0, 95.0, 153.0, 69.0, 90.0, 92.0, 126.0, 91.0, 110.0]
2025-09-16 13:07:08,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (556.20) for latency 15
2025-09-16 13:07:08,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 38 minutes, 9 seconds)
2025-09-16 13:09:09,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:09:11,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 485.30209 ± 86.880
2025-09-16 13:09:11,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [628.9403, 500.08185, 385.90533, 472.0171, 383.1937, 587.403, 437.1721, 393.75452, 597.9014, 466.65198]
2025-09-16 13:09:11,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 92.0, 72.0, 91.0, 70.0, 124.0, 95.0, 73.0, 112.0, 86.0]
2025-09-16 13:09:11,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 36 minutes, 16 seconds)
2025-09-16 13:11:11,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:11:12,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 519.53595 ± 74.765
2025-09-16 13:11:12,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [651.1365, 469.75723, 521.80707, 420.9924, 550.0522, 614.8339, 460.0402, 421.43213, 575.36127, 509.94604]
2025-09-16 13:11:12,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 91.0, 96.0, 80.0, 104.0, 117.0, 85.0, 90.0, 110.0, 94.0]
2025-09-16 13:11:12,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 34 minutes, 18 seconds)
2025-09-16 13:13:13,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:13:15,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 549.97400 ± 89.961
2025-09-16 13:13:15,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [658.9688, 455.91238, 510.31995, 559.20825, 612.68854, 500.2939, 509.64716, 715.2837, 400.15665, 577.26074]
2025-09-16 13:13:15,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 98.0, 97.0, 105.0, 121.0, 90.0, 95.0, 130.0, 88.0, 122.0]
2025-09-16 13:13:15,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 32 minutes, 37 seconds)
2025-09-16 13:15:13,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:15:15,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 547.27454 ± 127.370
2025-09-16 13:15:15,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [734.98975, 663.29944, 393.84274, 780.9455, 554.1344, 448.69836, 452.8716, 521.6031, 436.3789, 485.9821]
2025-09-16 13:15:15,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 123.0, 80.0, 153.0, 108.0, 84.0, 85.0, 99.0, 81.0, 89.0]
2025-09-16 13:15:15,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 30 minutes, 11 seconds)
2025-09-16 13:17:15,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:17:17,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 568.69659 ± 265.102
2025-09-16 13:17:17,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [423.71924, 442.75934, 238.91985, 1224.604, 451.69366, 778.4894, 438.24908, 526.0685, 422.97012, 739.4928]
2025-09-16 13:17:17,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 92.0, 46.0, 234.0, 83.0, 150.0, 84.0, 97.0, 79.0, 155.0]
2025-09-16 13:17:17,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (568.70) for latency 15
2025-09-16 13:17:17,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 28 minutes, 1 second)
2025-09-16 13:19:18,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:19:20,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 569.79797 ± 90.523
2025-09-16 13:19:20,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [513.59393, 491.7735, 472.55353, 693.5922, 454.19226, 504.42178, 659.5606, 699.50916, 565.6469, 643.1351]
2025-09-16 13:19:20,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 101.0, 86.0, 131.0, 86.0, 96.0, 128.0, 134.0, 116.0, 128.0]
2025-09-16 13:19:20,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (569.80) for latency 15
2025-09-16 13:19:20,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 26 minutes, 15 seconds)
2025-09-16 13:21:20,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:21:21,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 559.60364 ± 134.620
2025-09-16 13:21:21,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [350.13647, 385.69043, 755.83453, 568.2321, 395.78696, 682.32074, 603.3693, 573.6214, 716.0853, 564.9589]
2025-09-16 13:21:21,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 74.0, 150.0, 119.0, 73.0, 130.0, 114.0, 105.0, 142.0, 106.0]
2025-09-16 13:21:21,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 24 minutes, 11 seconds)
2025-09-16 13:23:23,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:23:24,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 534.27911 ± 71.806
2025-09-16 13:23:24,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [546.22864, 410.03204, 571.89716, 556.70984, 613.9759, 480.26788, 501.5111, 660.9984, 552.5066, 448.66284]
2025-09-16 13:23:24,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 88.0, 106.0, 102.0, 113.0, 89.0, 99.0, 121.0, 104.0, 84.0]
2025-09-16 13:23:24,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 22 minutes, 14 seconds)
2025-09-16 13:25:24,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:25:26,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 552.88251 ± 69.414
2025-09-16 13:25:26,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [601.8425, 580.2002, 420.05295, 544.614, 602.8142, 589.575, 589.7748, 641.45685, 520.7062, 437.78873]
2025-09-16 13:25:26,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 107.0, 77.0, 111.0, 116.0, 111.0, 110.0, 121.0, 99.0, 81.0]
2025-09-16 13:25:26,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 20 minutes, 32 seconds)
2025-09-16 13:27:26,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:27:27,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 535.17010 ± 135.892
2025-09-16 13:27:27,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [405.6344, 749.141, 480.53806, 560.74164, 447.5126, 419.81125, 479.4732, 551.34717, 824.4494, 433.05276]
2025-09-16 13:27:27,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 138.0, 89.0, 102.0, 83.0, 78.0, 88.0, 103.0, 155.0, 81.0]
2025-09-16 13:27:27,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 18 minutes, 22 seconds)
2025-09-16 13:29:28,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:29:30,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 582.08020 ± 123.394
2025-09-16 13:29:30,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [453.23352, 802.00604, 528.38495, 547.3909, 719.4578, 518.6541, 637.64526, 717.9865, 485.50015, 410.54333]
2025-09-16 13:29:30,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 150.0, 97.0, 100.0, 132.0, 95.0, 119.0, 139.0, 98.0, 85.0]
2025-09-16 13:29:30,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (582.08) for latency 15
2025-09-16 13:29:30,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 16 minutes, 8 seconds)
2025-09-16 13:31:31,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:31:32,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 668.46552 ± 163.536
2025-09-16 13:31:32,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [454.81003, 457.6353, 724.8373, 577.0825, 534.27997, 578.2818, 780.999, 952.1018, 761.44495, 863.18274]
2025-09-16 13:31:32,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 85.0, 137.0, 109.0, 99.0, 112.0, 147.0, 182.0, 147.0, 161.0]
2025-09-16 13:31:32,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (668.47) for latency 15
2025-09-16 13:31:32,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 14 minutes, 28 seconds)
2025-09-16 13:33:33,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:33:35,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 602.50189 ± 143.650
2025-09-16 13:33:35,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [490.44534, 686.0724, 516.2628, 690.4292, 437.45822, 949.69653, 453.35117, 620.4705, 568.0535, 612.7786]
2025-09-16 13:33:35,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 129.0, 111.0, 138.0, 80.0, 191.0, 84.0, 116.0, 105.0, 112.0]
2025-09-16 13:33:35,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 12 minutes, 17 seconds)
2025-09-16 13:35:36,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:35:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 719.43671 ± 243.772
2025-09-16 13:35:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [915.5086, 668.12177, 658.2578, 908.2045, 637.2746, 388.19824, 464.7495, 753.68365, 1264.8392, 535.5286]
2025-09-16 13:35:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 139.0, 118.0, 174.0, 135.0, 83.0, 85.0, 141.0, 262.0, 114.0]
2025-09-16 13:35:38,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (719.44) for latency 15
2025-09-16 13:35:38,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 10 minutes, 31 seconds)
2025-09-16 13:37:39,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:37:40,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 663.11194 ± 209.888
2025-09-16 13:37:40,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1000.0124, 463.30145, 732.979, 477.42554, 442.03146, 516.1858, 948.6208, 931.6279, 526.4634, 592.47107]
2025-09-16 13:37:40,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 85.0, 140.0, 102.0, 89.0, 96.0, 177.0, 176.0, 115.0, 109.0]
2025-09-16 13:37:40,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 8 minutes, 47 seconds)
2025-09-16 13:39:42,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:39:44,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 724.20441 ± 198.263
2025-09-16 13:39:44,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [903.8745, 424.0435, 1092.141, 557.1451, 624.116, 688.34863, 477.48703, 768.6472, 849.2249, 857.0159]
2025-09-16 13:39:44,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 77.0, 210.0, 111.0, 115.0, 131.0, 101.0, 140.0, 157.0, 158.0]
2025-09-16 13:39:44,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (724.20) for latency 15
2025-09-16 13:39:44,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 6 minutes, 53 seconds)
2025-09-16 13:41:45,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:41:47,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 768.18256 ± 239.332
2025-09-16 13:41:47,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [785.3165, 334.84778, 795.2427, 683.79987, 695.4643, 684.6287, 826.38385, 554.14087, 1102.7375, 1219.264]
2025-09-16 13:41:47,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 64.0, 155.0, 123.0, 136.0, 126.0, 160.0, 102.0, 206.0, 247.0]
2025-09-16 13:41:47,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (768.18) for latency 15
2025-09-16 13:41:47,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 4 minutes, 54 seconds)
2025-09-16 13:43:47,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:43:49,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 702.88147 ± 211.794
2025-09-16 13:43:49,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [557.74005, 1240.1755, 593.02216, 532.00836, 613.15607, 889.5634, 712.77637, 469.71555, 704.27924, 716.3785]
2025-09-16 13:43:49,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 237.0, 122.0, 96.0, 122.0, 172.0, 145.0, 88.0, 130.0, 135.0]
2025-09-16 13:43:49,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 2 minutes, 51 seconds)
2025-09-16 13:45:51,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:45:53,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 593.77307 ± 168.188
2025-09-16 13:45:53,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [554.4649, 769.8061, 822.5008, 373.97226, 415.57507, 554.4253, 351.16696, 580.2852, 800.07434, 715.4597]
2025-09-16 13:45:53,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 137.0, 152.0, 81.0, 85.0, 110.0, 69.0, 125.0, 167.0, 140.0]
2025-09-16 13:45:53,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 1 minute, 2 seconds)
2025-09-16 13:47:53,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:47:55,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 749.59943 ± 244.543
2025-09-16 13:47:55,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [748.2979, 400.10226, 557.87787, 509.00974, 1097.3068, 508.8811, 1061.2649, 801.47363, 1069.7457, 742.0349]
2025-09-16 13:47:55,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 77.0, 120.0, 110.0, 213.0, 92.0, 190.0, 156.0, 198.0, 145.0]
2025-09-16 13:47:55,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 58 minutes, 52 seconds)
2025-09-16 13:49:56,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:49:59,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 812.63922 ± 148.715
2025-09-16 13:49:59,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [709.0109, 872.14386, 995.00336, 948.25934, 1013.6696, 709.50024, 811.35236, 699.48505, 854.6763, 513.29095]
2025-09-16 13:49:59,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 159.0, 195.0, 184.0, 189.0, 133.0, 144.0, 127.0, 183.0, 93.0]
2025-09-16 13:49:59,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (812.64) for latency 15
2025-09-16 13:49:59,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 56 minutes, 50 seconds)
2025-09-16 13:52:00,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:52:02,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 851.31555 ± 203.453
2025-09-16 13:52:02,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1052.0685, 1073.7891, 1045.0305, 553.2345, 798.7213, 550.99866, 892.36676, 598.2077, 1040.586, 908.1523]
2025-09-16 13:52:02,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 221.0, 212.0, 120.0, 152.0, 115.0, 178.0, 129.0, 215.0, 185.0]
2025-09-16 13:52:02,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (851.32) for latency 15
2025-09-16 13:52:02,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 54 minutes, 54 seconds)
2025-09-16 13:54:04,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:54:06,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 774.79468 ± 100.893
2025-09-16 13:54:06,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [709.9538, 736.6718, 828.8983, 631.829, 653.5067, 976.32184, 831.0475, 701.59784, 857.2734, 820.84625]
2025-09-16 13:54:06,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 145.0, 174.0, 119.0, 128.0, 183.0, 160.0, 145.0, 176.0, 176.0]
2025-09-16 13:54:06,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 53 minutes, 8 seconds)
2025-09-16 13:56:08,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:56:11,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 820.30438 ± 250.419
2025-09-16 13:56:11,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [782.3517, 1244.6978, 884.16724, 1102.8989, 843.27277, 390.75366, 669.1448, 707.334, 1052.859, 525.56366]
2025-09-16 13:56:11,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 229.0, 183.0, 210.0, 158.0, 88.0, 144.0, 152.0, 210.0, 113.0]
2025-09-16 13:56:11,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 51 minutes, 9 seconds)
2025-09-16 13:58:11,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:58:14,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1029.81421 ± 372.136
2025-09-16 13:58:14,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1137.813, 925.0097, 1549.2198, 1149.3914, 924.8243, 911.41376, 847.30273, 190.8668, 1590.3123, 1071.9886]
2025-09-16 13:58:14,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 191.0, 278.0, 231.0, 197.0, 182.0, 165.0, 37.0, 276.0, 207.0]
2025-09-16 13:58:14,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1029.81) for latency 15
2025-09-16 13:58:14,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 49 minutes, 17 seconds)
2025-09-16 14:00:15,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:00:18,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1045.03442 ± 474.148
2025-09-16 14:00:18,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [149.78659, 994.5393, 1062.4058, 750.05115, 507.38373, 1175.8473, 1455.739, 1285.9248, 1949.736, 1118.9313]
2025-09-16 14:00:18,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 197.0, 203.0, 142.0, 100.0, 227.0, 263.0, 236.0, 370.0, 207.0]
2025-09-16 14:00:18,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1045.03) for latency 15
2025-09-16 14:00:18,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 47 minutes, 26 seconds)
2025-09-16 14:02:21,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:02:24,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 822.04932 ± 247.884
2025-09-16 14:02:24,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1035.601, 1421.0842, 730.6675, 675.5295, 627.30597, 709.31305, 512.3658, 749.96924, 983.983, 774.6742]
2025-09-16 14:02:24,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 288.0, 139.0, 144.0, 128.0, 150.0, 104.0, 151.0, 184.0, 141.0]
2025-09-16 14:02:24,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 45 minutes, 37 seconds)
2025-09-16 14:04:24,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:04:26,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 976.38641 ± 357.551
2025-09-16 14:04:26,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [780.01154, 595.20715, 946.6614, 976.43774, 735.40393, 1754.6146, 782.1202, 707.5077, 1541.1301, 944.7699]
2025-09-16 14:04:26,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 113.0, 194.0, 191.0, 127.0, 333.0, 160.0, 132.0, 284.0, 181.0]
2025-09-16 14:04:26,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 43 minutes, 22 seconds)
2025-09-16 14:06:29,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:06:33,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1694.82776 ± 743.514
2025-09-16 14:06:33,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [761.7151, 2355.248, 2435.134, 1647.6348, 1824.2596, 710.0962, 1476.4006, 778.568, 3011.3728, 1947.8469]
2025-09-16 14:06:33,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 432.0, 447.0, 321.0, 358.0, 130.0, 281.0, 141.0, 558.0, 365.0]
2025-09-16 14:06:33,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1694.83) for latency 15
2025-09-16 14:06:33,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 41 minutes, 42 seconds)
2025-09-16 14:08:36,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:08:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1242.18433 ± 657.596
2025-09-16 14:08:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1157.1249, 544.2526, 2422.454, 552.59467, 1372.2354, 852.28864, 1401.3342, 1299.1744, 486.22064, 2334.1638]
2025-09-16 14:08:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 110.0, 461.0, 102.0, 247.0, 153.0, 266.0, 234.0, 86.0, 428.0]
2025-09-16 14:08:39,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 40 minutes, 3 seconds)
2025-09-16 14:10:39,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:10:43,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1212.05640 ± 747.502
2025-09-16 14:10:43,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [583.9294, 1004.3999, 730.11554, 1115.0511, 1060.1862, 1984.0217, 748.316, 476.36417, 3087.9563, 1330.2231]
2025-09-16 14:10:43,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 183.0, 134.0, 219.0, 205.0, 362.0, 136.0, 104.0, 583.0, 254.0]
2025-09-16 14:10:43,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 37 minutes, 46 seconds)
2025-09-16 14:12:44,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:12:47,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1142.56531 ± 564.861
2025-09-16 14:12:47,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [729.29297, 974.3092, 352.93176, 1029.891, 910.5757, 1164.2437, 1759.1771, 533.35376, 1733.7864, 2238.092]
2025-09-16 14:12:47,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 170.0, 68.0, 194.0, 163.0, 202.0, 336.0, 113.0, 310.0, 429.0]
2025-09-16 14:12:47,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 35 minutes, 37 seconds)
2025-09-16 14:14:52,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:14:55,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1205.51794 ± 513.559
2025-09-16 14:14:55,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [839.2188, 2142.597, 1038.6973, 1615.9231, 495.01233, 539.4946, 987.9909, 1624.925, 1075.5378, 1695.7817]
2025-09-16 14:14:55,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 409.0, 191.0, 295.0, 109.0, 117.0, 184.0, 293.0, 201.0, 323.0]
2025-09-16 14:14:55,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 34 minutes, 16 seconds)
2025-09-16 14:16:53,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:16:57,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1221.68140 ± 697.084
2025-09-16 14:16:57,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [448.9124, 763.93756, 554.3095, 2216.1301, 1629.7667, 497.37878, 1304.8995, 747.7743, 2473.5005, 1580.204]
2025-09-16 14:16:57,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 154.0, 102.0, 391.0, 304.0, 87.0, 242.0, 131.0, 456.0, 307.0]
2025-09-16 14:16:57,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 31 minutes, 25 seconds)
2025-09-16 14:19:02,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:19:07,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1623.34839 ± 1089.511
2025-09-16 14:19:07,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [381.8087, 1637.244, 1112.6509, 535.2733, 1244.7188, 1522.1451, 2534.904, 2901.4165, 501.9906, 3861.333]
2025-09-16 14:19:07,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 307.0, 206.0, 99.0, 223.0, 270.0, 450.0, 509.0, 91.0, 686.0]
2025-09-16 14:19:07,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 57 seconds)
2025-09-16 14:21:12,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:21:16,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1619.51550 ± 815.289
2025-09-16 14:21:16,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3703.8767, 2101.9724, 1192.3434, 1290.4554, 1217.8226, 1184.4751, 2317.2283, 993.62054, 1057.9595, 1135.4012]
2025-09-16 14:21:16,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [711.0, 400.0, 233.0, 230.0, 224.0, 239.0, 439.0, 201.0, 202.0, 222.0]
2025-09-16 14:21:16,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 28 minutes, 43 seconds)
2025-09-16 14:23:16,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:23:23,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2298.48071 ± 1213.925
2025-09-16 14:23:23,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [792.6181, 2719.7756, 5458.1333, 1951.7136, 1125.7908, 1582.8855, 2486.8481, 2543.69, 2450.6829, 1872.6671]
2025-09-16 14:23:23,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 498.0, 1000.0, 355.0, 198.0, 297.0, 473.0, 474.0, 452.0, 362.0]
2025-09-16 14:23:23,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2298.48) for latency 15
2025-09-16 14:23:23,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 26 minutes, 50 seconds)
2025-09-16 14:25:22,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:25:28,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2195.15869 ± 1265.337
2025-09-16 14:25:28,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3813.5974, 1959.3907, 2262.1084, 451.40417, 4451.9336, 1181.6716, 2159.1438, 3449.2834, 1463.5564, 759.49634]
2025-09-16 14:25:28,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [684.0, 344.0, 414.0, 82.0, 787.0, 230.0, 388.0, 614.0, 276.0, 141.0]
2025-09-16 14:25:28,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 24 minutes, 21 seconds)
2025-09-16 14:27:31,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:27:37,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2320.70776 ± 1291.726
2025-09-16 14:27:37,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2026.1703, 3127.0508, 486.21613, 949.03033, 3561.3984, 2224.7585, 2041.9836, 4256.081, 3898.7805, 635.6083]
2025-09-16 14:27:37,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [375.0, 589.0, 87.0, 189.0, 660.0, 398.0, 363.0, 775.0, 736.0, 126.0]
2025-09-16 14:27:37,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2320.71) for latency 15
2025-09-16 14:27:37,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 23 minutes, 14 seconds)
2025-09-16 14:29:39,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:29:45,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2121.23462 ± 1149.279
2025-09-16 14:29:45,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2112.8674, 2506.7573, 5096.58, 666.2412, 2189.2495, 2518.3643, 1874.797, 1021.36285, 1347.7554, 1878.3702]
2025-09-16 14:29:45,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [395.0, 432.0, 943.0, 148.0, 391.0, 466.0, 368.0, 197.0, 268.0, 363.0]
2025-09-16 14:29:45,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 20 minutes, 47 seconds)
2025-09-16 14:31:55,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:32:00,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1617.67883 ± 1301.243
2025-09-16 14:32:00,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [723.9395, 938.5569, 1994.8983, 5143.4116, 1487.1907, 912.8408, 774.59875, 2466.1736, 746.255, 988.9227]
2025-09-16 14:32:00,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 195.0, 365.0, 1000.0, 301.0, 164.0, 164.0, 486.0, 130.0, 207.0]
2025-09-16 14:32:00,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 19 minutes, 22 seconds)
2025-09-16 14:33:53,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:33:58,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1889.75977 ± 1020.057
2025-09-16 14:33:58,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1243.5326, 1723.2966, 1586.5272, 1190.9347, 969.27045, 4289.7983, 2300.5103, 2033.9636, 648.8643, 2910.9001]
2025-09-16 14:33:58,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [239.0, 312.0, 295.0, 204.0, 189.0, 769.0, 418.0, 357.0, 131.0, 528.0]
2025-09-16 14:33:58,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 16 minutes, 15 seconds)
2025-09-16 14:36:01,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:36:10,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2857.16138 ± 1641.556
2025-09-16 14:36:10,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [450.1132, 5303.1753, 5382.541, 975.267, 2267.63, 4306.164, 2979.0142, 2274.1929, 3282.4697, 1351.0465]
2025-09-16 14:36:10,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 1000.0, 1000.0, 183.0, 417.0, 820.0, 556.0, 427.0, 614.0, 242.0]
2025-09-16 14:36:10,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2857.16) for latency 15
2025-09-16 14:36:10,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 14 minutes, 53 seconds)
2025-09-16 14:38:19,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:38:30,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3675.61792 ± 1877.738
2025-09-16 14:38:30,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [554.90186, 5411.2646, 5487.6255, 4765.5654, 983.43695, 5441.3843, 2121.4426, 2351.7783, 5485.004, 4153.7754]
2025-09-16 14:38:30,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 1000.0, 1000.0, 871.0, 199.0, 1000.0, 384.0, 439.0, 1000.0, 754.0]
2025-09-16 14:38:30,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (3675.62) for latency 15
2025-09-16 14:38:30,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 13 minutes, 57 seconds)
2025-09-16 14:40:35,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:40:48,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4424.25293 ± 1364.245
2025-09-16 14:40:48,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5400.257, 5387.206, 5360.221, 5407.9907, 5380.416, 3969.8125, 5333.96, 4234.222, 1879.7693, 1888.6713]
2025-09-16 14:40:48,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 750.0, 1000.0, 779.0, 363.0, 344.0]
2025-09-16 14:40:48,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4424.25) for latency 15
2025-09-16 14:40:48,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 12 minutes, 57 seconds)
2025-09-16 14:42:47,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:42:58,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3764.02271 ± 1670.745
2025-09-16 14:42:58,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1211.5673, 2143.1199, 5323.6045, 3120.3247, 5301.9683, 3407.0142, 1239.5548, 5306.333, 5330.08, 5256.66]
2025-09-16 14:42:58,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 402.0, 1000.0, 607.0, 1000.0, 640.0, 250.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:42:58,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 10 minutes, 9 seconds)
2025-09-16 14:44:56,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:45:06,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3290.05127 ± 1418.842
2025-09-16 14:45:06,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2405.7688, 5405.342, 3503.552, 561.3769, 4991.6016, 1900.7, 2755.9983, 3381.998, 3201.9712, 4792.2046]
2025-09-16 14:45:06,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [453.0, 1000.0, 657.0, 104.0, 929.0, 381.0, 515.0, 592.0, 600.0, 885.0]
2025-09-16 14:45:06,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 8 minutes, 58 seconds)
2025-09-16 14:47:19,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:47:33,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4478.42920 ± 1655.176
2025-09-16 14:47:33,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5256.6245, 475.16043, 5295.757, 5243.639, 5329.184, 5320.1567, 5244.9272, 5277.3647, 5336.2603, 2005.2188]
2025-09-16 14:47:33,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 87.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 392.0]
2025-09-16 14:47:33,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4478.43) for latency 15
2025-09-16 14:47:33,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 8 minutes, 22 seconds)
2025-09-16 14:49:24,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:49:34,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3237.55078 ± 2113.540
2025-09-16 14:49:34,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5231.7417, 5288.782, 796.0158, 1315.0742, 5206.3984, 530.90155, 5259.252, 595.28766, 2820.6514, 5331.4023]
2025-09-16 14:49:34,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 152.0, 250.0, 1000.0, 109.0, 1000.0, 116.0, 548.0, 1000.0]
2025-09-16 14:49:34,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 4 minutes, 12 seconds)
2025-09-16 14:51:40,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:51:54,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4755.87500 ± 1536.136
2025-09-16 14:51:54,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5538.4824, 5390.2725, 5596.5356, 5384.5757, 5396.9243, 5291.2466, 3600.4092, 5584.514, 460.73468, 5315.055]
2025-09-16 14:51:54,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 640.0, 1000.0, 84.0, 1000.0]
2025-09-16 14:51:54,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4755.88) for latency 15
2025-09-16 14:51:54,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 2 minutes, 9 seconds)
2025-09-16 14:53:55,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:54:07,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3848.58862 ± 1907.186
2025-09-16 14:54:07,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5187.2095, 1076.0238, 5448.8555, 1134.0002, 5335.706, 4754.9517, 4521.4526, 709.60284, 5468.178, 4849.9067]
2025-09-16 14:54:07,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [972.0, 204.0, 1000.0, 226.0, 1000.0, 871.0, 850.0, 131.0, 1000.0, 900.0]
2025-09-16 14:54:07,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 12 seconds)
2025-09-16 14:56:16,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:56:31,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4727.16309 ± 1468.445
2025-09-16 14:56:31,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [556.1653, 5286.2935, 3792.0938, 5333.43, 5477.981, 5390.605, 5408.7935, 5304.449, 5418.015, 5303.804]
2025-09-16 14:56:31,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 1000.0, 719.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:56:31,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 59 minutes, 21 seconds)
2025-09-16 14:58:30,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:58:43,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4083.48389 ± 1361.205
2025-09-16 14:58:43,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1801.828, 2398.7195, 5241.2095, 2253.5369, 3709.6558, 5238.5806, 4376.203, 5263.844, 5274.2266, 5277.036]
2025-09-16 14:58:43,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [339.0, 465.0, 1000.0, 421.0, 705.0, 1000.0, 844.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:58:43,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 55 minutes, 46 seconds)
2025-09-16 15:00:49,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:01:04,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4796.65088 ± 1347.320
2025-09-16 15:01:04,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5292.89, 5233.787, 5246.5225, 5195.1045, 758.54974, 5229.995, 5240.068, 5378.9653, 5137.916, 5252.7104]
2025-09-16 15:01:04,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 140.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:01:04,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4796.65) for latency 15
2025-09-16 15:01:04,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 55 minutes, 13 seconds)
2025-09-16 15:03:04,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:03:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4948.04980 ± 794.971
2025-09-16 15:03:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2566.825, 5145.301, 5152.161, 5193.379, 5245.022, 5226.3057, 5252.382, 5300.9165, 5198.799, 5199.4062]
2025-09-16 15:03:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [518.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:03:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4948.05) for latency 15
2025-09-16 15:03:19,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 52 minutes, 33 seconds)
2025-09-16 15:05:32,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:05:48,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5382.92041 ± 42.726
2025-09-16 15:05:48,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5353.081, 5424.5225, 5348.9355, 5344.6807, 5382.7207, 5393.33, 5431.9287, 5322.164, 5463.3813, 5364.455]
2025-09-16 15:05:48,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:05:48,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5382.92) for latency 15
2025-09-16 15:05:48,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 51 minutes, 26 seconds)
2025-09-16 15:07:51,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:08:02,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3764.52026 ± 1927.672
2025-09-16 15:08:02,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3122.1086, 714.0821, 5290.209, 5260.817, 5222.485, 5302.24, 1162.0203, 5217.878, 5275.2637, 1078.1003]
2025-09-16 15:08:02,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [592.0, 124.0, 1000.0, 1000.0, 1000.0, 1000.0, 244.0, 1000.0, 1000.0, 201.0]
2025-09-16 15:08:02,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 48 minutes, 24 seconds)
2025-09-16 15:10:00,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:10:15,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5146.19873 ± 612.768
2025-09-16 15:10:15,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5510.3022, 3820.7273, 5494.371, 5462.108, 5453.656, 5439.671, 4033.41, 5348.227, 5462.0664, 5437.454]
2025-09-16 15:10:15,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 666.0, 1000.0, 1000.0, 1000.0, 1000.0, 718.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:10:15,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 46 minutes, 7 seconds)
2025-09-16 15:12:25,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:12:38,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4495.40234 ± 1871.219
2025-09-16 15:12:38,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5398.6245, 5352.8022, 5456.8755, 5469.153, 5436.106, 5482.625, 5439.5103, 5411.049, 763.7917, 743.48846]
2025-09-16 15:12:38,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 136.0, 128.0]
2025-09-16 15:12:38,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 43 minutes, 55 seconds)
2025-09-16 15:14:35,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:14:47,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4522.12012 ± 2002.995
2025-09-16 15:14:47,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5516.4814, 5610.6533, 593.7449, 5567.0366, 5136.0327, 5561.9106, 457.16434, 5578.2466, 5573.3896, 5626.537]
2025-09-16 15:14:47,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 104.0, 1000.0, 894.0, 1000.0, 82.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:14:47,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 41 minutes, 16 seconds)
2025-09-16 15:16:53,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:17:09,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5448.84961 ± 77.243
2025-09-16 15:17:09,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5413.3325, 5423.5493, 5482.4956, 5557.934, 5372.2344, 5482.47, 5478.36, 5319.425, 5383.82, 5574.872]
2025-09-16 15:17:09,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:17:09,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5448.85) for latency 15
2025-09-16 15:17:09,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 38 minutes, 36 seconds)
2025-09-16 15:19:10,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:19:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4065.99072 ± 1539.276
2025-09-16 15:19:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1281.9521, 974.3098, 5357.7915, 5250.7607, 4356.3745, 5251.8647, 3986.1306, 5298.021, 4452.4336, 4450.2676]
2025-09-16 15:19:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [259.0, 185.0, 1000.0, 1000.0, 812.0, 1000.0, 719.0, 1000.0, 840.0, 836.0]
2025-09-16 15:19:22,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 36 minutes, 16 seconds)
2025-09-16 15:21:28,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:21:41,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4439.12646 ± 1738.793
2025-09-16 15:21:41,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5280.2007, 5314.001, 5305.1094, 5311.4854, 1394.3937, 5315.2573, 5298.0596, 568.61633, 5348.4814, 5255.658]
2025-09-16 15:21:41,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 292.0, 1000.0, 1000.0, 102.0, 1000.0, 1000.0]
2025-09-16 15:21:41,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 34 minutes, 20 seconds)
2025-09-16 15:23:48,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:24:04,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4847.33154 ± 1151.664
2025-09-16 15:24:04,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5283.232, 5282.2056, 5224.423, 5216.424, 5193.8477, 5206.438, 5210.7983, 5206.87, 5255.541, 1393.5365]
2025-09-16 15:24:04,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 257.0]
2025-09-16 15:24:04,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 32 minutes)
2025-09-16 15:26:12,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:26:26,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4520.20117 ± 1485.637
2025-09-16 15:26:26,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [522.7263, 5208.7065, 5207.8657, 5219.308, 5236.3813, 5169.9746, 5186.016, 3007.2195, 5180.924, 5262.8936]
2025-09-16 15:26:26,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 575.0, 1000.0, 1000.0]
2025-09-16 15:26:26,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 30 minutes, 17 seconds)
2025-09-16 15:28:21,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:28:36,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4984.32666 ± 1047.901
2025-09-16 15:28:36,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5354.179, 5291.276, 5476.2827, 5338.4385, 5321.935, 5295.6904, 5273.2095, 5299.609, 5347.845, 1844.8037]
2025-09-16 15:28:36,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 341.0]
2025-09-16 15:28:36,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 27 seconds)
2025-09-16 15:30:38,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:30:55,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5159.45020 ± 37.335
2025-09-16 15:30:55,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5105.92, 5145.448, 5128.6465, 5161.003, 5194.458, 5230.5923, 5139.906, 5147.482, 5132.651, 5208.399]
2025-09-16 15:30:55,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:30:55,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 22 seconds)
2025-09-16 15:32:58,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:33:12,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4847.79248 ± 1339.894
2025-09-16 15:33:12,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5336.368, 5350.393, 5417.986, 5453.474, 5466.803, 5418.0957, 4389.2603, 5425.311, 933.1183, 5287.118]
2025-09-16 15:33:12,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 828.0, 1000.0, 166.0, 1000.0]
2025-09-16 15:33:12,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 1 second)
2025-09-16 15:35:20,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:35:33,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4398.69092 ± 1807.448
2025-09-16 15:35:33,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5270.889, 5327.056, 5323.1855, 5328.4277, 5291.039, 926.4486, 5315.2, 5262.6987, 646.0056, 5295.9614]
2025-09-16 15:35:33,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 166.0, 1000.0, 1000.0, 133.0, 1000.0]
2025-09-16 15:35:33,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 41 seconds)
2025-09-16 15:37:32,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:37:46,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4350.63428 ± 1521.183
2025-09-16 15:37:46,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5151.795, 5173.0615, 5179.9756, 5173.4497, 5166.088, 5114.072, 5220.7476, 304.54233, 4091.0464, 2931.567]
2025-09-16 15:37:46,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 60.0, 783.0, 568.0]
2025-09-16 15:37:46,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 8 seconds)
2025-09-16 15:39:48,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:40:01,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4215.37158 ± 1236.794
2025-09-16 15:40:01,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3749.8105, 5329.189, 2929.964, 3480.0466, 1550.8662, 5242.4326, 5334.9023, 5212.4946, 5312.1143, 4011.8943]
2025-09-16 15:40:01,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [718.0, 1000.0, 567.0, 665.0, 290.0, 1000.0, 1000.0, 1000.0, 1000.0, 754.0]
2025-09-16 15:40:01,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 59 seconds)
2025-09-16 15:42:11,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:42:23,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3916.78076 ± 2187.524
2025-09-16 15:42:23,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5288.3384, 5352.7607, 5370.6978, 507.36398, 598.0571, 5440.2783, 5372.6465, 5265.618, 5348.747, 623.304]
2025-09-16 15:42:23,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 91.0, 108.0, 1000.0, 1000.0, 1000.0, 1000.0, 111.0]
2025-09-16 15:42:23,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 45 seconds)
2025-09-16 15:44:15,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:44:30,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4763.33887 ± 1348.336
2025-09-16 15:44:30,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5249.8853, 5219.3657, 5257.0527, 720.25116, 5267.8867, 5143.0703, 5213.0425, 5138.711, 5196.1387, 5227.982]
2025-09-16 15:44:30,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 132.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:44:30,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 17 seconds)
2025-09-16 15:46:27,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:46:43,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5434.73242 ± 93.069
2025-09-16 15:46:43,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5267.855, 5297.04, 5515.7417, 5486.801, 5472.374, 5326.661, 5471.274, 5508.1357, 5469.0264, 5532.413]
2025-09-16 15:46:43,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:46:43,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 55 seconds)
2025-09-16 15:48:44,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:49:00,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4872.98877 ± 1342.843
2025-09-16 15:49:00,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5288.5073, 5283.936, 5347.3403, 5228.8047, 5453.702, 5290.19, 5336.1445, 5294.6714, 5358.4985, 848.0913]
2025-09-16 15:49:00,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 160.0]
2025-09-16 15:49:00,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 43 seconds)
2025-09-16 15:51:14,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:51:26,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4126.98633 ± 1954.239
2025-09-16 15:51:26,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4512.894, 359.98825, 5457.0356, 5563.5845, 5449.9507, 2953.0684, 691.1575, 5450.378, 5340.911, 5490.894]
2025-09-16 15:51:26,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [830.0, 65.0, 1000.0, 1000.0, 1000.0, 533.0, 134.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:51:26,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 34 seconds)
2025-09-16 15:53:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:54:02,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5182.67822 ± 22.040
2025-09-16 15:54:02,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5165.7583, 5166.654, 5204.834, 5197.2896, 5201.1543, 5158.0107, 5152.1704, 5215.9556, 5164.841, 5200.1147]
2025-09-16 15:54:02,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:54:02,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 19 seconds)
2025-09-16 15:56:13,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:56:28,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4692.42969 ± 1457.088
2025-09-16 15:56:28,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5277.18, 5282.465, 5259.8794, 459.30588, 5240.8677, 5429.4956, 5282.12, 4089.699, 5267.7876, 5335.4937]
2025-09-16 15:56:28,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 84.0, 1000.0, 1000.0, 1000.0, 773.0, 1000.0, 1000.0]
2025-09-16 15:56:28,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1251 [DEBUG]: Training session finished
