2025-09-16 10:49:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.025-delay_3
2025-09-16 10:49:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.025-delay_3
2025-09-16 10:49:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'3': <latency_env.delayed_mdp.ConstantDelay object at 0x14587d254750>}
2025-09-16 10:49:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 10:49:23,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 10:49:23,072 baseline-bpql-noisepromille25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=427, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 10:49:23,072 baseline-bpql-noisepromille25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 10:49:24,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 10:49:24,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 10:51:10,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:51:11,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 327.01624 ± 14.990
2025-09-16 10:51:11,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [309.01968, 328.6614, 315.35303, 306.8809, 319.20905, 348.27515, 336.01904, 351.27777, 317.8524, 337.61398]
2025-09-16 10:51:11,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 60.0, 58.0, 56.0, 58.0, 63.0, 62.0, 65.0, 58.0, 62.0]
2025-09-16 10:51:11,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (327.02) for latency 3
2025-09-16 10:51:11,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 55 minutes, 49 seconds)
2025-09-16 10:53:07,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:53:08,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 348.80313 ± 53.996
2025-09-16 10:53:08,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [319.08997, 316.92734, 376.4134, 314.26044, 374.0772, 353.0085, 280.40247, 300.3356, 374.4941, 479.02234]
2025-09-16 10:53:08,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 61.0, 72.0, 61.0, 76.0, 70.0, 55.0, 60.0, 77.0, 106.0]
2025-09-16 10:53:08,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (348.80) for latency 3
2025-09-16 10:53:08,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 2 minutes, 43 seconds)
2025-09-16 10:55:07,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:55:08,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 362.42841 ± 41.296
2025-09-16 10:55:08,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [340.52658, 467.5652, 387.66794, 335.8587, 351.23383, 353.6765, 372.8327, 362.55576, 301.28763, 351.07913]
2025-09-16 10:55:08,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 93.0, 77.0, 64.0, 67.0, 68.0, 73.0, 70.0, 59.0, 69.0]
2025-09-16 10:55:08,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (362.43) for latency 3
2025-09-16 10:55:08,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 5 minutes, 14 seconds)
2025-09-16 10:57:06,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:57:07,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 419.72519 ± 90.663
2025-09-16 10:57:07,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [406.114, 422.829, 492.37567, 316.6098, 449.33017, 369.7954, 439.03574, 635.9617, 318.2861, 346.91455]
2025-09-16 10:57:07,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 94.0, 106.0, 69.0, 84.0, 82.0, 90.0, 120.0, 69.0, 67.0]
2025-09-16 10:57:07,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (419.73) for latency 3
2025-09-16 10:57:07,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 5 minutes, 5 seconds)
2025-09-16 10:59:03,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 10:59:05,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 436.57071 ± 75.923
2025-09-16 10:59:05,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [562.69556, 474.86282, 327.47443, 440.42996, 473.50238, 542.415, 373.95718, 437.98587, 334.91928, 397.4647]
2025-09-16 10:59:05,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 105.0, 72.0, 84.0, 88.0, 104.0, 76.0, 96.0, 74.0, 87.0]
2025-09-16 10:59:05,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (436.57) for latency 3
2025-09-16 10:59:05,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 3 minutes, 46 seconds)
2025-09-16 11:01:03,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:01:04,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 466.39484 ± 100.002
2025-09-16 11:01:04,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [373.08215, 501.5378, 689.91724, 356.29077, 394.90335, 401.09296, 554.3093, 515.639, 497.329, 379.84653]
2025-09-16 11:01:04,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 92.0, 134.0, 65.0, 80.0, 88.0, 106.0, 97.0, 97.0, 72.0]
2025-09-16 11:01:04,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (466.39) for latency 3
2025-09-16 11:01:04,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 5 minutes, 51 seconds)
2025-09-16 11:03:01,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:03:02,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 521.63684 ± 99.988
2025-09-16 11:03:02,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [462.9392, 475.26984, 457.79434, 504.02335, 359.60446, 465.79428, 610.3162, 684.6627, 512.7707, 683.19385]
2025-09-16 11:03:02,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 95.0, 82.0, 100.0, 82.0, 105.0, 115.0, 132.0, 111.0, 133.0]
2025-09-16 11:03:02,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (521.64) for latency 3
2025-09-16 11:03:02,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 4 minutes, 17 seconds)
2025-09-16 11:04:57,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:04:58,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 435.91943 ± 85.794
2025-09-16 11:04:58,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [359.47968, 596.9151, 466.2047, 307.69034, 467.8562, 350.77103, 374.2988, 545.349, 433.78842, 456.84122]
2025-09-16 11:04:58,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 127.0, 93.0, 67.0, 88.0, 68.0, 78.0, 100.0, 82.0, 97.0]
2025-09-16 11:04:58,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 1 minute, 4 seconds)
2025-09-16 11:06:54,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:06:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 469.52448 ± 109.931
2025-09-16 11:06:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [453.76477, 752.6639, 391.89377, 451.8788, 456.82797, 583.5578, 435.9624, 374.51334, 408.96298, 385.21942]
2025-09-16 11:06:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 163.0, 77.0, 91.0, 102.0, 125.0, 84.0, 83.0, 90.0, 86.0]
2025-09-16 11:06:55,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 58 minutes, 21 seconds)
2025-09-16 11:08:49,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:08:51,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 513.50708 ± 99.897
2025-09-16 11:08:51,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [649.8943, 622.97955, 397.48767, 683.7875, 468.18753, 489.56696, 390.3387, 489.15234, 422.39584, 521.28046]
2025-09-16 11:08:51,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 128.0, 87.0, 129.0, 103.0, 91.0, 87.0, 93.0, 92.0, 114.0]
2025-09-16 11:08:51,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 55 minutes, 48 seconds)
2025-09-16 11:10:45,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:10:46,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 634.72345 ± 193.212
2025-09-16 11:10:46,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [491.34988, 506.49454, 605.85077, 413.9086, 408.87094, 811.514, 751.60504, 552.7267, 1045.1184, 759.79584]
2025-09-16 11:10:46,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 97.0, 113.0, 83.0, 79.0, 160.0, 154.0, 106.0, 209.0, 137.0]
2025-09-16 11:10:46,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (634.72) for latency 3
2025-09-16 11:10:46,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 52 minutes, 42 seconds)
2025-09-16 11:12:40,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:12:42,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 658.99426 ± 160.603
2025-09-16 11:12:42,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [826.8039, 729.12085, 1061.8237, 570.8111, 517.6403, 570.3154, 558.5126, 575.71564, 601.83215, 577.3666]
2025-09-16 11:12:42,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 138.0, 205.0, 123.0, 111.0, 107.0, 121.0, 123.0, 124.0, 121.0]
2025-09-16 11:12:42,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (658.99) for latency 3
2025-09-16 11:12:42,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 50 minutes)
2025-09-16 11:14:38,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:14:39,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 649.68964 ± 105.258
2025-09-16 11:14:39,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [571.0941, 796.519, 559.54803, 890.2869, 562.63983, 671.16907, 611.77625, 651.33203, 572.06635, 610.46484]
2025-09-16 11:14:39,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 152.0, 109.0, 177.0, 108.0, 127.0, 115.0, 135.0, 113.0, 117.0]
2025-09-16 11:14:39,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 48 minutes, 28 seconds)
2025-09-16 11:16:34,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:16:36,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 729.85120 ± 75.322
2025-09-16 11:16:36,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [677.4865, 786.9187, 789.2765, 804.34326, 762.47943, 738.6234, 529.9369, 749.8125, 721.1963, 738.439]
2025-09-16 11:16:36,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 150.0, 154.0, 164.0, 153.0, 141.0, 100.0, 143.0, 137.0, 158.0]
2025-09-16 11:16:36,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (729.85) for latency 3
2025-09-16 11:16:36,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 46 minutes, 28 seconds)
2025-09-16 11:18:32,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:18:34,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 670.09863 ± 100.804
2025-09-16 11:18:34,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [621.7953, 685.65045, 597.82544, 671.0154, 858.55396, 453.7784, 647.3492, 684.5175, 761.91, 718.5901]
2025-09-16 11:18:34,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 150.0, 128.0, 131.0, 164.0, 100.0, 124.0, 133.0, 152.0, 135.0]
2025-09-16 11:18:34,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 45 minutes, 17 seconds)
2025-09-16 11:20:27,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:20:30,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 813.37390 ± 194.283
2025-09-16 11:20:30,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1157.1378, 736.32, 993.78064, 833.8797, 723.45264, 1032.534, 638.8544, 860.52057, 475.25232, 682.00616]
2025-09-16 11:20:30,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [223.0, 157.0, 210.0, 178.0, 155.0, 217.0, 121.0, 165.0, 103.0, 148.0]
2025-09-16 11:20:30,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (813.37) for latency 3
2025-09-16 11:20:30,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 43 minutes, 22 seconds)
2025-09-16 11:22:29,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:22:30,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 615.49207 ± 178.946
2025-09-16 11:22:30,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [461.39212, 513.29254, 544.8855, 637.89954, 632.0519, 427.5285, 649.92615, 1092.4283, 509.2953, 686.2208]
2025-09-16 11:22:30,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 106.0, 103.0, 123.0, 136.0, 83.0, 123.0, 213.0, 98.0, 144.0]
2025-09-16 11:22:30,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 42 minutes, 45 seconds)
2025-09-16 11:24:25,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:24:27,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 691.80096 ± 144.824
2025-09-16 11:24:27,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [551.7559, 580.10175, 689.2733, 604.53864, 628.382, 655.1797, 623.63837, 1081.5818, 772.934, 730.6238]
2025-09-16 11:24:27,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 111.0, 131.0, 113.0, 119.0, 123.0, 122.0, 225.0, 158.0, 141.0]
2025-09-16 11:24:27,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 34 seconds)
2025-09-16 11:26:21,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:26:22,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 533.34363 ± 161.225
2025-09-16 11:26:22,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [585.295, 399.93857, 567.6264, 736.06366, 866.1267, 332.69876, 383.14288, 436.41095, 594.88257, 431.2512]
2025-09-16 11:26:22,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 89.0, 115.0, 152.0, 166.0, 72.0, 85.0, 101.0, 118.0, 84.0]
2025-09-16 11:26:22,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 38 minutes, 26 seconds)
2025-09-16 11:28:17,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:28:19,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 650.98138 ± 116.582
2025-09-16 11:28:19,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [458.71112, 719.53033, 550.85114, 802.2262, 693.7241, 812.4214, 551.3888, 550.00244, 611.4485, 759.51044]
2025-09-16 11:28:19,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 158.0, 103.0, 172.0, 132.0, 159.0, 105.0, 118.0, 128.0, 152.0]
2025-09-16 11:28:19,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes)
2025-09-16 11:30:15,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:30:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 848.85272 ± 282.054
2025-09-16 11:30:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [599.4137, 523.44385, 1511.3574, 839.5879, 607.2445, 584.6976, 904.4713, 975.44324, 896.9041, 1045.9642]
2025-09-16 11:30:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 108.0, 310.0, 177.0, 114.0, 110.0, 176.0, 189.0, 179.0, 219.0]
2025-09-16 11:30:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (848.85) for latency 3
2025-09-16 11:30:17,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 34 minutes, 45 seconds)
2025-09-16 11:32:11,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:32:13,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 825.71643 ± 132.669
2025-09-16 11:32:13,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [661.71173, 1004.7171, 785.80896, 819.4454, 884.85345, 998.1526, 708.9931, 708.96173, 1007.1908, 677.32904]
2025-09-16 11:32:13,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 210.0, 163.0, 165.0, 167.0, 209.0, 145.0, 157.0, 196.0, 132.0]
2025-09-16 11:32:13,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 31 minutes, 33 seconds)
2025-09-16 11:34:07,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:34:10,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1024.76331 ± 250.440
2025-09-16 11:34:10,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1373.5469, 1267.4635, 893.3016, 978.32294, 907.97736, 791.06116, 1438.9272, 659.2844, 1123.1498, 814.5983]
2025-09-16 11:34:10,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [271.0, 254.0, 181.0, 185.0, 192.0, 155.0, 286.0, 135.0, 236.0, 157.0]
2025-09-16 11:34:10,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1024.76) for latency 3
2025-09-16 11:34:10,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 29 minutes, 39 seconds)
2025-09-16 11:36:03,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:36:05,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 790.21832 ± 285.644
2025-09-16 11:36:05,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [653.28064, 1070.0306, 596.8101, 574.0902, 700.3182, 590.39764, 624.63135, 637.74945, 1504.7129, 950.1617]
2025-09-16 11:36:05,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 221.0, 111.0, 108.0, 139.0, 124.0, 118.0, 119.0, 307.0, 188.0]
2025-09-16 11:36:05,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 27 minutes, 29 seconds)
2025-09-16 11:37:56,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:37:58,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1163.15942 ± 419.930
2025-09-16 11:37:58,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1282.6631, 729.79224, 919.15875, 716.8521, 973.1785, 2063.6836, 1672.6274, 1219.1415, 748.0309, 1306.4657]
2025-09-16 11:37:58,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [248.0, 140.0, 177.0, 138.0, 194.0, 431.0, 333.0, 233.0, 142.0, 267.0]
2025-09-16 11:37:58,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1163.16) for latency 3
2025-09-16 11:37:58,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 51 seconds)
2025-09-16 11:39:49,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:39:52,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1341.24902 ± 463.355
2025-09-16 11:39:52,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2024.666, 1588.0925, 831.4963, 983.305, 938.7981, 1234.3998, 1237.8534, 855.9072, 1487.9838, 2229.9888]
2025-09-16 11:39:52,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [393.0, 317.0, 161.0, 191.0, 182.0, 241.0, 238.0, 164.0, 287.0, 444.0]
2025-09-16 11:39:52,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1341.25) for latency 3
2025-09-16 11:39:52,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 21 minutes, 47 seconds)
2025-09-16 11:41:48,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:41:52,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1315.86279 ± 389.920
2025-09-16 11:41:52,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [993.0087, 1036.9857, 1354.1339, 1577.9054, 1082.8589, 2051.829, 953.7039, 925.0898, 1935.3779, 1247.7344]
2025-09-16 11:41:52,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 203.0, 280.0, 329.0, 236.0, 421.0, 181.0, 188.0, 384.0, 249.0]
2025-09-16 11:41:52,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 20 minutes, 48 seconds)
2025-09-16 11:43:37,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:43:41,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1609.98206 ± 391.943
2025-09-16 11:43:41,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1327.158, 1192.1624, 2184.645, 2474.5828, 1451.1305, 1483.661, 1664.3893, 1312.8264, 1661.4895, 1347.7767]
2025-09-16 11:43:41,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [261.0, 225.0, 439.0, 498.0, 283.0, 309.0, 340.0, 268.0, 334.0, 277.0]
2025-09-16 11:43:41,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1609.98) for latency 3
2025-09-16 11:43:41,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 17 minutes, 6 seconds)
2025-09-16 11:45:34,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:45:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1187.17847 ± 622.304
2025-09-16 11:45:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [755.6612, 1265.4752, 2389.7058, 660.34937, 782.4626, 954.2637, 693.7004, 654.9754, 2253.1616, 1462.0292]
2025-09-16 11:45:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 246.0, 482.0, 126.0, 151.0, 183.0, 135.0, 127.0, 449.0, 281.0]
2025-09-16 11:45:37,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 15 minutes, 25 seconds)
2025-09-16 11:47:26,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:47:29,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1165.24609 ± 424.047
2025-09-16 11:47:29,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [713.4771, 1963.006, 1155.9493, 953.4619, 525.4746, 1302.4221, 1725.28, 964.06384, 1429.8967, 919.4299]
2025-09-16 11:47:29,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 405.0, 233.0, 200.0, 100.0, 252.0, 338.0, 206.0, 281.0, 178.0]
2025-09-16 11:47:29,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 13 minutes, 7 seconds)
2025-09-16 11:49:24,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:49:28,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1556.29004 ± 782.660
2025-09-16 11:49:28,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1118.1462, 1139.3058, 2536.3716, 744.6435, 1337.55, 856.1431, 549.95306, 2791.514, 2485.6802, 2003.594]
2025-09-16 11:49:28,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [213.0, 227.0, 494.0, 145.0, 257.0, 160.0, 105.0, 536.0, 490.0, 402.0]
2025-09-16 11:49:28,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 12 minutes, 22 seconds)
2025-09-16 11:51:18,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:51:21,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1166.26270 ± 244.013
2025-09-16 11:51:21,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [930.45325, 910.096, 1502.9716, 1466.7699, 885.16956, 1470.2894, 1135.1445, 1158.4777, 889.8655, 1313.39]
2025-09-16 11:51:21,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 177.0, 314.0, 288.0, 186.0, 305.0, 240.0, 243.0, 182.0, 265.0]
2025-09-16 11:51:21,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 8 minutes, 55 seconds)
2025-09-16 11:53:14,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:53:18,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1746.99060 ± 1184.485
2025-09-16 11:53:18,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1764.7667, 1147.1068, 1133.7202, 1327.082, 1513.3522, 893.62036, 5141.862, 1948.6621, 870.3513, 1729.3822]
2025-09-16 11:53:18,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [343.0, 222.0, 216.0, 253.0, 292.0, 166.0, 1000.0, 373.0, 163.0, 326.0]
2025-09-16 11:53:18,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1746.99) for latency 3
2025-09-16 11:53:18,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 8 minutes, 47 seconds)
2025-09-16 11:55:13,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:55:17,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1340.15405 ± 499.914
2025-09-16 11:55:17,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1146.1768, 709.87573, 2124.553, 1090.9314, 822.13385, 1421.0593, 846.3358, 1277.4836, 2125.6357, 1837.3541]
2025-09-16 11:55:17,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 149.0, 456.0, 232.0, 156.0, 278.0, 162.0, 264.0, 410.0, 386.0]
2025-09-16 11:55:17,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 37 seconds)
2025-09-16 11:57:05,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:57:07,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1063.12231 ± 285.158
2025-09-16 11:57:07,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1033.0374, 582.474, 1268.8375, 1618.3971, 1208.1143, 840.31384, 1192.2133, 1099.54, 1100.075, 688.22064]
2025-09-16 11:57:07,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 120.0, 237.0, 308.0, 227.0, 158.0, 228.0, 207.0, 208.0, 137.0]
2025-09-16 11:57:07,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 5 minutes, 16 seconds)
2025-09-16 11:58:59,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:59:03,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1353.43701 ± 873.462
2025-09-16 11:59:03,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1209.2694, 1080.0375, 1009.2521, 603.5982, 748.11365, 1299.1373, 3776.911, 772.211, 1187.3271, 1848.5131]
2025-09-16 11:59:03,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 210.0, 205.0, 124.0, 143.0, 255.0, 719.0, 168.0, 238.0, 357.0]
2025-09-16 11:59:03,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes, 41 seconds)
2025-09-16 12:00:54,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:00:59,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1978.69604 ± 700.698
2025-09-16 12:00:59,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2008.0099, 2059.5847, 1137.4604, 1229.0763, 2261.7515, 1456.3323, 3542.099, 2750.5723, 1829.865, 1512.2096]
2025-09-16 12:00:59,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [385.0, 413.0, 222.0, 232.0, 452.0, 272.0, 681.0, 529.0, 346.0, 285.0]
2025-09-16 12:00:59,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1978.70) for latency 3
2025-09-16 12:00:59,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 1 minute, 28 seconds)
2025-09-16 12:02:54,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:02:59,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1993.55725 ± 950.034
2025-09-16 12:02:59,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1541.703, 1981.2316, 1204.4828, 1706.0759, 930.146, 2992.4902, 1336.982, 4346.608, 2035.1719, 1860.6812]
2025-09-16 12:02:59,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [297.0, 376.0, 231.0, 355.0, 178.0, 566.0, 254.0, 848.0, 386.0, 359.0]
2025-09-16 12:02:59,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1993.56) for latency 3
2025-09-16 12:02:59,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 10 seconds)
2025-09-16 12:04:53,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:04:59,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2173.20508 ± 1081.694
2025-09-16 12:04:59,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1961.3727, 1739.8627, 1321.1844, 1890.3877, 4904.9272, 3497.3503, 1433.8113, 1864.8641, 1400.3031, 1717.9867]
2025-09-16 12:04:59,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [372.0, 358.0, 252.0, 388.0, 1000.0, 706.0, 280.0, 383.0, 284.0, 336.0]
2025-09-16 12:04:59,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2173.21) for latency 3
2025-09-16 12:04:59,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 58 minutes, 20 seconds)
2025-09-16 12:06:48,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:06:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2843.52075 ± 1613.457
2025-09-16 12:06:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2291.6362, 1739.3481, 4910.639, 5025.772, 1940.2052, 5176.8203, 3618.8184, 830.2641, 867.82434, 2033.8796]
2025-09-16 12:06:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [438.0, 342.0, 942.0, 958.0, 370.0, 1000.0, 697.0, 189.0, 174.0, 398.0]
2025-09-16 12:06:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2843.52) for latency 3
2025-09-16 12:06:56,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 41 seconds)
2025-09-16 12:08:51,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:08:59,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3202.09326 ± 1561.907
2025-09-16 12:08:59,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1869.057, 4345.846, 1247.3721, 4054.3328, 5184.124, 4733.889, 1372.5658, 2278.3777, 1714.5717, 5220.794]
2025-09-16 12:08:59,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [355.0, 843.0, 236.0, 792.0, 1000.0, 889.0, 283.0, 447.0, 335.0, 1000.0]
2025-09-16 12:08:59,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (3202.09) for latency 3
2025-09-16 12:08:59,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 14 seconds)
2025-09-16 12:10:50,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:10:57,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2911.10156 ± 1496.553
2025-09-16 12:10:57,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2926.061, 5161.9116, 2132.9812, 4797.7407, 790.9951, 2190.477, 5130.751, 1846.6803, 1489.8759, 2643.544]
2025-09-16 12:10:57,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [567.0, 1000.0, 412.0, 942.0, 153.0, 415.0, 1000.0, 356.0, 283.0, 511.0]
2025-09-16 12:10:57,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 38 seconds)
2025-09-16 12:12:48,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:12:54,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2536.91211 ± 1251.598
2025-09-16 12:12:54,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1390.9177, 1941.2162, 5121.537, 1797.3683, 1726.6954, 1049.0729, 3592.4473, 4133.0884, 2050.4119, 2566.3652]
2025-09-16 12:12:54,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [266.0, 367.0, 1000.0, 349.0, 333.0, 196.0, 697.0, 811.0, 395.0, 506.0]
2025-09-16 12:12:54,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 52 minutes, 59 seconds)
2025-09-16 12:14:46,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:14:56,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4201.94189 ± 1285.808
2025-09-16 12:14:56,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5248.567, 5107.0874, 5245.6055, 5216.9507, 3163.3762, 2364.1775, 5196.5386, 3497.202, 5154.758, 1825.1581]
2025-09-16 12:14:56,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 595.0, 451.0, 1000.0, 662.0, 1000.0, 359.0]
2025-09-16 12:14:56,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4201.94) for latency 3
2025-09-16 12:14:56,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 29 seconds)
2025-09-16 12:16:49,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:16:55,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2582.03467 ± 1265.655
2025-09-16 12:16:55,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2188.3416, 3827.2026, 1864.628, 2556.5273, 5160.2593, 1339.9122, 2233.5242, 4055.0415, 1388.1779, 1206.733]
2025-09-16 12:16:55,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [426.0, 727.0, 357.0, 485.0, 1000.0, 255.0, 429.0, 772.0, 270.0, 247.0]
2025-09-16 12:16:55,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 57 seconds)
2025-09-16 12:18:53,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:19:03,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3670.03247 ± 1584.719
2025-09-16 12:19:03,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5197.856, 5195.06, 5099.207, 5225.953, 3423.6606, 5113.989, 1540.6931, 2070.6306, 1367.4163, 2465.8594]
2025-09-16 12:19:03,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 662.0, 1000.0, 299.0, 406.0, 263.0, 472.0]
2025-09-16 12:19:03,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 48 minutes, 43 seconds)
2025-09-16 12:20:57,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:21:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4498.55664 ± 880.074
2025-09-16 12:21:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3233.3884, 4964.6816, 4973.85, 5014.6333, 4994.5073, 5044.5264, 2401.3076, 4383.1714, 5006.6045, 4968.894]
2025-09-16 12:21:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [667.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 497.0, 879.0, 1000.0, 1000.0]
2025-09-16 12:21:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4498.56) for latency 3
2025-09-16 12:21:10,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 48 minutes, 12 seconds)
2025-09-16 12:22:58,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:23:10,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4274.26904 ± 1208.128
2025-09-16 12:23:10,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5127.2593, 1814.2295, 4996.4453, 5027.6997, 5012.6055, 5010.6807, 2558.3442, 5060.54, 5065.3228, 3069.5632]
2025-09-16 12:23:10,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 353.0, 1000.0, 1000.0, 1000.0, 1000.0, 529.0, 1000.0, 1000.0, 611.0]
2025-09-16 12:23:10,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 46 minutes, 46 seconds)
2025-09-16 12:25:03,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:25:15,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4711.74512 ± 984.581
2025-09-16 12:25:15,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5235.352, 5176.457, 5171.0522, 5181.649, 2445.612, 5191.9795, 3082.0588, 5216.687, 5175.5327, 5241.07]
2025-09-16 12:25:15,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 466.0, 1000.0, 604.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:25:15,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4711.75) for latency 3
2025-09-16 12:25:15,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 45 minutes, 14 seconds)
2025-09-16 12:27:14,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:27:27,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4995.29297 ± 509.114
2025-09-16 12:27:27,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5223.268, 3642.2283, 5237.3276, 5232.9893, 5234.2114, 5214.483, 4443.1343, 5260.2153, 5258.7197, 5206.355]
2025-09-16 12:27:27,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 700.0, 1000.0, 1000.0, 1000.0, 1000.0, 844.0, 996.0, 1000.0, 1000.0]
2025-09-16 12:27:27,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4995.29) for latency 3
2025-09-16 12:27:27,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 45 minutes, 18 seconds)
2025-09-16 12:29:16,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:29:29,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5148.48291 ± 36.532
2025-09-16 12:29:29,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5179.118, 5119.561, 5130.1255, 5151.177, 5107.6333, 5177.755, 5078.4907, 5173.1626, 5165.409, 5202.3955]
2025-09-16 12:29:29,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:29:29,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5148.48) for latency 3
2025-09-16 12:29:29,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 42 minutes, 20 seconds)
2025-09-16 12:31:21,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:31:35,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5071.74121 ± 161.301
2025-09-16 12:31:35,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5126.022, 5162.0054, 5154.169, 5131.6704, 5104.6963, 4602.405, 5139.565, 5119.9893, 5018.258, 5158.631]
2025-09-16 12:31:35,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 885.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:31:35,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 39 minutes, 59 seconds)
2025-09-16 12:33:26,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:33:39,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5138.93994 ± 15.270
2025-09-16 12:33:39,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5141.2407, 5128.174, 5151.9707, 5154.433, 5128.2935, 5104.5117, 5145.037, 5131.955, 5145.537, 5158.2417]
2025-09-16 12:33:39,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:33:39,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 38 minutes, 34 seconds)
2025-09-16 12:35:32,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:35:46,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4996.72852 ± 112.026
2025-09-16 12:35:46,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5062.564, 5075.8667, 5047.643, 4832.2627, 4729.522, 5008.3545, 5026.9604, 5068.1235, 5047.817, 5068.173]
2025-09-16 12:35:46,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 957.0, 930.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:35:46,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 36 minutes, 41 seconds)
2025-09-16 12:37:48,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:38:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4712.68848 ± 1298.321
2025-09-16 12:38:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5163.408, 5137.348, 5130.547, 5103.895, 5158.611, 818.11993, 5143.808, 5174.9272, 5145.093, 5151.1274]
2025-09-16 12:38:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 161.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:38:01,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 35 minutes, 6 seconds)
2025-09-16 12:39:45,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:39:50,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2036.30920 ± 651.870
2025-09-16 12:39:50,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1456.4265, 2227.667, 3120.4727, 889.71094, 2916.9937, 2077.8245, 2292.6807, 2288.0679, 1533.5099, 1559.7389]
2025-09-16 12:39:50,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [284.0, 420.0, 603.0, 180.0, 551.0, 403.0, 439.0, 431.0, 303.0, 300.0]
2025-09-16 12:39:50,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 31 minutes, 2 seconds)
2025-09-16 12:41:52,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:42:07,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5032.02051 ± 19.329
2025-09-16 12:42:07,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5036.4688, 5046.277, 5036.667, 5044.8794, 5050.6494, 5020.617, 5032.501, 5043.01, 4979.8613, 5029.2764]
2025-09-16 12:42:07,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:42:07,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 30 minutes, 38 seconds)
2025-09-16 12:43:56,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:44:10,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4620.19922 ± 680.269
2025-09-16 12:44:10,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [4916.4224, 4322.226, 4912.4473, 4891.8765, 2650.9185, 4928.4814, 4960.4316, 4790.201, 4935.959, 4893.023]
2025-09-16 12:44:10,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 887.0, 1000.0, 1000.0, 545.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:44:10,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 28 minutes, 17 seconds)
2025-09-16 12:46:11,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:46:22,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4055.26489 ± 1704.312
2025-09-16 12:46:22,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5165.254, 5137.837, 5160.6987, 5145.9194, 5154.8105, 1471.9791, 5059.3066, 652.2618, 2454.6348, 5149.945]
2025-09-16 12:46:22,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 291.0, 1000.0, 129.0, 475.0, 1000.0]
2025-09-16 12:46:22,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 26 minutes, 53 seconds)
2025-09-16 12:48:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:48:29,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5007.31104 ± 306.514
2025-09-16 12:48:29,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5097.3335, 4093.5374, 5047.0483, 5082.99, 5158.6255, 5160.4966, 5082.09, 5102.459, 5143.63, 5104.9]
2025-09-16 12:48:29,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 802.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:48:29,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 23 minutes, 40 seconds)
2025-09-16 12:50:26,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:50:40,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5126.43604 ± 20.000
2025-09-16 12:50:40,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5130.9775, 5125.341, 5108.352, 5130.5933, 5082.9316, 5143.162, 5116.9175, 5147.065, 5122.1353, 5156.8823]
2025-09-16 12:50:40,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:50:40,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 24 minutes, 25 seconds)
2025-09-16 12:52:27,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:52:39,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4616.88525 ± 1320.724
2025-09-16 12:52:39,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5149.761, 5185.7, 810.99274, 5184.8716, 5177.113, 5198.9297, 5170.4805, 5186.606, 3945.5115, 5158.888]
2025-09-16 12:52:39,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 173.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 754.0, 1000.0]
2025-09-16 12:52:39,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 20 minutes, 3 seconds)
2025-09-16 12:54:32,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:54:44,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4404.84375 ± 1061.520
2025-09-16 12:54:44,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3881.5435, 5204.5693, 5248.147, 2545.7646, 2321.472, 5035.838, 4958.153, 5337.4883, 4648.442, 4867.0215]
2025-09-16 12:54:44,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [728.0, 1000.0, 1000.0, 511.0, 437.0, 1000.0, 1000.0, 1000.0, 890.0, 932.0]
2025-09-16 12:54:44,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 18 minutes, 12 seconds)
2025-09-16 12:56:44,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:56:58,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5104.74316 ± 22.008
2025-09-16 12:56:58,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5092.957, 5110.4097, 5084.642, 5094.212, 5089.1484, 5114.988, 5164.4263, 5107.2065, 5090.2354, 5099.2065]
2025-09-16 12:56:58,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:56:58,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 16 minutes, 20 seconds)
2025-09-16 12:58:52,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:59:06,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5116.08301 ± 8.074
2025-09-16 12:59:06,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5109.3906, 5112.2134, 5104.982, 5106.3286, 5110.61, 5120.5054, 5120.229, 5120.4014, 5128.5195, 5127.647]
2025-09-16 12:59:06,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:59:06,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 14 minutes, 19 seconds)
2025-09-16 13:00:59,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:01:13,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5129.67578 ± 24.800
2025-09-16 13:01:13,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5133.1387, 5159.3813, 5129.2812, 5150.5425, 5160.126, 5093.764, 5093.2, 5106.6475, 5153.7573, 5116.916]
2025-09-16 13:01:13,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:01:13,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 11 minutes, 45 seconds)
2025-09-16 13:03:07,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:03:20,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4831.69824 ± 1016.465
2025-09-16 13:03:20,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5154.868, 5245.44, 5217.0854, 5130.494, 5189.743, 1787.2687, 5207.661, 5232.4883, 5092.245, 5059.6865]
2025-09-16 13:03:20,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 340.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:03:20,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 10 minutes, 31 seconds)
2025-09-16 13:05:09,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:05:23,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5076.60010 ± 15.797
2025-09-16 13:05:23,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5064.152, 5101.1978, 5086.4585, 5101.1904, 5068.36, 5066.1616, 5061.1914, 5080.688, 5053.0557, 5083.549]
2025-09-16 13:05:23,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:05:23,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 8 minutes, 8 seconds)
2025-09-16 13:07:19,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:07:34,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5093.43750 ± 20.240
2025-09-16 13:07:34,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5078.8613, 5064.3496, 5108.636, 5091.6963, 5065.4717, 5110.3594, 5128.291, 5112.393, 5080.7397, 5093.5767]
2025-09-16 13:07:34,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:07:34,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 5 minutes, 42 seconds)
2025-09-16 13:09:24,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:09:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4934.47119 ± 1025.082
2025-09-16 13:09:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5275.942, 5288.0303, 5258.4463, 5306.1396, 5299.7847, 5253.427, 1859.634, 5274.9785, 5257.846, 5270.4795]
2025-09-16 13:09:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 352.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:09:37,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 3 minutes, 5 seconds)
2025-09-16 13:11:31,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:11:46,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5038.07764 ± 11.973
2025-09-16 13:11:46,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5037.4062, 5043.8647, 5027.3726, 5043.2793, 5045.9756, 5041.9316, 5056.4814, 5027.368, 5012.03, 5045.068]
2025-09-16 13:11:46,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:11:46,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 1 minute, 10 seconds)
2025-09-16 13:13:39,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:13:52,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4788.77051 ± 679.116
2025-09-16 13:13:52,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5275.683, 4016.6511, 3914.1165, 5243.3506, 5250.858, 5174.1187, 3405.959, 5266.7275, 5252.536, 5087.7017]
2025-09-16 13:13:52,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 754.0, 745.0, 1000.0, 1000.0, 1000.0, 644.0, 1000.0, 1000.0, 969.0]
2025-09-16 13:13:52,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 58 minutes, 57 seconds)
2025-09-16 13:15:48,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:16:02,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5171.23145 ± 31.655
2025-09-16 13:16:02,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5211.588, 5163.6636, 5156.7666, 5239.4517, 5164.549, 5164.6865, 5161.0405, 5148.716, 5182.1377, 5119.7144]
2025-09-16 13:16:02,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:16:02,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5171.23) for latency 3
2025-09-16 13:16:02,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes, 32 seconds)
2025-09-16 13:17:58,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:18:12,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5117.41406 ± 15.958
2025-09-16 13:18:12,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5104.617, 5117.297, 5131.4253, 5153.7383, 5113.2207, 5126.107, 5122.0806, 5104.543, 5103.5537, 5097.554]
2025-09-16 13:18:12,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:18:12,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 55 minutes, 18 seconds)
2025-09-16 13:20:11,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:20:25,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5222.52246 ± 18.408
2025-09-16 13:20:25,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5215.0234, 5242.8867, 5204.1196, 5204.9854, 5205.623, 5240.56, 5209.7993, 5244.1387, 5250.7944, 5207.2944]
2025-09-16 13:20:25,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:20:25,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5222.52) for latency 3
2025-09-16 13:20:25,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 54 minutes)
2025-09-16 13:22:23,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:22:37,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5292.30762 ± 96.101
2025-09-16 13:22:37,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5219.087, 5337.1743, 5310.8687, 5354.2627, 5322.3394, 5028.949, 5302.193, 5336.25, 5370.1333, 5341.8223]
2025-09-16 13:22:37,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 942.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:22:37,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5292.31) for latency 3
2025-09-16 13:22:37,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 52 minutes, 4 seconds)
2025-09-16 13:24:28,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:24:42,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5146.45068 ± 12.345
2025-09-16 13:24:42,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5142.9688, 5162.1865, 5151.856, 5144.4917, 5156.979, 5119.775, 5152.475, 5159.2085, 5141.84, 5132.7256]
2025-09-16 13:24:42,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:24:42,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 48 seconds)
2025-09-16 13:26:38,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:26:52,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5179.05664 ± 10.135
2025-09-16 13:26:52,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5192.474, 5191.772, 5163.775, 5187.574, 5163.4507, 5178.544, 5175.699, 5180.8457, 5170.483, 5185.949]
2025-09-16 13:26:52,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:26:52,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 38 seconds)
2025-09-16 13:28:44,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:28:55,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3888.99072 ± 1308.233
2025-09-16 13:28:55,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5115.218, 3013.2283, 2739.2483, 5143.5547, 4142.033, 1549.9962, 5261.6523, 4380.187, 5231.9688, 2312.8186]
2025-09-16 13:28:55,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 603.0, 519.0, 1000.0, 785.0, 294.0, 1000.0, 831.0, 1000.0, 439.0]
2025-09-16 13:28:55,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 44 minutes, 59 seconds)
2025-09-16 13:30:50,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:31:03,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5272.33496 ± 14.032
2025-09-16 13:31:03,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5251.304, 5283.1377, 5284.539, 5290.377, 5284.6855, 5275.764, 5270.38, 5245.2744, 5271.4185, 5266.4688]
2025-09-16 13:31:03,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:31:03,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 42 minutes, 34 seconds)
2025-09-16 13:32:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:33:06,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5240.14941 ± 219.182
2025-09-16 13:33:06,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5324.481, 5380.429, 5353.437, 5064.4585, 4634.624, 5305.989, 5325.754, 5371.0347, 5293.243, 5348.0454]
2025-09-16 13:33:06,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 952.0, 864.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:33:06,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 49 seconds)
2025-09-16 13:34:57,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:35:08,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4248.84668 ± 923.930
2025-09-16 13:35:08,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5291.9717, 3289.6272, 3051.9727, 5347.8486, 3418.5679, 2996.9531, 4656.832, 4360.261, 4713.6235, 5360.8086]
2025-09-16 13:35:08,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 614.0, 572.0, 1000.0, 641.0, 562.0, 865.0, 815.0, 875.0, 1000.0]
2025-09-16 13:35:08,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 37 minutes, 33 seconds)
2025-09-16 13:36:59,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:37:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4002.75391 ± 1330.175
2025-09-16 13:37:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2900.808, 2268.1343, 5213.937, 3998.3054, 5224.2534, 5181.228, 2769.6782, 1916.5232, 5259.2183, 5295.4546]
2025-09-16 13:37:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [538.0, 428.0, 1000.0, 769.0, 1000.0, 1000.0, 516.0, 351.0, 1000.0, 1000.0]
2025-09-16 13:37:09,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 59 seconds)
2025-09-16 13:39:01,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:39:13,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4934.70508 ± 638.065
2025-09-16 13:39:13,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5219.264, 5283.0967, 5232.362, 5229.6284, 3781.1245, 5310.715, 5302.475, 3548.1848, 5237.544, 5202.656]
2025-09-16 13:39:13,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 706.0, 1000.0, 1000.0, 651.0, 1000.0, 1000.0]
2025-09-16 13:39:13,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes)
2025-09-16 13:41:02,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:41:16,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5214.20605 ± 67.666
2025-09-16 13:41:16,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5316.8765, 5197.1807, 5127.528, 5193.9707, 5366.844, 5181.485, 5186.551, 5182.295, 5187.003, 5202.326]
2025-09-16 13:41:16,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:41:16,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 38 seconds)
2025-09-16 13:43:11,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:43:23,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4757.69189 ± 1015.030
2025-09-16 13:43:23,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5315.082, 3878.5415, 4233.989, 5297.213, 5329.1035, 5343.1294, 5346.255, 5349.006, 5358.8687, 2125.7346]
2025-09-16 13:43:23,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 724.0, 797.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 404.0]
2025-09-16 13:43:23,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 48 seconds)
2025-09-16 13:45:16,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:45:30,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5095.73096 ± 18.575
2025-09-16 13:45:30,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5103.57, 5088.0376, 5085.623, 5108.8887, 5121.0913, 5115.1426, 5090.0063, 5112.061, 5063.3125, 5069.5737]
2025-09-16 13:45:30,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:45:30,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 57 seconds)
2025-09-16 13:47:22,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:47:35,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5235.93262 ± 9.709
2025-09-16 13:47:35,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5232.041, 5224.7207, 5231.25, 5219.8115, 5243.3765, 5241.4165, 5241.893, 5243.6177, 5228.4253, 5252.7715]
2025-09-16 13:47:35,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:47:35,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 2 seconds)
2025-09-16 13:49:29,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:49:43,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5131.75098 ± 7.994
2025-09-16 13:49:43,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5137.8096, 5124.512, 5131.507, 5139.991, 5142.2847, 5124.1045, 5139.906, 5117.448, 5134.164, 5125.7847]
2025-09-16 13:49:43,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:49:43,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 4 seconds)
2025-09-16 13:51:35,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:51:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5067.07959 ± 25.601
2025-09-16 13:51:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5094.359, 5076.469, 5104.393, 5052.9956, 5044.419, 5042.684, 5098.377, 5082.2397, 5032.195, 5042.662]
2025-09-16 13:51:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:51:49,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 5 seconds)
2025-09-16 13:53:40,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:53:54,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5163.29346 ± 17.174
2025-09-16 13:53:54,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5192.5845, 5155.231, 5156.876, 5196.041, 5148.169, 5158.8125, 5165.216, 5168.8647, 5146.1934, 5144.942]
2025-09-16 13:53:54,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:53:54,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 55 seconds)
2025-09-16 13:55:47,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:56:00,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4774.21387 ± 1372.728
2025-09-16 13:56:00,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5208.6772, 656.2932, 5216.745, 5253.2817, 5250.9834, 5238.773, 5209.1455, 5241.242, 5224.322, 5242.679]
2025-09-16 13:56:00,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 127.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:56:00,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 47 seconds)
2025-09-16 13:57:53,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:58:07,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5156.19189 ± 11.107
2025-09-16 13:58:07,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5164.2793, 5160.9224, 5162.332, 5149.943, 5152.922, 5139.4995, 5152.2925, 5167.553, 5138.2627, 5173.9146]
2025-09-16 13:58:07,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:58:07,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 44 seconds)
2025-09-16 14:00:04,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:00:17,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5268.88135 ± 23.743
2025-09-16 14:00:17,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5271.3013, 5286.54, 5292.449, 5206.8125, 5256.177, 5276.7646, 5283.072, 5285.6694, 5255.6597, 5274.362]
2025-09-16 14:00:17,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:00:17,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 41 seconds)
2025-09-16 14:02:16,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:02:30,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5118.96826 ± 29.106
2025-09-16 14:02:30,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5105.801, 5095.556, 5089.2646, 5124.0913, 5112.484, 5126.6143, 5115.185, 5118.0815, 5199.443, 5103.1626]
2025-09-16 14:02:30,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:02:30,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 41 seconds)
2025-09-16 14:04:28,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:04:42,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5177.44580 ± 23.168
2025-09-16 14:04:42,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5168.56, 5168.665, 5171.5146, 5149.8364, 5201.398, 5145.933, 5189.557, 5163.1777, 5225.814, 5190.0024]
2025-09-16 14:04:42,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:04:42,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 38 seconds)
2025-09-16 14:06:40,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:06:55,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4983.50195 ± 29.179
2025-09-16 14:06:55,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [4989.8833, 4910.278, 4972.1978, 4996.9385, 4992.0815, 4987.738, 4985.2847, 5006.6455, 5027.0645, 4966.911]
2025-09-16 14:06:55,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:06:55,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 33 seconds)
2025-09-16 14:08:54,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:09:07,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5246.78174 ± 16.997
2025-09-16 14:09:07,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5260.644, 5262.7734, 5238.3726, 5244.891, 5221.1353, 5233.392, 5220.9365, 5253.9746, 5272.292, 5259.4053]
2025-09-16 14:09:07,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:09:07,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 24 seconds)
2025-09-16 14:10:59,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:11:11,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4782.84424 ± 1389.213
2025-09-16 14:11:11,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5270.915, 5274.667, 5227.6733, 5192.899, 5272.913, 5305.934, 5162.666, 5239.001, 5264.8076, 616.96027]
2025-09-16 14:11:11,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 132.0]
2025-09-16 14:11:11,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 10 seconds)
2025-09-16 14:13:02,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:13:16,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5271.46631 ± 17.297
2025-09-16 14:13:16,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5280.4136, 5259.7505, 5267.668, 5269.0596, 5306.4478, 5275.1226, 5270.825, 5250.2817, 5244.7705, 5290.3228]
2025-09-16 14:13:16,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:13:16,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1251 [DEBUG]: Training session finished
