2025-09-16 15:16:15,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.200-delay_24
2025-09-16 15:16:15,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.200-delay_24
2025-09-16 15:16:15,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x14ad2f6ac810>}
2025-09-16 15:16:15,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 15:16:15,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 15:16:15,417 baseline-bpql-noisepromille200-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 15:16:15,417 baseline-bpql-noisepromille200-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 15:16:17,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 15:16:17,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 15:18:08,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:18:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 133.62741 ± 45.008
2025-09-16 15:18:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.625404, 140.0891, 95.32893, 161.73016, 107.43465, 107.00922, 127.46607, 243.04826, 90.08568, 168.45662]
2025-09-16 15:18:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 28.0, 19.0, 35.0, 21.0, 21.0, 26.0, 47.0, 18.0, 33.0]
2025-09-16 15:18:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (133.63) for latency 24
2025-09-16 15:18:08,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 3 minutes, 40 seconds)
2025-09-16 15:20:07,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:20:08,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 193.45370 ± 141.840
2025-09-16 15:20:08,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [554.76556, 101.97597, 95.999695, 144.5128, 129.21294, 140.26697, 96.69994, 356.79236, 205.8695, 108.4412]
2025-09-16 15:20:08,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 20.0, 19.0, 28.0, 25.0, 27.0, 19.0, 67.0, 40.0, 21.0]
2025-09-16 15:20:08,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (193.45) for latency 24
2025-09-16 15:20:08,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 8 minutes, 31 seconds)
2025-09-16 15:22:07,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:22:07,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 190.95322 ± 119.188
2025-09-16 15:22:07,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [124.45253, 436.1905, 102.19265, 105.116135, 121.83932, 95.38607, 276.48764, 133.10727, 137.27652, 377.48367]
2025-09-16 15:22:07,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 95.0, 20.0, 21.0, 24.0, 19.0, 54.0, 26.0, 27.0, 70.0]
2025-09-16 15:22:07,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 8 minutes, 54 seconds)
2025-09-16 15:24:07,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:24:08,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 199.46172 ± 99.512
2025-09-16 15:24:08,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.599, 383.54263, 174.53918, 143.56339, 155.54665, 116.656425, 347.3715, 95.34143, 168.54048, 301.9165]
2025-09-16 15:24:08,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 83.0, 34.0, 28.0, 30.0, 23.0, 64.0, 19.0, 33.0, 62.0]
2025-09-16 15:24:08,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (199.46) for latency 24
2025-09-16 15:24:08,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 8 minutes, 22 seconds)
2025-09-16 15:26:06,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:26:07,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 203.28133 ± 99.959
2025-09-16 15:26:07,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [331.9903, 102.54512, 165.99226, 113.272095, 160.21637, 139.2749, 113.023735, 367.5709, 186.20808, 352.7194]
2025-09-16 15:26:07,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 20.0, 32.0, 22.0, 32.0, 27.0, 22.0, 67.0, 36.0, 66.0]
2025-09-16 15:26:07,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (203.28) for latency 24
2025-09-16 15:26:07,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 6 minutes, 47 seconds)
2025-09-16 15:28:06,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:28:07,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 223.87688 ± 152.693
2025-09-16 15:28:07,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [96.01805, 261.378, 95.71055, 113.29607, 447.05298, 102.643715, 320.11313, 112.18927, 533.69403, 156.67278]
2025-09-16 15:28:07,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 49.0, 19.0, 22.0, 90.0, 20.0, 60.0, 22.0, 100.0, 30.0]
2025-09-16 15:28:07,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (223.88) for latency 24
2025-09-16 15:28:07,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 7 minutes, 39 seconds)
2025-09-16 15:30:06,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:30:07,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 203.63644 ± 121.654
2025-09-16 15:30:07,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [415.9462, 116.37001, 253.25885, 143.69197, 438.04538, 217.03957, 124.5706, 101.46565, 90.65087, 135.3254]
2025-09-16 15:30:07,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 23.0, 59.0, 28.0, 80.0, 46.0, 24.0, 20.0, 18.0, 26.0]
2025-09-16 15:30:07,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 5 minutes, 44 seconds)
2025-09-16 15:32:06,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:32:06,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 145.11952 ± 76.572
2025-09-16 15:32:06,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [123.48107, 113.913666, 369.06866, 111.56626, 144.34343, 135.1622, 95.80939, 147.98018, 96.479416, 113.391]
2025-09-16 15:32:06,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 80.0, 22.0, 28.0, 26.0, 19.0, 29.0, 19.0, 22.0]
2025-09-16 15:32:06,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 3 minutes, 39 seconds)
2025-09-16 15:34:06,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:34:06,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 168.90918 ± 110.287
2025-09-16 15:34:06,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.483795, 105.3236, 143.15536, 95.99095, 144.9875, 447.1688, 126.91659, 95.8616, 107.19565, 309.00793]
2025-09-16 15:34:06,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 28.0, 19.0, 28.0, 85.0, 25.0, 19.0, 21.0, 60.0]
2025-09-16 15:34:06,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 1 minute, 34 seconds)
2025-09-16 15:36:05,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:36:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 211.32832 ± 124.283
2025-09-16 15:36:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.43312, 95.378494, 451.56763, 108.50759, 366.72668, 184.1527, 106.999626, 282.55676, 96.797066, 307.1635]
2025-09-16 15:36:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 19.0, 85.0, 21.0, 68.0, 35.0, 21.0, 61.0, 19.0, 60.0]
2025-09-16 15:36:06,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 59 minutes, 48 seconds)
2025-09-16 15:38:06,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:38:06,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 224.47115 ± 125.840
2025-09-16 15:38:06,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [158.0068, 184.29701, 102.96329, 372.80786, 102.78371, 105.83939, 391.19522, 95.656654, 400.67352, 330.48813]
2025-09-16 15:38:06,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 35.0, 20.0, 77.0, 20.0, 21.0, 72.0, 19.0, 74.0, 65.0]
2025-09-16 15:38:06,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (224.47) for latency 24
2025-09-16 15:38:06,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 57 minutes, 50 seconds)
2025-09-16 15:40:06,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:40:07,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 238.91414 ± 164.735
2025-09-16 15:40:07,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [496.24893, 309.3226, 340.60712, 118.15735, 96.0767, 145.26991, 113.03299, 130.55615, 90.38547, 549.4841]
2025-09-16 15:40:07,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 56.0, 66.0, 23.0, 19.0, 29.0, 22.0, 26.0, 18.0, 111.0]
2025-09-16 15:40:07,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (238.91) for latency 24
2025-09-16 15:40:07,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 56 minutes, 5 seconds)
2025-09-16 15:42:05,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:42:06,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 210.07773 ± 115.143
2025-09-16 15:42:06,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [186.47046, 112.225525, 365.04404, 302.2198, 101.95738, 95.08107, 375.70898, 342.33688, 112.79523, 106.93805]
2025-09-16 15:42:06,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 22.0, 69.0, 59.0, 20.0, 19.0, 77.0, 69.0, 22.0, 21.0]
2025-09-16 15:42:06,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 54 seconds)
2025-09-16 15:44:06,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:44:07,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 208.61951 ± 150.181
2025-09-16 15:44:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [111.37701, 89.70109, 421.66943, 133.17531, 370.81863, 122.65359, 504.53983, 101.36614, 96.329384, 134.56476]
2025-09-16 15:44:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 82.0, 26.0, 71.0, 24.0, 104.0, 20.0, 19.0, 27.0]
2025-09-16 15:44:07,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 52 minutes, 6 seconds)
2025-09-16 15:46:03,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:46:04,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 167.44730 ± 110.241
2025-09-16 15:46:04,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [149.8133, 130.98085, 95.06122, 134.90813, 480.6774, 101.189224, 134.01608, 95.65014, 224.39305, 127.7835]
2025-09-16 15:46:04,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 26.0, 19.0, 26.0, 99.0, 20.0, 26.0, 19.0, 46.0, 25.0]
2025-09-16 15:46:04,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 49 minutes, 26 seconds)
2025-09-16 15:48:03,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:48:04,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 131.23242 ± 66.879
2025-09-16 15:48:04,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [330.3716, 118.410736, 119.51079, 96.372116, 110.77573, 119.97104, 106.85638, 109.020874, 104.69082, 96.34406]
2025-09-16 15:48:04,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 23.0, 23.0, 19.0, 22.0, 23.0, 21.0, 21.0, 21.0, 19.0]
2025-09-16 15:48:04,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 11 seconds)
2025-09-16 15:50:02,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:50:03,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 182.35065 ± 98.559
2025-09-16 15:50:03,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [388.04227, 306.07385, 133.07448, 245.69632, 101.72132, 89.93302, 228.32039, 95.79094, 116.20307, 118.65077]
2025-09-16 15:50:03,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 57.0, 26.0, 51.0, 20.0, 18.0, 49.0, 19.0, 23.0, 23.0]
2025-09-16 15:50:03,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 48 seconds)
2025-09-16 15:52:01,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:52:01,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 122.80107 ± 55.183
2025-09-16 15:52:01,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.90075, 114.54018, 105.52398, 108.305954, 285.08362, 118.693954, 107.8662, 84.267654, 117.659294, 90.16914]
2025-09-16 15:52:01,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 22.0, 21.0, 21.0, 54.0, 23.0, 21.0, 17.0, 23.0, 18.0]
2025-09-16 15:52:01,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 42 minutes, 41 seconds)
2025-09-16 15:53:58,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:53:59,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 235.58008 ± 185.352
2025-09-16 15:53:59,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [94.937164, 672.54706, 120.36622, 89.959076, 259.11917, 130.89198, 371.34875, 414.36896, 89.15452, 113.107635]
2025-09-16 15:53:59,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 143.0, 23.0, 18.0, 53.0, 25.0, 70.0, 77.0, 18.0, 22.0]
2025-09-16 15:53:59,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes, 2 seconds)
2025-09-16 15:55:57,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:55:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 184.38382 ± 105.569
2025-09-16 15:55:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [117.32003, 89.607864, 111.699234, 346.2801, 107.83791, 336.3774, 165.17125, 119.644356, 103.16791, 346.73206]
2025-09-16 15:55:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 22.0, 79.0, 21.0, 68.0, 32.0, 23.0, 20.0, 77.0]
2025-09-16 15:55:58,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 38 minutes, 22 seconds)
2025-09-16 15:57:57,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:57:57,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 198.84650 ± 116.249
2025-09-16 15:57:57,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [387.65256, 121.71676, 358.55078, 376.6159, 134.86104, 141.05505, 153.00934, 95.91763, 122.86755, 96.21837]
2025-09-16 15:57:57,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 24.0, 67.0, 70.0, 26.0, 28.0, 30.0, 19.0, 24.0, 19.0]
2025-09-16 15:57:57,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 36 minutes, 23 seconds)
2025-09-16 15:59:56,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:59:57,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 226.79520 ± 154.321
2025-09-16 15:59:57,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [352.06525, 526.5883, 101.67001, 102.34795, 155.70314, 442.55338, 113.65134, 102.30705, 280.73746, 90.327805]
2025-09-16 15:59:57,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 100.0, 20.0, 20.0, 30.0, 86.0, 22.0, 20.0, 55.0, 18.0]
2025-09-16 15:59:57,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 28 seconds)
2025-09-16 16:01:56,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:01:57,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 185.31413 ± 106.419
2025-09-16 16:01:57,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [133.1381, 315.68344, 144.95976, 139.26772, 102.86313, 346.29794, 112.63723, 96.10637, 90.315384, 371.87234]
2025-09-16 16:01:57,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 68.0, 28.0, 27.0, 20.0, 65.0, 22.0, 19.0, 18.0, 70.0]
2025-09-16 16:01:57,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 51 seconds)
2025-09-16 16:03:54,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:03:55,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 186.80711 ± 112.695
2025-09-16 16:03:55,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [96.06745, 108.55841, 184.19762, 145.59152, 385.6256, 424.10388, 174.50758, 122.4073, 118.820816, 108.191055]
2025-09-16 16:03:55,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 35.0, 28.0, 73.0, 82.0, 34.0, 24.0, 23.0, 21.0]
2025-09-16 16:03:55,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 50 seconds)
2025-09-16 16:05:53,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:05:54,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 190.07864 ± 124.049
2025-09-16 16:05:54,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [362.04767, 102.61798, 284.1533, 457.37906, 157.4891, 89.088745, 118.89458, 123.75033, 89.33741, 116.02826]
2025-09-16 16:05:54,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 20.0, 54.0, 102.0, 31.0, 18.0, 23.0, 24.0, 18.0, 23.0]
2025-09-16 16:05:54,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 1 second)
2025-09-16 16:07:53,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:07:54,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 217.51399 ± 146.358
2025-09-16 16:07:54,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [97.028114, 101.232254, 95.66213, 386.25314, 90.3225, 471.35913, 112.34838, 332.33392, 379.49094, 109.109276]
2025-09-16 16:07:54,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 19.0, 74.0, 18.0, 89.0, 22.0, 61.0, 70.0, 21.0]
2025-09-16 16:07:54,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 27 minutes, 2 seconds)
2025-09-16 16:09:52,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:09:52,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 169.36453 ± 109.130
2025-09-16 16:09:52,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [120.608795, 90.39756, 349.69156, 116.38265, 371.60028, 273.39218, 90.06305, 89.260826, 96.52514, 95.72322]
2025-09-16 16:09:52,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 70.0, 23.0, 75.0, 52.0, 18.0, 18.0, 19.0, 19.0]
2025-09-16 16:09:52,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 54 seconds)
2025-09-16 16:11:51,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:11:52,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 165.09290 ± 82.188
2025-09-16 16:11:52,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.52565, 342.12668, 134.09583, 107.83275, 161.73799, 159.14453, 119.70697, 124.44692, 90.29225, 304.01935]
2025-09-16 16:11:52,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 66.0, 26.0, 21.0, 32.0, 31.0, 23.0, 24.0, 18.0, 62.0]
2025-09-16 16:11:52,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 49 seconds)
2025-09-16 16:13:50,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:13:50,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 226.69199 ± 120.150
2025-09-16 16:13:50,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [362.39035, 96.10101, 393.85098, 376.5689, 162.76256, 149.15085, 354.49606, 122.899925, 111.472984, 137.22633]
2025-09-16 16:13:50,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 19.0, 75.0, 72.0, 31.0, 30.0, 67.0, 24.0, 22.0, 27.0]
2025-09-16 16:13:50,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 57 seconds)
2025-09-16 16:15:49,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:15:50,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 170.63051 ± 130.311
2025-09-16 16:15:50,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.13715, 105.91237, 508.92667, 124.85907, 96.01335, 89.14652, 95.84846, 130.8706, 323.84036, 123.7505]
2025-09-16 16:15:50,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 99.0, 24.0, 19.0, 18.0, 19.0, 26.0, 63.0, 24.0]
2025-09-16 16:15:50,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 18 minutes, 58 seconds)
2025-09-16 16:17:49,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:17:49,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 183.29378 ± 119.446
2025-09-16 16:17:49,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.70511, 116.42816, 96.3389, 340.69812, 107.71719, 103.05287, 327.50677, 416.7598, 139.26945, 95.46129]
2025-09-16 16:17:49,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 24.0, 19.0, 61.0, 21.0, 20.0, 60.0, 80.0, 27.0, 19.0]
2025-09-16 16:17:49,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 59 seconds)
2025-09-16 16:19:48,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:19:48,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 179.00925 ± 113.486
2025-09-16 16:19:48,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.11473, 424.36148, 102.89394, 89.85912, 129.60445, 146.19618, 369.15344, 178.88487, 96.12602, 163.89821]
2025-09-16 16:19:48,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 82.0, 20.0, 18.0, 25.0, 28.0, 67.0, 34.0, 19.0, 32.0]
2025-09-16 16:19:48,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 6 seconds)
2025-09-16 16:21:47,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:21:47,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 196.71924 ± 118.773
2025-09-16 16:21:47,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [410.71725, 123.6418, 134.91716, 96.056854, 371.5487, 341.53897, 151.81912, 111.9009, 89.52916, 135.5224]
2025-09-16 16:21:47,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 24.0, 26.0, 19.0, 82.0, 64.0, 30.0, 22.0, 18.0, 26.0]
2025-09-16 16:21:47,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 2 seconds)
2025-09-16 16:23:46,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:23:46,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 207.84402 ± 125.642
2025-09-16 16:23:46,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.96364, 186.58391, 114.1156, 368.394, 177.90974, 121.98989, 110.12477, 322.73798, 101.93597, 471.68484]
2025-09-16 16:23:46,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 36.0, 22.0, 71.0, 34.0, 24.0, 22.0, 62.0, 20.0, 89.0]
2025-09-16 16:23:46,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 5 seconds)
2025-09-16 16:25:45,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:25:45,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 177.03143 ± 123.355
2025-09-16 16:25:45,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.338524, 472.41562, 356.52087, 101.752365, 112.4592, 100.912056, 178.25311, 113.12269, 102.3671, 119.17287]
2025-09-16 16:25:45,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 93.0, 66.0, 20.0, 22.0, 20.0, 35.0, 22.0, 20.0, 23.0]
2025-09-16 16:25:45,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 9 minutes, 6 seconds)
2025-09-16 16:27:44,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:27:44,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 123.03623 ± 29.870
2025-09-16 16:27:44,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [97.19957, 186.86572, 102.709496, 99.13018, 112.42415, 90.87441, 130.62457, 112.82423, 132.21436, 165.49554]
2025-09-16 16:27:44,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 35.0, 20.0, 20.0, 22.0, 18.0, 26.0, 22.0, 26.0, 32.0]
2025-09-16 16:27:44,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 6 minutes, 59 seconds)
2025-09-16 16:29:43,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:29:43,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 183.09952 ± 93.690
2025-09-16 16:29:43,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [136.13666, 146.99986, 118.48941, 84.28058, 155.03088, 134.9389, 384.94098, 342.24542, 149.41962, 178.51268]
2025-09-16 16:29:43,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 28.0, 23.0, 17.0, 30.0, 26.0, 71.0, 64.0, 29.0, 35.0]
2025-09-16 16:29:43,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 4 minutes, 55 seconds)
2025-09-16 16:31:42,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:31:42,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 189.57632 ± 128.842
2025-09-16 16:31:42,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [129.80084, 115.181915, 106.33277, 130.52858, 434.8538, 456.33124, 106.22781, 156.84338, 128.82431, 130.83853]
2025-09-16 16:31:42,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 23.0, 21.0, 25.0, 84.0, 85.0, 21.0, 30.0, 25.0, 25.0]
2025-09-16 16:31:42,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 2 minutes, 58 seconds)
2025-09-16 16:33:41,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:33:41,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 191.76627 ± 147.887
2025-09-16 16:33:41,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [399.926, 152.74716, 159.46588, 95.13194, 551.68835, 123.11074, 84.21712, 142.0017, 106.1033, 103.27068]
2025-09-16 16:33:41,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 30.0, 31.0, 19.0, 117.0, 24.0, 17.0, 28.0, 21.0, 20.0]
2025-09-16 16:33:41,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute)
2025-09-16 16:35:40,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:35:40,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 138.15570 ± 48.691
2025-09-16 16:35:40,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [96.29554, 269.80627, 107.09959, 101.98231, 129.7261, 143.79076, 107.31316, 125.572556, 170.70284, 129.26784]
2025-09-16 16:35:40,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 56.0, 21.0, 20.0, 25.0, 28.0, 21.0, 24.0, 33.0, 26.0]
2025-09-16 16:35:40,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 58 minutes, 56 seconds)
2025-09-16 16:37:39,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:37:40,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 226.20500 ± 137.141
2025-09-16 16:37:40,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [389.5069, 176.80865, 91.01236, 96.05624, 515.1199, 125.66362, 352.51883, 167.12958, 225.60217, 122.63195]
2025-09-16 16:37:40,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 34.0, 18.0, 19.0, 96.0, 24.0, 68.0, 32.0, 44.0, 24.0]
2025-09-16 16:37:40,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 6 seconds)
2025-09-16 16:39:39,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:39:40,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 241.33086 ± 154.746
2025-09-16 16:39:40,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [484.312, 122.25914, 286.9291, 101.54944, 107.10403, 156.7825, 432.51318, 473.54306, 96.49841, 151.81749]
2025-09-16 16:39:40,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 24.0, 63.0, 20.0, 21.0, 31.0, 82.0, 89.0, 19.0, 30.0]
2025-09-16 16:39:40,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (241.33) for latency 24
2025-09-16 16:39:40,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 19 seconds)
2025-09-16 16:41:38,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:41:39,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 169.84384 ± 94.675
2025-09-16 16:41:39,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [155.46072, 362.3749, 200.10736, 116.143814, 335.32336, 95.90793, 127.50211, 102.58776, 96.24873, 106.781784]
2025-09-16 16:41:39,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 72.0, 38.0, 23.0, 64.0, 19.0, 26.0, 20.0, 19.0, 21.0]
2025-09-16 16:41:39,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 16 seconds)
2025-09-16 16:43:37,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:43:38,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 176.99846 ± 86.292
2025-09-16 16:43:38,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [124.72431, 294.99118, 138.50891, 323.3764, 140.6557, 101.126434, 111.9708, 117.54324, 303.59286, 113.4948]
2025-09-16 16:43:38,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 57.0, 27.0, 59.0, 27.0, 20.0, 22.0, 23.0, 63.0, 22.0]
2025-09-16 16:43:38,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 20 seconds)
2025-09-16 16:45:37,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:45:37,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 193.52914 ± 132.347
2025-09-16 16:45:37,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.79961, 150.40964, 188.80676, 443.9252, 106.38267, 456.57056, 180.28041, 100.95711, 105.66657, 112.49287]
2025-09-16 16:45:37,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 29.0, 36.0, 84.0, 21.0, 85.0, 35.0, 20.0, 21.0, 22.0]
2025-09-16 16:45:37,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 29 seconds)
2025-09-16 16:47:36,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:47:37,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 210.83806 ± 155.744
2025-09-16 16:47:37,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [127.87856, 101.39206, 107.27446, 106.457954, 294.3702, 140.71295, 601.3456, 359.14227, 89.27195, 180.5346]
2025-09-16 16:47:37,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 21.0, 21.0, 57.0, 27.0, 131.0, 67.0, 18.0, 35.0]
2025-09-16 16:47:37,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 27 seconds)
2025-09-16 16:49:36,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:49:36,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 210.17569 ± 135.764
2025-09-16 16:49:36,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [129.6452, 172.58144, 461.60046, 292.31964, 456.95514, 151.96474, 136.3532, 115.62461, 89.3459, 95.3664]
2025-09-16 16:49:36,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 34.0, 86.0, 55.0, 87.0, 29.0, 27.0, 23.0, 18.0, 19.0]
2025-09-16 16:49:36,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 24 seconds)
2025-09-16 16:51:35,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:51:35,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 111.60298 ± 13.195
2025-09-16 16:51:35,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [116.64029, 119.052444, 125.80334, 102.81018, 135.83179, 101.99624, 90.684074, 101.38968, 101.99375, 119.82788]
2025-09-16 16:51:35,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 25.0, 20.0, 27.0, 20.0, 18.0, 20.0, 20.0, 23.0]
2025-09-16 16:51:35,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 24 seconds)
2025-09-16 16:53:34,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:53:34,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 187.32770 ± 137.411
2025-09-16 16:53:34,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [106.27819, 134.7942, 95.89013, 426.66568, 102.28536, 491.04443, 111.7412, 118.695755, 141.21109, 144.67104]
2025-09-16 16:53:34,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 26.0, 19.0, 91.0, 20.0, 93.0, 22.0, 23.0, 27.0, 28.0]
2025-09-16 16:53:34,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 24 seconds)
2025-09-16 16:55:33,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:55:33,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 130.54849 ± 27.881
2025-09-16 16:55:33,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [112.96319, 132.57549, 156.65675, 133.64258, 187.15012, 102.28071, 128.1262, 107.88766, 154.53702, 89.665146]
2025-09-16 16:55:33,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 26.0, 30.0, 26.0, 36.0, 20.0, 25.0, 21.0, 32.0, 18.0]
2025-09-16 16:55:33,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 21 seconds)
2025-09-16 16:57:32,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:57:32,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 182.09688 ± 97.892
2025-09-16 16:57:32,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.63743, 89.25597, 317.20697, 114.3656, 366.0595, 111.72386, 161.60158, 130.6585, 117.52649, 298.9328]
2025-09-16 16:57:32,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 63.0, 22.0, 68.0, 22.0, 31.0, 25.0, 23.0, 54.0]
2025-09-16 16:57:32,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 16 seconds)
2025-09-16 16:59:32,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:59:33,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 267.66025 ± 167.681
2025-09-16 16:59:33,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [185.75331, 116.59192, 557.52185, 452.5542, 112.782074, 96.99767, 403.00632, 161.5372, 138.00024, 451.85754]
2025-09-16 16:59:33,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 23.0, 120.0, 98.0, 23.0, 19.0, 90.0, 31.0, 27.0, 94.0]
2025-09-16 16:59:33,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (267.66) for latency 24
2025-09-16 16:59:33,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 23 seconds)
2025-09-16 17:01:31,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:01:32,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 193.92673 ± 133.480
2025-09-16 17:01:32,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [154.1534, 428.16425, 476.68332, 101.19667, 129.77863, 207.3341, 95.26529, 119.484665, 94.95509, 132.25182]
2025-09-16 17:01:32,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 83.0, 92.0, 20.0, 26.0, 40.0, 19.0, 23.0, 19.0, 26.0]
2025-09-16 17:01:32,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 28 seconds)
2025-09-16 17:03:30,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:03:30,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 166.03465 ± 97.427
2025-09-16 17:03:30,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [90.04015, 112.58586, 158.62521, 95.74837, 103.15409, 103.034134, 369.84067, 127.7003, 340.16196, 159.45576]
2025-09-16 17:03:30,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 22.0, 30.0, 19.0, 20.0, 20.0, 75.0, 25.0, 62.0, 31.0]
2025-09-16 17:03:30,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 21 seconds)
2025-09-16 17:05:28,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:05:29,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 173.83365 ± 93.407
2025-09-16 17:05:29,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [275.82217, 84.380394, 143.2422, 96.26303, 96.30443, 132.92435, 371.85504, 128.88023, 129.78621, 278.87833]
2025-09-16 17:05:29,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 17.0, 28.0, 19.0, 19.0, 26.0, 69.0, 25.0, 25.0, 54.0]
2025-09-16 17:05:29,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 20 seconds)
2025-09-16 17:07:27,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:07:27,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 148.58743 ± 87.093
2025-09-16 17:07:27,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.291145, 118.19717, 133.9189, 113.11308, 154.38135, 105.656586, 102.123474, 402.4204, 157.73076, 96.04154]
2025-09-16 17:07:27,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 23.0, 26.0, 22.0, 30.0, 21.0, 20.0, 74.0, 31.0, 19.0]
2025-09-16 17:07:27,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 27 minutes, 16 seconds)
2025-09-16 17:09:26,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:09:26,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 145.55099 ± 50.540
2025-09-16 17:09:26,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.76236, 156.38515, 158.46178, 105.42973, 149.31958, 285.24753, 118.45964, 144.34245, 121.96648, 108.1354]
2025-09-16 17:09:26,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 30.0, 31.0, 21.0, 29.0, 55.0, 23.0, 28.0, 24.0, 21.0]
2025-09-16 17:09:26,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 6 seconds)
2025-09-16 17:11:26,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:11:27,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 203.63943 ± 106.174
2025-09-16 17:11:27,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [330.79248, 113.731255, 116.40765, 96.600296, 119.4801, 143.13237, 277.79013, 124.97659, 349.66696, 363.81638]
2025-09-16 17:11:27,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 22.0, 23.0, 19.0, 23.0, 27.0, 53.0, 24.0, 73.0, 77.0]
2025-09-16 17:11:27,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 16 seconds)
2025-09-16 17:13:25,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:13:26,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 206.09981 ± 192.804
2025-09-16 17:13:26,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [130.12848, 107.764595, 736.25793, 101.26228, 128.2073, 152.60571, 107.02895, 130.27782, 371.95856, 95.506256]
2025-09-16 17:13:26,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 143.0, 20.0, 25.0, 30.0, 21.0, 25.0, 72.0, 19.0]
2025-09-16 17:13:26,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 23 seconds)
2025-09-16 17:15:23,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:15:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 139.91104 ± 74.416
2025-09-16 17:15:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.51095, 95.6266, 111.23362, 96.60748, 356.60312, 151.87276, 108.1011, 136.1824, 134.47086, 106.901634]
2025-09-16 17:15:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 22.0, 19.0, 70.0, 29.0, 21.0, 26.0, 26.0, 21.0]
2025-09-16 17:15:24,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 15 seconds)
2025-09-16 17:17:22,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:17:23,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 127.91560 ± 24.224
2025-09-16 17:17:23,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [177.09941, 125.01095, 121.08684, 101.06808, 90.1684, 110.51241, 153.73628, 137.31746, 140.62358, 122.53259]
2025-09-16 17:17:23,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 24.0, 24.0, 20.0, 18.0, 22.0, 30.0, 27.0, 27.0, 24.0]
2025-09-16 17:17:23,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 22 seconds)
2025-09-16 17:19:23,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:19:23,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 204.42952 ± 125.824
2025-09-16 17:19:23,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [172.58942, 346.10828, 102.78746, 107.49081, 111.117226, 128.35306, 461.65454, 96.29059, 356.93933, 160.96454]
2025-09-16 17:19:23,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 77.0, 20.0, 21.0, 22.0, 25.0, 85.0, 19.0, 68.0, 31.0]
2025-09-16 17:19:23,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 36 seconds)
2025-09-16 17:21:22,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:21:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 174.57266 ± 106.785
2025-09-16 17:21:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.18379, 130.64558, 211.52197, 95.233444, 393.42822, 100.28183, 151.76732, 107.24056, 107.11399, 359.30988]
2025-09-16 17:21:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 26.0, 42.0, 19.0, 86.0, 20.0, 29.0, 21.0, 21.0, 66.0]
2025-09-16 17:21:22,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 26 seconds)
2025-09-16 17:23:21,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:23:21,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 155.08882 ± 79.093
2025-09-16 17:23:21,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [90.47221, 124.0514, 154.22542, 319.8499, 90.36978, 291.6617, 141.06145, 101.62107, 148.28596, 89.28916]
2025-09-16 17:23:21,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 24.0, 30.0, 61.0, 18.0, 56.0, 27.0, 20.0, 28.0, 18.0]
2025-09-16 17:23:21,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 28 seconds)
2025-09-16 17:25:20,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:25:20,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 148.28580 ± 68.908
2025-09-16 17:25:20,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [125.97022, 132.35724, 101.876175, 146.00618, 90.76622, 105.28896, 290.99695, 113.64836, 102.24672, 273.70105]
2025-09-16 17:25:20,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 26.0, 20.0, 29.0, 18.0, 21.0, 54.0, 22.0, 20.0, 50.0]
2025-09-16 17:25:20,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 37 seconds)
2025-09-16 17:27:18,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:27:19,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 175.43913 ± 142.340
2025-09-16 17:27:19,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [132.1918, 107.6517, 96.60061, 105.582306, 97.308754, 95.6251, 112.07874, 131.76031, 558.182, 317.41006]
2025-09-16 17:27:19,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 21.0, 19.0, 21.0, 19.0, 19.0, 22.0, 26.0, 113.0, 70.0]
2025-09-16 17:27:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 35 seconds)
2025-09-16 17:29:18,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:29:19,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 189.89218 ± 152.121
2025-09-16 17:29:19,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.18111, 96.33696, 120.565346, 107.983574, 316.3412, 604.7282, 101.686966, 131.88483, 118.739136, 193.47446]
2025-09-16 17:29:19,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 19.0, 23.0, 21.0, 66.0, 115.0, 20.0, 26.0, 23.0, 38.0]
2025-09-16 17:29:19,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 30 seconds)
2025-09-16 17:31:18,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:31:18,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 196.97691 ± 141.082
2025-09-16 17:31:18,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [117.29544, 113.147964, 100.53735, 89.621666, 306.9746, 134.99425, 362.92227, 96.3465, 522.6873, 125.241806]
2025-09-16 17:31:18,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 22.0, 20.0, 18.0, 67.0, 26.0, 68.0, 19.0, 105.0, 24.0]
2025-09-16 17:31:18,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 36 seconds)
2025-09-16 17:33:17,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:33:17,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 145.22723 ± 65.568
2025-09-16 17:33:17,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [325.95966, 129.41743, 110.307434, 136.42781, 117.620094, 129.31296, 111.168335, 192.82999, 109.057755, 90.17092]
2025-09-16 17:33:17,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 25.0, 22.0, 27.0, 23.0, 25.0, 22.0, 37.0, 21.0, 18.0]
2025-09-16 17:33:17,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 34 seconds)
2025-09-16 17:35:16,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:35:17,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 202.27739 ± 106.309
2025-09-16 17:35:17,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [418.5574, 96.412094, 110.844406, 132.56754, 115.21463, 306.90244, 142.19699, 339.1183, 180.54964, 180.41057]
2025-09-16 17:35:17,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 19.0, 22.0, 26.0, 22.0, 59.0, 28.0, 65.0, 34.0, 35.0]
2025-09-16 17:35:17,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 38 seconds)
2025-09-16 17:37:15,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:37:15,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 168.33401 ± 109.691
2025-09-16 17:37:15,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [131.5379, 408.14273, 96.74642, 107.872986, 363.78955, 128.44336, 117.411934, 110.22464, 107.05685, 112.11376]
2025-09-16 17:37:15,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 78.0, 19.0, 21.0, 72.0, 25.0, 23.0, 22.0, 21.0, 22.0]
2025-09-16 17:37:16,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 39 seconds)
2025-09-16 17:39:15,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:39:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 225.26083 ± 139.696
2025-09-16 17:39:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [154.63135, 97.15085, 275.3067, 118.39545, 425.64197, 120.07857, 453.38992, 394.1445, 111.55222, 102.31683]
2025-09-16 17:39:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 55.0, 23.0, 84.0, 23.0, 103.0, 83.0, 22.0, 20.0]
2025-09-16 17:39:16,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 42 seconds)
2025-09-16 17:41:14,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:41:15,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 231.05075 ± 107.635
2025-09-16 17:41:15,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [154.0452, 108.40767, 132.52837, 127.53025, 331.45282, 288.44226, 355.70462, 106.8633, 368.46292, 337.07013]
2025-09-16 17:41:15,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 21.0, 26.0, 25.0, 62.0, 55.0, 64.0, 21.0, 67.0, 74.0]
2025-09-16 17:41:15,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 41 seconds)
2025-09-16 17:43:15,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:43:15,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 145.22099 ± 72.538
2025-09-16 17:43:15,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [108.69018, 95.59345, 107.88573, 122.122314, 130.35294, 107.86383, 355.84225, 138.46584, 163.39658, 121.996666]
2025-09-16 17:43:15,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 19.0, 21.0, 24.0, 25.0, 21.0, 66.0, 27.0, 31.0, 24.0]
2025-09-16 17:43:15,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 48 seconds)
2025-09-16 17:45:12,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:45:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 198.23952 ± 127.092
2025-09-16 17:45:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [354.34238, 400.11234, 405.0254, 197.21349, 117.507645, 95.55212, 101.860916, 90.64374, 119.19203, 100.94509]
2025-09-16 17:45:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 74.0, 75.0, 38.0, 23.0, 19.0, 20.0, 18.0, 23.0, 20.0]
2025-09-16 17:45:13,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 41 seconds)
2025-09-16 17:47:12,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:47:12,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 128.19171 ± 68.230
2025-09-16 17:47:12,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [329.1167, 94.96131, 90.010605, 112.268005, 118.03396, 101.16523, 136.20062, 96.33429, 95.85858, 107.96792]
2025-09-16 17:47:12,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 19.0, 18.0, 22.0, 23.0, 20.0, 26.0, 19.0, 19.0, 21.0]
2025-09-16 17:47:12,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 45 seconds)
2025-09-16 17:49:12,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:49:12,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 118.17165 ± 14.737
2025-09-16 17:49:12,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [108.777, 151.9948, 107.887314, 125.83099, 113.911385, 113.519066, 107.53379, 114.28729, 101.58301, 136.3919]
2025-09-16 17:49:12,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 29.0, 21.0, 24.0, 22.0, 22.0, 21.0, 22.0, 20.0, 26.0]
2025-09-16 17:49:12,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 43 seconds)
2025-09-16 17:51:12,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:51:13,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 124.47162 ± 28.188
2025-09-16 17:51:13,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.52558, 90.49036, 182.73885, 129.77916, 130.57292, 114.420876, 102.752045, 128.52014, 113.915436, 162.00084]
2025-09-16 17:51:13,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 35.0, 26.0, 25.0, 22.0, 20.0, 25.0, 23.0, 31.0]
2025-09-16 17:51:13,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 49 seconds)
2025-09-16 17:53:11,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:53:11,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 249.03305 ± 151.084
2025-09-16 17:53:11,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.36317, 446.07358, 404.32144, 125.269936, 379.19107, 102.57255, 89.74483, 391.9555, 371.7461, 90.0926]
2025-09-16 17:53:11,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 82.0, 83.0, 25.0, 86.0, 20.0, 18.0, 77.0, 70.0, 18.0]
2025-09-16 17:53:11,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 45 seconds)
2025-09-16 17:55:10,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:55:11,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 155.33380 ± 92.681
2025-09-16 17:55:11,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [123.14834, 180.16914, 89.42853, 138.65547, 140.38824, 420.6189, 156.84744, 113.08155, 89.29729, 101.703156]
2025-09-16 17:55:11,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 34.0, 18.0, 27.0, 27.0, 78.0, 30.0, 22.0, 18.0, 20.0]
2025-09-16 17:55:11,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 50 seconds)
2025-09-16 17:57:09,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:57:10,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 224.14534 ± 156.087
2025-09-16 17:57:10,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [477.31256, 420.95795, 123.47998, 484.37305, 127.70546, 128.32578, 95.97947, 136.67915, 117.765785, 128.87402]
2025-09-16 17:57:10,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 79.0, 24.0, 91.0, 25.0, 25.0, 19.0, 27.0, 23.0, 25.0]
2025-09-16 17:57:10,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 50 seconds)
2025-09-16 17:59:10,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:59:10,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 180.23660 ± 126.749
2025-09-16 17:59:10,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.68304, 114.22638, 90.14249, 145.68214, 139.54683, 96.606, 116.77309, 173.34192, 513.4027, 310.96143]
2025-09-16 17:59:10,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 18.0, 28.0, 27.0, 19.0, 23.0, 34.0, 105.0, 58.0]
2025-09-16 17:59:10,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 53 seconds)
2025-09-16 18:01:08,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:01:09,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 194.10683 ± 165.270
2025-09-16 18:01:09,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [387.8277, 122.52021, 123.212204, 124.00625, 101.76366, 105.40203, 95.3554, 132.3727, 624.9524, 123.655716]
2025-09-16 18:01:09,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 24.0, 24.0, 24.0, 20.0, 21.0, 19.0, 26.0, 118.0, 24.0]
2025-09-16 18:01:09,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 47 seconds)
2025-09-16 18:03:08,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:03:09,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 185.28415 ± 112.628
2025-09-16 18:03:09,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [492.42117, 140.34705, 134.95137, 101.45857, 137.08736, 280.92148, 175.4673, 130.68166, 140.3137, 119.19189]
2025-09-16 18:03:09,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 27.0, 26.0, 20.0, 27.0, 54.0, 35.0, 25.0, 28.0, 23.0]
2025-09-16 18:03:09,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 51 seconds)
2025-09-16 18:05:08,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:05:09,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 172.40993 ± 94.635
2025-09-16 18:05:09,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.869965, 117.42101, 95.16761, 151.47876, 260.32578, 135.74644, 107.87483, 380.10425, 282.98672, 97.12388]
2025-09-16 18:05:09,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 19.0, 29.0, 60.0, 26.0, 21.0, 82.0, 55.0, 19.0]
2025-09-16 18:05:09,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 53 seconds)
2025-09-16 18:07:07,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:07:07,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 189.44913 ± 112.044
2025-09-16 18:07:07,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [124.469376, 109.35124, 101.6101, 316.36044, 142.29852, 392.58932, 101.27952, 95.94886, 149.4562, 361.12784]
2025-09-16 18:07:07,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 21.0, 20.0, 57.0, 28.0, 74.0, 20.0, 19.0, 29.0, 66.0]
2025-09-16 18:07:07,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 51 seconds)
2025-09-16 18:09:07,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:09:08,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 128.14725 ± 28.003
2025-09-16 18:09:08,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.39346, 158.81563, 153.04715, 99.955154, 123.36414, 158.45288, 165.68762, 129.12363, 90.72087, 112.91192]
2025-09-16 18:09:08,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 31.0, 30.0, 20.0, 24.0, 31.0, 32.0, 25.0, 18.0, 22.0]
2025-09-16 18:09:08,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 53 seconds)
2025-09-16 18:11:06,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:11:06,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 198.48413 ± 165.659
2025-09-16 18:11:06,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.770676, 150.9726, 152.78284, 107.69518, 620.9555, 107.8784, 122.23523, 96.65404, 404.87497, 131.0217]
2025-09-16 18:11:06,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 29.0, 29.0, 21.0, 123.0, 21.0, 24.0, 19.0, 77.0, 26.0]
2025-09-16 18:11:06,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 53 seconds)
2025-09-16 18:13:05,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:13:06,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 219.52995 ± 172.214
2025-09-16 18:13:06,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.657585, 136.6078, 122.16539, 356.95905, 118.34993, 366.45004, 646.9081, 96.00946, 108.64834, 129.54361]
2025-09-16 18:13:06,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 26.0, 24.0, 69.0, 23.0, 71.0, 124.0, 19.0, 21.0, 25.0]
2025-09-16 18:13:06,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 53 seconds)
2025-09-16 18:15:04,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:15:05,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 159.93275 ± 81.293
2025-09-16 18:15:05,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [142.8992, 173.7653, 89.22007, 108.907715, 129.22942, 126.58415, 107.82658, 309.05942, 321.59314, 90.2426]
2025-09-16 18:15:05,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 33.0, 18.0, 22.0, 25.0, 25.0, 21.0, 58.0, 64.0, 18.0]
2025-09-16 18:15:05,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 52 seconds)
2025-09-16 18:17:04,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:17:04,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 198.65884 ± 107.432
2025-09-16 18:17:04,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [170.97841, 378.23514, 135.73462, 113.6485, 219.55605, 112.51113, 377.72208, 101.565414, 292.36957, 84.26765]
2025-09-16 18:17:04,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 74.0, 26.0, 22.0, 42.0, 22.0, 80.0, 20.0, 55.0, 17.0]
2025-09-16 18:17:04,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 55 seconds)
2025-09-16 18:19:04,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:19:04,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 199.93753 ± 122.415
2025-09-16 18:19:04,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [141.23674, 127.159676, 124.024345, 88.98947, 121.337166, 409.6547, 100.84076, 360.90408, 141.5909, 383.63733]
2025-09-16 18:19:04,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 25.0, 24.0, 18.0, 24.0, 78.0, 20.0, 70.0, 27.0, 73.0]
2025-09-16 18:19:04,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 54 seconds)
2025-09-16 18:21:03,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:21:04,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 193.00925 ± 105.528
2025-09-16 18:21:04,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [112.83023, 342.35367, 118.88316, 95.75232, 114.33207, 124.99945, 288.64975, 184.72826, 142.26105, 405.3027]
2025-09-16 18:21:04,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 66.0, 23.0, 19.0, 22.0, 24.0, 58.0, 36.0, 28.0, 84.0]
2025-09-16 18:21:04,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 56 seconds)
2025-09-16 18:23:02,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:23:03,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 184.50906 ± 116.776
2025-09-16 18:23:03,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [443.6247, 130.25159, 96.262566, 287.75488, 329.00092, 108.881485, 113.4799, 114.71748, 124.41603, 96.70102]
2025-09-16 18:23:03,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 25.0, 19.0, 58.0, 63.0, 21.0, 22.0, 22.0, 24.0, 19.0]
2025-09-16 18:23:03,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 56 seconds)
2025-09-16 18:25:01,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:25:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 216.90610 ± 121.449
2025-09-16 18:25:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [117.61635, 374.43094, 136.18495, 357.2394, 113.40068, 147.95108, 378.69275, 89.36563, 107.71567, 346.46353]
2025-09-16 18:25:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 80.0, 27.0, 65.0, 22.0, 29.0, 72.0, 18.0, 21.0, 67.0]
2025-09-16 18:25:02,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 57 seconds)
2025-09-16 18:27:01,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:27:02,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 293.79034 ± 138.545
2025-09-16 18:27:02,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [328.9306, 434.53195, 373.98138, 404.0662, 109.42219, 138.17921, 160.57191, 408.601, 112.11984, 467.49924]
2025-09-16 18:27:02,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 84.0, 83.0, 77.0, 21.0, 27.0, 31.0, 77.0, 22.0, 106.0]
2025-09-16 18:27:02,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (293.79) for latency 24
2025-09-16 18:27:02,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 58 seconds)
2025-09-16 18:29:01,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:29:02,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 171.23630 ± 113.499
2025-09-16 18:29:02,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.81113, 138.829, 422.60147, 106.82573, 107.211815, 367.58383, 135.91158, 118.10786, 123.32129, 89.159355]
2025-09-16 18:29:02,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 27.0, 77.0, 21.0, 21.0, 69.0, 27.0, 23.0, 24.0, 18.0]
2025-09-16 18:29:02,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 58 seconds)
2025-09-16 18:31:01,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:31:02,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 297.82346 ± 199.196
2025-09-16 18:31:02,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.11171, 367.23758, 263.49774, 678.0535, 158.8368, 117.41982, 101.670265, 555.97656, 476.68402, 145.74626]
2025-09-16 18:31:02,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 67.0, 57.0, 150.0, 31.0, 23.0, 20.0, 115.0, 90.0, 29.0]
2025-09-16 18:31:02,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (297.82) for latency 24
2025-09-16 18:31:02,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 59 seconds)
2025-09-16 18:33:00,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:33:00,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 205.08664 ± 117.010
2025-09-16 18:33:00,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [347.20486, 95.64933, 425.58057, 174.38681, 113.78068, 150.54079, 140.71101, 102.588585, 362.33865, 138.08525]
2025-09-16 18:33:00,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 19.0, 82.0, 33.0, 22.0, 29.0, 27.0, 20.0, 70.0, 27.0]
2025-09-16 18:33:00,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-09-16 18:34:59,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:35:00,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 229.42122 ± 112.826
2025-09-16 18:35:00,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [162.5036, 451.03394, 164.12341, 300.52783, 124.75034, 147.86166, 134.50883, 326.21686, 122.77411, 359.91156]
2025-09-16 18:35:00,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 86.0, 32.0, 56.0, 24.0, 29.0, 26.0, 60.0, 24.0, 69.0]
2025-09-16 18:35:00,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1251 [DEBUG]: Training session finished
