2025-09-16 14:03:24,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.150-delay_18
2025-09-16 14:03:24,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.150-delay_18
2025-09-16 14:03:24,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x151e3e8a0910>}
2025-09-16 14:03:24,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:03:24,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:03:24,501 baseline-bpql-noisepromille150-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=682, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:03:24,501 baseline-bpql-noisepromille150-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:03:26,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:03:26,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:05:14,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:05:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 158.26433 ± 49.112
2025-09-16 14:05:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [218.59004, 127.64444, 212.61787, 113.28769, 204.21056, 200.82312, 194.74675, 96.16763, 118.619934, 95.93526]
2025-09-16 14:05:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 25.0, 44.0, 22.0, 43.0, 42.0, 39.0, 19.0, 23.0, 19.0]
2025-09-16 14:05:15,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (158.26) for latency 18
2025-09-16 14:05:15,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 11 seconds)
2025-09-16 14:07:14,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:07:15,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 317.71191 ± 100.791
2025-09-16 14:07:15,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [338.01642, 127.47979, 382.05634, 382.83383, 304.53635, 364.9152, 421.277, 392.95184, 124.377625, 338.67468]
2025-09-16 14:07:15,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 25.0, 73.0, 73.0, 61.0, 68.0, 79.0, 75.0, 24.0, 65.0]
2025-09-16 14:07:15,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (317.71) for latency 18
2025-09-16 14:07:15,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 7 minutes, 14 seconds)
2025-09-16 14:09:12,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:09:13,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 251.13083 ± 128.385
2025-09-16 14:09:13,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [129.59506, 117.071465, 288.90317, 335.2214, 383.75626, 89.33378, 460.29007, 368.83136, 102.911194, 235.39467]
2025-09-16 14:09:13,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 23.0, 56.0, 66.0, 71.0, 18.0, 88.0, 70.0, 20.0, 44.0]
2025-09-16 14:09:13,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 7 minutes, 10 seconds)
2025-09-16 14:11:12,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:11:13,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 235.21806 ± 117.927
2025-09-16 14:11:13,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [139.78589, 311.02524, 294.68405, 100.6664, 145.87761, 368.31873, 364.68796, 405.48935, 101.546585, 120.09893]
2025-09-16 14:11:13,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 59.0, 55.0, 20.0, 28.0, 70.0, 67.0, 78.0, 20.0, 23.0]
2025-09-16 14:11:13,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 6 minutes, 45 seconds)
2025-09-16 14:13:10,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:13:11,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 279.95169 ± 119.302
2025-09-16 14:13:11,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [472.44186, 363.30157, 165.2495, 359.00043, 346.56854, 164.3803, 352.39505, 142.39908, 337.32324, 96.45725]
2025-09-16 14:13:11,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 81.0, 33.0, 69.0, 69.0, 32.0, 66.0, 28.0, 66.0, 19.0]
2025-09-16 14:13:11,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 5 minutes, 24 seconds)
2025-09-16 14:15:09,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:15:10,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 306.76105 ± 109.951
2025-09-16 14:15:10,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [350.44336, 328.3173, 330.85138, 119.72021, 463.6361, 434.3727, 348.31686, 165.4539, 354.29953, 172.19925]
2025-09-16 14:15:10,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 63.0, 67.0, 23.0, 86.0, 98.0, 68.0, 32.0, 66.0, 33.0]
2025-09-16 14:15:10,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 6 minutes, 29 seconds)
2025-09-16 14:17:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:17:10,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 278.29230 ± 135.464
2025-09-16 14:17:10,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [309.797, 139.7725, 379.46274, 307.35864, 114.95622, 413.20435, 102.01155, 507.1795, 361.30484, 147.87582]
2025-09-16 14:17:10,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 27.0, 74.0, 59.0, 22.0, 79.0, 20.0, 100.0, 66.0, 29.0]
2025-09-16 14:17:10,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 4 minutes, 34 seconds)
2025-09-16 14:19:09,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:19:10,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 283.20636 ± 122.927
2025-09-16 14:19:10,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.90577, 146.09613, 426.23157, 329.24237, 101.80121, 466.04663, 307.0153, 364.51678, 316.63104, 278.57675]
2025-09-16 14:19:10,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 28.0, 78.0, 63.0, 20.0, 91.0, 56.0, 69.0, 61.0, 53.0]
2025-09-16 14:19:10,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 2 minutes, 58 seconds)
2025-09-16 14:21:08,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:21:09,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 273.88593 ± 111.591
2025-09-16 14:21:09,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [114.291336, 347.4131, 113.60433, 342.2192, 95.96244, 398.48227, 354.75156, 360.2795, 310.99734, 300.8582]
2025-09-16 14:21:09,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 67.0, 22.0, 64.0, 19.0, 72.0, 67.0, 68.0, 60.0, 56.0]
2025-09-16 14:21:09,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 51 seconds)
2025-09-16 14:23:07,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:23:08,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 236.74947 ± 111.282
2025-09-16 14:23:08,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [114.31366, 90.84214, 358.3436, 307.5371, 423.50406, 140.03917, 276.44382, 333.72006, 185.75963, 136.99156]
2025-09-16 14:23:08,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 71.0, 57.0, 81.0, 27.0, 51.0, 61.0, 38.0, 26.0]
2025-09-16 14:23:08,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 59 minutes, 3 seconds)
2025-09-16 14:25:07,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:25:08,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 311.37082 ± 169.800
2025-09-16 14:25:08,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [315.06485, 344.2953, 746.29535, 102.55838, 109.261696, 325.6044, 367.93744, 241.14108, 239.50409, 322.04575]
2025-09-16 14:25:08,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 63.0, 144.0, 20.0, 21.0, 62.0, 69.0, 46.0, 46.0, 59.0]
2025-09-16 14:25:08,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 57 minutes, 18 seconds)
2025-09-16 14:27:07,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:27:08,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 283.44693 ± 125.062
2025-09-16 14:27:08,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.75153, 397.2661, 379.23874, 401.1138, 363.70346, 380.68677, 241.42921, 119.71696, 359.1812, 96.38152]
2025-09-16 14:27:08,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 88.0, 85.0, 79.0, 67.0, 72.0, 47.0, 23.0, 68.0, 19.0]
2025-09-16 14:27:08,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 55 minutes, 21 seconds)
2025-09-16 14:29:07,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:29:08,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 331.86148 ± 84.645
2025-09-16 14:29:08,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [414.94806, 114.630104, 306.00873, 299.78802, 383.34195, 399.26135, 349.7586, 338.19318, 415.94965, 296.7348]
2025-09-16 14:29:08,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 22.0, 58.0, 55.0, 72.0, 73.0, 64.0, 67.0, 78.0, 57.0]
2025-09-16 14:29:08,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (331.86) for latency 18
2025-09-16 14:29:08,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 33 seconds)
2025-09-16 14:31:06,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:31:07,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 270.02747 ± 107.275
2025-09-16 14:31:07,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [398.3118, 367.04504, 239.9405, 356.9889, 289.75473, 114.14722, 117.307175, 335.8425, 125.03057, 355.90622]
2025-09-16 14:31:07,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 69.0, 48.0, 67.0, 53.0, 22.0, 23.0, 62.0, 24.0, 67.0]
2025-09-16 14:31:07,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 51 minutes, 26 seconds)
2025-09-16 14:33:06,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:33:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 372.86206 ± 77.194
2025-09-16 14:33:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [328.8704, 425.74643, 284.10947, 316.11746, 474.35275, 359.18295, 329.3458, 287.3098, 397.59653, 525.98914]
2025-09-16 14:33:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 79.0, 53.0, 59.0, 88.0, 67.0, 63.0, 55.0, 75.0, 100.0]
2025-09-16 14:33:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (372.86) for latency 18
2025-09-16 14:33:07,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 49 minutes, 35 seconds)
2025-09-16 14:35:05,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:35:06,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 299.44467 ± 109.985
2025-09-16 14:35:06,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [357.67197, 276.20065, 101.90753, 264.5931, 123.47477, 300.5094, 416.50427, 306.71405, 449.23062, 397.6403]
2025-09-16 14:35:06,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 53.0, 20.0, 48.0, 24.0, 55.0, 80.0, 59.0, 84.0, 73.0]
2025-09-16 14:35:06,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 28 seconds)
2025-09-16 14:37:05,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:37:06,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 257.03070 ± 144.617
2025-09-16 14:37:06,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [107.660126, 386.00128, 356.72415, 362.06158, 125.230774, 511.6309, 114.13173, 140.40198, 106.98003, 359.48453]
2025-09-16 14:37:06,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 71.0, 70.0, 67.0, 24.0, 104.0, 22.0, 27.0, 21.0, 67.0]
2025-09-16 14:37:06,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 45 minutes, 25 seconds)
2025-09-16 14:39:05,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:39:06,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 242.25322 ± 164.362
2025-09-16 14:39:06,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [134.94727, 113.93141, 352.33746, 397.46008, 95.85941, 614.0845, 155.56314, 113.389496, 325.84238, 119.11708]
2025-09-16 14:39:06,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 22.0, 66.0, 73.0, 19.0, 117.0, 30.0, 22.0, 60.0, 23.0]
2025-09-16 14:39:06,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 22 seconds)
2025-09-16 14:41:05,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:41:06,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 386.52402 ± 63.477
2025-09-16 14:41:06,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [454.19205, 517.56104, 351.45126, 353.25845, 329.7486, 384.1164, 438.77905, 369.38028, 286.9863, 379.76685]
2025-09-16 14:41:06,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 99.0, 64.0, 67.0, 62.0, 70.0, 82.0, 68.0, 54.0, 74.0]
2025-09-16 14:41:06,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (386.52) for latency 18
2025-09-16 14:41:06,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 42 seconds)
2025-09-16 14:43:05,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:43:05,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 298.82440 ± 163.989
2025-09-16 14:43:05,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [363.68347, 314.09506, 112.60083, 573.4195, 278.95334, 537.4305, 162.14636, 411.037, 116.59302, 118.284904]
2025-09-16 14:43:05,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 59.0, 22.0, 118.0, 61.0, 113.0, 31.0, 79.0, 23.0, 23.0]
2025-09-16 14:43:05,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 40 seconds)
2025-09-16 14:45:03,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:45:04,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 327.06720 ± 150.514
2025-09-16 14:45:04,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [442.12277, 120.199936, 496.09583, 504.69174, 132.88782, 176.35568, 178.36847, 477.4284, 397.21545, 345.3056]
2025-09-16 14:45:04,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 23.0, 92.0, 109.0, 26.0, 34.0, 34.0, 89.0, 85.0, 64.0]
2025-09-16 14:45:04,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 31 seconds)
2025-09-16 14:47:04,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:47:05,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 253.09160 ± 118.779
2025-09-16 14:47:05,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [349.75476, 130.59395, 96.466034, 125.069016, 101.46289, 332.1298, 347.81485, 428.17798, 293.39777, 326.04898]
2025-09-16 14:47:05,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 25.0, 19.0, 24.0, 20.0, 61.0, 64.0, 87.0, 55.0, 63.0]
2025-09-16 14:47:05,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 43 seconds)
2025-09-16 14:49:03,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:49:03,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 188.20125 ± 121.759
2025-09-16 14:49:03,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [117.149956, 112.317154, 125.5161, 96.2091, 109.30646, 376.9018, 276.42038, 101.90247, 125.089516, 441.1996]
2025-09-16 14:49:03,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 22.0, 24.0, 19.0, 21.0, 74.0, 52.0, 20.0, 24.0, 82.0]
2025-09-16 14:49:03,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 19 seconds)
2025-09-16 14:51:03,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:51:03,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 211.93750 ± 101.262
2025-09-16 14:51:03,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [158.62578, 317.07803, 128.51392, 113.87459, 252.60883, 389.32196, 150.65024, 96.80809, 350.98032, 160.9133]
2025-09-16 14:51:03,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 62.0, 25.0, 22.0, 50.0, 71.0, 29.0, 19.0, 67.0, 31.0]
2025-09-16 14:51:03,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 31 minutes, 18 seconds)
2025-09-16 14:53:02,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:53:02,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 186.99818 ± 114.849
2025-09-16 14:53:02,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [89.60999, 352.79504, 309.62875, 96.56296, 107.95547, 146.6277, 114.33078, 408.94012, 141.59935, 101.9318]
2025-09-16 14:53:02,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 65.0, 58.0, 19.0, 21.0, 28.0, 22.0, 76.0, 27.0, 20.0]
2025-09-16 14:53:02,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 14 seconds)
2025-09-16 14:55:01,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:55:01,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 279.42096 ± 120.028
2025-09-16 14:55:01,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [350.4373, 315.0452, 107.34047, 114.6786, 356.04672, 106.30944, 423.35052, 406.97208, 364.7073, 249.3221]
2025-09-16 14:55:01,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 58.0, 21.0, 22.0, 66.0, 21.0, 78.0, 75.0, 67.0, 46.0]
2025-09-16 14:55:01,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 27 minutes, 17 seconds)
2025-09-16 14:57:01,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:57:02,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 296.53494 ± 111.131
2025-09-16 14:57:02,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [114.142685, 112.92579, 365.44458, 281.16864, 374.4718, 325.32925, 482.50818, 381.18732, 244.40024, 283.7711]
2025-09-16 14:57:02,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 22.0, 66.0, 52.0, 68.0, 60.0, 90.0, 70.0, 50.0, 53.0]
2025-09-16 14:57:02,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 25 minutes, 7 seconds)
2025-09-16 14:59:01,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:59:02,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 307.97833 ± 151.727
2025-09-16 14:59:02,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [373.30014, 135.82791, 394.53876, 206.2744, 532.5432, 90.442795, 489.39642, 357.48376, 388.96875, 111.00703]
2025-09-16 14:59:02,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 26.0, 76.0, 40.0, 98.0, 18.0, 89.0, 67.0, 86.0, 22.0]
2025-09-16 14:59:02,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 minutes, 34 seconds)
2025-09-16 15:01:00,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:01:01,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 222.63757 ± 131.835
2025-09-16 15:01:01,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [139.80667, 112.12713, 391.83853, 444.29825, 108.625, 107.533714, 314.87592, 365.07553, 146.6382, 95.55685]
2025-09-16 15:01:01,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 71.0, 83.0, 21.0, 21.0, 60.0, 68.0, 28.0, 19.0]
2025-09-16 15:01:01,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 21 minutes, 28 seconds)
2025-09-16 15:03:00,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:03:01,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 289.64273 ± 127.818
2025-09-16 15:03:01,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [123.76629, 499.30893, 326.78442, 250.2579, 101.62407, 126.15753, 365.3448, 324.1645, 392.6832, 386.3358]
2025-09-16 15:03:01,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 99.0, 59.0, 47.0, 20.0, 24.0, 67.0, 60.0, 77.0, 71.0]
2025-09-16 15:03:01,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 35 seconds)
2025-09-16 15:04:59,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:05:00,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 244.58659 ± 159.044
2025-09-16 15:05:00,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [457.49976, 124.81728, 139.84094, 124.585785, 107.84351, 101.75773, 417.36197, 469.5994, 406.57217, 95.98737]
2025-09-16 15:05:00,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 24.0, 27.0, 24.0, 21.0, 20.0, 77.0, 86.0, 74.0, 19.0]
2025-09-16 15:05:00,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 17 minutes, 40 seconds)
2025-09-16 15:07:00,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:07:01,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 281.87408 ± 129.959
2025-09-16 15:07:01,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [349.4626, 385.04214, 173.02998, 95.77811, 401.6367, 114.42364, 422.22714, 119.87189, 387.30536, 369.96323]
2025-09-16 15:07:01,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 74.0, 33.0, 19.0, 75.0, 22.0, 79.0, 23.0, 71.0, 69.0]
2025-09-16 15:07:01,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 48 seconds)
2025-09-16 15:08:59,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:09:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 178.02316 ± 99.563
2025-09-16 15:09:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [334.84155, 107.82703, 142.84814, 340.49023, 118.954285, 114.40512, 101.52242, 310.9658, 106.5247, 101.85246]
2025-09-16 15:09:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 21.0, 27.0, 63.0, 23.0, 22.0, 20.0, 56.0, 21.0, 20.0]
2025-09-16 15:09:00,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 36 seconds)
2025-09-16 15:10:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:10:59,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 268.50589 ± 159.831
2025-09-16 15:10:59,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [140.58408, 369.84964, 399.7524, 102.464935, 585.8405, 113.38388, 151.04036, 335.65216, 102.388954, 384.10208]
2025-09-16 15:10:59,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 69.0, 74.0, 20.0, 113.0, 22.0, 29.0, 62.0, 20.0, 70.0]
2025-09-16 15:10:59,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 29 seconds)
2025-09-16 15:12:58,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:13:00,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 396.02460 ± 114.892
2025-09-16 15:13:00,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [438.86255, 102.2982, 411.55753, 502.14328, 342.04254, 442.3924, 461.41946, 419.7243, 523.212, 316.59344]
2025-09-16 15:13:00,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 20.0, 76.0, 92.0, 63.0, 80.0, 84.0, 77.0, 112.0, 58.0]
2025-09-16 15:13:00,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (396.02) for latency 18
2025-09-16 15:13:00,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 9 minutes, 45 seconds)
2025-09-16 15:14:58,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:14:58,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 177.40549 ± 110.175
2025-09-16 15:14:58,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [155.66623, 108.31938, 341.35696, 108.325035, 113.41665, 89.682724, 108.41181, 102.3515, 422.65326, 223.8714]
2025-09-16 15:14:58,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 21.0, 62.0, 21.0, 22.0, 18.0, 21.0, 20.0, 93.0, 42.0]
2025-09-16 15:14:58,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 7 minutes, 40 seconds)
2025-09-16 15:16:59,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:17:00,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 275.96548 ± 221.007
2025-09-16 15:17:00,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [118.42907, 309.15625, 107.87128, 112.820946, 323.77924, 843.5971, 344.99756, 96.87501, 401.19385, 100.93447]
2025-09-16 15:17:00,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 59.0, 21.0, 22.0, 66.0, 164.0, 62.0, 19.0, 73.0, 20.0]
2025-09-16 15:17:00,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 5 minutes, 44 seconds)
2025-09-16 15:18:57,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:18:58,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 239.30779 ± 148.250
2025-09-16 15:18:58,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [123.80804, 167.4086, 318.15924, 512.47675, 477.7779, 113.61484, 118.593254, 110.87021, 315.4609, 134.90831]
2025-09-16 15:18:58,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 32.0, 70.0, 93.0, 87.0, 22.0, 23.0, 22.0, 60.0, 26.0]
2025-09-16 15:18:58,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 3 minutes, 34 seconds)
2025-09-16 15:20:57,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:20:57,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 239.31238 ± 121.326
2025-09-16 15:20:57,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [108.29041, 107.464134, 303.45996, 378.35443, 167.68172, 333.52634, 417.90692, 113.48992, 114.5481, 348.40182]
2025-09-16 15:20:57,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 59.0, 70.0, 32.0, 62.0, 77.0, 22.0, 22.0, 63.0]
2025-09-16 15:20:57,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute, 43 seconds)
2025-09-16 15:22:56,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:22:57,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 327.93280 ± 145.287
2025-09-16 15:22:57,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [355.37024, 326.2741, 367.51755, 479.10986, 135.9268, 450.12692, 129.93729, 416.45206, 101.48885, 517.1246]
2025-09-16 15:22:57,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 61.0, 69.0, 88.0, 26.0, 83.0, 25.0, 74.0, 20.0, 96.0]
2025-09-16 15:22:57,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 33 seconds)
2025-09-16 15:24:56,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:24:58,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 383.53268 ± 188.215
2025-09-16 15:24:58,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [434.877, 795.5489, 327.96082, 373.72226, 108.61777, 96.22508, 363.26196, 396.19516, 520.48883, 418.4292]
2025-09-16 15:24:58,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 171.0, 62.0, 73.0, 21.0, 19.0, 68.0, 72.0, 96.0, 76.0]
2025-09-16 15:24:58,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 50 seconds)
2025-09-16 15:26:58,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:26:58,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 327.57864 ± 136.803
2025-09-16 15:26:58,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [372.58704, 506.8115, 171.53035, 434.8932, 408.48334, 406.2514, 101.49207, 448.919, 134.85268, 289.96588]
2025-09-16 15:26:58,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 96.0, 33.0, 78.0, 75.0, 76.0, 20.0, 86.0, 26.0, 53.0]
2025-09-16 15:26:58,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 47 seconds)
2025-09-16 15:28:57,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:28:57,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 261.88324 ± 155.571
2025-09-16 15:28:57,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [478.19562, 129.60007, 112.76214, 353.70633, 96.60079, 118.7999, 458.1856, 435.2761, 333.60236, 102.1037]
2025-09-16 15:28:57,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 25.0, 22.0, 64.0, 19.0, 23.0, 87.0, 80.0, 63.0, 20.0]
2025-09-16 15:28:57,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 54 seconds)
2025-09-16 15:30:56,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:30:57,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 311.19171 ± 164.437
2025-09-16 15:30:57,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [458.5453, 118.74913, 151.12503, 603.83093, 427.49164, 345.0671, 336.538, 106.812004, 426.07443, 137.68353]
2025-09-16 15:30:57,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 23.0, 29.0, 132.0, 79.0, 64.0, 61.0, 21.0, 91.0, 27.0]
2025-09-16 15:30:57,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 59 seconds)
2025-09-16 15:32:56,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:32:57,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 379.08759 ± 133.408
2025-09-16 15:32:57,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [445.33704, 107.53435, 527.0051, 522.4626, 363.86383, 431.67123, 370.541, 384.9009, 472.1294, 165.43077]
2025-09-16 15:32:57,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 21.0, 94.0, 98.0, 64.0, 80.0, 68.0, 71.0, 93.0, 32.0]
2025-09-16 15:32:57,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes)
2025-09-16 15:34:56,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:34:57,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 209.06470 ± 155.424
2025-09-16 15:34:57,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.44753, 102.33712, 385.00403, 102.02857, 89.82056, 133.99347, 129.69344, 124.912254, 370.4255, 550.98456]
2025-09-16 15:34:57,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 72.0, 20.0, 18.0, 26.0, 25.0, 24.0, 72.0, 107.0]
2025-09-16 15:34:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 54 seconds)
2025-09-16 15:36:57,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:36:58,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 377.01596 ± 202.014
2025-09-16 15:36:58,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.769356, 494.95847, 101.75009, 476.53217, 527.50336, 140.31252, 714.76556, 556.797, 324.4009, 337.37033]
2025-09-16 15:36:58,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 90.0, 20.0, 87.0, 96.0, 27.0, 134.0, 119.0, 60.0, 63.0]
2025-09-16 15:36:58,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 51 seconds)
2025-09-16 15:38:57,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:38:58,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 321.06720 ± 179.894
2025-09-16 15:38:58,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [128.81836, 565.60114, 389.02618, 125.51159, 534.10254, 102.01808, 419.63992, 315.45142, 114.23063, 516.2721]
2025-09-16 15:38:58,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 114.0, 71.0, 24.0, 99.0, 20.0, 77.0, 59.0, 22.0, 95.0]
2025-09-16 15:38:58,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 10 seconds)
2025-09-16 15:40:57,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:40:57,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 279.28552 ± 172.275
2025-09-16 15:40:57,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [117.876434, 119.13772, 435.80496, 390.83157, 100.90075, 107.821434, 403.4337, 549.50116, 112.81205, 454.7356]
2025-09-16 15:40:57,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 81.0, 71.0, 20.0, 21.0, 75.0, 104.0, 22.0, 82.0]
2025-09-16 15:40:57,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 42 minutes, 1 second)
2025-09-16 15:42:57,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:42:57,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 308.90738 ± 164.860
2025-09-16 15:42:57,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [431.71042, 542.9552, 369.0914, 119.89213, 480.05264, 347.59232, 102.85848, 125.20078, 119.88853, 449.83188]
2025-09-16 15:42:57,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 101.0, 71.0, 23.0, 89.0, 63.0, 20.0, 24.0, 23.0, 86.0]
2025-09-16 15:42:57,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 40 minutes)
2025-09-16 15:44:57,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:44:58,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 330.51169 ± 173.865
2025-09-16 15:44:58,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [125.669556, 502.86993, 155.64415, 334.88785, 468.07108, 362.23563, 555.41394, 124.80622, 129.07687, 546.4416]
2025-09-16 15:44:58,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 90.0, 30.0, 62.0, 87.0, 70.0, 104.0, 24.0, 25.0, 101.0]
2025-09-16 15:44:58,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 38 minutes, 6 seconds)
2025-09-16 15:46:57,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:46:58,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 251.22104 ± 159.325
2025-09-16 15:46:58,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [96.117905, 471.04373, 114.23442, 96.60154, 107.272194, 280.78305, 351.9026, 496.7242, 95.830734, 401.70004]
2025-09-16 15:46:58,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 87.0, 22.0, 19.0, 21.0, 61.0, 63.0, 93.0, 19.0, 73.0]
2025-09-16 15:46:58,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes)
2025-09-16 15:48:57,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:48:58,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 339.57379 ± 228.286
2025-09-16 15:48:58,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [96.53637, 177.18011, 108.33602, 898.48083, 435.19684, 119.30339, 373.44916, 396.68866, 367.18634, 423.38022]
2025-09-16 15:48:58,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 34.0, 21.0, 178.0, 78.0, 23.0, 69.0, 73.0, 67.0, 78.0]
2025-09-16 15:48:58,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 53 seconds)
2025-09-16 15:50:57,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:50:58,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 355.45294 ± 155.646
2025-09-16 15:50:58,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [510.2339, 108.44815, 421.21716, 545.588, 535.9073, 398.30972, 150.08896, 369.26697, 360.07925, 155.39006]
2025-09-16 15:50:58,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 21.0, 84.0, 112.0, 110.0, 73.0, 29.0, 68.0, 66.0, 30.0]
2025-09-16 15:50:58,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 9 seconds)
2025-09-16 15:52:57,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:52:58,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 338.11642 ± 169.328
2025-09-16 15:52:58,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [499.10486, 430.9868, 174.14937, 400.979, 170.5796, 114.34323, 409.83365, 125.892654, 424.75888, 630.5363]
2025-09-16 15:52:58,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 79.0, 33.0, 77.0, 33.0, 22.0, 77.0, 24.0, 77.0, 133.0]
2025-09-16 15:52:58,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 6 seconds)
2025-09-16 15:54:58,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:54:59,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 329.98825 ± 189.768
2025-09-16 15:54:59,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [616.6535, 344.07996, 369.13, 89.94544, 102.397736, 167.2467, 413.2253, 597.2816, 473.35333, 126.569084]
2025-09-16 15:54:59,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 64.0, 67.0, 18.0, 20.0, 32.0, 74.0, 112.0, 88.0, 24.0]
2025-09-16 15:54:59,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 9 seconds)
2025-09-16 15:56:59,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:57:00,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 348.47858 ± 180.237
2025-09-16 15:57:00,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [125.49491, 429.12708, 264.32306, 537.9529, 118.44375, 456.5194, 187.78564, 408.53687, 257.03104, 699.5711]
2025-09-16 15:57:00,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 80.0, 50.0, 99.0, 23.0, 85.0, 36.0, 74.0, 48.0, 135.0]
2025-09-16 15:57:00,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 17 seconds)
2025-09-16 15:58:59,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:59:00,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 322.93954 ± 183.387
2025-09-16 15:59:00,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [529.8542, 577.4243, 102.25107, 437.7071, 102.72999, 384.57574, 125.022064, 376.16772, 108.15426, 485.5091]
2025-09-16 15:59:00,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 104.0, 20.0, 81.0, 20.0, 82.0, 24.0, 68.0, 21.0, 90.0]
2025-09-16 15:59:00,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 21 seconds)
2025-09-16 16:00:59,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:01:00,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 273.32465 ± 164.328
2025-09-16 16:01:00,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [112.319016, 447.7225, 362.40863, 568.91095, 113.49985, 145.53427, 95.77144, 325.59015, 135.56967, 425.92004]
2025-09-16 16:01:00,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 83.0, 69.0, 102.0, 22.0, 28.0, 19.0, 61.0, 26.0, 77.0]
2025-09-16 16:01:00,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 10 seconds)
2025-09-16 16:02:59,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:03:00,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 297.61215 ± 162.832
2025-09-16 16:03:00,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [473.9261, 96.41231, 114.417816, 328.52805, 432.8992, 141.96025, 111.50478, 533.2768, 459.366, 283.83038]
2025-09-16 16:03:00,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 19.0, 22.0, 60.0, 80.0, 27.0, 22.0, 96.0, 85.0, 56.0]
2025-09-16 16:03:00,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 12 seconds)
2025-09-16 16:04:59,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:05:00,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 346.81177 ± 161.415
2025-09-16 16:05:00,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.76499, 396.6793, 416.04428, 618.85864, 163.31586, 269.61856, 523.99603, 402.44357, 425.72357, 149.67297]
2025-09-16 16:05:00,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 72.0, 82.0, 121.0, 32.0, 53.0, 95.0, 75.0, 87.0, 29.0]
2025-09-16 16:05:00,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 13 seconds)
2025-09-16 16:06:59,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:07:00,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 190.39828 ± 116.159
2025-09-16 16:07:00,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [173.6991, 101.40117, 112.96759, 321.84488, 358.49265, 407.59003, 102.37908, 106.06193, 118.92771, 100.61879]
2025-09-16 16:07:00,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 20.0, 22.0, 60.0, 63.0, 75.0, 20.0, 21.0, 23.0, 20.0]
2025-09-16 16:07:00,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 2 seconds)
2025-09-16 16:08:59,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:09:00,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 424.29303 ± 157.260
2025-09-16 16:09:00,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [682.7669, 120.207756, 657.3343, 388.96298, 366.07062, 441.58298, 350.56628, 424.07385, 512.11584, 299.24838]
2025-09-16 16:09:00,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 23.0, 136.0, 70.0, 76.0, 80.0, 66.0, 76.0, 94.0, 58.0]
2025-09-16 16:09:00,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (424.29) for latency 18
2025-09-16 16:09:00,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 2 seconds)
2025-09-16 16:11:00,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:11:02,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 414.51837 ± 135.661
2025-09-16 16:11:02,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [368.50378, 376.44565, 390.0665, 625.52924, 285.4637, 321.15997, 541.2772, 162.28302, 546.52734, 527.92725]
2025-09-16 16:11:02,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 68.0, 72.0, 127.0, 55.0, 59.0, 101.0, 31.0, 116.0, 96.0]
2025-09-16 16:11:02,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 14 seconds)
2025-09-16 16:13:01,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:13:01,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 277.54126 ± 168.684
2025-09-16 16:13:01,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [413.1594, 96.10003, 141.52063, 95.83651, 408.30276, 117.593864, 446.8723, 464.38632, 489.18195, 102.45881]
2025-09-16 16:13:01,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 19.0, 27.0, 19.0, 73.0, 23.0, 85.0, 85.0, 100.0, 20.0]
2025-09-16 16:13:01,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 11 seconds)
2025-09-16 16:15:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:15:02,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 326.71521 ± 186.118
2025-09-16 16:15:02,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [90.16049, 639.82074, 195.60278, 433.97073, 322.9243, 349.42596, 109.01182, 436.52646, 571.8695, 117.839584]
2025-09-16 16:15:02,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 121.0, 37.0, 79.0, 60.0, 64.0, 21.0, 91.0, 104.0, 23.0]
2025-09-16 16:15:02,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 7 seconds)
2025-09-16 16:17:01,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:17:02,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 318.41864 ± 131.887
2025-09-16 16:17:02,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [444.58093, 114.52032, 445.02612, 354.528, 402.01743, 119.45567, 415.9835, 367.4791, 130.17993, 390.41547]
2025-09-16 16:17:02,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 22.0, 80.0, 66.0, 83.0, 23.0, 75.0, 71.0, 25.0, 72.0]
2025-09-16 16:17:02,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 15 seconds)
2025-09-16 16:19:02,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:19:03,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 414.45175 ± 295.696
2025-09-16 16:19:03,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [411.5616, 1188.9575, 418.06516, 90.18727, 444.98438, 527.11523, 442.2439, 173.03897, 334.29147, 114.07223]
2025-09-16 16:19:03,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 227.0, 75.0, 18.0, 82.0, 102.0, 81.0, 33.0, 61.0, 22.0]
2025-09-16 16:19:03,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 18 seconds)
2025-09-16 16:21:03,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:21:04,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 315.23770 ± 200.945
2025-09-16 16:21:04,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [328.69104, 613.1232, 108.48661, 129.59035, 96.51191, 193.7956, 647.76117, 339.7526, 171.49559, 523.169]
2025-09-16 16:21:04,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 118.0, 21.0, 25.0, 19.0, 37.0, 118.0, 65.0, 33.0, 99.0]
2025-09-16 16:21:04,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 14 seconds)
2025-09-16 16:23:03,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:23:04,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 347.32169 ± 163.103
2025-09-16 16:23:04,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [397.18793, 328.9464, 434.7171, 172.58888, 429.2493, 413.67044, 447.76746, 640.1194, 113.41849, 95.55125]
2025-09-16 16:23:04,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 64.0, 83.0, 33.0, 82.0, 73.0, 83.0, 122.0, 22.0, 19.0]
2025-09-16 16:23:04,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 15 seconds)
2025-09-16 16:25:03,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:25:04,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 280.49432 ± 176.996
2025-09-16 16:25:04,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [494.61185, 90.882515, 139.55447, 413.1491, 111.8906, 323.07785, 119.49306, 95.866776, 513.68494, 502.73215]
2025-09-16 16:25:04,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 18.0, 27.0, 72.0, 22.0, 62.0, 23.0, 19.0, 100.0, 101.0]
2025-09-16 16:25:04,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 14 seconds)
2025-09-16 16:27:03,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:27:04,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 430.25189 ± 136.162
2025-09-16 16:27:04,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [453.42743, 594.09076, 323.90668, 141.61395, 577.1644, 419.52655, 507.4763, 317.79807, 576.9401, 390.57486]
2025-09-16 16:27:04,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 125.0, 66.0, 27.0, 106.0, 76.0, 91.0, 68.0, 119.0, 73.0]
2025-09-16 16:27:04,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (430.25) for latency 18
2025-09-16 16:27:04,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 10 seconds)
2025-09-16 16:29:04,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:29:05,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 322.85925 ± 230.933
2025-09-16 16:29:05,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [475.41962, 108.65571, 135.14078, 528.84064, 96.01945, 469.45386, 96.44016, 778.1445, 427.47137, 113.00643]
2025-09-16 16:29:05,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 21.0, 26.0, 99.0, 19.0, 82.0, 19.0, 143.0, 80.0, 22.0]
2025-09-16 16:29:05,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 6 seconds)
2025-09-16 16:31:06,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:31:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 359.26614 ± 221.764
2025-09-16 16:31:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [113.944145, 593.3092, 388.94095, 134.95517, 587.54376, 709.8988, 102.20567, 387.7152, 95.67102, 478.47736]
2025-09-16 16:31:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 118.0, 72.0, 26.0, 110.0, 141.0, 20.0, 70.0, 19.0, 90.0]
2025-09-16 16:31:07,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 15 seconds)
2025-09-16 16:33:05,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:33:06,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 399.30792 ± 142.869
2025-09-16 16:33:06,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [365.35242, 542.2406, 570.7151, 437.5009, 117.54097, 498.26437, 430.9352, 455.26715, 158.3382, 416.92432]
2025-09-16 16:33:06,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 100.0, 106.0, 81.0, 23.0, 91.0, 77.0, 80.0, 30.0, 75.0]
2025-09-16 16:33:06,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 11 seconds)
2025-09-16 16:35:05,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:35:06,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 322.91608 ± 166.771
2025-09-16 16:35:06,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [356.83664, 490.46432, 436.12463, 391.22104, 108.214966, 103.41585, 443.84497, 580.8687, 192.99849, 125.1708]
2025-09-16 16:35:06,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 89.0, 76.0, 71.0, 21.0, 20.0, 82.0, 104.0, 37.0, 24.0]
2025-09-16 16:35:06,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 10 seconds)
2025-09-16 16:37:06,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:37:07,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 407.91461 ± 207.564
2025-09-16 16:37:07,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [397.88257, 516.046, 390.9983, 136.09947, 148.05156, 460.3447, 644.0207, 108.158295, 532.9501, 744.5945]
2025-09-16 16:37:07,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 91.0, 85.0, 26.0, 28.0, 84.0, 129.0, 21.0, 103.0, 142.0]
2025-09-16 16:37:07,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 14 seconds)
2025-09-16 16:39:06,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:39:07,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 296.35577 ± 198.037
2025-09-16 16:39:07,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [482.99704, 622.0207, 550.85834, 131.41489, 479.32687, 102.817024, 124.18177, 155.80408, 147.55765, 166.57953]
2025-09-16 16:39:07,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 133.0, 103.0, 25.0, 89.0, 20.0, 24.0, 30.0, 29.0, 32.0]
2025-09-16 16:39:07,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 12 seconds)
2025-09-16 16:41:06,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:41:08,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 396.38788 ± 222.584
2025-09-16 16:41:08,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [97.126114, 432.38693, 372.03232, 118.44861, 97.16533, 399.48294, 645.55865, 686.43823, 417.9956, 697.2439]
2025-09-16 16:41:08,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 79.0, 78.0, 23.0, 19.0, 75.0, 131.0, 143.0, 77.0, 127.0]
2025-09-16 16:41:08,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 2 seconds)
2025-09-16 16:43:08,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:43:08,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 247.80014 ± 161.329
2025-09-16 16:43:08,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [114.392006, 333.5699, 113.90884, 529.8264, 418.0149, 101.73535, 102.16599, 193.98447, 459.0385, 111.36493]
2025-09-16 16:43:08,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 63.0, 22.0, 96.0, 76.0, 20.0, 20.0, 37.0, 99.0, 22.0]
2025-09-16 16:43:08,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 8 seconds)
2025-09-16 16:45:07,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:45:08,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 313.75171 ± 203.145
2025-09-16 16:45:08,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [527.85223, 118.23878, 478.7181, 357.65332, 122.47317, 101.50344, 174.7076, 624.92847, 90.26096, 541.18085]
2025-09-16 16:45:08,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 23.0, 87.0, 65.0, 24.0, 20.0, 33.0, 110.0, 18.0, 99.0]
2025-09-16 16:45:08,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 7 seconds)
2025-09-16 16:47:08,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:47:09,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 350.29352 ± 147.274
2025-09-16 16:47:09,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [346.3658, 284.02148, 343.47696, 129.83458, 367.95428, 261.4387, 642.49554, 441.30322, 518.89575, 167.14886]
2025-09-16 16:47:09,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 54.0, 63.0, 25.0, 69.0, 52.0, 119.0, 79.0, 93.0, 32.0]
2025-09-16 16:47:09,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 4 seconds)
2025-09-16 16:49:08,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:49:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 371.04828 ± 241.393
2025-09-16 16:49:09,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [453.59363, 101.48212, 776.5774, 114.220116, 659.53864, 533.82336, 123.20899, 102.10699, 532.3847, 313.54688]
2025-09-16 16:49:09,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 20.0, 142.0, 22.0, 124.0, 97.0, 24.0, 20.0, 98.0, 62.0]
2025-09-16 16:49:09,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 6 seconds)
2025-09-16 16:51:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:51:10,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 365.36383 ± 228.330
2025-09-16 16:51:10,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [89.3662, 661.35095, 353.14868, 687.77783, 102.34719, 540.6332, 129.93753, 114.429855, 435.25238, 539.3943]
2025-09-16 16:51:10,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 121.0, 63.0, 135.0, 20.0, 102.0, 25.0, 22.0, 79.0, 98.0]
2025-09-16 16:51:10,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 8 seconds)
2025-09-16 16:53:11,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:53:12,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 412.84369 ± 213.531
2025-09-16 16:53:12,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [563.6103, 527.3875, 409.2317, 135.93672, 780.195, 102.45797, 486.2933, 484.0791, 524.90076, 114.34478]
2025-09-16 16:53:12,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 96.0, 79.0, 26.0, 141.0, 20.0, 91.0, 97.0, 95.0, 22.0]
2025-09-16 16:53:12,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 10 seconds)
2025-09-16 16:55:12,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:55:13,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 486.00473 ± 250.717
2025-09-16 16:55:13,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [134.59454, 586.7045, 179.3904, 779.73425, 726.9008, 422.05838, 405.2041, 661.5834, 809.93207, 153.9447]
2025-09-16 16:55:13,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 110.0, 34.0, 146.0, 140.0, 76.0, 73.0, 117.0, 154.0, 30.0]
2025-09-16 16:55:13,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (486.00) for latency 18
2025-09-16 16:55:13,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 15 seconds)
2025-09-16 16:57:12,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:57:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 347.55264 ± 215.656
2025-09-16 16:57:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [398.5985, 803.13495, 120.05495, 118.83498, 362.2522, 562.2501, 172.19125, 408.21613, 96.957375, 433.03613]
2025-09-16 16:57:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 142.0, 23.0, 23.0, 71.0, 108.0, 33.0, 78.0, 19.0, 95.0]
2025-09-16 16:57:13,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 10 seconds)
2025-09-16 16:59:12,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:59:13,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 376.93536 ± 204.958
2025-09-16 16:59:13,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [515.55914, 108.942245, 341.35962, 499.32214, 640.1072, 101.98262, 372.7379, 690.0409, 386.54694, 112.75492]
2025-09-16 16:59:13,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 21.0, 61.0, 90.0, 118.0, 20.0, 68.0, 123.0, 80.0, 22.0]
2025-09-16 16:59:13,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 8 seconds)
2025-09-16 17:01:12,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:01:13,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 408.73395 ± 206.175
2025-09-16 17:01:13,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [522.567, 645.6096, 552.1436, 113.008835, 366.0764, 437.8044, 529.2244, 671.3572, 141.71207, 107.83605]
2025-09-16 17:01:13,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 123.0, 101.0, 22.0, 71.0, 78.0, 92.0, 117.0, 27.0, 21.0]
2025-09-16 17:01:13,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 5 seconds)
2025-09-16 17:03:11,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:03:12,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 311.77637 ± 225.082
2025-09-16 17:03:12,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [135.32213, 413.52866, 414.72287, 349.9797, 108.72344, 588.5227, 774.31177, 106.74824, 102.54789, 123.35648]
2025-09-16 17:03:12,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 74.0, 74.0, 65.0, 21.0, 109.0, 160.0, 21.0, 20.0, 24.0]
2025-09-16 17:03:12,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes)
2025-09-16 17:05:11,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:05:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 420.79608 ± 255.595
2025-09-16 17:05:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [382.5359, 662.3918, 95.36866, 630.8247, 97.61792, 108.31565, 554.87646, 463.7197, 882.5755, 329.73474]
2025-09-16 17:05:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 130.0, 19.0, 117.0, 19.0, 21.0, 113.0, 82.0, 174.0, 59.0]
2025-09-16 17:05:13,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 58 seconds)
2025-09-16 17:07:12,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:07:13,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 356.34094 ± 298.684
2025-09-16 17:07:13,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [97.01513, 124.67495, 872.8568, 730.30646, 506.95663, 149.96266, 116.54465, 120.25554, 124.66146, 720.17523]
2025-09-16 17:07:13,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 150.0, 145.0, 107.0, 29.0, 23.0, 23.0, 24.0, 147.0]
2025-09-16 17:07:13,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes)
2025-09-16 17:09:11,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:09:12,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 321.35538 ± 207.780
2025-09-16 17:09:12,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [114.06033, 123.121086, 102.72474, 461.81165, 146.12886, 141.0013, 477.5862, 481.21118, 456.72302, 709.1855]
2025-09-16 17:09:12,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 20.0, 84.0, 28.0, 27.0, 88.0, 91.0, 85.0, 154.0]
2025-09-16 17:09:12,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 58 seconds)
2025-09-16 17:11:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:11:12,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 525.99628 ± 235.224
2025-09-16 17:11:12,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [769.2757, 454.9597, 485.88916, 621.26044, 96.30281, 764.57025, 108.090904, 635.92993, 580.60956, 743.0749]
2025-09-16 17:11:12,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 81.0, 90.0, 109.0, 19.0, 140.0, 21.0, 122.0, 127.0, 129.0]
2025-09-16 17:11:12,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (526.00) for latency 18
2025-09-16 17:11:12,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 59 seconds)
2025-09-16 17:13:11,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:13:12,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 317.38953 ± 266.220
2025-09-16 17:13:12,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.63553, 573.9992, 102.54443, 107.767296, 101.935486, 114.71545, 498.66827, 676.5157, 114.32608, 781.78754]
2025-09-16 17:13:12,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 106.0, 20.0, 21.0, 20.0, 22.0, 91.0, 121.0, 22.0, 154.0]
2025-09-16 17:13:12,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes)
2025-09-16 17:15:10,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:15:11,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 399.22211 ± 207.112
2025-09-16 17:15:11,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [399.00183, 96.182556, 543.1994, 113.820755, 585.7821, 96.78051, 461.22604, 454.63403, 662.97504, 578.61847]
2025-09-16 17:15:11,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 19.0, 94.0, 22.0, 107.0, 19.0, 83.0, 84.0, 123.0, 113.0]
2025-09-16 17:15:11,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 58 seconds)
2025-09-16 17:17:09,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:17:10,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 417.58243 ± 263.312
2025-09-16 17:17:10,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [304.4253, 125.09819, 844.39844, 102.55156, 433.80817, 107.907745, 295.2981, 667.9649, 516.18536, 778.1863]
2025-09-16 17:17:10,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 24.0, 149.0, 20.0, 80.0, 21.0, 56.0, 137.0, 91.0, 139.0]
2025-09-16 17:17:10,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 58 seconds)
2025-09-16 17:19:09,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:19:10,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 348.50858 ± 168.303
2025-09-16 17:19:10,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [529.89923, 409.585, 305.28647, 114.16181, 416.18097, 453.72012, 623.14764, 378.2651, 135.11719, 119.722115]
2025-09-16 17:19:10,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 88.0, 62.0, 22.0, 82.0, 82.0, 128.0, 69.0, 26.0, 23.0]
2025-09-16 17:19:10,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 59 seconds)
2025-09-16 17:21:08,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:21:09,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 308.09210 ± 239.463
2025-09-16 17:21:09,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [157.34082, 599.0128, 108.63411, 660.36334, 102.26201, 525.40375, 97.13283, 113.748726, 108.2725, 608.75]
2025-09-16 17:21:09,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 117.0, 21.0, 117.0, 20.0, 115.0, 19.0, 22.0, 21.0, 113.0]
2025-09-16 17:21:09,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-09-16 17:23:07,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:23:09,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 408.44702 ± 272.744
2025-09-16 17:23:09,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [813.63074, 805.2505, 124.67193, 103.04302, 449.1399, 642.6016, 365.76166, 546.9152, 136.54314, 96.912476]
2025-09-16 17:23:09,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 148.0, 24.0, 20.0, 81.0, 114.0, 69.0, 101.0, 26.0, 19.0]
2025-09-16 17:23:09,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1251 [DEBUG]: Training session finished
