2025-09-16 11:19:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.200-delay_3
2025-09-16 11:19:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.200-delay_3
2025-09-16 11:19:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'3': <latency_env.delayed_mdp.ConstantDelay object at 0x14d619794590>}
2025-09-16 11:19:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:19:22,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:19:22,756 baseline-bpql-noisepromille200-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=427, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:19:22,756 baseline-bpql-noisepromille200-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:19:24,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:19:24,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:21:06,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:21:07,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 375.68134 ± 68.306
2025-09-16 11:21:07,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [432.1695, 354.1518, 530.8959, 347.0965, 331.2676, 346.48148, 377.1303, 283.8783, 435.33307, 318.40927]
2025-09-16 11:21:07,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 67.0, 113.0, 64.0, 61.0, 63.0, 75.0, 54.0, 83.0, 59.0]
2025-09-16 11:21:07,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (375.68) for latency 3
2025-09-16 11:21:07,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 50 minutes, 9 seconds)
2025-09-16 11:22:59,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:23:00,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 457.27362 ± 51.557
2025-09-16 11:23:00,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [397.42532, 407.58405, 432.2588, 432.3897, 478.38657, 547.4713, 391.15683, 465.21008, 528.02, 492.8333]
2025-09-16 11:23:00,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 78.0, 82.0, 88.0, 88.0, 103.0, 74.0, 99.0, 107.0, 92.0]
2025-09-16 11:23:00,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (457.27) for latency 3
2025-09-16 11:23:00,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 56 minutes, 49 seconds)
2025-09-16 11:24:53,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:24:54,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 424.78442 ± 114.521
2025-09-16 11:24:54,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [293.87836, 703.7367, 379.05585, 507.14484, 351.19406, 348.6819, 503.2946, 358.0083, 448.35162, 354.49796]
2025-09-16 11:24:54,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 135.0, 84.0, 98.0, 68.0, 65.0, 95.0, 67.0, 83.0, 67.0]
2025-09-16 11:24:54,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 57 minutes, 59 seconds)
2025-09-16 11:26:46,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:26:47,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 396.70258 ± 68.935
2025-09-16 11:26:47,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [337.63254, 487.55154, 336.63712, 453.58817, 480.0004, 314.7184, 335.29617, 322.80078, 433.78632, 465.01413]
2025-09-16 11:26:47,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 88.0, 62.0, 89.0, 89.0, 61.0, 62.0, 60.0, 83.0, 88.0]
2025-09-16 11:26:47,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 57 minutes, 9 seconds)
2025-09-16 11:28:39,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:28:40,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 372.35284 ± 45.903
2025-09-16 11:28:40,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [364.61246, 414.30896, 296.91742, 327.79147, 350.2767, 394.1401, 467.20547, 400.90665, 346.52344, 360.84595]
2025-09-16 11:28:40,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 87.0, 59.0, 61.0, 66.0, 86.0, 93.0, 82.0, 74.0, 71.0]
2025-09-16 11:28:40,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 56 minutes, 5 seconds)
2025-09-16 11:30:33,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:30:34,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 379.66043 ± 89.066
2025-09-16 11:30:34,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [547.4113, 254.09718, 367.88544, 318.63525, 322.65656, 539.31616, 343.58856, 342.3348, 372.50412, 388.1749]
2025-09-16 11:30:34,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 49.0, 71.0, 62.0, 61.0, 114.0, 66.0, 64.0, 69.0, 78.0]
2025-09-16 11:30:34,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 57 minutes, 37 seconds)
2025-09-16 11:32:25,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:32:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 463.77740 ± 127.087
2025-09-16 11:32:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [422.68637, 683.07043, 482.4794, 406.3127, 359.452, 683.6857, 491.75006, 488.65945, 314.41718, 305.26086]
2025-09-16 11:32:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 130.0, 99.0, 78.0, 69.0, 147.0, 95.0, 106.0, 58.0, 56.0]
2025-09-16 11:32:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (463.78) for latency 3
2025-09-16 11:32:27,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 55 minutes, 33 seconds)
2025-09-16 11:34:19,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:34:20,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 460.64014 ± 116.576
2025-09-16 11:34:20,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [571.8459, 453.53058, 376.7437, 734.38446, 306.58832, 359.30206, 444.57996, 520.88556, 409.3526, 429.188]
2025-09-16 11:34:20,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 93.0, 71.0, 141.0, 61.0, 67.0, 86.0, 97.0, 78.0, 81.0]
2025-09-16 11:34:20,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 53 minutes, 36 seconds)
2025-09-16 11:36:12,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:36:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 395.24310 ± 64.758
2025-09-16 11:36:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [512.9888, 304.77002, 349.60455, 400.78964, 461.0224, 417.62372, 370.95477, 295.0219, 396.35492, 443.30005]
2025-09-16 11:36:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 57.0, 65.0, 76.0, 87.0, 78.0, 71.0, 57.0, 77.0, 84.0]
2025-09-16 11:36:13,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 51 minutes, 46 seconds)
2025-09-16 11:38:07,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:38:08,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 399.40829 ± 96.964
2025-09-16 11:38:08,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [328.88174, 362.534, 363.40912, 376.12503, 417.46832, 349.30228, 669.8777, 438.91196, 376.2641, 311.30865]
2025-09-16 11:38:08,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 78.0, 67.0, 70.0, 80.0, 65.0, 127.0, 83.0, 72.0, 67.0]
2025-09-16 11:38:08,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 50 minutes, 20 seconds)
2025-09-16 11:40:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:40:01,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 572.67377 ± 153.381
2025-09-16 11:40:01,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [560.1179, 730.94635, 670.53326, 610.12555, 530.5736, 473.73685, 334.95367, 432.95755, 487.32776, 895.46533]
2025-09-16 11:40:01,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 153.0, 133.0, 115.0, 110.0, 89.0, 63.0, 80.0, 91.0, 169.0]
2025-09-16 11:40:01,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (572.67) for latency 3
2025-09-16 11:40:01,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 48 minutes, 22 seconds)
2025-09-16 11:41:53,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:41:54,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 512.97388 ± 119.573
2025-09-16 11:41:54,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [659.1092, 456.01102, 481.4777, 772.7304, 393.83478, 428.2756, 464.67624, 439.046, 619.0412, 415.53644]
2025-09-16 11:41:54,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 87.0, 89.0, 144.0, 80.0, 82.0, 87.0, 92.0, 128.0, 81.0]
2025-09-16 11:41:54,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 46 minutes, 29 seconds)
2025-09-16 11:43:47,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:43:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 596.41852 ± 126.044
2025-09-16 11:43:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [545.88385, 446.48688, 833.8009, 716.0303, 407.07535, 536.51825, 741.2812, 607.49445, 582.6147, 546.9992]
2025-09-16 11:43:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 84.0, 164.0, 135.0, 77.0, 101.0, 147.0, 119.0, 112.0, 102.0]
2025-09-16 11:43:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (596.42) for latency 3
2025-09-16 11:43:49,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 44 minutes, 55 seconds)
2025-09-16 11:45:42,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:45:43,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 610.17340 ± 115.949
2025-09-16 11:45:43,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [630.0459, 691.7891, 523.9783, 858.9951, 532.1967, 587.6766, 745.26746, 466.02884, 548.36804, 517.38806]
2025-09-16 11:45:43,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 131.0, 101.0, 162.0, 97.0, 111.0, 153.0, 90.0, 108.0, 111.0]
2025-09-16 11:45:43,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (610.17) for latency 3
2025-09-16 11:45:43,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 43 minutes, 27 seconds)
2025-09-16 11:47:36,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:47:38,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 570.38953 ± 190.502
2025-09-16 11:47:38,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [658.046, 550.47064, 1078.1207, 341.82092, 634.449, 423.21146, 492.7415, 507.01688, 482.09283, 535.92596]
2025-09-16 11:47:38,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 105.0, 228.0, 70.0, 120.0, 77.0, 96.0, 96.0, 93.0, 119.0]
2025-09-16 11:47:38,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 41 minutes, 30 seconds)
2025-09-16 11:49:31,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:49:32,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 564.23938 ± 134.912
2025-09-16 11:49:32,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [412.5972, 448.70895, 553.98895, 836.9768, 628.2145, 670.649, 544.7539, 637.82086, 567.949, 340.73462]
2025-09-16 11:49:32,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 85.0, 106.0, 174.0, 117.0, 135.0, 102.0, 121.0, 111.0, 75.0]
2025-09-16 11:49:32,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 39 minutes, 46 seconds)
2025-09-16 11:51:24,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:51:26,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 576.29138 ± 122.916
2025-09-16 11:51:26,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [536.55524, 633.8773, 471.9533, 646.7413, 723.6565, 440.8659, 500.21338, 586.2914, 408.98026, 813.77905]
2025-09-16 11:51:26,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 124.0, 90.0, 123.0, 139.0, 86.0, 93.0, 113.0, 88.0, 164.0]
2025-09-16 11:51:26,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 38 minutes, 4 seconds)
2025-09-16 11:53:19,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:53:20,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 702.31873 ± 277.324
2025-09-16 11:53:20,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [546.94055, 581.1942, 670.4254, 630.8253, 899.0122, 611.7005, 710.7709, 607.8749, 1431.5503, 332.8933]
2025-09-16 11:53:20,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 120.0, 127.0, 117.0, 172.0, 128.0, 133.0, 115.0, 282.0, 63.0]
2025-09-16 11:53:20,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (702.32) for latency 3
2025-09-16 11:53:21,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 36 minutes, 13 seconds)
2025-09-16 11:55:14,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:55:15,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 628.32050 ± 140.974
2025-09-16 11:55:15,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [727.7427, 1003.5754, 487.28174, 601.8073, 562.9698, 517.25366, 634.37164, 555.2166, 637.8851, 555.1009]
2025-09-16 11:55:15,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 211.0, 89.0, 113.0, 103.0, 106.0, 124.0, 122.0, 136.0, 109.0]
2025-09-16 11:55:15,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 34 minutes, 28 seconds)
2025-09-16 11:57:08,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:57:10,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 797.43213 ± 151.457
2025-09-16 11:57:10,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [799.46655, 736.2061, 635.0562, 958.9394, 676.62555, 618.3429, 645.7266, 877.53815, 1057.7704, 968.6496]
2025-09-16 11:57:10,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 144.0, 117.0, 204.0, 131.0, 114.0, 135.0, 172.0, 208.0, 196.0]
2025-09-16 11:57:10,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (797.43) for latency 3
2025-09-16 11:57:10,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 32 minutes, 36 seconds)
2025-09-16 11:59:03,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:59:05,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 730.55347 ± 164.150
2025-09-16 11:59:05,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [981.73804, 539.5004, 626.1367, 640.3048, 727.1569, 679.9449, 1071.9036, 705.8464, 773.5101, 559.49207]
2025-09-16 11:59:05,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 102.0, 116.0, 118.0, 139.0, 138.0, 204.0, 135.0, 153.0, 108.0]
2025-09-16 11:59:05,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 30 minutes, 53 seconds)
2025-09-16 12:00:59,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:01:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 733.80310 ± 145.426
2025-09-16 12:01:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [827.30786, 826.94904, 726.8021, 695.2143, 996.5148, 486.30783, 572.9827, 586.2578, 848.97, 770.7255]
2025-09-16 12:01:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 170.0, 145.0, 149.0, 191.0, 91.0, 110.0, 112.0, 166.0, 148.0]
2025-09-16 12:01:01,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 29 minutes, 36 seconds)
2025-09-16 12:02:54,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:02:55,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 699.21307 ± 161.050
2025-09-16 12:02:55,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [819.567, 512.342, 627.19257, 568.20166, 876.9291, 600.5655, 897.452, 558.3261, 567.9255, 963.6293]
2025-09-16 12:02:55,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 100.0, 119.0, 110.0, 177.0, 114.0, 173.0, 108.0, 107.0, 187.0]
2025-09-16 12:02:55,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 27 minutes, 34 seconds)
2025-09-16 12:04:49,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:04:51,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 806.93988 ± 198.718
2025-09-16 12:04:51,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1067.8226, 542.6723, 848.9436, 656.2104, 1142.5978, 911.97845, 505.77332, 693.30646, 822.8528, 877.2417]
2025-09-16 12:04:51,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 102.0, 165.0, 123.0, 227.0, 173.0, 94.0, 132.0, 156.0, 169.0]
2025-09-16 12:04:51,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (806.94) for latency 3
2025-09-16 12:04:51,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 25 minutes, 42 seconds)
2025-09-16 12:06:43,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:06:45,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 614.75360 ± 138.335
2025-09-16 12:06:45,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [497.2081, 562.37695, 604.05383, 629.07904, 487.77496, 418.98242, 579.52246, 923.8476, 715.12225, 729.569]
2025-09-16 12:06:45,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 103.0, 113.0, 118.0, 90.0, 78.0, 109.0, 174.0, 134.0, 144.0]
2025-09-16 12:06:45,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 23 minutes, 42 seconds)
2025-09-16 12:08:39,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:08:41,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 869.91077 ± 221.235
2025-09-16 12:08:41,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1261.8927, 613.6317, 915.4013, 746.58374, 916.5576, 1281.7888, 655.94495, 717.6645, 814.6173, 775.02527]
2025-09-16 12:08:41,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 116.0, 175.0, 143.0, 176.0, 253.0, 125.0, 139.0, 155.0, 154.0]
2025-09-16 12:08:41,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (869.91) for latency 3
2025-09-16 12:08:41,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 22 minutes, 3 seconds)
2025-09-16 12:10:34,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:10:36,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 715.85333 ± 181.041
2025-09-16 12:10:36,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [849.75653, 497.71368, 1082.1179, 787.1838, 544.9561, 625.7742, 473.40634, 699.881, 718.6672, 879.0764]
2025-09-16 12:10:36,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 108.0, 207.0, 148.0, 102.0, 117.0, 100.0, 129.0, 143.0, 176.0]
2025-09-16 12:10:36,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 19 minutes, 52 seconds)
2025-09-16 12:12:30,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:12:31,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 682.41772 ± 109.435
2025-09-16 12:12:31,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [596.4405, 555.7472, 606.23303, 640.98834, 618.16046, 953.95996, 730.1664, 753.48444, 726.92267, 642.07385]
2025-09-16 12:12:31,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 103.0, 117.0, 123.0, 115.0, 182.0, 137.0, 139.0, 139.0, 121.0]
2025-09-16 12:12:31,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 18 minutes, 10 seconds)
2025-09-16 12:14:25,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:14:27,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1036.69690 ± 478.015
2025-09-16 12:14:27,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [424.708, 768.0654, 1255.2617, 1049.9265, 990.0845, 1872.01, 876.9521, 1860.9736, 810.4669, 458.51932]
2025-09-16 12:14:27,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 149.0, 245.0, 204.0, 195.0, 363.0, 184.0, 368.0, 161.0, 86.0]
2025-09-16 12:14:27,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1036.70) for latency 3
2025-09-16 12:14:27,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 16 minutes, 29 seconds)
2025-09-16 12:16:20,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:16:23,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 960.64111 ± 277.934
2025-09-16 12:16:23,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [864.063, 1271.5149, 760.3884, 804.3576, 772.7535, 1328.0577, 1183.14, 1362.9102, 654.0664, 605.15875]
2025-09-16 12:16:23,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 260.0, 143.0, 155.0, 145.0, 263.0, 238.0, 271.0, 124.0, 131.0]
2025-09-16 12:16:23,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 14 minutes, 49 seconds)
2025-09-16 12:18:18,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:18:19,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 761.99200 ± 190.435
2025-09-16 12:18:19,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [843.23737, 1129.1643, 519.2994, 497.3816, 1016.1219, 645.29895, 706.4243, 684.95294, 746.3748, 831.6641]
2025-09-16 12:18:19,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 215.0, 98.0, 96.0, 196.0, 123.0, 132.0, 133.0, 141.0, 162.0]
2025-09-16 12:18:19,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 13 minutes, 2 seconds)
2025-09-16 12:20:13,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:20:15,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 793.58417 ± 242.242
2025-09-16 12:20:15,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [724.95013, 1159.2175, 1096.8029, 929.5351, 971.67017, 563.015, 573.7738, 927.18207, 429.81683, 559.8782]
2025-09-16 12:20:15,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 232.0, 206.0, 175.0, 183.0, 104.0, 108.0, 175.0, 79.0, 104.0]
2025-09-16 12:20:15,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 11 seconds)
2025-09-16 12:22:10,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:22:12,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 812.93585 ± 251.368
2025-09-16 12:22:12,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [789.8897, 986.1119, 854.4524, 417.98532, 385.58655, 796.8963, 1079.2571, 892.7212, 1226.1678, 700.29016]
2025-09-16 12:22:12,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 194.0, 162.0, 78.0, 73.0, 163.0, 211.0, 171.0, 230.0, 132.0]
2025-09-16 12:22:12,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 42 seconds)
2025-09-16 12:24:08,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:24:11,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 935.37878 ± 316.899
2025-09-16 12:24:11,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [966.2666, 1266.4639, 542.5713, 472.05545, 997.2952, 1260.7036, 1163.348, 1237.1566, 421.7946, 1026.1332]
2025-09-16 12:24:11,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 240.0, 100.0, 87.0, 188.0, 256.0, 231.0, 234.0, 77.0, 215.0]
2025-09-16 12:24:11,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 19 seconds)
2025-09-16 12:26:04,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:26:07,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1109.89282 ± 379.030
2025-09-16 12:26:07,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [781.0307, 1467.8116, 625.5608, 794.1431, 1821.7256, 739.59125, 1561.9309, 1022.45953, 1078.366, 1206.3098]
2025-09-16 12:26:07,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 312.0, 119.0, 146.0, 358.0, 155.0, 316.0, 212.0, 207.0, 247.0]
2025-09-16 12:26:07,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1109.89) for latency 3
2025-09-16 12:26:07,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 32 seconds)
2025-09-16 12:28:02,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:28:06,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1278.71497 ± 587.991
2025-09-16 12:28:06,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1242.7739, 1515.9598, 2189.2642, 851.41327, 849.4413, 1682.3068, 1147.0247, 2234.0603, 460.18863, 614.7172]
2025-09-16 12:28:06,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [242.0, 298.0, 419.0, 158.0, 161.0, 326.0, 216.0, 449.0, 87.0, 120.0]
2025-09-16 12:28:06,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1278.71) for latency 3
2025-09-16 12:28:06,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 3 seconds)
2025-09-16 12:30:00,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:30:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 922.58264 ± 226.249
2025-09-16 12:30:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1444.7343, 1004.8755, 763.4252, 895.0494, 682.3567, 876.74426, 625.7951, 810.0641, 1121.7786, 1001.0033]
2025-09-16 12:30:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [296.0, 194.0, 148.0, 202.0, 130.0, 163.0, 121.0, 150.0, 230.0, 191.0]
2025-09-16 12:30:02,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 22 seconds)
2025-09-16 12:31:57,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:32:01,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1585.26538 ± 665.336
2025-09-16 12:32:01,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2185.1594, 758.426, 1205.9884, 979.2389, 1657.2135, 600.47815, 1648.199, 2790.858, 2204.3643, 1822.7291]
2025-09-16 12:32:01,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [442.0, 162.0, 235.0, 189.0, 331.0, 115.0, 332.0, 553.0, 461.0, 375.0]
2025-09-16 12:32:01,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1585.27) for latency 3
2025-09-16 12:32:01,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 49 seconds)
2025-09-16 12:34:01,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:34:06,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1654.70239 ± 1004.916
2025-09-16 12:34:06,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1102.6799, 1124.3191, 1235.1647, 467.6943, 1007.4447, 3052.8936, 1223.0164, 3632.7212, 1020.47626, 2680.6128]
2025-09-16 12:34:06,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 233.0, 244.0, 101.0, 195.0, 591.0, 248.0, 733.0, 218.0, 525.0]
2025-09-16 12:34:06,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1654.70) for latency 3
2025-09-16 12:34:06,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute, 4 seconds)
2025-09-16 12:35:59,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:36:03,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1437.88684 ± 569.571
2025-09-16 12:36:03,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [860.3944, 1177.2883, 870.6073, 1028.8744, 2696.2214, 1014.4637, 1783.7738, 1427.8469, 2123.0515, 1396.3457]
2025-09-16 12:36:03,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 239.0, 170.0, 206.0, 541.0, 201.0, 354.0, 270.0, 424.0, 266.0]
2025-09-16 12:36:03,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 11 seconds)
2025-09-16 12:37:56,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:38:00,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1483.47070 ± 926.141
2025-09-16 12:38:00,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2596.3022, 598.57294, 755.7137, 3751.696, 1101.6912, 1561.8383, 1458.353, 1118.734, 851.2107, 1040.5947]
2025-09-16 12:38:00,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [520.0, 117.0, 155.0, 736.0, 212.0, 310.0, 288.0, 215.0, 160.0, 199.0]
2025-09-16 12:38:00,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 56 minutes, 58 seconds)
2025-09-16 12:39:59,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:40:03,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1604.35852 ± 896.362
2025-09-16 12:40:03,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [707.56995, 1897.9984, 747.75946, 429.40836, 1392.9945, 3197.689, 2484.5833, 823.9818, 2627.0056, 1734.5945]
2025-09-16 12:40:03,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 371.0, 141.0, 83.0, 265.0, 646.0, 511.0, 163.0, 535.0, 346.0]
2025-09-16 12:40:03,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 56 minutes, 12 seconds)
2025-09-16 12:41:59,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:42:03,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1564.31360 ± 866.499
2025-09-16 12:42:03,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1269.1241, 720.8289, 970.8709, 1418.2501, 1704.9347, 823.7841, 3263.3796, 491.27692, 2575.917, 2404.7693]
2025-09-16 12:42:03,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [253.0, 140.0, 179.0, 272.0, 341.0, 160.0, 641.0, 91.0, 518.0, 499.0]
2025-09-16 12:42:03,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 54 minutes, 21 seconds)
2025-09-16 12:44:01,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:44:06,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1891.01855 ± 900.124
2025-09-16 12:44:06,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2187.3174, 2923.7368, 1123.1694, 1718.3868, 3998.1602, 1727.6741, 1643.3235, 1299.3354, 1554.3676, 734.71484]
2025-09-16 12:44:06,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [434.0, 559.0, 233.0, 333.0, 797.0, 332.0, 327.0, 269.0, 303.0, 143.0]
2025-09-16 12:44:06,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1891.02) for latency 3
2025-09-16 12:44:06,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes, 2 seconds)
2025-09-16 12:46:02,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:46:06,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1431.55579 ± 1034.542
2025-09-16 12:46:06,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1431.4426, 702.96295, 4365.003, 911.92126, 1831.7798, 1003.85754, 1259.1869, 769.9431, 748.586, 1290.8741]
2025-09-16 12:46:06,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [277.0, 133.0, 874.0, 200.0, 373.0, 201.0, 237.0, 151.0, 163.0, 244.0]
2025-09-16 12:46:06,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes, 39 seconds)
2025-09-16 12:48:05,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:48:14,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3035.21729 ± 1374.260
2025-09-16 12:48:14,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1420.9996, 5009.674, 3346.1614, 2250.9438, 3686.546, 1557.3951, 4289.935, 2489.4648, 5043.012, 1258.0405]
2025-09-16 12:48:14,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [273.0, 1000.0, 658.0, 445.0, 743.0, 310.0, 857.0, 495.0, 1000.0, 246.0]
2025-09-16 12:48:14,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (3035.22) for latency 3
2025-09-16 12:48:14,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 50 minutes, 27 seconds)
2025-09-16 12:50:12,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:50:15,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1118.34644 ± 594.448
2025-09-16 12:50:15,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [633.51105, 639.6153, 1138.608, 1860.7194, 722.7686, 1162.4275, 532.6882, 663.1941, 1395.2958, 2434.6365]
2025-09-16 12:50:15,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 141.0, 215.0, 358.0, 137.0, 226.0, 100.0, 128.0, 264.0, 494.0]
2025-09-16 12:50:15,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 48 minutes, 7 seconds)
2025-09-16 12:52:12,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:52:22,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3284.80859 ± 1695.743
2025-09-16 12:52:22,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1793.1807, 1735.051, 4992.245, 1899.0825, 798.70276, 5001.9443, 4830.7793, 4903.535, 1849.739, 5043.8286]
2025-09-16 12:52:22,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [360.0, 350.0, 1000.0, 391.0, 157.0, 1000.0, 963.0, 1000.0, 371.0, 1000.0]
2025-09-16 12:52:22,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (3284.81) for latency 3
2025-09-16 12:52:22,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 47 minutes, 14 seconds)
2025-09-16 12:54:23,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:54:31,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2978.21729 ± 1494.519
2025-09-16 12:54:31,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [4043.0857, 1112.5094, 3716.6619, 5142.7754, 958.49457, 4809.2466, 2996.7034, 3658.4558, 2441.422, 902.8162]
2025-09-16 12:54:31,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [820.0, 235.0, 734.0, 1000.0, 214.0, 944.0, 580.0, 705.0, 471.0, 172.0]
2025-09-16 12:54:31,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 46 minutes, 14 seconds)
2025-09-16 12:56:26,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:56:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2905.02905 ± 1329.723
2025-09-16 12:56:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2995.556, 2618.0928, 5005.219, 1554.9282, 1449.2937, 1149.5865, 2796.9636, 4100.121, 2370.4773, 5010.053]
2025-09-16 12:56:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [599.0, 526.0, 1000.0, 303.0, 275.0, 233.0, 584.0, 804.0, 472.0, 1000.0]
2025-09-16 12:56:34,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 44 minutes, 38 seconds)
2025-09-16 12:58:39,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:58:46,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2420.68774 ± 1161.024
2025-09-16 12:58:46,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1776.0475, 3983.435, 1862.1727, 2090.036, 2047.5247, 2488.4536, 1593.0938, 978.3166, 5116.823, 2270.9749]
2025-09-16 12:58:46,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [344.0, 806.0, 353.0, 414.0, 393.0, 464.0, 300.0, 185.0, 1000.0, 441.0]
2025-09-16 12:58:46,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 43 minutes, 9 seconds)
2025-09-16 13:00:42,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:00:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3166.22314 ± 1665.192
2025-09-16 13:00:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1276.1588, 5031.3584, 5054.6353, 3993.8623, 1770.9362, 1411.0178, 1122.8976, 5133.564, 4700.7827, 2167.0159]
2025-09-16 13:00:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 1000.0, 1000.0, 790.0, 346.0, 286.0, 213.0, 1000.0, 899.0, 419.0]
2025-09-16 13:00:52,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 41 minutes, 49 seconds)
2025-09-16 13:02:54,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:03:02,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2871.87378 ± 1446.332
2025-09-16 13:03:02,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1378.5303, 4897.04, 3648.8767, 2666.7668, 1225.9827, 5106.7373, 3191.9563, 3970.4568, 1105.9606, 1526.4291]
2025-09-16 13:03:02,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [256.0, 1000.0, 728.0, 524.0, 243.0, 1000.0, 612.0, 799.0, 224.0, 293.0]
2025-09-16 13:03:02,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 40 minutes, 17 seconds)
2025-09-16 13:04:56,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:05:07,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3547.74731 ± 1865.156
2025-09-16 13:05:07,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5073.979, 4638.168, 5087.185, 4650.209, 680.3685, 5145.3, 967.54645, 3566.416, 4982.8853, 685.41565]
2025-09-16 13:05:07,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 940.0, 1000.0, 928.0, 128.0, 1000.0, 186.0, 693.0, 1000.0, 136.0]
2025-09-16 13:05:07,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (3547.75) for latency 3
2025-09-16 13:05:07,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 37 minutes, 27 seconds)
2025-09-16 13:07:08,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:07:21,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4133.05566 ± 1294.299
2025-09-16 13:07:21,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3762.0674, 4621.942, 710.50275, 3768.6814, 5023.117, 5090.9077, 5079.3706, 5084.3726, 4807.426, 3382.1711]
2025-09-16 13:07:21,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [752.0, 901.0, 136.0, 746.0, 1000.0, 1000.0, 1000.0, 1000.0, 957.0, 675.0]
2025-09-16 13:07:21,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (4133.06) for latency 3
2025-09-16 13:07:21,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 36 minutes, 56 seconds)
2025-09-16 13:09:14,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:09:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4325.82227 ± 1205.074
2025-09-16 13:09:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5112.2017, 5041.8223, 2257.45, 3134.7473, 5164.4307, 5135.4194, 4948.3506, 2178.131, 5130.245, 5155.423]
2025-09-16 13:09:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 442.0, 617.0, 1000.0, 1000.0, 1000.0, 420.0, 1000.0, 1000.0]
2025-09-16 13:09:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (4325.82) for latency 3
2025-09-16 13:09:27,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 34 minutes, 2 seconds)
2025-09-16 13:11:27,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:11:42,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4925.38867 ± 459.562
2025-09-16 13:11:42,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5229.8213, 5056.139, 3609.0652, 4786.203, 5098.823, 5148.673, 5197.8857, 4962.356, 4932.089, 5232.8276]
2025-09-16 13:11:42,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 707.0, 939.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:11:42,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (4925.39) for latency 3
2025-09-16 13:11:42,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 33 minutes, 9 seconds)
2025-09-16 13:13:41,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:13:52,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3585.75122 ± 1601.682
2025-09-16 13:13:52,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3141.9236, 5045.7573, 513.8034, 5050.487, 2513.9412, 4982.096, 5146.4185, 5085.7964, 2354.0657, 2023.2225]
2025-09-16 13:13:52,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [638.0, 1000.0, 99.0, 1000.0, 514.0, 1000.0, 1000.0, 1000.0, 456.0, 406.0]
2025-09-16 13:13:52,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 30 minutes, 59 seconds)
2025-09-16 13:15:44,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:15:53,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3328.52539 ± 1766.535
2025-09-16 13:15:53,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1165.134, 1488.2463, 4413.6045, 2210.74, 5166.915, 5121.199, 5070.2876, 5149.591, 2929.6836, 569.8527]
2025-09-16 13:15:53,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 296.0, 845.0, 430.0, 1000.0, 1000.0, 1000.0, 1000.0, 578.0, 109.0]
2025-09-16 13:15:53,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 28 minutes, 20 seconds)
2025-09-16 13:18:00,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:18:13,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4514.39600 ± 1255.462
2025-09-16 13:18:13,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5116.683, 3515.9138, 4933.7134, 5094.868, 5167.023, 5103.9863, 5080.87, 1021.06793, 4994.642, 5115.1934]
2025-09-16 13:18:13,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 704.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 210.0, 1000.0, 1000.0]
2025-09-16 13:18:13,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 27 minutes, 2 seconds)
2025-09-16 13:20:07,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:20:18,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3527.44531 ± 1483.685
2025-09-16 13:20:18,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1356.1016, 1750.1671, 4493.1646, 2758.8943, 5152.215, 1330.709, 4852.023, 4676.3457, 4879.8022, 4025.029]
2025-09-16 13:20:18,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [288.0, 366.0, 875.0, 559.0, 1000.0, 288.0, 1000.0, 926.0, 1000.0, 805.0]
2025-09-16 13:20:18,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 24 minutes, 35 seconds)
2025-09-16 13:22:20,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:22:33,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4460.85059 ± 1354.584
2025-09-16 13:22:33,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5097.9746, 5083.9336, 1472.2549, 5176.9478, 5229.394, 5204.698, 2060.506, 5067.941, 5106.5576, 5108.302]
2025-09-16 13:22:33,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 284.0, 1000.0, 1000.0, 1000.0, 375.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:22:33,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 22 minutes, 32 seconds)
2025-09-16 13:24:27,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:24:41,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4635.37402 ± 1047.216
2025-09-16 13:24:41,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5201.986, 4779.1475, 5205.8916, 5152.2734, 5117.891, 5189.732, 5165.782, 3389.8157, 5220.7827, 1930.446]
2025-09-16 13:24:41,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 928.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 648.0, 1000.0, 377.0]
2025-09-16 13:24:41,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 20 minutes)
2025-09-16 13:26:48,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:27:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4655.32959 ± 1270.457
2025-09-16 13:27:02,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5139.798, 5244.6875, 955.3183, 5227.087, 5164.3047, 5242.841, 4972.6094, 5192.7495, 5227.7812, 4186.1226]
2025-09-16 13:27:02,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 183.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 821.0]
2025-09-16 13:27:02,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 20 minutes, 17 seconds)
2025-09-16 13:28:54,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:29:08,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5064.09229 ± 377.839
2025-09-16 13:29:08,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5000.9604, 3958.811, 5188.161, 5256.1245, 5250.271, 5234.8926, 5221.333, 5065.053, 5279.2085, 5186.1074]
2025-09-16 13:29:08,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 752.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 967.0, 1000.0, 1000.0]
2025-09-16 13:29:08,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5064.09) for latency 3
2025-09-16 13:29:08,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 16 minutes, 24 seconds)
2025-09-16 13:31:08,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:31:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4788.49316 ± 858.908
2025-09-16 13:31:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5126.5337, 5247.864, 5167.2627, 5039.774, 5215.3823, 5127.0894, 5185.4224, 5172.6504, 2347.3076, 4255.647]
2025-09-16 13:31:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 482.0, 826.0]
2025-09-16 13:31:22,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 15 minutes, 15 seconds)
2025-09-16 13:33:24,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:33:40,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5145.53467 ± 46.174
2025-09-16 13:33:40,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5072.096, 5063.2495, 5187.8896, 5112.372, 5207.4434, 5164.537, 5136.3027, 5177.9272, 5170.8125, 5162.7144]
2025-09-16 13:33:40,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:33:40,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5145.53) for latency 3
2025-09-16 13:33:40,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 13 minutes, 17 seconds)
2025-09-16 13:35:31,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:35:43,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4555.78125 ± 1519.804
2025-09-16 13:35:43,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5337.53, 5278.934, 1063.2903, 5302.895, 5267.768, 5353.752, 5325.095, 2032.0435, 5276.25, 5320.257]
2025-09-16 13:35:43,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 203.0, 1000.0, 1000.0, 1000.0, 1000.0, 365.0, 1000.0, 1000.0]
2025-09-16 13:35:43,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 10 minutes, 39 seconds)
2025-09-16 13:37:45,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:37:58,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4480.69531 ± 1418.910
2025-09-16 13:37:58,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2996.1926, 5312.967, 5265.319, 5307.153, 794.4703, 4208.372, 5214.59, 5113.698, 5337.366, 5256.824]
2025-09-16 13:37:58,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [580.0, 1000.0, 1000.0, 1000.0, 149.0, 790.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:37:58,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 7 minutes, 46 seconds)
2025-09-16 13:39:59,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:40:14,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4911.33936 ± 411.077
2025-09-16 13:40:14,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5085.456, 4944.6597, 5150.771, 5083.648, 3688.9463, 5017.7163, 5045.9014, 5059.5225, 4980.606, 5056.168]
2025-09-16 13:40:14,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 730.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:40:14,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 6 minutes, 37 seconds)
2025-09-16 13:42:07,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:42:21,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4889.26172 ± 1100.335
2025-09-16 13:42:21,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5200.032, 5219.366, 5318.4976, 1590.3156, 5272.3843, 5248.225, 5270.898, 5320.3535, 5237.688, 5214.8604]
2025-09-16 13:42:21,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 298.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:42:21,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 3 minutes, 43 seconds)
2025-09-16 13:44:21,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:44:31,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3510.54370 ± 1859.762
2025-09-16 13:44:31,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5183.99, 5298.266, 3005.291, 3279.694, 978.82855, 5281.0835, 1022.47314, 804.2917, 5179.033, 5072.4854]
2025-09-16 13:44:31,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 590.0, 627.0, 187.0, 1000.0, 188.0, 157.0, 1000.0, 1000.0]
2025-09-16 13:44:31,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 45 seconds)
2025-09-16 13:46:30,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:46:41,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3773.51489 ± 2014.661
2025-09-16 13:46:41,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [4912.743, 5175.2773, 5232.448, 1022.6357, 5176.289, 5026.55, 5242.861, 549.7655, 4846.356, 550.2238]
2025-09-16 13:46:41,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 201.0, 1000.0, 1000.0, 1000.0, 113.0, 935.0, 118.0]
2025-09-16 13:46:41,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 59 minutes, 13 seconds)
2025-09-16 13:48:44,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:48:58,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4900.29590 ± 917.338
2025-09-16 13:48:58,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5204.7026, 5226.6914, 5207.7847, 5163.3022, 5224.7095, 5121.5225, 5225.5537, 2150.5571, 5267.3, 5210.8364]
2025-09-16 13:48:58,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 431.0, 1000.0, 1000.0]
2025-09-16 13:48:58,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 57 minutes, 14 seconds)
2025-09-16 13:50:49,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:51:03,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4889.36768 ± 1001.572
2025-09-16 13:51:03,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1889.5544, 5252.57, 5232.628, 5065.2803, 5251.589, 5223.823, 5229.6353, 5278.53, 5271.2637, 5198.8022]
2025-09-16 13:51:03,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [355.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:51:03,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 54 minutes, 2 seconds)
2025-09-16 13:53:11,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:53:26,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5149.25244 ± 297.810
2025-09-16 13:53:26,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5292.5273, 4275.6255, 5148.0283, 5263.2954, 5295.9434, 5139.0635, 5323.7812, 5295.874, 5272.7266, 5185.655]
2025-09-16 13:53:26,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 803.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:53:26,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5149.25) for latency 3
2025-09-16 13:53:26,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 53 minutes, 10 seconds)
2025-09-16 13:55:17,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:55:33,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5216.26025 ± 52.103
2025-09-16 13:55:33,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5249.1465, 5230.6646, 5293.45, 5164.8755, 5097.9336, 5181.897, 5228.804, 5233.9814, 5254.8706, 5226.9814]
2025-09-16 13:55:33,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:55:33,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5216.26) for latency 3
2025-09-16 13:55:33,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 50 minutes, 44 seconds)
2025-09-16 13:57:33,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:57:48,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5189.04199 ± 252.704
2025-09-16 13:57:48,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5255.7017, 5322.44, 5231.1714, 5255.982, 4437.042, 5282.4844, 5286.9854, 5222.02, 5323.879, 5272.713]
2025-09-16 13:57:48,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 835.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:57:48,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 48 minutes, 52 seconds)
2025-09-16 13:59:49,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:00:03,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4952.82129 ± 829.334
2025-09-16 14:00:03,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5213.05, 5225.0396, 5196.8877, 5304.8506, 5274.939, 5126.166, 5260.2554, 2469.0476, 5192.6636, 5265.313]
2025-09-16 14:00:03,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 477.0, 1000.0, 1000.0]
2025-09-16 14:00:03,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 30 seconds)
2025-09-16 14:02:00,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:02:14,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4880.03320 ± 1070.836
2025-09-16 14:02:14,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5160.4937, 5246.4644, 1670.7604, 5259.8945, 5288.108, 5130.022, 5260.5493, 5273.7017, 5254.087, 5256.253]
2025-09-16 14:02:14,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 316.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:02:14,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 44 minutes, 42 seconds)
2025-09-16 14:04:06,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:04:20,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5050.92969 ± 650.054
2025-09-16 14:04:20,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5250.9565, 5263.125, 5241.9927, 5277.2886, 5287.0493, 5259.4316, 3102.2764, 5332.304, 5252.292, 5242.583]
2025-09-16 14:04:20,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 583.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:04:20,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 26 seconds)
2025-09-16 14:06:27,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:06:41,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4552.39551 ± 1145.794
2025-09-16 14:06:41,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1917.1522, 5163.8403, 5197.8125, 5075.8906, 5169.9136, 5090.1313, 2658.8176, 4979.605, 5158.326, 5112.464]
2025-09-16 14:06:41,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [402.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 550.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:06:41,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 40 minutes, 5 seconds)
2025-09-16 14:08:35,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:08:50,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5184.58496 ± 34.160
2025-09-16 14:08:50,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5216.987, 5164.787, 5248.3384, 5156.748, 5220.4316, 5161.5234, 5198.9663, 5128.3613, 5177.4697, 5172.2397]
2025-09-16 14:08:50,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:08:50,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 30 seconds)
2025-09-16 14:10:47,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:11:02,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5210.45215 ± 326.777
2025-09-16 14:11:02,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5278.853, 5291.5054, 4233.0923, 5317.101, 5323.9014, 5338.6567, 5286.1763, 5332.8433, 5364.1753, 5338.2173]
2025-09-16 14:11:02,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 782.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:11:02,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 9 seconds)
2025-09-16 14:13:05,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:13:20,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5221.71094 ± 34.304
2025-09-16 14:13:20,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5250.8105, 5245.6006, 5193.74, 5234.518, 5249.964, 5249.0986, 5233.0786, 5135.016, 5217.4434, 5207.843]
2025-09-16 14:13:20,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:13:20,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5221.71) for latency 3
2025-09-16 14:13:20,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 18 seconds)
2025-09-16 14:15:23,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:15:36,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4570.18457 ± 1420.552
2025-09-16 14:15:36,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5304.894, 5283.0454, 5212.828, 5298.26, 1765.4607, 5278.76, 1694.4664, 5264.5903, 5257.8306, 5341.712]
2025-09-16 14:15:36,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 341.0, 1000.0, 328.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:15:36,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 33 seconds)
2025-09-16 14:17:35,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:17:48,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4387.05713 ± 1504.998
2025-09-16 14:17:48,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5234.802, 5283.48, 1989.1823, 5307.8555, 5289.943, 5260.041, 5317.0664, 3846.7224, 1068.6879, 5272.788]
2025-09-16 14:17:48,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 372.0, 1000.0, 1000.0, 1000.0, 1000.0, 724.0, 215.0, 1000.0]
2025-09-16 14:17:48,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 53 seconds)
2025-09-16 14:19:43,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:19:58,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5099.96826 ± 171.159
2025-09-16 14:19:58,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5214.587, 5126.1465, 4640.2847, 5230.154, 5197.5615, 4958.349, 5096.1533, 5145.944, 5172.7964, 5217.708]
2025-09-16 14:19:58,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 897.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:19:58,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 43 seconds)
2025-09-16 14:21:57,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:22:11,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5017.36572 ± 710.645
2025-09-16 14:22:11,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5314.07, 5278.9907, 5263.8525, 5328.0244, 5239.9565, 5103.962, 5241.9116, 5222.3867, 5287.709, 2892.7878]
2025-09-16 14:22:11,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 551.0]
2025-09-16 14:22:11,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 32 seconds)
2025-09-16 14:24:03,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:24:18,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5249.34277 ± 39.180
2025-09-16 14:24:18,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5174.363, 5286.049, 5232.0527, 5305.145, 5267.3696, 5226.4106, 5238.1895, 5305.3667, 5224.419, 5234.0664]
2025-09-16 14:24:18,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:24:18,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5249.34) for latency 3
2025-09-16 14:24:18,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 56 seconds)
2025-09-16 14:26:18,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:26:32,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4957.22363 ± 991.426
2025-09-16 14:26:32,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5215.0693, 5300.245, 5303.698, 5298.637, 5281.284, 5316.345, 5297.167, 5244.6064, 1984.5289, 5330.6543]
2025-09-16 14:26:32,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 383.0, 1000.0]
2025-09-16 14:26:32,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 40 seconds)
2025-09-16 14:28:39,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:28:53,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5085.77588 ± 642.786
2025-09-16 14:28:53,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3158.0105, 5299.66, 5301.058, 5310.2744, 5270.0347, 5320.7197, 5314.0625, 5318.001, 5282.8755, 5283.0625]
2025-09-16 14:28:53,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [598.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:28:53,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 44 seconds)
2025-09-16 14:30:50,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:31:02,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4619.92285 ± 1458.305
2025-09-16 14:31:02,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5299.7217, 5254.826, 5369.962, 5293.171, 5287.2524, 5354.6934, 5384.267, 2687.3232, 5332.3286, 935.6872]
2025-09-16 14:31:02,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 498.0, 1000.0, 170.0]
2025-09-16 14:31:02,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 30 seconds)
2025-09-16 14:33:00,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:33:15,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4986.67969 ± 515.849
2025-09-16 14:33:15,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5167.0405, 3455.4473, 5071.105, 5001.699, 5273.1763, 5216.107, 5181.984, 5191.9414, 5205.933, 5102.3643]
2025-09-16 14:33:15,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 662.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:33:15,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 16 seconds)
2025-09-16 14:35:10,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:35:25,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5299.04736 ± 51.976
2025-09-16 14:35:25,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5220.696, 5334.336, 5200.6846, 5304.124, 5276.2065, 5357.8926, 5336.6226, 5344.857, 5341.374, 5273.681]
2025-09-16 14:35:25,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 988.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:35:25,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5299.05) for latency 3
2025-09-16 14:35:25,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 7 seconds)
2025-09-16 14:37:22,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:37:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5317.08447 ± 29.382
2025-09-16 14:37:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5298.6504, 5338.0093, 5340.47, 5245.9004, 5319.0327, 5321.1064, 5348.9404, 5307.5947, 5303.521, 5347.6206]
2025-09-16 14:37:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:37:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5317.08) for latency 3
2025-09-16 14:37:36,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 51 seconds)
2025-09-16 14:39:26,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:39:40,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4931.45605 ± 1117.118
2025-09-16 14:39:40,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5319.3193, 5301.219, 5282.1226, 5287.329, 1580.6431, 5335.964, 5335.558, 5308.45, 5279.475, 5284.4814]
2025-09-16 14:39:40,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 297.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:39:40,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 28 seconds)
2025-09-16 14:41:44,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:41:58,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4799.03125 ± 1431.780
2025-09-16 14:41:58,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5315.119, 5212.183, 5338.6855, 506.6628, 5272.11, 5241.3164, 5296.1045, 5163.135, 5314.5347, 5330.467]
2025-09-16 14:41:58,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 110.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:41:58,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 22 seconds)
2025-09-16 14:43:54,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:44:09,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4951.35791 ± 844.753
2025-09-16 14:44:09,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5252.881, 5254.918, 5256.7256, 5195.8433, 5268.9707, 5245.7188, 5182.9614, 2418.4402, 5205.2373, 5231.8833]
2025-09-16 14:44:09,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 451.0, 1000.0, 1000.0]
2025-09-16 14:44:09,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 10 seconds)
2025-09-16 14:46:05,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:46:20,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5204.89551 ± 24.415
2025-09-16 14:46:20,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5143.3555, 5184.421, 5218.8447, 5235.7983, 5209.0005, 5211.499, 5206.8228, 5199.906, 5213.3154, 5225.987]
2025-09-16 14:46:20,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:46:20,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1251 [DEBUG]: Training session finished
