2025-09-16 12:20:46,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.025-delay_15
2025-09-16 12:20:46,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.025-delay_15
2025-09-16 12:20:46,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x14e644d04890>}
2025-09-16 12:20:46,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:20:46,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:20:46,624 baseline-bpql-noisepromille25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=631, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:20:46,624 baseline-bpql-noisepromille25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:20:48,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:20:48,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:22:36,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:22:37,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 301.00995 ± 19.801
2025-09-16 12:22:37,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [279.83087, 309.13635, 290.43207, 316.39636, 331.9553, 303.24423, 265.12738, 300.03513, 326.1439, 287.7978]
2025-09-16 12:22:37,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 59.0, 56.0, 60.0, 64.0, 59.0, 51.0, 58.0, 62.0, 55.0]
2025-09-16 12:22:37,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (301.01) for latency 15
2025-09-16 12:22:37,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 7 seconds)
2025-09-16 12:24:34,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:24:35,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 397.20105 ± 127.520
2025-09-16 12:24:35,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [565.99866, 388.81528, 252.69989, 363.69135, 298.46472, 641.2601, 211.47972, 347.8893, 455.95187, 445.75983]
2025-09-16 12:24:35,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 75.0, 52.0, 70.0, 61.0, 124.0, 43.0, 67.0, 90.0, 97.0]
2025-09-16 12:24:35,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (397.20) for latency 15
2025-09-16 12:24:35,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 43 seconds)
2025-09-16 12:26:31,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:26:33,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 436.40131 ± 114.109
2025-09-16 12:26:33,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [352.3215, 423.50583, 350.87354, 340.49368, 395.1382, 454.39767, 515.4782, 441.20053, 352.37936, 738.2246]
2025-09-16 12:26:33,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 90.0, 75.0, 67.0, 84.0, 85.0, 100.0, 89.0, 75.0, 151.0]
2025-09-16 12:26:33,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (436.40) for latency 15
2025-09-16 12:26:33,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 5 minutes, 53 seconds)
2025-09-16 12:28:29,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:28:31,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 423.80722 ± 87.331
2025-09-16 12:28:31,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [381.37808, 512.5154, 439.77725, 593.7996, 312.13043, 418.03488, 441.61273, 381.7458, 281.70084, 475.3774]
2025-09-16 12:28:31,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 109.0, 86.0, 121.0, 60.0, 81.0, 83.0, 71.0, 54.0, 92.0]
2025-09-16 12:28:31,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 5 minutes, 5 seconds)
2025-09-16 12:30:28,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:30:29,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 516.92554 ± 156.786
2025-09-16 12:30:29,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [622.2838, 516.30206, 882.08154, 317.06125, 413.60007, 361.43774, 584.5187, 376.9938, 530.72644, 564.2501]
2025-09-16 12:30:29,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 98.0, 176.0, 64.0, 78.0, 69.0, 117.0, 71.0, 104.0, 110.0]
2025-09-16 12:30:29,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (516.93) for latency 15
2025-09-16 12:30:29,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 4 minutes, 10 seconds)
2025-09-16 12:32:27,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:32:28,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 440.84741 ± 55.222
2025-09-16 12:32:28,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [463.58893, 447.3685, 379.05386, 564.22723, 494.02615, 450.8715, 401.5963, 368.69238, 431.9661, 407.08316]
2025-09-16 12:32:28,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 83.0, 80.0, 118.0, 95.0, 95.0, 79.0, 81.0, 80.0, 76.0]
2025-09-16 12:32:28,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 5 minutes, 8 seconds)
2025-09-16 12:34:25,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:34:27,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 533.44556 ± 163.984
2025-09-16 12:34:27,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [449.3191, 573.7481, 859.4601, 566.8332, 364.9403, 245.87672, 711.221, 498.05417, 599.0484, 465.95462]
2025-09-16 12:34:27,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 116.0, 185.0, 111.0, 71.0, 48.0, 143.0, 102.0, 124.0, 99.0]
2025-09-16 12:34:27,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (533.45) for latency 15
2025-09-16 12:34:27,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 3 minutes, 19 seconds)
2025-09-16 12:36:24,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:36:25,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 426.87872 ± 125.017
2025-09-16 12:36:25,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [384.41257, 413.71768, 618.339, 577.2666, 396.4669, 346.09003, 502.85712, 181.68077, 528.091, 319.86533]
2025-09-16 12:36:25,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 82.0, 119.0, 109.0, 76.0, 66.0, 97.0, 35.0, 102.0, 61.0]
2025-09-16 12:36:25,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 1 minute, 32 seconds)
2025-09-16 12:38:22,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:38:23,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 456.32812 ± 136.868
2025-09-16 12:38:23,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [328.00595, 535.0099, 656.866, 297.7433, 438.87888, 685.26385, 249.80771, 476.15454, 406.54358, 489.00732]
2025-09-16 12:38:23,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 117.0, 126.0, 63.0, 94.0, 130.0, 49.0, 87.0, 87.0, 92.0]
2025-09-16 12:38:23,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 59 minutes, 43 seconds)
2025-09-16 12:40:21,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:40:22,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 466.28387 ± 74.045
2025-09-16 12:40:22,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [432.75125, 467.42593, 397.8701, 658.8409, 494.66528, 447.81595, 414.57108, 519.0161, 420.79956, 409.08292]
2025-09-16 12:40:22,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 87.0, 75.0, 125.0, 94.0, 83.0, 77.0, 101.0, 79.0, 78.0]
2025-09-16 12:40:22,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 57 minutes, 52 seconds)
2025-09-16 12:42:19,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:42:21,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 563.28357 ± 114.515
2025-09-16 12:42:21,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [433.65414, 477.8402, 578.37366, 613.28424, 481.33316, 719.52686, 428.7383, 571.4699, 532.78564, 795.8296]
2025-09-16 12:42:21,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 88.0, 112.0, 123.0, 90.0, 136.0, 90.0, 111.0, 101.0, 162.0]
2025-09-16 12:42:21,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (563.28) for latency 15
2025-09-16 12:42:21,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 55 minutes, 54 seconds)
2025-09-16 12:44:18,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:44:20,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 502.26358 ± 84.270
2025-09-16 12:44:20,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [394.30222, 416.49832, 508.13162, 623.38336, 377.9896, 503.04077, 623.428, 543.696, 465.7141, 566.4523]
2025-09-16 12:44:20,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 77.0, 96.0, 117.0, 71.0, 94.0, 122.0, 102.0, 86.0, 107.0]
2025-09-16 12:44:20,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes)
2025-09-16 12:46:18,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:46:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 576.94037 ± 104.645
2025-09-16 12:46:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [453.9291, 496.18433, 434.68176, 589.06525, 630.28577, 599.6267, 566.66656, 503.2998, 757.099, 738.56525]
2025-09-16 12:46:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 104.0, 85.0, 110.0, 118.0, 113.0, 107.0, 96.0, 151.0, 158.0]
2025-09-16 12:46:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (576.94) for latency 15
2025-09-16 12:46:19,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 52 minutes, 26 seconds)
2025-09-16 12:48:18,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:48:19,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 428.25049 ± 88.249
2025-09-16 12:48:19,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [519.8086, 441.1225, 419.92322, 541.2898, 406.64197, 470.4893, 222.53242, 438.84778, 485.7579, 336.09158]
2025-09-16 12:48:19,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 92.0, 80.0, 111.0, 77.0, 89.0, 43.0, 95.0, 91.0, 70.0]
2025-09-16 12:48:19,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 50 minutes, 45 seconds)
2025-09-16 12:50:17,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:50:18,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 542.11041 ± 108.873
2025-09-16 12:50:18,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [577.8177, 569.81616, 512.1179, 503.70193, 817.65234, 375.8188, 584.8859, 469.13837, 522.99646, 487.15842]
2025-09-16 12:50:18,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 106.0, 96.0, 105.0, 157.0, 84.0, 107.0, 87.0, 98.0, 90.0]
2025-09-16 12:50:18,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 48 minutes, 54 seconds)
2025-09-16 12:52:17,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:52:18,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 547.41541 ± 169.595
2025-09-16 12:52:18,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [468.4151, 579.9465, 228.40439, 637.11005, 460.02893, 771.6948, 406.4597, 677.34576, 442.66922, 802.07965]
2025-09-16 12:52:18,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 110.0, 44.0, 122.0, 89.0, 147.0, 76.0, 127.0, 95.0, 149.0]
2025-09-16 12:52:18,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 18 seconds)
2025-09-16 12:54:18,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:54:19,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 546.24255 ± 88.392
2025-09-16 12:54:19,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [424.63428, 368.09937, 559.4229, 537.5276, 597.4135, 598.01227, 604.35565, 487.90756, 623.31866, 661.7332]
2025-09-16 12:54:19,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 81.0, 102.0, 98.0, 114.0, 115.0, 114.0, 91.0, 132.0, 123.0]
2025-09-16 12:54:20,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 45 minutes, 54 seconds)
2025-09-16 12:56:18,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:56:20,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 591.55078 ± 173.904
2025-09-16 12:56:20,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [511.94067, 952.5657, 373.9524, 532.81995, 864.7847, 599.99744, 442.95587, 618.72125, 465.62192, 552.1477]
2025-09-16 12:56:20,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 183.0, 70.0, 118.0, 171.0, 114.0, 82.0, 117.0, 86.0, 122.0]
2025-09-16 12:56:20,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (591.55) for latency 15
2025-09-16 12:56:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 44 minutes, 7 seconds)
2025-09-16 12:58:18,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:58:19,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 546.19269 ± 67.882
2025-09-16 12:58:19,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [655.1796, 524.82294, 510.21228, 604.0071, 570.5886, 393.76642, 613.08075, 527.2869, 528.7068, 534.2756]
2025-09-16 12:58:19,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 110.0, 94.0, 130.0, 107.0, 83.0, 134.0, 98.0, 98.0, 116.0]
2025-09-16 12:58:19,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 42 minutes, 9 seconds)
2025-09-16 13:00:18,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:00:19,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 530.27545 ± 130.242
2025-09-16 13:00:19,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [542.4246, 692.2262, 593.6641, 594.3994, 416.06982, 365.05215, 563.6019, 488.92206, 308.88373, 737.5103]
2025-09-16 13:00:19,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 147.0, 114.0, 127.0, 86.0, 79.0, 109.0, 106.0, 62.0, 138.0]
2025-09-16 13:00:19,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 40 minutes, 14 seconds)
2025-09-16 13:02:18,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:02:19,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 573.33923 ± 143.715
2025-09-16 13:02:19,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [459.66425, 632.03845, 672.41187, 578.59625, 443.3756, 685.10693, 893.556, 490.01144, 491.95096, 386.6809]
2025-09-16 13:02:19,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 127.0, 129.0, 120.0, 82.0, 129.0, 171.0, 106.0, 105.0, 80.0]
2025-09-16 13:02:19,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 38 minutes, 15 seconds)
2025-09-16 13:04:18,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:04:20,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 593.45453 ± 190.860
2025-09-16 13:04:20,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [467.32007, 433.2626, 559.8038, 471.78064, 520.0254, 810.8948, 720.5308, 492.47714, 420.28998, 1038.1603]
2025-09-16 13:04:20,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 81.0, 106.0, 89.0, 97.0, 158.0, 135.0, 91.0, 79.0, 215.0]
2025-09-16 13:04:20,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (593.45) for latency 15
2025-09-16 13:04:20,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 36 minutes, 2 seconds)
2025-09-16 13:06:18,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:06:20,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 614.12427 ± 163.267
2025-09-16 13:06:20,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [514.9325, 789.775, 518.52216, 1011.5717, 623.2821, 500.55927, 675.53864, 449.02502, 540.619, 517.4176]
2025-09-16 13:06:20,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 158.0, 111.0, 195.0, 120.0, 98.0, 141.0, 84.0, 103.0, 109.0]
2025-09-16 13:06:20,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (614.12) for latency 15
2025-09-16 13:06:20,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 2 seconds)
2025-09-16 13:08:18,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:08:20,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 648.18799 ± 106.495
2025-09-16 13:08:20,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [612.02374, 749.885, 525.7874, 658.6707, 574.742, 595.6457, 547.0614, 667.5964, 908.7079, 641.75964]
2025-09-16 13:08:20,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 142.0, 115.0, 141.0, 125.0, 112.0, 103.0, 128.0, 189.0, 119.0]
2025-09-16 13:08:20,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (648.19) for latency 15
2025-09-16 13:08:20,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 13 seconds)
2025-09-16 13:10:19,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:10:21,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 508.79590 ± 95.832
2025-09-16 13:10:21,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [519.4854, 342.5666, 464.33438, 463.3017, 469.6985, 475.05298, 565.90076, 501.87436, 738.0461, 547.69836]
2025-09-16 13:10:21,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 67.0, 102.0, 90.0, 90.0, 88.0, 106.0, 110.0, 151.0, 109.0]
2025-09-16 13:10:21,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 19 seconds)
2025-09-16 13:12:20,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:12:21,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 553.90710 ± 93.846
2025-09-16 13:12:21,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [731.7093, 514.79504, 546.82336, 413.25766, 440.55164, 618.76447, 668.6294, 533.1285, 492.20795, 579.2035]
2025-09-16 13:12:21,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 103.0, 106.0, 76.0, 83.0, 115.0, 129.0, 119.0, 111.0, 108.0]
2025-09-16 13:12:21,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 30 seconds)
2025-09-16 13:14:20,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:14:22,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 648.49658 ± 144.006
2025-09-16 13:14:22,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [713.618, 552.0776, 901.47723, 497.8548, 537.35455, 592.8133, 502.36765, 598.46405, 906.7672, 682.1712]
2025-09-16 13:14:22,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 113.0, 192.0, 104.0, 115.0, 114.0, 108.0, 127.0, 194.0, 133.0]
2025-09-16 13:14:22,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (648.50) for latency 15
2025-09-16 13:14:22,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 28 seconds)
2025-09-16 13:16:21,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:16:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 562.24567 ± 149.068
2025-09-16 13:16:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [579.9439, 929.9655, 359.49075, 637.92084, 580.53815, 547.938, 615.3329, 421.99136, 501.84836, 447.4875]
2025-09-16 13:16:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 183.0, 74.0, 132.0, 106.0, 101.0, 114.0, 91.0, 93.0, 93.0]
2025-09-16 13:16:23,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 37 seconds)
2025-09-16 13:18:21,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:18:23,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 624.06091 ± 133.710
2025-09-16 13:18:23,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [719.33386, 519.0649, 513.8807, 541.9765, 626.39636, 556.6098, 542.0017, 917.898, 509.16788, 794.28]
2025-09-16 13:18:23,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 117.0, 95.0, 100.0, 118.0, 118.0, 100.0, 194.0, 95.0, 151.0]
2025-09-16 13:18:23,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 34 seconds)
2025-09-16 13:20:22,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:20:24,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 577.99054 ± 111.079
2025-09-16 13:20:24,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [471.7184, 561.23785, 695.8805, 523.4232, 493.67334, 590.9923, 722.85675, 774.91736, 419.321, 525.88403]
2025-09-16 13:20:24,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 104.0, 132.0, 113.0, 105.0, 107.0, 152.0, 155.0, 78.0, 111.0]
2025-09-16 13:20:24,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 40 seconds)
2025-09-16 13:22:21,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:22:23,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 684.42340 ± 159.281
2025-09-16 13:22:23,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [719.61597, 669.97894, 485.03632, 1073.6744, 554.9777, 820.0987, 533.2129, 686.1196, 669.3044, 632.21484]
2025-09-16 13:22:23,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 127.0, 90.0, 202.0, 121.0, 165.0, 98.0, 143.0, 137.0, 121.0]
2025-09-16 13:22:23,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (684.42) for latency 15
2025-09-16 13:22:23,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 27 seconds)
2025-09-16 13:24:23,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:24:25,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 634.14832 ± 153.237
2025-09-16 13:24:25,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [480.81775, 473.41525, 525.14154, 540.1802, 522.29865, 908.5786, 644.3852, 901.1896, 637.10065, 708.37585]
2025-09-16 13:24:25,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 87.0, 98.0, 114.0, 97.0, 176.0, 127.0, 169.0, 118.0, 136.0]
2025-09-16 13:24:25,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 42 seconds)
2025-09-16 13:26:23,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:26:24,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 560.12946 ± 91.723
2025-09-16 13:26:24,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [456.57812, 587.2912, 506.4417, 482.3779, 592.78015, 553.04175, 473.05792, 520.13477, 767.72266, 661.86804]
2025-09-16 13:26:24,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 111.0, 95.0, 98.0, 121.0, 105.0, 87.0, 111.0, 147.0, 133.0]
2025-09-16 13:26:24,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 14 minutes, 23 seconds)
2025-09-16 13:28:24,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:28:26,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 616.07849 ± 113.371
2025-09-16 13:28:26,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [476.04898, 693.9379, 482.8682, 460.99954, 802.9307, 626.3819, 734.78906, 581.1872, 586.32135, 715.32043]
2025-09-16 13:28:26,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 132.0, 91.0, 86.0, 162.0, 118.0, 141.0, 111.0, 109.0, 129.0]
2025-09-16 13:28:26,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 45 seconds)
2025-09-16 13:30:24,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:30:26,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 614.98621 ± 141.928
2025-09-16 13:30:26,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [585.37085, 775.4309, 454.27444, 583.66895, 466.3245, 919.4518, 550.6708, 669.52966, 466.50787, 678.63293]
2025-09-16 13:30:26,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 149.0, 84.0, 109.0, 86.0, 173.0, 102.0, 137.0, 86.0, 148.0]
2025-09-16 13:30:26,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 31 seconds)
2025-09-16 13:32:25,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:32:27,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 655.85565 ± 281.764
2025-09-16 13:32:27,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [501.79233, 388.56543, 602.0244, 569.4692, 1445.1852, 572.28345, 429.5638, 638.2567, 675.2539, 736.162]
2025-09-16 13:32:27,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 71.0, 113.0, 122.0, 279.0, 107.0, 91.0, 118.0, 135.0, 138.0]
2025-09-16 13:32:27,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 47 seconds)
2025-09-16 13:34:27,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:34:29,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 735.04065 ± 136.290
2025-09-16 13:34:29,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [510.66608, 631.01776, 843.25183, 567.2631, 776.7095, 908.27924, 605.51483, 897.26154, 801.05096, 809.39197]
2025-09-16 13:34:29,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 119.0, 175.0, 106.0, 148.0, 174.0, 114.0, 178.0, 151.0, 151.0]
2025-09-16 13:34:29,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (735.04) for latency 15
2025-09-16 13:34:29,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 50 seconds)
2025-09-16 13:36:26,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:36:29,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 764.82202 ± 188.597
2025-09-16 13:36:29,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [673.62463, 721.5551, 1107.8833, 594.8069, 911.7077, 1040.3394, 612.1361, 695.65607, 794.8054, 495.7057]
2025-09-16 13:36:29,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 157.0, 220.0, 125.0, 168.0, 213.0, 131.0, 139.0, 160.0, 92.0]
2025-09-16 13:36:29,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (764.82) for latency 15
2025-09-16 13:36:29,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 52 seconds)
2025-09-16 13:38:28,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:38:30,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 726.18854 ± 152.474
2025-09-16 13:38:30,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [608.65765, 644.03955, 712.7344, 928.8672, 672.7671, 558.0872, 1007.02936, 780.3383, 512.75806, 836.60675]
2025-09-16 13:38:30,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 120.0, 136.0, 178.0, 125.0, 106.0, 188.0, 149.0, 96.0, 177.0]
2025-09-16 13:38:30,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 46 seconds)
2025-09-16 13:40:30,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:40:31,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 578.01453 ± 126.927
2025-09-16 13:40:31,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [503.89328, 621.16187, 746.967, 657.638, 684.05365, 296.0595, 437.36124, 574.4406, 590.1148, 668.45514]
2025-09-16 13:40:31,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 114.0, 138.0, 126.0, 137.0, 57.0, 95.0, 125.0, 109.0, 126.0]
2025-09-16 13:40:31,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 5 seconds)
2025-09-16 13:42:29,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:42:31,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 750.25696 ± 262.724
2025-09-16 13:42:31,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [832.99646, 631.0899, 448.67764, 615.9864, 669.121, 665.6203, 531.7964, 1364.8134, 657.0002, 1085.4675]
2025-09-16 13:42:31,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 116.0, 82.0, 128.0, 127.0, 131.0, 97.0, 269.0, 125.0, 228.0]
2025-09-16 13:42:31,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 49 seconds)
2025-09-16 13:44:29,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:44:32,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 806.99548 ± 215.915
2025-09-16 13:44:32,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1187.6079, 511.24493, 851.8927, 907.8387, 728.7855, 611.1561, 957.51556, 1077.8617, 548.1677, 687.88336]
2025-09-16 13:44:32,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [234.0, 94.0, 179.0, 191.0, 133.0, 128.0, 204.0, 210.0, 106.0, 132.0]
2025-09-16 13:44:32,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (807.00) for latency 15
2025-09-16 13:44:32,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 56 minutes, 36 seconds)
2025-09-16 13:46:34,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:46:36,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 738.92621 ± 186.916
2025-09-16 13:46:36,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [706.6725, 496.26776, 898.84314, 620.6509, 811.9802, 954.5515, 420.88327, 1044.7375, 752.37384, 682.3012]
2025-09-16 13:46:36,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 107.0, 171.0, 118.0, 154.0, 190.0, 80.0, 219.0, 139.0, 132.0]
2025-09-16 13:46:36,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 22 seconds)
2025-09-16 13:48:34,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:48:36,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 684.03888 ± 181.336
2025-09-16 13:48:36,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [698.46155, 540.4597, 559.83673, 541.1935, 558.4907, 827.43634, 866.36804, 1104.589, 555.4139, 588.1389]
2025-09-16 13:48:36,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 116.0, 121.0, 113.0, 120.0, 153.0, 190.0, 225.0, 117.0, 124.0]
2025-09-16 13:48:36,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 4 seconds)
2025-09-16 13:50:34,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:50:36,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 740.17139 ± 258.764
2025-09-16 13:50:36,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [469.796, 621.0657, 669.89874, 761.57996, 553.38403, 1390.0693, 763.81793, 994.9062, 524.384, 652.8127]
2025-09-16 13:50:36,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 119.0, 146.0, 153.0, 106.0, 280.0, 144.0, 190.0, 99.0, 143.0]
2025-09-16 13:50:36,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes, 50 seconds)
2025-09-16 13:52:36,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:52:38,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 729.50214 ± 225.812
2025-09-16 13:52:38,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [573.41656, 756.7008, 1315.2588, 549.81165, 699.25195, 909.91785, 739.62665, 557.3755, 522.6419, 671.01953]
2025-09-16 13:52:38,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 151.0, 279.0, 103.0, 133.0, 184.0, 155.0, 106.0, 101.0, 126.0]
2025-09-16 13:52:38,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 8 seconds)
2025-09-16 13:54:37,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:54:39,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 881.25098 ± 287.827
2025-09-16 13:54:39,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1122.9042, 735.56665, 628.4936, 957.07544, 1493.91, 595.892, 481.65396, 1110.3685, 903.98486, 782.6602]
2025-09-16 13:54:39,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [218.0, 140.0, 131.0, 204.0, 285.0, 108.0, 91.0, 221.0, 195.0, 148.0]
2025-09-16 13:54:39,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (881.25) for latency 15
2025-09-16 13:54:39,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 47 minutes, 20 seconds)
2025-09-16 13:56:38,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:56:40,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 765.55493 ± 204.231
2025-09-16 13:56:40,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [517.87476, 903.323, 615.1322, 747.3754, 1262.3583, 779.2593, 854.00964, 771.66077, 655.3809, 549.1755]
2025-09-16 13:56:40,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 165.0, 123.0, 160.0, 250.0, 144.0, 155.0, 144.0, 139.0, 108.0]
2025-09-16 13:56:40,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 47 seconds)
2025-09-16 13:58:40,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:58:42,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 876.50098 ± 214.234
2025-09-16 13:58:42,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1166.5068, 887.46204, 959.1942, 734.79865, 583.824, 681.272, 591.5881, 955.38556, 971.13586, 1233.8431]
2025-09-16 13:58:42,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [234.0, 169.0, 181.0, 136.0, 112.0, 124.0, 123.0, 189.0, 188.0, 232.0]
2025-09-16 13:58:42,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 43 minutes, 6 seconds)
2025-09-16 14:00:41,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:00:43,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 830.48358 ± 213.503
2025-09-16 14:00:43,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [917.39307, 1324.7549, 559.5595, 728.0172, 926.73236, 765.96204, 657.89075, 587.0766, 950.7001, 886.749]
2025-09-16 14:00:43,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 258.0, 103.0, 138.0, 196.0, 164.0, 125.0, 118.0, 198.0, 174.0]
2025-09-16 14:00:43,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 41 minutes, 11 seconds)
2025-09-16 14:02:44,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:02:47,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 780.97363 ± 245.275
2025-09-16 14:02:47,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [674.9468, 911.56476, 655.06274, 1028.1193, 485.98938, 631.53284, 672.50037, 1372.9016, 750.9787, 626.1404]
2025-09-16 14:02:47,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 175.0, 124.0, 198.0, 100.0, 116.0, 148.0, 263.0, 143.0, 117.0]
2025-09-16 14:02:47,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 39 minutes, 27 seconds)
2025-09-16 14:04:44,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:04:47,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 956.18298 ± 324.755
2025-09-16 14:04:47,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [919.68665, 560.2741, 1057.5442, 1405.2186, 653.64136, 1153.1263, 1216.6913, 511.34833, 672.1272, 1412.1729]
2025-09-16 14:04:47,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 108.0, 212.0, 270.0, 119.0, 222.0, 232.0, 95.0, 122.0, 301.0]
2025-09-16 14:04:47,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (956.18) for latency 15
2025-09-16 14:04:47,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 37 minutes, 9 seconds)
2025-09-16 14:06:46,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:06:49,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 977.54211 ± 361.651
2025-09-16 14:06:49,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1540.3708, 1523.4058, 813.53015, 621.0657, 1010.7707, 1164.6393, 1196.6296, 599.96204, 427.52893, 877.51764]
2025-09-16 14:06:49,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [317.0, 289.0, 173.0, 127.0, 199.0, 234.0, 235.0, 132.0, 94.0, 196.0]
2025-09-16 14:06:49,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (977.54) for latency 15
2025-09-16 14:06:49,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 35 minutes, 21 seconds)
2025-09-16 14:08:48,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:08:51,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 905.06750 ± 314.380
2025-09-16 14:08:51,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1489.9789, 650.49927, 770.235, 1477.2218, 787.00116, 668.84406, 983.1001, 511.32776, 845.00574, 867.46185]
2025-09-16 14:08:51,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [284.0, 141.0, 145.0, 288.0, 150.0, 128.0, 180.0, 94.0, 165.0, 165.0]
2025-09-16 14:08:51,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 33 minutes, 19 seconds)
2025-09-16 14:10:49,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:10:52,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 934.16290 ± 284.776
2025-09-16 14:10:52,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [955.1335, 1004.59875, 713.12616, 528.4022, 1006.6294, 1225.3464, 1204.2701, 731.4331, 549.0053, 1423.6841]
2025-09-16 14:10:52,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 192.0, 142.0, 102.0, 186.0, 224.0, 229.0, 140.0, 118.0, 274.0]
2025-09-16 14:10:52,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 31 minutes, 20 seconds)
2025-09-16 14:12:52,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:12:55,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 939.07678 ± 254.439
2025-09-16 14:12:55,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [648.57104, 1233.484, 505.29584, 925.72186, 869.9705, 1151.3062, 1160.2544, 614.29553, 1107.2858, 1174.5824]
2025-09-16 14:12:55,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 243.0, 105.0, 166.0, 177.0, 219.0, 219.0, 111.0, 235.0, 234.0]
2025-09-16 14:12:55,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 29 minutes, 12 seconds)
2025-09-16 14:14:53,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:14:55,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1004.89929 ± 335.198
2025-09-16 14:14:55,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [711.0344, 1546.8971, 462.33337, 1120.7716, 1222.9471, 692.3153, 1369.2961, 1293.2245, 849.9192, 780.2537]
2025-09-16 14:14:55,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 307.0, 89.0, 221.0, 242.0, 148.0, 255.0, 251.0, 162.0, 164.0]
2025-09-16 14:14:55,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1004.90) for latency 15
2025-09-16 14:14:55,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 27 minutes, 15 seconds)
2025-09-16 14:16:59,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:17:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1205.76685 ± 559.077
2025-09-16 14:17:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [728.70013, 793.07434, 749.2209, 1800.6965, 1106.5792, 688.8941, 2071.1926, 536.362, 1842.5895, 1740.357]
2025-09-16 14:17:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 161.0, 156.0, 340.0, 211.0, 140.0, 428.0, 112.0, 371.0, 338.0]
2025-09-16 14:17:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1205.77) for latency 15
2025-09-16 14:17:02,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 25 minutes, 52 seconds)
2025-09-16 14:19:00,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:19:03,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1146.54041 ± 448.434
2025-09-16 14:19:03,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2270.9138, 1041.5549, 585.4337, 1239.6052, 781.9716, 862.81555, 924.592, 1495.0475, 1014.2011, 1249.2689]
2025-09-16 14:19:03,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [425.0, 193.0, 108.0, 237.0, 160.0, 162.0, 186.0, 285.0, 216.0, 250.0]
2025-09-16 14:19:03,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 23 minutes, 39 seconds)
2025-09-16 14:21:03,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:21:06,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1022.78693 ± 306.470
2025-09-16 14:21:06,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1245.7594, 1562.8184, 1025.9427, 765.2275, 1248.0935, 799.5363, 1371.2495, 537.50775, 790.3717, 881.3622]
2025-09-16 14:21:06,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [263.0, 303.0, 196.0, 165.0, 229.0, 159.0, 266.0, 109.0, 146.0, 160.0]
2025-09-16 14:21:06,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 21 minutes, 49 seconds)
2025-09-16 14:23:04,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:23:07,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1034.18860 ± 291.054
2025-09-16 14:23:07,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1126.312, 1385.5912, 1174.8077, 667.8391, 927.95825, 667.37506, 1603.5076, 763.4612, 920.3926, 1104.6409]
2025-09-16 14:23:07,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 260.0, 252.0, 124.0, 179.0, 137.0, 333.0, 157.0, 185.0, 218.0]
2025-09-16 14:23:07,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 19 minutes, 36 seconds)
2025-09-16 14:25:08,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:25:13,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1699.18945 ± 517.731
2025-09-16 14:25:13,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1409.2877, 2455.315, 2529.4636, 2065.0396, 1399.0641, 2133.2964, 1480.7478, 1118.7189, 1057.5219, 1343.441]
2025-09-16 14:25:13,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [294.0, 482.0, 488.0, 415.0, 273.0, 421.0, 273.0, 213.0, 200.0, 275.0]
2025-09-16 14:25:13,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1699.19) for latency 15
2025-09-16 14:25:13,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 18 minutes, 12 seconds)
2025-09-16 14:27:12,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:27:15,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1010.53906 ± 275.834
2025-09-16 14:27:15,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [663.9139, 877.89734, 875.25916, 957.91486, 778.4852, 857.7153, 1253.6104, 1252.8542, 942.81696, 1644.9224]
2025-09-16 14:27:15,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 168.0, 163.0, 194.0, 149.0, 178.0, 240.0, 243.0, 178.0, 306.0]
2025-09-16 14:27:15,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 15 minutes, 33 seconds)
2025-09-16 14:29:17,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:29:20,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1263.39771 ± 364.726
2025-09-16 14:29:20,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1409.3843, 917.4517, 1515.0496, 1816.3337, 784.4141, 1233.674, 859.93756, 981.42126, 1263.3416, 1852.9689]
2025-09-16 14:29:20,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [268.0, 165.0, 302.0, 351.0, 143.0, 241.0, 168.0, 187.0, 237.0, 334.0]
2025-09-16 14:29:20,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 14 minutes, 5 seconds)
2025-09-16 14:31:19,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:31:22,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1109.00378 ± 349.319
2025-09-16 14:31:22,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [644.40753, 1320.6044, 914.01666, 1197.039, 597.57294, 1055.031, 1270.8832, 1342.0756, 1836.901, 911.5066]
2025-09-16 14:31:22,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 234.0, 166.0, 236.0, 108.0, 193.0, 243.0, 259.0, 347.0, 177.0]
2025-09-16 14:31:22,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 11 minutes, 51 seconds)
2025-09-16 14:33:22,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:33:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1629.60986 ± 626.384
2025-09-16 14:33:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1089.2852, 1185.9279, 1361.4694, 2884.7314, 1274.3348, 2238.395, 1484.7428, 675.39514, 2005.0913, 2096.7258]
2025-09-16 14:33:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 220.0, 258.0, 556.0, 244.0, 425.0, 299.0, 146.0, 421.0, 406.0]
2025-09-16 14:33:27,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 10 minutes, 11 seconds)
2025-09-16 14:35:27,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:35:31,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1150.81189 ± 211.176
2025-09-16 14:35:31,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1337.1168, 1263.6215, 1340.265, 1196.1909, 1515.098, 838.9433, 1140.7532, 1062.7384, 948.85767, 864.5337]
2025-09-16 14:35:31,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [253.0, 247.0, 254.0, 220.0, 283.0, 166.0, 208.0, 211.0, 178.0, 171.0]
2025-09-16 14:35:31,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 7 minutes, 57 seconds)
2025-09-16 14:37:33,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:37:39,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2022.17480 ± 911.357
2025-09-16 14:37:39,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1281.0631, 2221.1448, 1369.8143, 1296.9407, 1678.4966, 3407.465, 1774.3462, 1108.0527, 3977.6826, 2106.7407]
2025-09-16 14:37:39,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [262.0, 412.0, 266.0, 231.0, 309.0, 656.0, 328.0, 210.0, 759.0, 424.0]
2025-09-16 14:37:39,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2022.17) for latency 15
2025-09-16 14:37:39,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 6 minutes, 32 seconds)
2025-09-16 14:39:37,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:39:41,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1496.26477 ± 599.533
2025-09-16 14:39:41,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1903.4072, 1099.08, 1565.8586, 467.09372, 1363.3116, 1303.2024, 2629.8796, 1745.6244, 2074.6018, 810.5888]
2025-09-16 14:39:41,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [351.0, 196.0, 291.0, 88.0, 245.0, 239.0, 480.0, 319.0, 406.0, 151.0]
2025-09-16 14:39:41,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 4 minutes, 11 seconds)
2025-09-16 14:41:41,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:41:47,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1897.47498 ± 764.267
2025-09-16 14:41:47,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1284.8307, 1423.7599, 1890.0513, 2030.2577, 3023.625, 2603.0527, 3146.8972, 1680.9834, 1082.9092, 808.383]
2025-09-16 14:41:47,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [252.0, 269.0, 395.0, 378.0, 574.0, 488.0, 605.0, 344.0, 197.0, 168.0]
2025-09-16 14:41:47,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 2 minutes, 31 seconds)
2025-09-16 14:43:51,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:43:56,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1912.19360 ± 696.910
2025-09-16 14:43:56,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [894.50244, 1927.7054, 1091.1539, 1882.2855, 2924.896, 1151.0339, 1636.1611, 2900.9546, 2142.04, 2571.2017]
2025-09-16 14:43:56,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 390.0, 207.0, 366.0, 546.0, 222.0, 328.0, 571.0, 408.0, 502.0]
2025-09-16 14:43:56,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 52 seconds)
2025-09-16 14:45:55,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:46:02,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2263.07666 ± 1711.366
2025-09-16 14:46:02,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3036.448, 1924.7026, 661.3284, 647.58856, 5249.609, 3530.521, 602.1569, 4871.4707, 614.56085, 1492.381]
2025-09-16 14:46:02,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [574.0, 367.0, 121.0, 116.0, 1000.0, 666.0, 111.0, 924.0, 111.0, 305.0]
2025-09-16 14:46:02,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2263.08) for latency 15
2025-09-16 14:46:02,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 58 minutes, 55 seconds)
2025-09-16 14:48:09,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:48:18,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3067.03076 ± 1862.467
2025-09-16 14:48:18,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1399.2874, 5214.5054, 5373.58, 1785.5029, 5189.12, 3032.6462, 882.96844, 5221.3193, 1098.0212, 1473.3558]
2025-09-16 14:48:18,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [261.0, 1000.0, 1000.0, 341.0, 1000.0, 572.0, 181.0, 1000.0, 216.0, 307.0]
2025-09-16 14:48:18,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (3067.03) for latency 15
2025-09-16 14:48:18,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes, 31 seconds)
2025-09-16 14:50:14,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:50:20,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2107.03735 ± 1347.226
2025-09-16 14:50:20,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5232.3364, 2895.664, 961.7202, 2271.4766, 551.08826, 1234.9126, 2847.4, 2625.2783, 566.249, 1884.2474]
2025-09-16 14:50:20,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 544.0, 175.0, 421.0, 101.0, 244.0, 513.0, 472.0, 106.0, 373.0]
2025-09-16 14:50:20,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 55 minutes, 21 seconds)
2025-09-16 14:52:24,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:52:33,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2996.79688 ± 1870.461
2025-09-16 14:52:33,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1031.2871, 5214.6865, 5033.04, 844.5199, 1679.6293, 2896.2415, 1895.4237, 5156.9653, 897.1807, 5318.9946]
2025-09-16 14:52:33,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [205.0, 1000.0, 954.0, 159.0, 338.0, 560.0, 370.0, 1000.0, 174.0, 1000.0]
2025-09-16 14:52:33,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 53 minutes, 51 seconds)
2025-09-16 14:54:41,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:54:49,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2583.00415 ± 1286.701
2025-09-16 14:54:49,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3350.9526, 2081.2107, 3671.1462, 1538.1519, 3210.9036, 935.9907, 3039.9656, 1948.5747, 5186.0806, 867.06683]
2025-09-16 14:54:49,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [634.0, 419.0, 685.0, 298.0, 618.0, 179.0, 580.0, 379.0, 1000.0, 157.0]
2025-09-16 14:54:49,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 52 minutes, 11 seconds)
2025-09-16 14:56:41,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:56:50,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3053.17090 ± 1437.556
2025-09-16 14:56:50,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2084.4036, 1409.5581, 2466.5664, 2724.3086, 1870.1549, 4321.6216, 1192.3293, 5170.2817, 5167.9243, 4124.561]
2025-09-16 14:56:50,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [400.0, 262.0, 475.0, 518.0, 344.0, 850.0, 213.0, 1000.0, 1000.0, 809.0]
2025-09-16 14:56:50,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 41 seconds)
2025-09-16 14:58:51,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:59:03,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3884.57568 ± 1424.087
2025-09-16 14:59:03,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [4627.2427, 1544.9375, 5120.7017, 5211.0063, 2928.1204, 4734.818, 4330.417, 5171.1343, 1153.9436, 4023.437]
2025-09-16 14:59:03,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [882.0, 286.0, 1000.0, 1000.0, 559.0, 893.0, 824.0, 1000.0, 218.0, 751.0]
2025-09-16 14:59:03,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (3884.58) for latency 15
2025-09-16 14:59:03,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 17 seconds)
2025-09-16 15:01:05,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:01:18,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3967.58789 ± 1562.648
2025-09-16 15:01:18,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2414.9836, 699.5337, 5128.9907, 5141.3687, 5152.64, 5178.2837, 5173.878, 3191.8513, 5141.8325, 2452.5173]
2025-09-16 15:01:18,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [464.0, 126.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 613.0, 1000.0, 464.0]
2025-09-16 15:01:18,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (3967.59) for latency 15
2025-09-16 15:01:18,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 2 seconds)
2025-09-16 15:03:19,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:03:32,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4353.56006 ± 1661.485
2025-09-16 15:03:32,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5429.6978, 5422.819, 2835.8071, 5397.4883, 2175.8862, 5417.4033, 754.4921, 5371.7695, 5364.2188, 5366.0195]
2025-09-16 15:03:32,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 519.0, 1000.0, 403.0, 1000.0, 141.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:03:32,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4353.56) for latency 15
2025-09-16 15:03:32,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 56 seconds)
2025-09-16 15:05:33,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:05:40,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2151.74854 ± 864.787
2025-09-16 15:05:40,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2502.4607, 1736.1079, 2301.537, 972.75653, 1645.5482, 4189.5557, 2655.2092, 1962.0864, 1147.7006, 2404.5208]
2025-09-16 15:05:40,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [469.0, 330.0, 446.0, 188.0, 303.0, 793.0, 509.0, 367.0, 210.0, 455.0]
2025-09-16 15:05:40,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 13 seconds)
2025-09-16 15:07:45,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:07:57,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3581.12500 ± 1746.670
2025-09-16 15:07:57,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1267.7545, 4230.041, 5157.6377, 1338.633, 1372.5076, 5053.6724, 5160.466, 5158.762, 5146.4575, 1925.3156]
2025-09-16 15:07:57,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 831.0, 1000.0, 283.0, 274.0, 1000.0, 1000.0, 1000.0, 1000.0, 371.0]
2025-09-16 15:07:57,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes, 59 seconds)
2025-09-16 15:10:02,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:10:10,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2501.88232 ± 776.632
2025-09-16 15:10:10,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2939.8052, 4199.9756, 2518.94, 3041.0403, 2653.5483, 2149.0276, 1977.1644, 1053.1604, 2312.7603, 2173.4004]
2025-09-16 15:10:10,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [534.0, 785.0, 478.0, 569.0, 489.0, 383.0, 386.0, 184.0, 430.0, 398.0]
2025-09-16 15:10:10,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 46 seconds)
2025-09-16 15:12:07,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:12:22,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4775.05420 ± 912.185
2025-09-16 15:12:22,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5143.911, 5175.065, 5195.279, 2402.0825, 5245.7334, 5226.341, 5214.2163, 5245.2983, 5215.6196, 3686.9932]
2025-09-16 15:12:22,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 453.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 711.0]
2025-09-16 15:12:22,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4775.05) for latency 15
2025-09-16 15:12:22,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 26 seconds)
2025-09-16 15:14:25,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:14:38,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4011.46338 ± 1594.548
2025-09-16 15:14:38,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2875.4539, 860.7384, 1931.1738, 5217.4355, 5197.668, 5291.586, 5239.262, 5255.315, 5195.601, 3050.3967]
2025-09-16 15:14:38,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [543.0, 164.0, 354.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 571.0]
2025-09-16 15:14:38,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 17 seconds)
2025-09-16 15:16:39,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:16:54,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4846.22314 ± 1239.915
2025-09-16 15:16:54,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5241.226, 5288.079, 5282.408, 5211.0103, 1128.22, 5312.296, 5243.3105, 5184.0864, 5274.7896, 5296.809]
2025-09-16 15:16:54,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 212.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:16:54,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4846.22) for latency 15
2025-09-16 15:16:54,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 28 seconds)
2025-09-16 15:18:57,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:19:10,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4069.13525 ± 1359.713
2025-09-16 15:19:10,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3013.4722, 2149.6533, 4388.545, 5208.0146, 5248.1074, 5261.2563, 3431.2605, 5197.423, 1540.035, 5253.5806]
2025-09-16 15:19:10,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [610.0, 433.0, 849.0, 1000.0, 1000.0, 1000.0, 640.0, 1000.0, 297.0, 1000.0]
2025-09-16 15:19:10,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 11 seconds)
2025-09-16 15:21:17,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:21:30,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4112.68164 ± 1779.338
2025-09-16 15:21:30,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5203.4946, 5327.503, 1765.5078, 5254.8994, 1128.4742, 1319.406, 5294.8022, 5279.225, 5228.5127, 5324.9873]
2025-09-16 15:21:30,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 337.0, 1000.0, 210.0, 257.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:21:30,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 13 seconds)
2025-09-16 15:23:35,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:23:50,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4927.59668 ± 1367.104
2025-09-16 15:23:50,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5359.435, 827.05896, 5452.484, 5384.543, 5374.068, 5385.1665, 5381.532, 5363.3633, 5351.1953, 5397.126]
2025-09-16 15:23:50,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 161.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:23:50,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4927.60) for latency 15
2025-09-16 15:23:50,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 12 seconds)
2025-09-16 15:25:50,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:26:06,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5181.49707 ± 162.864
2025-09-16 15:26:06,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5269.1587, 5230.0044, 5250.1445, 5234.932, 5229.4976, 5248.25, 5201.994, 5224.127, 5231.332, 4695.5327]
2025-09-16 15:26:06,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 893.0]
2025-09-16 15:26:06,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5181.50) for latency 15
2025-09-16 15:26:06,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 56 seconds)
2025-09-16 15:28:06,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:28:21,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4832.37842 ± 967.521
2025-09-16 15:28:21,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5275.7383, 5326.8887, 5274.4526, 5375.342, 5311.831, 4622.882, 5261.37, 4597.818, 2047.0004, 5230.464]
2025-09-16 15:28:21,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 870.0, 1000.0, 879.0, 426.0, 1000.0]
2025-09-16 15:28:21,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 37 seconds)
2025-09-16 15:30:28,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:30:45,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5245.98096 ± 50.099
2025-09-16 15:30:45,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5174.718, 5227.818, 5254.532, 5252.7627, 5298.472, 5283.655, 5276.7114, 5143.2446, 5239.785, 5308.1084]
2025-09-16 15:30:45,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:30:45,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5245.98) for latency 15
2025-09-16 15:30:45,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 31 seconds)
2025-09-16 15:32:41,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:32:58,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5287.47900 ± 55.883
2025-09-16 15:32:58,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5288.5264, 5307.876, 5384.9814, 5194.547, 5306.247, 5325.5244, 5291.1694, 5294.848, 5297.079, 5183.9927]
2025-09-16 15:32:58,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:32:58,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5287.48) for latency 15
2025-09-16 15:32:58,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 2 seconds)
2025-09-16 15:35:09,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:35:27,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5294.83740 ± 35.396
2025-09-16 15:35:27,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5334.5225, 5291.934, 5218.3315, 5327.535, 5267.6226, 5319.543, 5294.8594, 5261.377, 5298.595, 5334.056]
2025-09-16 15:35:27,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:35:27,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5294.84) for latency 15
2025-09-16 15:35:27,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 55 seconds)
2025-09-16 15:37:27,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:37:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4170.86426 ± 1496.464
2025-09-16 15:37:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2930.4683, 5286.956, 5322.7656, 2836.388, 5280.9424, 5254.3936, 910.94336, 5293.8765, 5339.085, 3252.8193]
2025-09-16 15:37:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [593.0, 1000.0, 1000.0, 535.0, 1000.0, 1000.0, 163.0, 1000.0, 1000.0, 620.0]
2025-09-16 15:37:40,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 34 seconds)
2025-09-16 15:39:43,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:39:59,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5196.69922 ± 105.920
2025-09-16 15:39:59,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5202.93, 5256.815, 5220.3247, 5166.715, 4902.049, 5164.638, 5235.1226, 5264.257, 5288.797, 5265.344]
2025-09-16 15:39:59,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 928.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:39:59,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 18 seconds)
2025-09-16 15:41:57,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:42:11,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4296.91699 ± 1659.636
2025-09-16 15:42:11,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5334.4917, 5332.4453, 2212.6624, 642.7836, 2775.3772, 5323.2456, 5368.5986, 5348.017, 5303.015, 5328.537]
2025-09-16 15:42:11,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 431.0, 139.0, 518.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:42:11,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 51 seconds)
2025-09-16 15:44:15,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:44:31,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5145.80957 ± 582.237
2025-09-16 15:44:31,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5318.037, 5378.539, 5325.273, 5311.808, 5338.251, 3400.282, 5351.3716, 5369.1025, 5316.5044, 5348.926]
2025-09-16 15:44:31,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 624.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:44:31,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 37 seconds)
2025-09-16 15:46:28,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:46:39,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3390.05151 ± 1753.566
2025-09-16 15:46:39,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5279.3003, 1607.2917, 1255.6377, 1740.2183, 4314.9404, 4906.9985, 914.81537, 5314.529, 5318.2144, 3248.5713]
2025-09-16 15:46:39,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 305.0, 237.0, 328.0, 805.0, 940.0, 183.0, 1000.0, 1000.0, 605.0]
2025-09-16 15:46:39,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 14 seconds)
2025-09-16 15:48:44,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:49:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5456.67725 ± 18.844
2025-09-16 15:49:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5462.4487, 5465.867, 5439.406, 5499.9204, 5436.8564, 5469.598, 5448.7617, 5436.6157, 5464.1807, 5443.1177]
2025-09-16 15:49:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:49:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5456.68) for latency 15
2025-09-16 15:49:01,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1251 [DEBUG]: Training session finished
