2025-09-16 11:48:23,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.075-delay_6
2025-09-16 11:48:23,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.075-delay_6
2025-09-16 11:48:23,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x1503adbe8710>}
2025-09-16 11:48:23,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:48:23,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:48:23,667 baseline-bpql-noisepromille75-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=478, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:48:23,667 baseline-bpql-noisepromille75-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:48:25,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:48:25,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:50:08,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:50:09,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 274.83807 ± 64.405
2025-09-16 11:50:09,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [220.38461, 229.2095, 364.08347, 217.66531, 277.3262, 363.27505, 217.64247, 271.61838, 375.4016, 211.77385]
2025-09-16 11:50:09,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 47.0, 79.0, 47.0, 56.0, 74.0, 46.0, 54.0, 79.0, 46.0]
2025-09-16 11:50:09,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (274.84) for latency 6
2025-09-16 11:50:09,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 52 minutes, 16 seconds)
2025-09-16 11:52:02,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:52:03,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 368.58136 ± 84.513
2025-09-16 11:52:03,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [350.36526, 264.58582, 530.80975, 305.4329, 323.34363, 306.53366, 361.82355, 517.8659, 333.61896, 391.43423]
2025-09-16 11:52:03,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 53.0, 103.0, 57.0, 62.0, 62.0, 69.0, 96.0, 65.0, 74.0]
2025-09-16 11:52:03,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (368.58) for latency 6
2025-09-16 11:52:03,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 58 minutes, 10 seconds)
2025-09-16 11:53:56,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:53:57,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 369.92133 ± 49.334
2025-09-16 11:53:57,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [315.0463, 478.2236, 385.08566, 327.45157, 403.5104, 390.62698, 302.7394, 342.21735, 359.52136, 394.7902]
2025-09-16 11:53:57,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 90.0, 73.0, 62.0, 87.0, 75.0, 58.0, 69.0, 71.0, 73.0]
2025-09-16 11:53:57,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (369.92) for latency 6
2025-09-16 11:53:57,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 59 minutes, 5 seconds)
2025-09-16 11:55:50,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:55:51,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 412.78476 ± 75.225
2025-09-16 11:55:51,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [559.0893, 356.06232, 351.93533, 407.9608, 427.8649, 321.68353, 486.9857, 408.8918, 487.43292, 319.94122]
2025-09-16 11:55:51,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 68.0, 66.0, 78.0, 90.0, 69.0, 102.0, 77.0, 94.0, 72.0]
2025-09-16 11:55:51,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (412.78) for latency 6
2025-09-16 11:55:51,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 58 minutes, 28 seconds)
2025-09-16 11:57:45,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:57:46,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 460.67667 ± 103.150
2025-09-16 11:57:46,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [592.6783, 600.15546, 423.774, 498.67746, 398.21762, 379.01974, 359.06366, 624.51324, 385.30765, 345.35934]
2025-09-16 11:57:46,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 118.0, 93.0, 94.0, 80.0, 76.0, 68.0, 119.0, 72.0, 69.0]
2025-09-16 11:57:46,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (460.68) for latency 6
2025-09-16 11:57:46,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 57 minutes, 41 seconds)
2025-09-16 11:59:41,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:59:42,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 372.52975 ± 59.687
2025-09-16 11:59:42,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [301.9503, 347.60822, 399.33368, 364.6375, 395.5958, 330.79736, 435.5438, 421.1423, 263.1022, 465.5864]
2025-09-16 11:59:42,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 77.0, 78.0, 73.0, 75.0, 74.0, 98.0, 89.0, 56.0, 88.0]
2025-09-16 11:59:42,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 59 minutes, 19 seconds)
2025-09-16 12:01:35,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:01:36,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 438.52191 ± 61.603
2025-09-16 12:01:36,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [521.43304, 389.57492, 408.76474, 483.69113, 453.0023, 478.28006, 523.40985, 321.03854, 417.73212, 388.29263]
2025-09-16 12:01:36,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 78.0, 76.0, 92.0, 85.0, 88.0, 104.0, 63.0, 80.0, 73.0]
2025-09-16 12:01:36,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 57 minutes, 38 seconds)
2025-09-16 12:03:29,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:03:30,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 408.82391 ± 104.738
2025-09-16 12:03:30,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [331.53113, 439.94852, 432.877, 417.54004, 349.35004, 677.00165, 246.75713, 397.87, 400.2917, 395.07205]
2025-09-16 12:03:30,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 80.0, 85.0, 78.0, 67.0, 147.0, 51.0, 76.0, 78.0, 74.0]
2025-09-16 12:03:30,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 55 minutes, 37 seconds)
2025-09-16 12:05:24,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:05:25,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 422.25995 ± 90.233
2025-09-16 12:05:25,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [292.45316, 413.58167, 518.26483, 303.45337, 412.61, 487.34967, 326.5888, 509.05646, 561.97833, 397.26324]
2025-09-16 12:05:25,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 91.0, 95.0, 66.0, 84.0, 102.0, 74.0, 105.0, 107.0, 76.0]
2025-09-16 12:05:25,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 54 minutes, 5 seconds)
2025-09-16 12:07:18,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:07:19,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 523.48425 ± 85.698
2025-09-16 12:07:19,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [437.1416, 444.10712, 541.7427, 531.26605, 595.0947, 506.542, 563.6882, 393.6407, 711.1837, 510.43542]
2025-09-16 12:07:19,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 84.0, 113.0, 102.0, 110.0, 100.0, 117.0, 73.0, 140.0, 93.0]
2025-09-16 12:07:19,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (523.48) for latency 6
2025-09-16 12:07:19,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 51 minutes, 56 seconds)
2025-09-16 12:09:11,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:09:12,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 504.11041 ± 122.230
2025-09-16 12:09:12,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [466.76474, 430.4335, 783.43933, 375.0753, 369.7746, 537.2093, 522.5764, 649.323, 486.13928, 420.36862]
2025-09-16 12:09:12,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 80.0, 165.0, 80.0, 70.0, 102.0, 100.0, 139.0, 92.0, 94.0]
2025-09-16 12:09:12,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 49 minutes, 15 seconds)
2025-09-16 12:11:04,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:11:06,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 575.29749 ± 71.180
2025-09-16 12:11:06,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [530.20233, 543.6093, 587.09204, 531.97485, 713.32324, 560.2165, 522.67487, 624.6666, 466.98715, 672.22784]
2025-09-16 12:11:06,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 102.0, 111.0, 110.0, 135.0, 103.0, 99.0, 119.0, 87.0, 126.0]
2025-09-16 12:11:06,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (575.30) for latency 6
2025-09-16 12:11:06,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 47 minutes, 6 seconds)
2025-09-16 12:12:58,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:12:59,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 473.80234 ± 112.188
2025-09-16 12:12:59,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [462.32858, 518.4707, 444.61172, 528.9691, 699.5012, 565.1582, 475.7223, 255.59584, 402.32285, 385.34302]
2025-09-16 12:12:59,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 96.0, 82.0, 98.0, 134.0, 107.0, 101.0, 48.0, 76.0, 71.0]
2025-09-16 12:12:59,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 45 minutes, 4 seconds)
2025-09-16 12:14:51,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:14:53,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 599.09381 ± 140.549
2025-09-16 12:14:53,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [353.53204, 538.71265, 682.3302, 712.3792, 439.79376, 678.709, 573.73474, 860.1069, 497.51706, 654.1224]
2025-09-16 12:14:53,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 106.0, 144.0, 152.0, 84.0, 147.0, 123.0, 173.0, 93.0, 141.0]
2025-09-16 12:14:53,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (599.09) for latency 6
2025-09-16 12:14:53,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 42 minutes, 48 seconds)
2025-09-16 12:16:48,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:16:50,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 539.61121 ± 145.971
2025-09-16 12:16:50,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [785.47125, 461.50366, 521.3764, 378.51385, 463.44888, 575.49884, 333.0668, 785.0313, 617.4361, 474.76465]
2025-09-16 12:16:50,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 99.0, 103.0, 71.0, 100.0, 109.0, 62.0, 146.0, 114.0, 102.0]
2025-09-16 12:16:50,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 41 minutes, 39 seconds)
2025-09-16 12:18:43,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:18:45,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 596.18774 ± 129.922
2025-09-16 12:18:45,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [729.1514, 387.47226, 841.6335, 569.7314, 612.0464, 537.9083, 573.5293, 464.2155, 514.4978, 731.69104]
2025-09-16 12:18:45,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 82.0, 175.0, 104.0, 116.0, 101.0, 111.0, 88.0, 94.0, 150.0]
2025-09-16 12:18:45,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 40 minutes, 17 seconds)
2025-09-16 12:20:37,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:20:39,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 676.90912 ± 153.545
2025-09-16 12:20:39,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [646.0516, 821.8542, 597.26636, 483.71613, 897.1156, 632.6922, 943.3023, 703.20966, 521.4337, 522.4495]
2025-09-16 12:20:39,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 161.0, 123.0, 104.0, 173.0, 132.0, 196.0, 152.0, 95.0, 112.0]
2025-09-16 12:20:39,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (676.91) for latency 6
2025-09-16 12:20:39,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 38 minutes, 40 seconds)
2025-09-16 12:22:34,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:22:36,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 668.77985 ± 175.530
2025-09-16 12:22:36,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [579.8921, 758.34155, 475.93564, 881.5706, 590.7207, 407.46286, 912.7273, 494.84317, 707.60406, 878.7005]
2025-09-16 12:22:36,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 141.0, 88.0, 162.0, 114.0, 79.0, 167.0, 95.0, 147.0, 167.0]
2025-09-16 12:22:36,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 37 minutes, 42 seconds)
2025-09-16 12:24:29,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:24:30,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 647.52100 ± 141.009
2025-09-16 12:24:30,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [698.07367, 577.486, 451.26834, 491.9457, 671.6729, 967.52277, 685.7142, 698.5371, 517.28345, 715.7059]
2025-09-16 12:24:30,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 109.0, 96.0, 100.0, 133.0, 180.0, 130.0, 132.0, 97.0, 133.0]
2025-09-16 12:24:30,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 35 minutes, 52 seconds)
2025-09-16 12:26:24,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:26:25,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 649.82507 ± 142.664
2025-09-16 12:26:25,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [777.2114, 618.5395, 721.8035, 496.57733, 886.8165, 624.70044, 528.87683, 583.6716, 831.3629, 428.6909]
2025-09-16 12:26:25,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 127.0, 138.0, 107.0, 173.0, 119.0, 113.0, 111.0, 152.0, 81.0]
2025-09-16 12:26:25,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 33 minutes, 28 seconds)
2025-09-16 12:28:21,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:28:23,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 812.27698 ± 327.426
2025-09-16 12:28:23,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [442.59076, 1338.093, 785.99365, 1243.8691, 434.40097, 639.94684, 483.7701, 1025.3829, 1129.6155, 599.10645]
2025-09-16 12:28:23,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 259.0, 160.0, 246.0, 91.0, 120.0, 91.0, 186.0, 220.0, 116.0]
2025-09-16 12:28:23,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (812.28) for latency 6
2025-09-16 12:28:23,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 32 minutes, 22 seconds)
2025-09-16 12:30:16,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:30:18,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 829.61798 ± 207.751
2025-09-16 12:30:18,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [901.98724, 545.36017, 548.1326, 745.3849, 674.0974, 1001.54755, 998.44867, 1001.3645, 691.7829, 1188.0741]
2025-09-16 12:30:18,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 105.0, 108.0, 144.0, 138.0, 202.0, 202.0, 202.0, 131.0, 234.0]
2025-09-16 12:30:18,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (829.62) for latency 6
2025-09-16 12:30:18,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 35 seconds)
2025-09-16 12:32:11,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:32:12,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 559.27551 ± 123.800
2025-09-16 12:32:12,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [507.40805, 620.866, 743.49054, 685.57166, 589.4678, 422.5148, 492.27664, 653.06775, 306.60165, 571.4902]
2025-09-16 12:32:12,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 129.0, 139.0, 147.0, 125.0, 89.0, 93.0, 123.0, 57.0, 124.0]
2025-09-16 12:32:12,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 27 minutes, 54 seconds)
2025-09-16 12:34:06,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:34:08,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 675.99243 ± 196.245
2025-09-16 12:34:08,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [470.33948, 719.10614, 836.6275, 479.15967, 1155.4781, 650.85266, 637.0019, 741.82996, 526.43994, 543.08875]
2025-09-16 12:34:08,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 130.0, 165.0, 89.0, 233.0, 127.0, 131.0, 138.0, 97.0, 102.0]
2025-09-16 12:34:08,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 23 seconds)
2025-09-16 12:36:02,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:36:03,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 651.44849 ± 135.635
2025-09-16 12:36:03,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [878.7994, 482.4431, 529.29944, 633.1218, 743.97974, 633.3855, 828.67584, 542.6563, 489.76166, 752.3621]
2025-09-16 12:36:03,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 88.0, 98.0, 121.0, 137.0, 118.0, 166.0, 116.0, 91.0, 147.0]
2025-09-16 12:36:03,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 33 seconds)
2025-09-16 12:37:57,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:37:59,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 647.43909 ± 122.770
2025-09-16 12:37:59,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [693.7585, 714.35205, 786.84735, 560.49805, 613.3424, 525.0563, 584.78534, 604.8677, 908.3062, 482.57684]
2025-09-16 12:37:59,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 137.0, 167.0, 111.0, 118.0, 103.0, 117.0, 116.0, 174.0, 89.0]
2025-09-16 12:37:59,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 22 minutes, 4 seconds)
2025-09-16 12:39:54,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:39:56,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 958.39661 ± 351.104
2025-09-16 12:39:56,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [437.4121, 1055.6864, 622.63086, 861.4263, 1768.8971, 945.1538, 627.28766, 1165.867, 1009.59845, 1090.0071]
2025-09-16 12:39:56,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 203.0, 121.0, 165.0, 369.0, 176.0, 120.0, 241.0, 186.0, 209.0]
2025-09-16 12:39:56,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (958.40) for latency 6
2025-09-16 12:39:56,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 20 minutes, 38 seconds)
2025-09-16 12:41:50,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:41:52,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 806.88440 ± 204.594
2025-09-16 12:41:52,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [867.29596, 584.9514, 944.9192, 675.15485, 523.1616, 1034.4602, 1091.9816, 1039.8164, 568.61865, 738.4842]
2025-09-16 12:41:52,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 110.0, 197.0, 125.0, 101.0, 199.0, 211.0, 213.0, 103.0, 133.0]
2025-09-16 12:41:52,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 1 second)
2025-09-16 12:43:46,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:43:48,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 755.78668 ± 192.842
2025-09-16 12:43:48,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [559.06683, 588.73553, 726.90845, 764.215, 672.7617, 626.8594, 933.0277, 1241.7805, 796.41876, 648.0928]
2025-09-16 12:43:48,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 111.0, 153.0, 159.0, 129.0, 124.0, 182.0, 243.0, 146.0, 134.0]
2025-09-16 12:43:48,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 14 seconds)
2025-09-16 12:45:42,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:45:45,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1134.84741 ± 462.313
2025-09-16 12:45:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1520.0605, 925.64886, 507.3755, 1527.722, 1753.5834, 646.52, 1764.9913, 523.3472, 1108.3219, 1070.9012]
2025-09-16 12:45:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [301.0, 172.0, 94.0, 291.0, 320.0, 123.0, 346.0, 99.0, 212.0, 207.0]
2025-09-16 12:45:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1134.85) for latency 6
2025-09-16 12:45:45,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 15 minutes, 39 seconds)
2025-09-16 12:47:39,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:47:42,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 898.84314 ± 252.911
2025-09-16 12:47:42,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [691.0163, 910.9016, 1206.2098, 988.46356, 671.9998, 1469.172, 896.15125, 727.65533, 615.97864, 810.88354]
2025-09-16 12:47:42,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 170.0, 222.0, 189.0, 127.0, 281.0, 167.0, 138.0, 113.0, 160.0]
2025-09-16 12:47:42,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 13 minutes, 57 seconds)
2025-09-16 12:49:35,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:49:38,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1037.12146 ± 244.195
2025-09-16 12:49:38,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [943.8704, 1414.3557, 946.2506, 957.346, 900.45667, 1423.6278, 570.3595, 1142.0457, 906.55225, 1166.3495]
2025-09-16 12:49:38,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 270.0, 175.0, 177.0, 171.0, 263.0, 103.0, 215.0, 172.0, 226.0]
2025-09-16 12:49:38,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 45 seconds)
2025-09-16 12:51:34,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:51:37,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1149.66992 ± 391.669
2025-09-16 12:51:37,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1161.0872, 1273.3702, 986.8061, 1810.3185, 675.945, 1913.3888, 958.13086, 980.10724, 979.9087, 757.6369]
2025-09-16 12:51:37,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [220.0, 248.0, 196.0, 355.0, 131.0, 378.0, 185.0, 188.0, 197.0, 147.0]
2025-09-16 12:51:37,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1149.67) for latency 6
2025-09-16 12:51:37,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 10 minutes, 38 seconds)
2025-09-16 12:53:30,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:53:33,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1247.65527 ± 823.699
2025-09-16 12:53:33,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [744.5153, 1587.635, 3320.7932, 750.2431, 852.83435, 753.6535, 1163.7532, 642.1447, 594.4815, 2066.4988]
2025-09-16 12:53:33,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 305.0, 618.0, 139.0, 156.0, 137.0, 217.0, 126.0, 112.0, 386.0]
2025-09-16 12:53:33,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1247.66) for latency 6
2025-09-16 12:53:33,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 42 seconds)
2025-09-16 12:55:26,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:55:29,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1221.69861 ± 651.815
2025-09-16 12:55:29,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [725.2501, 930.0535, 1227.8427, 3041.9285, 886.6903, 1229.9602, 1527.2603, 700.25653, 932.47784, 1015.26636]
2025-09-16 12:55:29,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 175.0, 223.0, 583.0, 171.0, 230.0, 311.0, 142.0, 176.0, 188.0]
2025-09-16 12:55:29,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 35 seconds)
2025-09-16 12:57:24,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:57:27,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1268.19702 ± 803.574
2025-09-16 12:57:27,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1091.1361, 2084.8982, 865.4302, 755.4532, 1292.2827, 876.8072, 832.3519, 994.48285, 3358.8206, 530.3078]
2025-09-16 12:57:27,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 395.0, 161.0, 139.0, 235.0, 160.0, 159.0, 180.0, 658.0, 111.0]
2025-09-16 12:57:27,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1268.20) for latency 6
2025-09-16 12:57:27,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 4 minutes, 54 seconds)
2025-09-16 12:59:24,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:59:27,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1242.24609 ± 331.118
2025-09-16 12:59:27,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1013.36127, 1890.3698, 1097.3447, 1031.9879, 1574.8901, 1305.4727, 1561.1863, 1202.454, 688.7853, 1056.6093]
2025-09-16 12:59:27,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 369.0, 221.0, 189.0, 291.0, 246.0, 295.0, 213.0, 134.0, 189.0]
2025-09-16 12:59:27,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 51 seconds)
2025-09-16 13:01:19,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:01:22,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1059.49536 ± 326.569
2025-09-16 13:01:22,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1351.1953, 816.95844, 1117.0287, 1074.2955, 951.8343, 1739.9285, 1042.419, 574.8225, 1272.037, 654.4341]
2025-09-16 13:01:22,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 157.0, 218.0, 198.0, 173.0, 323.0, 201.0, 108.0, 247.0, 138.0]
2025-09-16 13:01:22,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute)
2025-09-16 13:03:16,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:03:20,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1357.17163 ± 243.576
2025-09-16 13:03:20,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1412.0353, 1531.1797, 1252.7264, 1597.3069, 1559.5148, 766.81635, 1573.0931, 1346.0363, 1127.256, 1405.7511]
2025-09-16 13:03:20,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [258.0, 285.0, 236.0, 299.0, 301.0, 148.0, 292.0, 249.0, 217.0, 262.0]
2025-09-16 13:03:20,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1357.17) for latency 6
2025-09-16 13:03:20,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 17 seconds)
2025-09-16 13:05:14,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:05:18,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1617.11353 ± 869.389
2025-09-16 13:05:18,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2345.4534, 881.8779, 3372.9531, 967.33374, 1512.4279, 1556.913, 1588.506, 2655.198, 642.5209, 647.9508]
2025-09-16 13:05:18,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [432.0, 162.0, 656.0, 181.0, 284.0, 305.0, 300.0, 510.0, 118.0, 117.0]
2025-09-16 13:05:18,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1617.11) for latency 6
2025-09-16 13:05:18,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 49 seconds)
2025-09-16 13:07:14,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:07:18,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1493.58167 ± 519.365
2025-09-16 13:07:18,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1288.0702, 1442.7954, 2871.9534, 1053.7295, 1895.5863, 1496.7758, 1030.2075, 1287.4742, 1121.7429, 1447.4805]
2025-09-16 13:07:18,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [238.0, 271.0, 531.0, 211.0, 338.0, 288.0, 184.0, 255.0, 231.0, 262.0]
2025-09-16 13:07:18,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 56 minutes, 10 seconds)
2025-09-16 13:09:12,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:09:17,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2127.67871 ± 1281.640
2025-09-16 13:09:17,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4650.2573, 616.73193, 2061.961, 1497.0769, 1635.2369, 3822.6707, 980.15454, 3290.8225, 904.50055, 1817.3752]
2025-09-16 13:09:17,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [859.0, 113.0, 378.0, 275.0, 292.0, 698.0, 181.0, 622.0, 158.0, 359.0]
2025-09-16 13:09:17,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (2127.68) for latency 6
2025-09-16 13:09:17,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 53 minutes, 57 seconds)
2025-09-16 13:11:15,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:11:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3072.61108 ± 1933.154
2025-09-16 13:11:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5430.192, 5383.702, 1420.8505, 1340.4183, 5460.136, 636.68365, 1562.9146, 5275.721, 2346.3855, 1869.1096]
2025-09-16 13:11:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 263.0, 242.0, 1000.0, 118.0, 288.0, 1000.0, 412.0, 353.0]
2025-09-16 13:11:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (3072.61) for latency 6
2025-09-16 13:11:22,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 54 minutes, 4 seconds)
2025-09-16 13:13:15,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:13:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2837.08130 ± 1746.716
2025-09-16 13:13:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5345.946, 5344.927, 1255.265, 2665.0598, 5305.4326, 810.2655, 904.9377, 2021.0194, 2769.8904, 1948.0707]
2025-09-16 13:13:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 246.0, 480.0, 1000.0, 155.0, 161.0, 370.0, 519.0, 367.0]
2025-09-16 13:13:22,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes, 28 seconds)
2025-09-16 13:15:18,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:15:27,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3491.66406 ± 1548.634
2025-09-16 13:15:27,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5234.056, 5254.7554, 2856.9993, 3139.1055, 5334.078, 1704.6407, 5332.6094, 1988.3024, 1389.365, 2682.7278]
2025-09-16 13:15:27,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 535.0, 607.0, 1000.0, 322.0, 1000.0, 392.0, 278.0, 525.0]
2025-09-16 13:15:27,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (3491.66) for latency 6
2025-09-16 13:15:27,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 39 seconds)
2025-09-16 13:17:23,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:17:32,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3440.13745 ± 1594.630
2025-09-16 13:17:32,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5413.134, 1923.4918, 1715.857, 3255.359, 1946.5156, 5427.548, 5050.7656, 1781.8799, 5392.0327, 2494.7915]
2025-09-16 13:17:32,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 353.0, 313.0, 606.0, 372.0, 1000.0, 939.0, 342.0, 1000.0, 503.0]
2025-09-16 13:17:32,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 50 minutes, 35 seconds)
2025-09-16 13:19:27,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:19:38,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4155.91211 ± 1343.446
2025-09-16 13:19:38,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5302.6665, 5237.47, 3981.7947, 3639.5317, 5247.8584, 1146.5823, 2887.1775, 3413.6692, 5363.5806, 5338.79]
2025-09-16 13:19:38,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 754.0, 712.0, 1000.0, 216.0, 557.0, 643.0, 1000.0, 1000.0]
2025-09-16 13:19:38,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (4155.91) for latency 6
2025-09-16 13:19:38,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 49 minutes, 38 seconds)
2025-09-16 13:21:33,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:21:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3388.70752 ± 1684.748
2025-09-16 13:21:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5278.897, 4277.8755, 4424.564, 5257.1606, 1632.6156, 3425.7393, 811.1149, 5355.671, 1370.0604, 2053.3774]
2025-09-16 13:21:42,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 800.0, 803.0, 1000.0, 306.0, 647.0, 151.0, 1000.0, 252.0, 378.0]
2025-09-16 13:21:42,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 47 minutes, 18 seconds)
2025-09-16 13:23:39,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:23:47,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3258.26807 ± 1604.948
2025-09-16 13:23:47,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5531.846, 1315.9331, 1598.0156, 2667.982, 5430.9844, 1716.0884, 2889.6936, 2093.2568, 5423.411, 3915.4688]
2025-09-16 13:23:47,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 237.0, 295.0, 495.0, 1000.0, 319.0, 531.0, 387.0, 1000.0, 707.0]
2025-09-16 13:23:47,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 46 minutes, 10 seconds)
2025-09-16 13:25:41,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:25:51,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3983.51221 ± 1381.601
2025-09-16 13:25:51,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5303.203, 1342.1716, 2749.5767, 5427.8135, 4508.1646, 2732.2095, 4178.715, 2901.6357, 5342.6587, 5348.9727]
2025-09-16 13:25:51,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 263.0, 510.0, 1000.0, 855.0, 526.0, 778.0, 535.0, 1000.0, 1000.0]
2025-09-16 13:25:51,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 43 minutes, 55 seconds)
2025-09-16 13:27:57,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:28:08,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4290.31152 ± 1826.366
2025-09-16 13:28:08,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4593.004, 5341.276, 5325.4897, 5373.949, 768.329, 584.55707, 4797.228, 5364.578, 5414.738, 5339.9683]
2025-09-16 13:28:08,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [870.0, 1000.0, 1000.0, 1000.0, 158.0, 125.0, 913.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:28:08,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (4290.31) for latency 6
2025-09-16 13:28:08,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 43 minutes, 52 seconds)
2025-09-16 13:29:57,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:30:09,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4749.90918 ± 1299.335
2025-09-16 13:30:09,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5439.1895, 5327.975, 1569.3435, 5400.359, 5335.2715, 5290.8813, 5568.853, 5388.2573, 2875.2751, 5303.686]
2025-09-16 13:30:09,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 298.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 540.0, 1000.0]
2025-09-16 13:30:09,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (4749.91) for latency 6
2025-09-16 13:30:09,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 41 minutes, 3 seconds)
2025-09-16 13:32:11,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:32:22,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4223.22754 ± 1373.109
2025-09-16 13:32:22,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5338.0156, 5401.361, 4169.1074, 2979.1777, 1328.6256, 3526.1458, 5475.6055, 3114.3198, 5434.9917, 5464.9233]
2025-09-16 13:32:22,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 764.0, 540.0, 257.0, 647.0, 1000.0, 595.0, 1000.0, 1000.0]
2025-09-16 13:32:22,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 40 minutes, 20 seconds)
2025-09-16 13:34:19,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:34:29,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3752.85620 ± 2024.883
2025-09-16 13:34:29,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5363.5674, 5378.6006, 798.4512, 789.0569, 1128.2854, 5350.2964, 5366.428, 5362.8516, 5317.9697, 2673.0532]
2025-09-16 13:34:29,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 146.0, 147.0, 222.0, 1000.0, 1000.0, 1000.0, 1000.0, 489.0]
2025-09-16 13:34:29,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 38 minutes, 24 seconds)
2025-09-16 13:36:18,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:36:26,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3095.94092 ± 1935.605
2025-09-16 13:36:26,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5506.564, 5447.2686, 1151.0088, 2227.9563, 5430.846, 1356.632, 1651.6754, 5375.501, 1162.6013, 1649.3556]
2025-09-16 13:36:26,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 218.0, 411.0, 1000.0, 261.0, 297.0, 1000.0, 220.0, 306.0]
2025-09-16 13:36:26,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 35 minutes, 13 seconds)
2025-09-16 13:38:27,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:38:38,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4094.38428 ± 1591.249
2025-09-16 13:38:38,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1519.5311, 5424.041, 3356.8633, 5337.2007, 4738.765, 3582.2837, 5333.8633, 5217.3877, 1001.1388, 5432.764]
2025-09-16 13:38:38,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [281.0, 1000.0, 631.0, 1000.0, 923.0, 673.0, 1000.0, 1000.0, 187.0, 1000.0]
2025-09-16 13:38:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 32 minutes, 23 seconds)
2025-09-16 13:40:30,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:40:44,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5366.45264 ± 32.162
2025-09-16 13:40:44,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5387.5327, 5363.8125, 5431.82, 5343.2217, 5384.5435, 5356.637, 5370.8096, 5298.6147, 5360.52, 5367.014]
2025-09-16 13:40:44,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:40:44,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5366.45) for latency 6
2025-09-16 13:40:44,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 30 minutes, 57 seconds)
2025-09-16 13:42:34,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:42:47,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5079.68848 ± 735.511
2025-09-16 13:42:47,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2874.3894, 5296.532, 5361.4546, 5333.381, 5329.276, 5334.9053, 5317.1553, 5268.6973, 5342.0996, 5338.9917]
2025-09-16 13:42:47,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [565.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:42:47,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 27 minutes, 28 seconds)
2025-09-16 13:44:47,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:44:59,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4964.23682 ± 1326.568
2025-09-16 13:44:59,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5457.8037, 989.3011, 5470.13, 5440.5107, 5445.384, 5363.486, 5264.4185, 5433.179, 5457.826, 5320.326]
2025-09-16 13:44:59,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 198.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:44:59,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 26 minutes, 13 seconds)
2025-09-16 13:46:57,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:47:11,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5299.57812 ± 65.877
2025-09-16 13:47:11,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5327.8604, 5361.24, 5135.3975, 5255.3516, 5275.849, 5354.954, 5358.3726, 5292.3823, 5284.586, 5349.7896]
2025-09-16 13:47:11,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:47:11,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 26 minutes, 1 second)
2025-09-16 13:49:07,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:49:17,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3901.74854 ± 1985.812
2025-09-16 13:49:17,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5458.8096, 5372.232, 3229.334, 5493.2656, 1787.9799, 5475.8203, 745.90533, 5406.3716, 662.48315, 5385.282]
2025-09-16 13:49:17,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 600.0, 1000.0, 339.0, 1000.0, 151.0, 1000.0, 119.0, 1000.0]
2025-09-16 13:49:17,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 23 minutes, 5 seconds)
2025-09-16 13:51:16,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:51:28,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5134.42480 ± 1285.958
2025-09-16 13:51:28,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5537.917, 5558.031, 5574.211, 5606.179, 5553.366, 5562.195, 1277.0159, 5554.3105, 5536.3306, 5584.6924]
2025-09-16 13:51:28,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 232.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:51:28,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 21 minutes, 39 seconds)
2025-09-16 13:53:21,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:53:35,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5457.21582 ± 29.373
2025-09-16 13:53:35,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5498.9995, 5423.5312, 5451.22, 5455.1064, 5473.6714, 5433.6416, 5482.2817, 5430.122, 5420.1025, 5503.479]
2025-09-16 13:53:35,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:53:35,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5457.22) for latency 6
2025-09-16 13:53:35,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 19 minutes, 54 seconds)
2025-09-16 13:55:33,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:55:45,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4831.48438 ± 1129.529
2025-09-16 13:55:45,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5566.218, 5486.631, 5429.543, 5455.2417, 1958.2584, 5483.462, 4035.3599, 5407.2437, 3915.098, 5577.7886]
2025-09-16 13:55:45,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 367.0, 1000.0, 752.0, 1000.0, 717.0, 1000.0]
2025-09-16 13:55:45,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 17 minutes, 27 seconds)
2025-09-16 13:57:42,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:57:55,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5113.66260 ± 1037.724
2025-09-16 13:57:55,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5456.0166, 5445.2656, 5466.6826, 5478.9434, 5445.1475, 5503.6626, 5463.9697, 2001.0753, 5425.7466, 5450.113]
2025-09-16 13:57:55,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 421.0, 1000.0, 1000.0]
2025-09-16 13:57:55,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 15 minutes, 9 seconds)
2025-09-16 13:59:42,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:59:56,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5420.71484 ± 30.683
2025-09-16 13:59:56,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5451.4463, 5383.794, 5430.472, 5431.1763, 5481.297, 5422.8247, 5408.732, 5419.293, 5412.2666, 5365.8457]
2025-09-16 13:59:56,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:59:56,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 12 minutes, 20 seconds)
2025-09-16 14:02:02,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:02:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5260.12793 ± 447.642
2025-09-16 14:02:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5556.649, 5541.5986, 4885.3896, 5567.603, 5444.8945, 4713.3994, 5495.8164, 5580.434, 5564.324, 4251.1753]
2025-09-16 14:02:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 899.0, 1000.0, 1000.0, 859.0, 1000.0, 1000.0, 1000.0, 780.0]
2025-09-16 14:02:15,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 11 minutes, 5 seconds)
2025-09-16 14:04:06,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:04:20,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5316.45068 ± 37.511
2025-09-16 14:04:20,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5284.9375, 5291.633, 5313.8325, 5327.197, 5345.323, 5393.081, 5327.9766, 5318.705, 5319.4453, 5242.377]
2025-09-16 14:04:20,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:04:20,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 8 minutes, 51 seconds)
2025-09-16 14:06:19,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:06:33,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5542.03320 ± 44.863
2025-09-16 14:06:33,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5520.89, 5542.946, 5513.285, 5537.2773, 5567.1313, 5504.822, 5642.189, 5475.68, 5588.616, 5527.4976]
2025-09-16 14:06:33,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:06:33,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5542.03) for latency 6
2025-09-16 14:06:33,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 6 minutes, 59 seconds)
2025-09-16 14:08:25,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:08:30,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2034.99146 ± 1919.979
2025-09-16 14:08:30,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [795.70557, 456.33813, 5307.4644, 423.70145, 5355.5605, 169.52544, 613.2504, 3293.939, 1128.0891, 2806.3394]
2025-09-16 14:08:30,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 93.0, 1000.0, 88.0, 1000.0, 33.0, 122.0, 625.0, 207.0, 526.0]
2025-09-16 14:08:30,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 3 minutes, 31 seconds)
2025-09-16 14:10:18,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:10:21,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 904.60583 ± 743.960
2025-09-16 14:10:21,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1189.1711, 2852.0188, 680.6245, 395.39005, 626.692, 476.773, 449.57407, 443.59088, 389.71414, 1542.5101]
2025-09-16 14:10:21,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [256.0, 526.0, 149.0, 83.0, 115.0, 104.0, 87.0, 84.0, 81.0, 280.0]
2025-09-16 14:10:21,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 25 seconds)
2025-09-16 14:12:17,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:12:31,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5395.67529 ± 31.114
2025-09-16 14:12:31,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5386.3926, 5382.6934, 5406.0786, 5423.2974, 5390.9326, 5354.4717, 5336.319, 5412.1333, 5445.8203, 5418.617]
2025-09-16 14:12:31,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:12:31,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 57 minutes, 29 seconds)
2025-09-16 14:14:27,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:14:40,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5177.37793 ± 604.533
2025-09-16 14:14:40,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5395.127, 3367.8503, 5420.8647, 5343.9634, 5369.33, 5361.644, 5432.0137, 5317.291, 5327.603, 5438.0933]
2025-09-16 14:14:40,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 646.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:14:40,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 55 minutes, 48 seconds)
2025-09-16 14:16:45,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:16:58,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5100.48096 ± 1287.918
2025-09-16 14:16:58,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5508.626, 5538.51, 5535.1807, 5524.0864, 5503.3203, 5529.5415, 5582.7617, 5527.0854, 5518.4805, 1237.2188]
2025-09-16 14:16:58,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 241.0]
2025-09-16 14:16:58,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 54 minutes, 9 seconds)
2025-09-16 14:18:47,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:19:00,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4969.40479 ± 1389.342
2025-09-16 14:19:00,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [802.2409, 5461.297, 5411.212, 5460.574, 5449.3003, 5481.8584, 5405.0054, 5418.2046, 5387.9927, 5416.365]
2025-09-16 14:19:00,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:19:00,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 52 minutes, 29 seconds)
2025-09-16 14:20:57,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:21:09,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4982.14160 ± 1204.836
2025-09-16 14:21:09,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5621.799, 5578.626, 5578.834, 5560.0396, 2148.6924, 5616.963, 5589.498, 5578.6704, 5478.859, 3069.4353]
2025-09-16 14:21:09,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 386.0, 1000.0, 1000.0, 1000.0, 1000.0, 552.0]
2025-09-16 14:21:09,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 51 minutes, 54 seconds)
2025-09-16 14:23:05,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:23:18,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5003.07422 ± 1141.150
2025-09-16 14:23:18,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1580.667, 5353.056, 5404.1274, 5393.598, 5378.497, 5401.8286, 5376.22, 5344.221, 5354.347, 5444.174]
2025-09-16 14:23:18,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [336.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:23:18,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 39 seconds)
2025-09-16 14:25:16,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:25:30,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5477.60010 ± 127.185
2025-09-16 14:25:30,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5474.9126, 5494.3696, 5549.976, 5534.018, 5557.575, 5601.7744, 5459.6396, 5469.233, 5516.522, 5117.9785]
2025-09-16 14:25:30,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 927.0]
2025-09-16 14:25:30,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 37 seconds)
2025-09-16 14:27:27,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:27:41,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5332.60498 ± 43.587
2025-09-16 14:27:41,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5314.8574, 5412.9595, 5336.1904, 5382.7583, 5254.8403, 5338.7583, 5281.176, 5309.8105, 5344.7827, 5349.919]
2025-09-16 14:27:41,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:27:41,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 45 minutes)
2025-09-16 14:29:40,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:29:52,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5118.23291 ± 1311.461
2025-09-16 14:29:52,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1184.2262, 5542.968, 5564.9644, 5537.604, 5577.788, 5523.432, 5567.545, 5569.691, 5536.6025, 5577.5093]
2025-09-16 14:29:52,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [222.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:29:52,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 28 seconds)
2025-09-16 14:31:43,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:31:56,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4776.76123 ± 1345.813
2025-09-16 14:31:56,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5555.049, 5519.629, 5505.479, 1168.7474, 3840.065, 5538.152, 5543.4014, 5391.6196, 4155.849, 5549.6245]
2025-09-16 14:31:56,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 243.0, 695.0, 1000.0, 1000.0, 1000.0, 753.0, 1000.0]
2025-09-16 14:31:56,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 40 minutes, 55 seconds)
2025-09-16 14:33:47,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:34:00,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5281.57227 ± 809.568
2025-09-16 14:34:00,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5557.8784, 5547.626, 5589.382, 2863.3726, 5593.2056, 5342.8984, 5577.2544, 5529.0894, 5642.144, 5572.8755]
2025-09-16 14:34:00,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 519.0, 1000.0, 959.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:34:00,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 30 seconds)
2025-09-16 14:36:02,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:36:16,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5625.54980 ± 30.047
2025-09-16 14:36:16,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5638.0093, 5644.363, 5632.0312, 5599.228, 5567.1167, 5648.895, 5654.147, 5601.845, 5601.2466, 5668.6147]
2025-09-16 14:36:16,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:36:16,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5625.55) for latency 6
2025-09-16 14:36:16,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 36 minutes, 37 seconds)
2025-09-16 14:38:12,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:38:25,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5010.56348 ± 1023.405
2025-09-16 14:38:25,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5417.345, 5448.853, 2214.2478, 5550.5874, 4069.3267, 5573.647, 5476.0684, 5463.017, 5467.64, 5424.902]
2025-09-16 14:38:25,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 403.0, 1000.0, 739.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:38:25,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 34 minutes, 20 seconds)
2025-09-16 14:40:18,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:40:32,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5241.84473 ± 768.025
2025-09-16 14:40:32,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5454.526, 5485.796, 5473.803, 5520.9224, 5242.1567, 5597.654, 2955.7104, 5545.1694, 5567.832, 5574.88]
2025-09-16 14:40:32,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 549.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:40:32,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 31 minutes, 58 seconds)
2025-09-16 14:42:29,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:42:43,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5503.09814 ± 56.585
2025-09-16 14:42:43,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5490.2974, 5496.664, 5548.504, 5517.0537, 5470.848, 5424.381, 5642.2783, 5461.51, 5468.9683, 5510.476]
2025-09-16 14:42:43,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:42:43,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 11 seconds)
2025-09-16 14:44:37,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:44:51,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5314.10693 ± 54.225
2025-09-16 14:44:51,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5240.148, 5359.8276, 5364.0977, 5329.187, 5246.0913, 5356.8447, 5350.425, 5221.3965, 5366.056, 5306.994]
2025-09-16 14:44:51,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:44:51,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 11 seconds)
2025-09-16 14:46:47,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:47:00,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5230.36914 ± 1201.916
2025-09-16 14:47:00,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5685.8984, 5653.1123, 5647.912, 5666.8027, 5671.8867, 5630.7188, 5433.919, 1630.3824, 5654.0483, 5629.011]
2025-09-16 14:47:00,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 296.0, 1000.0, 1000.0]
2025-09-16 14:47:00,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 44 seconds)
2025-09-16 14:48:55,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:49:08,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5160.43457 ± 1322.136
2025-09-16 14:49:08,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5593.7744, 5573.852, 5627.3237, 5624.1025, 5685.4326, 5663.512, 5550.568, 5510.612, 5578.3833, 1196.7799]
2025-09-16 14:49:08,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 245.0]
2025-09-16 14:49:08,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 33 seconds)
2025-09-16 14:51:04,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:51:18,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5326.82715 ± 29.395
2025-09-16 14:51:18,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5266.6045, 5313.1763, 5344.8477, 5319.7197, 5332.757, 5332.9414, 5355.88, 5289.7207, 5341.8154, 5370.803]
2025-09-16 14:51:18,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:51:18,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 32 seconds)
2025-09-16 14:53:11,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:53:25,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5271.93457 ± 766.353
2025-09-16 14:53:25,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5542.958, 5515.788, 5352.6875, 5504.419, 5519.234, 5555.7637, 2981.392, 5610.9604, 5588.634, 5547.511]
2025-09-16 14:53:25,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 971.0, 1000.0, 1000.0, 1000.0, 537.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:53:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 15 seconds)
2025-09-16 14:55:28,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:55:42,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5547.89355 ± 82.116
2025-09-16 14:55:42,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5540.8486, 5307.061, 5552.432, 5589.9307, 5590.918, 5597.1284, 5588.954, 5572.2104, 5563.1064, 5576.343]
2025-09-16 14:55:42,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:55:42,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 22 seconds)
2025-09-16 14:57:38,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:57:52,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5531.86621 ± 32.039
2025-09-16 14:57:52,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5567.6177, 5520.569, 5566.301, 5509.653, 5540.5063, 5560.0728, 5553.2373, 5537.781, 5500.657, 5462.2686]
2025-09-16 14:57:52,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:57:52,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 13 seconds)
2025-09-16 14:59:46,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:59:58,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4542.02637 ± 1527.278
2025-09-16 14:59:58,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5219.2197, 5253.0303, 5327.2393, 5318.864, 5276.4834, 877.32214, 5289.8267, 2217.9175, 5305.597, 5334.7603]
2025-09-16 14:59:58,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 171.0, 1000.0, 448.0, 1000.0, 1000.0]
2025-09-16 14:59:58,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes)
2025-09-16 15:01:50,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:02:04,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5538.53564 ± 28.077
2025-09-16 15:02:04,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5521.5283, 5479.067, 5591.2686, 5547.9185, 5548.3613, 5523.731, 5541.513, 5560.5703, 5522.2393, 5549.1562]
2025-09-16 15:02:04,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:02:04,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 46 seconds)
2025-09-16 15:04:04,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:04:17,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5171.07910 ± 1019.104
2025-09-16 15:04:17,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5657.8213, 2824.418, 5747.231, 5604.6704, 5688.1772, 5669.8315, 3487.2512, 5678.4287, 5710.794, 5642.172]
2025-09-16 15:04:17,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 503.0, 1000.0, 987.0, 1000.0, 1000.0, 625.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:04:17,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 41 seconds)
2025-09-16 15:06:05,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:06:17,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5115.35791 ± 1165.507
2025-09-16 15:06:17,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5580.4087, 5545.849, 5597.0933, 5607.7085, 5673.8457, 5592.9688, 1772.7542, 5640.453, 5669.3164, 4473.1846]
2025-09-16 15:06:17,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 309.0, 1000.0, 1000.0, 811.0]
2025-09-16 15:06:17,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 20 seconds)
2025-09-16 15:08:17,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:08:30,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4907.57715 ± 1321.152
2025-09-16 15:08:30,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5364.355, 5348.451, 945.02856, 5388.304, 5305.3486, 5348.3984, 5345.2603, 5291.314, 5377.387, 5361.923]
2025-09-16 15:08:30,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 196.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:08:30,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 15 seconds)
2025-09-16 15:10:26,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:10:39,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4963.12988 ± 1379.558
2025-09-16 15:10:39,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5501.813, 894.9318, 5500.0474, 5511.3374, 5506.607, 5523.3853, 5537.3354, 5514.222, 5481.122, 4660.5]
2025-09-16 15:10:39,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 183.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 837.0]
2025-09-16 15:10:39,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 8 seconds)
2025-09-16 15:12:35,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:12:46,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4587.19873 ± 1977.676
2025-09-16 15:12:46,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [669.6984, 5546.955, 5539.228, 5562.012, 5589.4683, 5638.6733, 5582.718, 5588.6323, 594.9847, 5559.616]
2025-09-16 15:12:46,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 106.0, 1000.0]
2025-09-16 15:12:46,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1251 [DEBUG]: Training session finished
