2025-09-16 12:06:10,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.075-delay_9
2025-09-16 12:06:10,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.075-delay_9
2025-09-16 12:06:10,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x154ca16c4490>}
2025-09-16 12:06:10,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:06:10,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:06:10,212 baseline-bpql-noisepromille75-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=529, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:06:10,212 baseline-bpql-noisepromille75-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:06:13,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:06:13,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:07:58,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:08:00,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 415.24200 ± 46.405
2025-09-16 12:08:00,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [413.50485, 464.35846, 390.9984, 371.43878, 372.76447, 503.27158, 442.04666, 439.47882, 415.16052, 339.39728]
2025-09-16 12:08:00,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 89.0, 72.0, 69.0, 69.0, 95.0, 86.0, 83.0, 78.0, 63.0]
2025-09-16 12:08:00,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (415.24) for latency 9
2025-09-16 12:08:00,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 56 minutes, 32 seconds)
2025-09-16 12:09:55,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:09:55,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 343.27206 ± 33.130
2025-09-16 12:09:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [338.4495, 331.47208, 350.5808, 330.97604, 423.89233, 301.73068, 331.07086, 303.08426, 362.94016, 358.52377]
2025-09-16 12:09:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 62.0, 66.0, 69.0, 83.0, 57.0, 60.0, 57.0, 68.0, 67.0]
2025-09-16 12:09:55,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 2 minutes, 1 second)
2025-09-16 12:11:51,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:11:52,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 299.53107 ± 74.298
2025-09-16 12:11:52,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [309.71802, 351.38123, 336.7392, 148.71637, 359.72327, 333.2299, 314.1627, 338.01898, 344.74127, 158.8798]
2025-09-16 12:11:52,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 77.0, 62.0, 29.0, 72.0, 61.0, 60.0, 64.0, 68.0, 33.0]
2025-09-16 12:11:52,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 2 minutes, 41 seconds)
2025-09-16 12:13:49,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:13:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 464.05536 ± 73.531
2025-09-16 12:13:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [588.498, 378.04208, 445.658, 593.3173, 372.53943, 498.74207, 466.27756, 425.6449, 408.57553, 463.25882]
2025-09-16 12:13:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 74.0, 82.0, 110.0, 74.0, 100.0, 89.0, 79.0, 81.0, 90.0]
2025-09-16 12:13:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (464.06) for latency 9
2025-09-16 12:13:50,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 4 seconds)
2025-09-16 12:15:47,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:15:48,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 400.83856 ± 63.346
2025-09-16 12:15:48,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [438.62595, 386.34415, 445.11832, 350.037, 377.2654, 352.9089, 345.14975, 414.5869, 556.72015, 341.6291]
2025-09-16 12:15:48,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 86.0, 90.0, 75.0, 71.0, 68.0, 71.0, 84.0, 109.0, 65.0]
2025-09-16 12:15:48,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 6 seconds)
2025-09-16 12:17:45,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:17:46,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 436.30005 ± 62.492
2025-09-16 12:17:46,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [516.7944, 331.40152, 486.752, 545.785, 396.8362, 384.47815, 403.59534, 454.86035, 442.771, 399.72638]
2025-09-16 12:17:46,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 63.0, 91.0, 105.0, 76.0, 72.0, 77.0, 87.0, 84.0, 78.0]
2025-09-16 12:17:46,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 46 seconds)
2025-09-16 12:19:43,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:19:44,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 391.01254 ± 87.618
2025-09-16 12:19:44,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [426.66956, 608.7852, 450.50818, 311.55078, 385.48138, 288.36856, 337.60815, 406.5907, 366.4305, 328.1325]
2025-09-16 12:19:44,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 130.0, 91.0, 57.0, 84.0, 53.0, 62.0, 82.0, 68.0, 62.0]
2025-09-16 12:19:44,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 2 minutes, 26 seconds)
2025-09-16 12:21:41,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:21:42,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 380.50269 ± 76.249
2025-09-16 12:21:42,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [347.62643, 396.2361, 260.95615, 402.2512, 388.73752, 284.2189, 348.2962, 528.4905, 369.48553, 478.72824]
2025-09-16 12:21:42,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 86.0, 52.0, 80.0, 80.0, 57.0, 65.0, 115.0, 70.0, 90.0]
2025-09-16 12:21:42,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 1 minute, 13 seconds)
2025-09-16 12:23:40,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:23:41,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 435.37900 ± 58.647
2025-09-16 12:23:41,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [470.61987, 460.3622, 569.0255, 390.9728, 468.66675, 405.943, 372.63577, 445.77316, 412.99542, 356.79556]
2025-09-16 12:23:41,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 96.0, 123.0, 72.0, 103.0, 77.0, 70.0, 95.0, 76.0, 67.0]
2025-09-16 12:23:41,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 59 minutes, 9 seconds)
2025-09-16 12:25:37,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:25:39,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 485.73120 ± 118.228
2025-09-16 12:25:39,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [353.75464, 385.24878, 461.95636, 743.04004, 399.0247, 549.29724, 485.23608, 388.19742, 453.02032, 638.5366]
2025-09-16 12:25:39,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 78.0, 85.0, 139.0, 80.0, 104.0, 105.0, 74.0, 91.0, 138.0]
2025-09-16 12:25:39,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (485.73) for latency 9
2025-09-16 12:25:39,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 57 minutes, 19 seconds)
2025-09-16 12:27:36,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:27:38,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 494.50082 ± 76.749
2025-09-16 12:27:38,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [573.2745, 536.98114, 350.66663, 441.10275, 497.0944, 547.02313, 541.7677, 399.07922, 455.295, 602.72375]
2025-09-16 12:27:38,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 99.0, 66.0, 84.0, 93.0, 106.0, 104.0, 73.0, 83.0, 116.0]
2025-09-16 12:27:38,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (494.50) for latency 9
2025-09-16 12:27:38,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 55 minutes, 28 seconds)
2025-09-16 12:29:35,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:29:36,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 536.83185 ± 145.448
2025-09-16 12:29:36,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [529.55316, 771.50195, 468.15216, 492.82037, 357.0877, 783.51776, 616.6133, 556.20514, 324.15604, 468.71027]
2025-09-16 12:29:36,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 152.0, 89.0, 92.0, 70.0, 149.0, 115.0, 101.0, 66.0, 91.0]
2025-09-16 12:29:36,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (536.83) for latency 9
2025-09-16 12:29:36,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 53 minutes, 42 seconds)
2025-09-16 12:31:34,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:31:35,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 541.79749 ± 128.683
2025-09-16 12:31:35,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [465.94116, 647.20605, 479.77655, 500.96652, 398.86765, 823.426, 665.78595, 410.232, 445.37234, 580.40045]
2025-09-16 12:31:35,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 125.0, 90.0, 97.0, 73.0, 156.0, 126.0, 76.0, 97.0, 108.0]
2025-09-16 12:31:35,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (541.80) for latency 9
2025-09-16 12:31:35,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 48 seconds)
2025-09-16 12:33:33,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:33:35,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 612.23566 ± 217.371
2025-09-16 12:33:35,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [646.3379, 631.78864, 907.9758, 363.2568, 429.54123, 465.45126, 1011.64496, 386.70404, 805.1524, 474.5032]
2025-09-16 12:33:35,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 110.0, 170.0, 67.0, 79.0, 96.0, 213.0, 73.0, 150.0, 85.0]
2025-09-16 12:33:35,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (612.24) for latency 9
2025-09-16 12:33:35,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 50 minutes, 15 seconds)
2025-09-16 12:35:34,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:35:35,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 548.83386 ± 106.508
2025-09-16 12:35:35,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [457.28644, 666.97736, 574.1264, 673.30786, 427.18146, 532.33545, 692.74713, 624.83026, 412.88306, 426.66248]
2025-09-16 12:35:35,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 138.0, 121.0, 135.0, 91.0, 109.0, 132.0, 113.0, 83.0, 93.0]
2025-09-16 12:35:35,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 49 minutes)
2025-09-16 12:37:33,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:37:34,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 528.38934 ± 70.840
2025-09-16 12:37:34,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [512.0577, 335.97852, 527.1555, 600.28143, 570.69684, 514.1867, 553.8755, 523.1554, 550.5575, 595.9484]
2025-09-16 12:37:34,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 71.0, 94.0, 110.0, 108.0, 94.0, 105.0, 100.0, 98.0, 110.0]
2025-09-16 12:37:34,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 5 seconds)
2025-09-16 12:39:33,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:39:35,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 597.82275 ± 91.506
2025-09-16 12:39:35,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [494.24545, 644.55786, 615.705, 584.5484, 762.03937, 537.0673, 583.52704, 733.42346, 455.9223, 567.19135]
2025-09-16 12:39:35,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 121.0, 111.0, 112.0, 148.0, 99.0, 104.0, 142.0, 97.0, 107.0]
2025-09-16 12:39:35,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 45 minutes, 39 seconds)
2025-09-16 12:41:34,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:41:35,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 555.51135 ± 144.033
2025-09-16 12:41:35,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [654.3949, 479.2182, 506.06937, 587.7182, 428.42648, 676.66187, 665.6018, 434.8019, 305.38342, 816.8379]
2025-09-16 12:41:35,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 90.0, 94.0, 107.0, 89.0, 119.0, 135.0, 83.0, 59.0, 160.0]
2025-09-16 12:41:35,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 44 minutes, 6 seconds)
2025-09-16 12:43:34,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:43:35,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 564.99353 ± 130.729
2025-09-16 12:43:35,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [657.2293, 554.4992, 431.62933, 437.69934, 580.6922, 513.451, 903.5203, 462.45093, 549.3666, 559.39685]
2025-09-16 12:43:35,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 103.0, 77.0, 81.0, 111.0, 111.0, 193.0, 88.0, 104.0, 114.0]
2025-09-16 12:43:36,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 42 minutes, 11 seconds)
2025-09-16 12:45:35,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:45:37,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 573.82867 ± 109.749
2025-09-16 12:45:37,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [686.894, 472.4444, 427.62003, 552.4263, 583.34186, 450.115, 622.345, 763.5436, 692.97845, 486.57797]
2025-09-16 12:45:37,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 86.0, 82.0, 104.0, 109.0, 99.0, 114.0, 146.0, 134.0, 87.0]
2025-09-16 12:45:37,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 40 minutes, 28 seconds)
2025-09-16 12:47:36,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:47:38,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 621.31610 ± 110.154
2025-09-16 12:47:38,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [621.89087, 590.6735, 580.5347, 673.153, 614.3254, 543.29474, 709.62823, 884.39307, 463.6461, 531.62177]
2025-09-16 12:47:38,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 110.0, 110.0, 130.0, 133.0, 99.0, 134.0, 184.0, 84.0, 110.0]
2025-09-16 12:47:38,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (621.32) for latency 9
2025-09-16 12:47:38,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 38 minutes, 52 seconds)
2025-09-16 12:49:37,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:49:38,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 570.06580 ± 203.713
2025-09-16 12:49:38,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1046.4252, 343.1907, 347.41788, 552.9851, 623.7903, 792.69995, 454.7725, 598.5998, 458.04346, 482.7332]
2025-09-16 12:49:38,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 71.0, 73.0, 106.0, 120.0, 151.0, 87.0, 121.0, 83.0, 94.0]
2025-09-16 12:49:38,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 36 minutes, 52 seconds)
2025-09-16 12:51:36,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:51:37,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 542.19049 ± 136.525
2025-09-16 12:51:37,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [634.67487, 433.4134, 696.9071, 658.69995, 501.89636, 299.19653, 338.50314, 611.39166, 559.24255, 687.979]
2025-09-16 12:51:37,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 82.0, 141.0, 127.0, 95.0, 60.0, 67.0, 112.0, 102.0, 131.0]
2025-09-16 12:51:37,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 30 seconds)
2025-09-16 12:53:37,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:53:39,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 663.21051 ± 159.266
2025-09-16 12:53:39,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [942.9503, 551.95013, 730.3156, 391.6614, 766.7281, 773.5084, 554.29004, 816.8768, 580.60547, 523.21875]
2025-09-16 12:53:39,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 106.0, 132.0, 73.0, 147.0, 158.0, 104.0, 159.0, 106.0, 95.0]
2025-09-16 12:53:39,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (663.21) for latency 9
2025-09-16 12:53:39,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 53 seconds)
2025-09-16 12:55:38,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:55:39,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 536.58215 ± 79.673
2025-09-16 12:55:39,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [648.7293, 526.3188, 637.42365, 409.0368, 484.96176, 587.13727, 629.6948, 473.58902, 461.36148, 507.56873]
2025-09-16 12:55:39,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 111.0, 119.0, 73.0, 87.0, 109.0, 127.0, 87.0, 96.0, 96.0]
2025-09-16 12:55:39,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 29 seconds)
2025-09-16 12:57:37,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:57:39,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 598.18018 ± 44.161
2025-09-16 12:57:39,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [614.72577, 623.9159, 510.90002, 623.60315, 648.7337, 662.3354, 575.6729, 557.4603, 604.0993, 560.3554]
2025-09-16 12:57:39,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 115.0, 97.0, 114.0, 124.0, 124.0, 106.0, 118.0, 110.0, 109.0]
2025-09-16 12:57:39,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 20 seconds)
2025-09-16 12:59:39,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:59:40,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 658.08405 ± 95.663
2025-09-16 12:59:40,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [543.58875, 541.6913, 673.4054, 696.4507, 797.83093, 718.699, 794.0819, 535.7155, 585.8311, 693.5459]
2025-09-16 12:59:40,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 117.0, 123.0, 141.0, 152.0, 140.0, 150.0, 101.0, 110.0, 138.0]
2025-09-16 12:59:40,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 32 seconds)
2025-09-16 13:01:39,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:01:41,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 670.60217 ± 143.133
2025-09-16 13:01:41,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [671.7558, 976.85913, 642.87, 479.96457, 613.97, 652.445, 865.5648, 582.5871, 516.9262, 703.0792]
2025-09-16 13:01:41,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 187.0, 120.0, 91.0, 116.0, 121.0, 186.0, 114.0, 95.0, 127.0]
2025-09-16 13:01:41,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (670.60) for latency 9
2025-09-16 13:01:41,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 48 seconds)
2025-09-16 13:03:41,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:03:43,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 694.47809 ± 166.894
2025-09-16 13:03:43,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [806.9268, 586.7014, 548.70685, 625.7317, 904.1193, 624.3411, 493.81754, 610.76794, 685.757, 1057.9115]
2025-09-16 13:03:43,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 109.0, 116.0, 127.0, 170.0, 128.0, 100.0, 119.0, 131.0, 199.0]
2025-09-16 13:03:43,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (694.48) for latency 9
2025-09-16 13:03:43,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 59 seconds)
2025-09-16 13:05:41,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:05:43,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 741.69153 ± 156.305
2025-09-16 13:05:43,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [860.2721, 796.6465, 922.1526, 1037.8208, 672.40424, 677.9613, 524.621, 621.54083, 550.8179, 752.67816]
2025-09-16 13:05:43,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 153.0, 173.0, 200.0, 126.0, 128.0, 111.0, 116.0, 108.0, 133.0]
2025-09-16 13:05:43,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (741.69) for latency 9
2025-09-16 13:05:43,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 56 seconds)
2025-09-16 13:07:43,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:07:45,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 656.36444 ± 103.093
2025-09-16 13:07:45,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [631.58105, 656.86926, 632.7409, 688.486, 572.6975, 576.67444, 822.8009, 827.13513, 473.96112, 680.6981]
2025-09-16 13:07:45,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 122.0, 112.0, 126.0, 104.0, 115.0, 156.0, 159.0, 90.0, 126.0]
2025-09-16 13:07:45,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 19 minutes, 17 seconds)
2025-09-16 13:09:44,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:09:46,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 873.29382 ± 315.777
2025-09-16 13:09:46,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [589.1516, 1158.6345, 692.9925, 680.0104, 1365.3079, 1407.3173, 691.6975, 613.26, 1006.86456, 527.7026]
2025-09-16 13:09:46,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 210.0, 129.0, 120.0, 269.0, 285.0, 129.0, 112.0, 204.0, 101.0]
2025-09-16 13:09:46,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (873.29) for latency 9
2025-09-16 13:09:46,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 17 minutes, 20 seconds)
2025-09-16 13:11:47,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:11:49,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 769.13354 ± 176.815
2025-09-16 13:11:49,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [845.63416, 1165.8651, 789.84265, 588.37994, 836.2242, 749.2188, 801.66394, 508.6889, 573.04877, 832.7691]
2025-09-16 13:11:49,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 216.0, 151.0, 113.0, 165.0, 139.0, 153.0, 97.0, 107.0, 155.0]
2025-09-16 13:11:49,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 15 minutes, 50 seconds)
2025-09-16 13:13:48,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:13:50,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 951.96570 ± 235.438
2025-09-16 13:13:50,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [994.8278, 807.0491, 1078.9313, 851.27704, 1121.2372, 907.986, 1313.9257, 1252.1083, 583.3068, 609.0092]
2025-09-16 13:13:50,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [200.0, 152.0, 207.0, 175.0, 213.0, 182.0, 242.0, 239.0, 124.0, 114.0]
2025-09-16 13:13:50,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (951.97) for latency 9
2025-09-16 13:13:50,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 32 seconds)
2025-09-16 13:15:49,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:15:51,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 799.82678 ± 215.466
2025-09-16 13:15:51,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1115.127, 515.4767, 847.1032, 997.4024, 810.891, 548.48834, 447.56342, 813.8733, 1019.4585, 882.88306]
2025-09-16 13:15:51,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 104.0, 160.0, 187.0, 145.0, 113.0, 92.0, 151.0, 195.0, 161.0]
2025-09-16 13:15:52,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes, 51 seconds)
2025-09-16 13:17:52,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:17:54,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 784.94293 ± 179.350
2025-09-16 13:17:54,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [765.23865, 749.5032, 564.82056, 1109.6841, 794.50305, 1031.5331, 771.1911, 515.36414, 900.4852, 647.1059]
2025-09-16 13:17:54,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 139.0, 123.0, 207.0, 147.0, 186.0, 159.0, 103.0, 186.0, 137.0]
2025-09-16 13:17:54,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 10 minutes, 6 seconds)
2025-09-16 13:19:54,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:19:56,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 813.62610 ± 256.298
2025-09-16 13:19:56,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [675.35803, 614.9934, 827.0474, 920.26556, 733.0309, 569.484, 1437.1344, 1079.401, 659.95233, 619.5943]
2025-09-16 13:19:56,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 114.0, 170.0, 183.0, 137.0, 118.0, 264.0, 211.0, 119.0, 119.0]
2025-09-16 13:19:56,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 8 minutes, 3 seconds)
2025-09-16 13:21:56,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:21:59,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1003.32666 ± 330.249
2025-09-16 13:21:59,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1879.0087, 1062.0435, 949.0534, 772.4221, 995.7855, 914.22974, 714.70355, 750.72876, 1231.0386, 764.2526]
2025-09-16 13:21:59,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [365.0, 198.0, 176.0, 160.0, 197.0, 175.0, 132.0, 151.0, 246.0, 139.0]
2025-09-16 13:21:59,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1003.33) for latency 9
2025-09-16 13:21:59,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 6 minutes, 2 seconds)
2025-09-16 13:23:58,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:24:00,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 968.77020 ± 293.417
2025-09-16 13:24:00,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1553.7264, 1085.9785, 606.92487, 1384.3483, 779.411, 996.59064, 640.0203, 745.16364, 1001.6113, 893.9271]
2025-09-16 13:24:00,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [310.0, 197.0, 114.0, 257.0, 144.0, 187.0, 121.0, 152.0, 191.0, 165.0]
2025-09-16 13:24:00,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 4 minutes)
2025-09-16 13:26:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:26:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 976.36249 ± 318.959
2025-09-16 13:26:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [703.32806, 751.2007, 1273.0254, 811.05817, 1295.4702, 663.0153, 1177.7383, 818.3386, 1615.2, 655.25024]
2025-09-16 13:26:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 152.0, 245.0, 147.0, 244.0, 124.0, 219.0, 150.0, 308.0, 122.0]
2025-09-16 13:26:03,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 2 minutes, 16 seconds)
2025-09-16 13:28:05,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:28:08,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1206.77502 ± 278.535
2025-09-16 13:28:08,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1001.0824, 1702.8229, 981.6477, 1167.6467, 1209.0356, 1503.4955, 1575.9126, 1026.765, 1101.8434, 797.4984]
2025-09-16 13:28:08,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 307.0, 186.0, 220.0, 229.0, 280.0, 322.0, 197.0, 221.0, 146.0]
2025-09-16 13:28:08,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1206.78) for latency 9
2025-09-16 13:28:08,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 40 seconds)
2025-09-16 13:30:06,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:30:09,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1180.27966 ± 365.580
2025-09-16 13:30:09,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1126.0402, 1266.8203, 811.04846, 930.73694, 576.60754, 1593.8267, 884.57574, 1785.6489, 1293.1545, 1534.338]
2025-09-16 13:30:09,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 241.0, 160.0, 183.0, 107.0, 304.0, 169.0, 352.0, 245.0, 315.0]
2025-09-16 13:30:09,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 58 minutes, 31 seconds)
2025-09-16 13:32:11,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:32:15,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1345.15674 ± 545.308
2025-09-16 13:32:15,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [803.3147, 1708.8864, 881.27386, 2276.4492, 585.87683, 1218.3105, 1169.1096, 1949.7942, 1909.8445, 948.7068]
2025-09-16 13:32:15,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 343.0, 183.0, 445.0, 112.0, 228.0, 225.0, 379.0, 366.0, 197.0]
2025-09-16 13:32:15,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1345.16) for latency 9
2025-09-16 13:32:15,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 56 minutes, 58 seconds)
2025-09-16 13:34:15,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:34:18,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1067.58081 ± 606.104
2025-09-16 13:34:18,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1174.385, 625.0947, 651.1326, 1007.2685, 1354.1194, 716.3654, 2722.9656, 999.77325, 920.47205, 504.231]
2025-09-16 13:34:18,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [238.0, 121.0, 122.0, 207.0, 247.0, 136.0, 519.0, 204.0, 188.0, 100.0]
2025-09-16 13:34:18,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 55 minutes, 17 seconds)
2025-09-16 13:36:19,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:36:23,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1527.63904 ± 398.965
2025-09-16 13:36:23,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1642.2336, 996.1273, 1272.9568, 1993.6873, 956.4965, 1400.6881, 2259.8882, 1602.7046, 1325.5553, 1826.0532]
2025-09-16 13:36:23,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [319.0, 182.0, 247.0, 374.0, 178.0, 265.0, 415.0, 299.0, 251.0, 348.0]
2025-09-16 13:36:23,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1527.64) for latency 9
2025-09-16 13:36:23,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 53 minutes, 36 seconds)
2025-09-16 13:38:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:38:25,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1445.93530 ± 652.286
2025-09-16 13:38:25,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [815.2383, 720.68835, 1438.6475, 1291.8893, 2278.0862, 2867.4414, 1573.0394, 1438.477, 1343.0946, 692.74994]
2025-09-16 13:38:25,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 132.0, 286.0, 234.0, 423.0, 549.0, 296.0, 266.0, 251.0, 141.0]
2025-09-16 13:38:25,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 51 minutes, 6 seconds)
2025-09-16 13:40:25,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:40:28,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1292.97412 ± 441.897
2025-09-16 13:40:28,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1182.8546, 1504.6808, 706.2407, 963.6901, 1413.1744, 1628.6445, 821.15796, 1270.386, 1115.6848, 2323.2273]
2025-09-16 13:40:28,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [226.0, 288.0, 128.0, 176.0, 274.0, 315.0, 156.0, 245.0, 217.0, 441.0]
2025-09-16 13:40:28,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 49 minutes, 24 seconds)
2025-09-16 13:42:29,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:42:32,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1257.50525 ± 296.756
2025-09-16 13:42:32,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1555.149, 1708.7147, 1359.6293, 1447.2568, 1503.1322, 1273.5234, 1043.5144, 862.41626, 770.8254, 1050.8921]
2025-09-16 13:42:32,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [293.0, 319.0, 249.0, 268.0, 290.0, 258.0, 197.0, 162.0, 154.0, 202.0]
2025-09-16 13:42:32,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 47 minutes, 6 seconds)
2025-09-16 13:44:35,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:44:40,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1762.00317 ± 834.014
2025-09-16 13:44:40,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1723.918, 1654.0585, 599.03345, 3063.3152, 824.806, 1990.6595, 1036.1898, 1818.1484, 3313.9736, 1595.9291]
2025-09-16 13:44:40,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [318.0, 325.0, 111.0, 589.0, 163.0, 393.0, 191.0, 351.0, 655.0, 312.0]
2025-09-16 13:44:40,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1762.00) for latency 9
2025-09-16 13:44:40,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 45 minutes, 42 seconds)
2025-09-16 13:46:37,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:46:41,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1409.72339 ± 485.170
2025-09-16 13:46:41,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2415.9639, 1371.7284, 1090.7051, 1059.5792, 1275.564, 1215.6669, 1618.7612, 2132.077, 749.577, 1167.6104]
2025-09-16 13:46:41,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [486.0, 275.0, 204.0, 208.0, 249.0, 255.0, 311.0, 414.0, 140.0, 228.0]
2025-09-16 13:46:41,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 43 minutes, 5 seconds)
2025-09-16 13:48:43,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:48:50,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2439.85645 ± 1154.700
2025-09-16 13:48:50,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2001.3352, 5157.2373, 3630.8906, 1129.6467, 1633.4408, 1989.659, 1921.4231, 3023.246, 2543.6392, 1368.0471]
2025-09-16 13:48:50,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [379.0, 1000.0, 698.0, 216.0, 311.0, 384.0, 366.0, 593.0, 494.0, 263.0]
2025-09-16 13:48:50,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (2439.86) for latency 9
2025-09-16 13:48:50,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 41 minutes, 58 seconds)
2025-09-16 13:50:58,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:51:02,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1800.32910 ± 1035.068
2025-09-16 13:51:02,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2302.2524, 938.98865, 1162.3988, 1104.436, 1014.9322, 2255.0916, 4476.735, 2265.42, 1219.7262, 1263.3099]
2025-09-16 13:51:02,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [426.0, 172.0, 222.0, 215.0, 222.0, 425.0, 833.0, 450.0, 223.0, 234.0]
2025-09-16 13:51:02,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 41 minutes, 25 seconds)
2025-09-16 13:53:03,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:53:07,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1523.90088 ± 904.953
2025-09-16 13:53:07,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1133.1052, 1272.6963, 4081.9546, 1234.1442, 1791.5068, 1097.8633, 1400.2734, 785.6784, 803.17535, 1638.6105]
2025-09-16 13:53:07,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 262.0, 788.0, 250.0, 361.0, 214.0, 254.0, 160.0, 167.0, 327.0]
2025-09-16 13:53:07,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 39 minutes, 24 seconds)
2025-09-16 13:55:03,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:55:11,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2853.99707 ± 1423.579
2025-09-16 13:55:11,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [795.6204, 3064.1313, 3230.0256, 1292.6427, 3652.6755, 1110.206, 4022.3699, 1869.5956, 4440.913, 5061.7905]
2025-09-16 13:55:11,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 583.0, 642.0, 260.0, 722.0, 217.0, 787.0, 365.0, 873.0, 1000.0]
2025-09-16 13:55:11,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (2854.00) for latency 9
2025-09-16 13:55:11,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 36 minutes, 52 seconds)
2025-09-16 13:57:20,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:57:27,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2426.24609 ± 1634.044
2025-09-16 13:57:27,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1485.2202, 854.7761, 958.55347, 1694.4813, 1324.5476, 5201.284, 2753.5583, 3810.3647, 5191.9976, 987.67786]
2025-09-16 13:57:27,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [282.0, 175.0, 196.0, 344.0, 253.0, 1000.0, 527.0, 748.0, 1000.0, 202.0]
2025-09-16 13:57:27,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 36 minutes, 54 seconds)
2025-09-16 13:59:25,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:59:34,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3357.20068 ± 1074.084
2025-09-16 13:59:34,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [3957.1885, 1578.2354, 3648.1567, 1802.3179, 3705.52, 2379.3325, 3346.8425, 3598.4717, 4367.7764, 5188.162]
2025-09-16 13:59:34,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [772.0, 309.0, 689.0, 335.0, 707.0, 456.0, 645.0, 709.0, 826.0, 987.0]
2025-09-16 13:59:34,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (3357.20) for latency 9
2025-09-16 13:59:34,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 34 minutes, 31 seconds)
2025-09-16 14:01:38,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:01:48,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3625.46484 ± 1626.801
2025-09-16 14:01:48,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5088.8833, 764.7412, 5138.709, 4384.871, 4341.6577, 5089.1826, 5146.996, 1706.6882, 3034.0942, 1558.8243]
2025-09-16 14:01:48,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 146.0, 1000.0, 874.0, 848.0, 1000.0, 1000.0, 324.0, 574.0, 288.0]
2025-09-16 14:01:48,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (3625.46) for latency 9
2025-09-16 14:01:48,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 32 minutes, 35 seconds)
2025-09-16 14:03:49,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:03:59,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3634.21948 ± 1803.758
2025-09-16 14:03:59,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5174.471, 4658.3486, 1959.0784, 4680.4644, 5281.016, 820.52844, 5203.563, 5242.473, 2596.3196, 725.92957]
2025-09-16 14:03:59,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 885.0, 373.0, 884.0, 1000.0, 161.0, 1000.0, 1000.0, 489.0, 140.0]
2025-09-16 14:03:59,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (3634.22) for latency 9
2025-09-16 14:03:59,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 31 minutes, 19 seconds)
2025-09-16 14:06:05,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:06:14,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3459.83594 ± 1533.833
2025-09-16 14:06:14,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1965.9215, 955.9174, 5276.8027, 3805.873, 5283.7354, 2978.0134, 3588.3132, 4016.3267, 1401.5216, 5325.9365]
2025-09-16 14:06:14,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [355.0, 176.0, 1000.0, 701.0, 1000.0, 564.0, 669.0, 750.0, 257.0, 1000.0]
2025-09-16 14:06:14,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 30 minutes, 37 seconds)
2025-09-16 14:08:12,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:08:23,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3721.38672 ± 1758.585
2025-09-16 14:08:23,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5140.5317, 4872.242, 5097.8535, 2997.9956, 5090.0947, 682.76886, 851.6909, 5108.487, 2289.3977, 5082.8037]
2025-09-16 14:08:23,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 951.0, 1000.0, 561.0, 1000.0, 134.0, 172.0, 1000.0, 431.0, 1000.0]
2025-09-16 14:08:23,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (3721.39) for latency 9
2025-09-16 14:08:23,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 27 minutes, 25 seconds)
2025-09-16 14:10:17,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:10:27,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3368.22217 ± 1852.640
2025-09-16 14:10:27,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5078.564, 4133.314, 589.76843, 2112.654, 5134.9526, 1416.1924, 5127.94, 5131.851, 635.41504, 4321.572]
2025-09-16 14:10:27,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 812.0, 126.0, 429.0, 1000.0, 290.0, 1000.0, 1000.0, 135.0, 848.0]
2025-09-16 14:10:27,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 24 minutes, 54 seconds)
2025-09-16 14:12:32,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:12:46,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4611.91797 ± 1272.658
2025-09-16 14:12:46,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5012.993, 5024.2896, 5076.994, 5032.094, 5030.869, 794.2429, 5037.7314, 5038.165, 5027.759, 5044.0444]
2025-09-16 14:12:46,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 143.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:12:46,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (4611.92) for latency 9
2025-09-16 14:12:46,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 23 minutes, 18 seconds)
2025-09-16 14:14:51,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:15:02,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3410.46216 ± 1962.198
2025-09-16 14:15:02,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [824.9647, 5035.8564, 5021.313, 643.82153, 5043.5703, 2064.5776, 4698.556, 681.89496, 5043.1006, 5046.9644]
2025-09-16 14:15:02,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 1000.0, 1000.0, 138.0, 1000.0, 401.0, 943.0, 145.0, 1000.0, 1000.0]
2025-09-16 14:15:02,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 21 minutes, 40 seconds)
2025-09-16 14:16:59,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:17:11,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4091.33154 ± 1811.161
2025-09-16 14:17:11,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5189.663, 5175.7993, 5175.5127, 3359.5996, 5218.7314, 5184.3115, 750.28326, 5157.8403, 5181.688, 519.88855]
2025-09-16 14:17:11,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 640.0, 1000.0, 1000.0, 151.0, 1000.0, 1000.0, 101.0]
2025-09-16 14:17:11,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 18 minutes, 47 seconds)
2025-09-16 14:19:16,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:19:32,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5150.91309 ± 19.055
2025-09-16 14:19:32,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5183.436, 5124.0884, 5152.4106, 5143.2056, 5168.7666, 5124.745, 5176.253, 5140.854, 5151.1787, 5144.1943]
2025-09-16 14:19:32,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:19:32,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5150.91) for latency 9
2025-09-16 14:19:32,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 18 minutes, 3 seconds)
2025-09-16 14:21:29,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:21:40,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3867.97803 ± 1614.756
2025-09-16 14:21:40,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4672.8696, 5040.055, 945.70636, 1557.5897, 4372.201, 5077.751, 5116.6064, 5033.4766, 1825.6217, 5037.9014]
2025-09-16 14:21:40,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [936.0, 1000.0, 201.0, 319.0, 855.0, 1000.0, 1000.0, 1000.0, 370.0, 1000.0]
2025-09-16 14:21:40,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 16 minutes, 16 seconds)
2025-09-16 14:23:47,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:24:01,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4967.11865 ± 541.192
2025-09-16 14:24:01,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [3516.0002, 5266.596, 5184.7886, 5222.688, 5231.7524, 4414.782, 5283.884, 5150.962, 5160.0366, 5239.6978]
2025-09-16 14:24:01,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [668.0, 1000.0, 1000.0, 1000.0, 1000.0, 837.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:24:01,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 14 minutes, 14 seconds)
2025-09-16 14:26:02,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:26:16,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4872.75098 ± 1188.016
2025-09-16 14:26:16,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5364.09, 1389.9968, 5383.622, 5394.9478, 5125.3057, 5351.7075, 4543.5786, 5391.0435, 5408.1846, 5375.0347]
2025-09-16 14:26:16,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 293.0, 1000.0, 1000.0, 951.0, 1000.0, 859.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:26:16,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 11 minutes, 54 seconds)
2025-09-16 14:28:13,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:28:26,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4586.43555 ± 1521.669
2025-09-16 14:28:26,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5339.3525, 5318.7607, 5357.6987, 5314.7754, 756.94244, 2542.9707, 5308.962, 5307.8027, 5298.685, 5318.4053]
2025-09-16 14:28:26,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 146.0, 464.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:28:26,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 9 minutes, 45 seconds)
2025-09-16 14:30:32,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:30:47,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5058.77979 ± 26.288
2025-09-16 14:30:47,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5023.5103, 5061.4736, 5008.47, 5043.1357, 5063.4224, 5078.4937, 5062.6973, 5101.467, 5083.8765, 5061.252]
2025-09-16 14:30:47,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:30:47,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 7 minutes, 30 seconds)
2025-09-16 14:32:41,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:32:54,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4769.63916 ± 980.505
2025-09-16 14:32:54,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5356.104, 5373.9873, 5399.912, 5358.477, 3179.8381, 5345.0264, 5366.808, 2675.8816, 4292.6055, 5347.749]
2025-09-16 14:32:54,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 597.0, 1000.0, 1000.0, 498.0, 790.0, 1000.0]
2025-09-16 14:32:54,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 5 minutes, 7 seconds)
2025-09-16 14:35:01,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:35:14,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4710.95215 ± 1420.849
2025-09-16 14:35:14,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [521.4127, 5269.6025, 5319.1914, 5332.129, 4402.9565, 5224.9307, 5295.587, 5281.3535, 5247.279, 5215.0835]
2025-09-16 14:35:14,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 1000.0, 1000.0, 1000.0, 821.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:35:14,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 2 minutes, 47 seconds)
2025-09-16 14:37:17,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:37:30,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4246.95166 ± 1561.058
2025-09-16 14:37:30,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5175.0244, 5176.417, 3993.262, 1311.9323, 5186.332, 5193.543, 5200.31, 4943.743, 1099.8123, 5189.1426]
2025-09-16 14:37:30,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 777.0, 275.0, 1000.0, 1000.0, 1000.0, 935.0, 215.0, 1000.0]
2025-09-16 14:37:30,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 38 seconds)
2025-09-16 14:39:30,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:39:40,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3621.78711 ± 2038.029
2025-09-16 14:39:40,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5303.9214, 736.89136, 5267.836, 1560.3844, 5268.166, 5225.929, 5235.935, 5262.95, 1847.6046, 508.25082]
2025-09-16 14:39:40,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 154.0, 1000.0, 275.0, 1000.0, 1000.0, 1000.0, 1000.0, 343.0, 112.0]
2025-09-16 14:39:40,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 58 minutes, 22 seconds)
2025-09-16 14:41:37,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:41:47,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3596.44995 ± 1698.475
2025-09-16 14:41:47,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2828.173, 5090.955, 5103.842, 956.30884, 826.4006, 5083.3477, 3776.7908, 5114.5947, 2082.0054, 5102.082]
2025-09-16 14:41:47,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [574.0, 1000.0, 1000.0, 191.0, 181.0, 1000.0, 742.0, 1000.0, 420.0, 1000.0]
2025-09-16 14:41:47,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 55 minutes, 1 second)
2025-09-16 14:43:48,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:43:59,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3917.46875 ± 1649.646
2025-09-16 14:43:59,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4767.6323, 5325.237, 1950.4088, 1568.5156, 5361.397, 5422.5312, 5307.8223, 2141.1667, 5331.6997, 1998.2754]
2025-09-16 14:43:59,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [879.0, 1000.0, 371.0, 287.0, 1000.0, 1000.0, 1000.0, 400.0, 1000.0, 379.0]
2025-09-16 14:43:59,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 53 minutes, 9 seconds)
2025-09-16 14:45:56,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:46:09,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4719.59180 ± 1324.417
2025-09-16 14:46:09,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5131.5117, 5121.0435, 5163.882, 5143.2397, 5148.3906, 5178.556, 5262.9336, 5166.262, 747.98376, 5132.1123]
2025-09-16 14:46:09,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 142.0, 1000.0]
2025-09-16 14:46:09,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 50 minutes, 16 seconds)
2025-09-16 14:48:10,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:48:23,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4638.96973 ± 1338.299
2025-09-16 14:48:23,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5255.131, 5217.906, 5316.9897, 5325.0854, 5298.865, 5356.62, 2506.497, 1496.9629, 5320.0376, 5295.5996]
2025-09-16 14:48:23,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 475.0, 280.0, 1000.0, 1000.0]
2025-09-16 14:48:23,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 53 seconds)
2025-09-16 14:50:24,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:50:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5155.78857 ± 22.906
2025-09-16 14:50:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5158.432, 5152.379, 5180.3228, 5179.5215, 5132.5317, 5115.087, 5139.84, 5138.1133, 5185.5566, 5176.1055]
2025-09-16 14:50:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:50:39,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5155.79) for latency 9
2025-09-16 14:50:39,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 10 seconds)
2025-09-16 14:52:40,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:52:55,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5155.88232 ± 119.306
2025-09-16 14:52:55,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5218.76, 5168.5024, 5205.401, 5187.136, 5217.032, 5171.748, 5170.566, 4802.393, 5214.478, 5202.808]
2025-09-16 14:52:55,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 921.0, 1000.0, 1000.0]
2025-09-16 14:52:55,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5155.88) for latency 9
2025-09-16 14:52:55,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 44 minutes, 29 seconds)
2025-09-16 14:54:55,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:55:10,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5202.35645 ± 13.435
2025-09-16 14:55:10,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5177.865, 5199.204, 5218.2983, 5222.263, 5204.195, 5194.7, 5203.1265, 5219.1455, 5191.5776, 5193.1895]
2025-09-16 14:55:10,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:55:10,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5202.36) for latency 9
2025-09-16 14:55:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 42 minutes, 31 seconds)
2025-09-16 14:57:10,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:57:25,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5154.39160 ± 19.472
2025-09-16 14:57:25,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5132.8276, 5157.0093, 5145.195, 5136.063, 5180.447, 5149.111, 5197.49, 5136.185, 5156.028, 5153.5605]
2025-09-16 14:57:25,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:57:25,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 40 minutes, 33 seconds)
2025-09-16 14:59:18,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:59:32,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4652.32324 ± 1375.673
2025-09-16 14:59:32,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5101.5454, 5118.4004, 5122.966, 5103.4546, 5109.48, 5135.573, 5104.619, 5094.2915, 525.4451, 5107.457]
2025-09-16 14:59:32,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 115.0, 1000.0]
2025-09-16 14:59:32,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 54 seconds)
2025-09-16 15:01:43,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:01:57,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4581.44043 ± 1215.726
2025-09-16 15:01:57,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4951.7437, 5004.7017, 4982.382, 4970.7603, 5009.755, 4978.8467, 4996.137, 4996.3213, 934.58374, 4989.1724]
2025-09-16 15:01:57,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 199.0, 1000.0]
2025-09-16 15:01:57,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 36 minutes, 9 seconds)
2025-09-16 15:03:51,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:04:06,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5165.95850 ± 20.176
2025-09-16 15:04:06,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5169.7583, 5109.268, 5178.522, 5183.0435, 5167.214, 5163.368, 5182.2417, 5164.749, 5176.735, 5164.688]
2025-09-16 15:04:06,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:04:06,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 32 seconds)
2025-09-16 15:06:06,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:06:20,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4883.04541 ± 770.637
2025-09-16 15:06:20,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5138.0933, 2572.0994, 5137.8164, 5118.965, 5164.462, 5153.8853, 5115.695, 5107.0728, 5185.05, 5137.3154]
2025-09-16 15:06:20,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 480.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:06:20,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 15 seconds)
2025-09-16 15:08:18,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:08:33,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5264.70410 ± 30.017
2025-09-16 15:08:33,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5307.2036, 5262.231, 5216.741, 5249.1636, 5289.9893, 5271.0366, 5212.143, 5261.668, 5294.6963, 5282.168]
2025-09-16 15:08:33,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:08:33,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5264.70) for latency 9
2025-09-16 15:08:33,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 56 seconds)
2025-09-16 15:10:33,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:10:48,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5175.32568 ± 11.656
2025-09-16 15:10:48,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5173.521, 5164.9272, 5186.184, 5162.295, 5172.953, 5165.832, 5163.745, 5199.1006, 5188.078, 5176.6245]
2025-09-16 15:10:48,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:10:48,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 3 seconds)
2025-09-16 15:12:49,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:13:02,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4467.19482 ± 1535.227
2025-09-16 15:13:02,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5206.512, 5167.3228, 5183.8047, 1794.75, 1037.8809, 5277.797, 5250.11, 5268.2554, 5274.811, 5210.7026]
2025-09-16 15:13:02,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 352.0, 211.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:13:02,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 22 seconds)
2025-09-16 15:15:02,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:15:16,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4791.76123 ± 1375.757
2025-09-16 15:15:16,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5265.056, 5174.017, 5291.8594, 5216.371, 5266.3325, 5257.0073, 666.5922, 5340.802, 5222.8438, 5216.729]
2025-09-16 15:15:16,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 145.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:15:16,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 20 seconds)
2025-09-16 15:17:13,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:17:29,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5125.59131 ± 16.807
2025-09-16 15:17:29,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5136.7563, 5125.9653, 5100.6147, 5140.06, 5154.261, 5128.9087, 5123.92, 5111.648, 5135.3975, 5098.382]
2025-09-16 15:17:29,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:17:29,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 3 seconds)
2025-09-16 15:19:29,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:19:44,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5271.79736 ± 24.040
2025-09-16 15:19:44,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5265.296, 5271.9287, 5279.1553, 5269.202, 5271.4756, 5338.076, 5248.9146, 5267.512, 5248.2275, 5258.184]
2025-09-16 15:19:44,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:19:44,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5271.80) for latency 9
2025-09-16 15:19:44,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 52 seconds)
2025-09-16 15:21:44,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:21:59,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5176.36572 ± 10.244
2025-09-16 15:21:59,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5165.169, 5172.446, 5163.6255, 5175.656, 5171.265, 5178.65, 5185.858, 5177.9297, 5200.9414, 5172.1123]
2025-09-16 15:21:59,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:21:59,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 39 seconds)
2025-09-16 15:24:02,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:24:17,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5127.30029 ± 41.234
2025-09-16 15:24:17,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5130.868, 5132.421, 5071.653, 5151.6133, 5064.1113, 5112.284, 5084.2, 5187.7847, 5174.4395, 5163.628]
2025-09-16 15:24:17,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:24:17,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 30 seconds)
2025-09-16 15:26:20,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:26:34,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4630.97217 ± 1410.006
2025-09-16 15:26:34,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5276.635, 5168.4644, 5171.7305, 4330.2485, 5157.819, 5205.4956, 5229.915, 472.94012, 5118.204, 5178.2705]
2025-09-16 15:26:34,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 854.0, 1000.0, 1000.0, 1000.0, 93.0, 1000.0, 1000.0]
2025-09-16 15:26:34,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 17 seconds)
2025-09-16 15:28:38,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:28:53,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5228.17285 ± 17.675
2025-09-16 15:28:53,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5212.558, 5201.154, 5236.2354, 5214.947, 5206.4556, 5224.0737, 5249.4634, 5242.4346, 5242.6934, 5251.7153]
2025-09-16 15:28:53,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:28:53,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 7 seconds)
2025-09-16 15:30:48,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:31:01,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4816.62158 ± 1393.593
2025-09-16 15:31:01,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5311.676, 5292.269, 5304.8057, 5239.2607, 5272.358, 5251.9126, 5283.109, 5290.742, 636.3162, 5283.764]
2025-09-16 15:31:01,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 115.0, 1000.0]
2025-09-16 15:31:01,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 46 seconds)
2025-09-16 15:33:02,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:33:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4243.43896 ± 1759.156
2025-09-16 15:33:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5319.615, 5260.0044, 4041.8252, 5265.637, 5275.4873, 5134.056, 745.4653, 5296.435, 5239.713, 856.1546]
2025-09-16 15:33:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 768.0, 1000.0, 1000.0, 1000.0, 165.0, 1000.0, 1000.0, 153.0]
2025-09-16 15:33:14,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 29 seconds)
2025-09-16 15:35:14,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:35:29,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5242.37207 ± 30.589
2025-09-16 15:35:29,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5225.884, 5222.1523, 5265.867, 5264.0903, 5228.2065, 5262.2476, 5217.7705, 5308.4565, 5230.8594, 5198.1914]
2025-09-16 15:35:29,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:35:29,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 14 seconds)
2025-09-16 15:37:31,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:37:44,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4442.91211 ± 1449.935
2025-09-16 15:37:44,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5180.53, 5157.0464, 5151.6187, 887.9113, 5120.4155, 5127.2573, 5115.0996, 5190.0786, 5151.346, 2347.816]
2025-09-16 15:37:44,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 189.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 465.0]
2025-09-16 15:37:44,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1251 [DEBUG]: Training session finished
