2025-09-16 12:02:05,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.025-delay_9
2025-09-16 12:02:05,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.025-delay_9
2025-09-16 12:02:05,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x148f3508c710>}
2025-09-16 12:02:05,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:02:05,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:02:05,744 baseline-bpql-noisepromille25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=529, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:02:05,744 baseline-bpql-noisepromille25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:02:07,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:02:07,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:03:54,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:03:56,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 446.68872 ± 106.463
2025-09-16 12:03:56,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [524.40375, 311.31024, 378.08737, 387.92587, 444.19098, 622.03625, 451.7296, 627.6698, 326.7888, 392.74417]
2025-09-16 12:03:56,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 64.0, 80.0, 82.0, 92.0, 118.0, 93.0, 119.0, 69.0, 81.0]
2025-09-16 12:03:56,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (446.69) for latency 9
2025-09-16 12:03:56,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 59 minutes, 18 seconds)
2025-09-16 12:05:51,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:05:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 359.37921 ± 59.295
2025-09-16 12:05:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [300.49454, 468.5497, 370.7091, 329.08414, 347.40247, 406.58853, 424.14645, 266.64517, 377.61127, 302.56064]
2025-09-16 12:05:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 90.0, 70.0, 65.0, 66.0, 79.0, 81.0, 54.0, 73.0, 59.0]
2025-09-16 12:05:52,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 3 minutes, 32 seconds)
2025-09-16 12:07:48,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:07:49,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 384.10699 ± 46.320
2025-09-16 12:07:49,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [348.09628, 474.74115, 355.82785, 371.82013, 336.54352, 405.1386, 454.07303, 358.46317, 334.7998, 401.56674]
2025-09-16 12:07:49,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 97.0, 75.0, 78.0, 75.0, 88.0, 86.0, 78.0, 73.0, 89.0]
2025-09-16 12:07:49,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 4 minutes, 35 seconds)
2025-09-16 12:09:45,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:09:46,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 464.04745 ± 85.768
2025-09-16 12:09:46,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [392.89703, 431.57498, 537.2293, 558.2019, 386.72095, 471.19336, 349.96835, 543.5418, 368.3481, 600.7989]
2025-09-16 12:09:46,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 92.0, 103.0, 107.0, 74.0, 91.0, 66.0, 111.0, 71.0, 118.0]
2025-09-16 12:09:46,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (464.05) for latency 9
2025-09-16 12:09:47,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 51 seconds)
2025-09-16 12:11:43,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:11:44,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 482.44794 ± 115.517
2025-09-16 12:11:44,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [458.77078, 434.73514, 495.621, 341.81134, 376.37302, 442.5936, 765.8095, 400.72903, 571.2936, 536.74243]
2025-09-16 12:11:44,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 79.0, 91.0, 65.0, 80.0, 83.0, 154.0, 76.0, 107.0, 100.0]
2025-09-16 12:11:44,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (482.45) for latency 9
2025-09-16 12:11:44,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 42 seconds)
2025-09-16 12:13:40,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:13:41,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 409.02219 ± 83.193
2025-09-16 12:13:41,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [317.9379, 309.23712, 535.89496, 489.78915, 513.77155, 429.58096, 355.23032, 348.8048, 324.17407, 465.80096]
2025-09-16 12:13:41,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 67.0, 110.0, 91.0, 104.0, 90.0, 75.0, 75.0, 70.0, 87.0]
2025-09-16 12:13:41,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 29 seconds)
2025-09-16 12:15:37,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:15:39,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 467.64374 ± 118.325
2025-09-16 12:15:39,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [674.21735, 353.99664, 498.78967, 528.2705, 391.5228, 612.249, 338.84802, 558.7139, 310.0413, 409.78787]
2025-09-16 12:15:39,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 79.0, 91.0, 101.0, 85.0, 116.0, 74.0, 107.0, 70.0, 86.0]
2025-09-16 12:15:39,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 58 seconds)
2025-09-16 12:17:35,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:17:36,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 488.48331 ± 63.265
2025-09-16 12:17:36,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [422.59348, 537.62726, 382.90308, 599.97363, 517.7836, 485.73456, 529.10205, 410.46185, 478.6766, 519.97723]
2025-09-16 12:17:36,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 103.0, 77.0, 114.0, 111.0, 91.0, 101.0, 79.0, 103.0, 99.0]
2025-09-16 12:17:36,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (488.48) for latency 9
2025-09-16 12:17:36,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 59 minutes, 51 seconds)
2025-09-16 12:19:32,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:19:34,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 561.54388 ± 91.549
2025-09-16 12:19:34,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [663.4925, 676.56573, 537.2248, 429.84137, 594.9294, 671.6957, 489.62943, 419.26035, 608.4768, 524.3229]
2025-09-16 12:19:34,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 139.0, 103.0, 80.0, 114.0, 138.0, 94.0, 77.0, 133.0, 114.0]
2025-09-16 12:19:34,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (561.54) for latency 9
2025-09-16 12:19:34,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 58 minutes, 6 seconds)
2025-09-16 12:21:30,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:21:31,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 499.51825 ± 58.441
2025-09-16 12:21:31,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [489.187, 446.28754, 547.4261, 437.88535, 514.86786, 519.5351, 606.6093, 547.5144, 396.52722, 489.34256]
2025-09-16 12:21:31,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 84.0, 103.0, 81.0, 109.0, 106.0, 122.0, 102.0, 74.0, 90.0]
2025-09-16 12:21:31,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 9 seconds)
2025-09-16 12:23:28,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:23:29,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 464.97665 ± 150.327
2025-09-16 12:23:29,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [377.0204, 521.11993, 418.07257, 378.71814, 274.32407, 400.647, 686.1436, 791.2765, 428.1574, 374.28687]
2025-09-16 12:23:29,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 98.0, 79.0, 72.0, 54.0, 79.0, 149.0, 164.0, 79.0, 71.0]
2025-09-16 12:23:29,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 54 minutes, 23 seconds)
2025-09-16 12:25:26,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:25:28,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 531.17883 ± 83.073
2025-09-16 12:25:28,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [577.5702, 442.6748, 423.0395, 720.4625, 489.40887, 547.7143, 592.5374, 476.01544, 556.3851, 485.98053]
2025-09-16 12:25:28,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 82.0, 78.0, 142.0, 110.0, 103.0, 113.0, 88.0, 105.0, 91.0]
2025-09-16 12:25:28,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 52 minutes, 49 seconds)
2025-09-16 12:27:24,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:27:25,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 528.63855 ± 107.114
2025-09-16 12:27:25,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [436.20166, 601.49774, 514.0263, 600.70636, 388.09305, 510.6208, 525.195, 446.8944, 785.7034, 477.44702]
2025-09-16 12:27:25,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 116.0, 112.0, 114.0, 73.0, 112.0, 98.0, 86.0, 151.0, 91.0]
2025-09-16 12:27:25,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 50 minutes, 53 seconds)
2025-09-16 12:29:23,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:29:24,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 528.91333 ± 89.372
2025-09-16 12:29:24,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [485.5338, 438.01993, 528.4707, 486.85477, 511.64795, 559.6956, 773.60455, 475.7666, 472.24203, 557.2973]
2025-09-16 12:29:24,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 81.0, 97.0, 91.0, 96.0, 104.0, 144.0, 88.0, 87.0, 103.0]
2025-09-16 12:29:24,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 49 minutes, 10 seconds)
2025-09-16 12:31:20,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:31:22,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 679.16974 ± 117.190
2025-09-16 12:31:22,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [610.2393, 681.0944, 665.04803, 756.04645, 423.4591, 673.0334, 639.52814, 876.11316, 647.63495, 819.50055]
2025-09-16 12:31:22,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 146.0, 128.0, 145.0, 79.0, 131.0, 136.0, 168.0, 135.0, 157.0]
2025-09-16 12:31:22,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (679.17) for latency 9
2025-09-16 12:31:22,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 47 minutes, 20 seconds)
2025-09-16 12:33:19,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:33:20,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 616.22479 ± 118.876
2025-09-16 12:33:20,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [572.7308, 617.3933, 587.3858, 897.14197, 443.9709, 635.79114, 487.8697, 707.7485, 653.8838, 558.3318]
2025-09-16 12:33:20,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 116.0, 113.0, 171.0, 85.0, 120.0, 92.0, 133.0, 141.0, 121.0]
2025-09-16 12:33:20,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 45 minutes, 33 seconds)
2025-09-16 12:35:18,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:35:19,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 622.73669 ± 131.504
2025-09-16 12:35:19,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [939.34985, 527.127, 570.7676, 504.23618, 566.14343, 600.80975, 790.999, 508.04337, 584.17676, 635.7137]
2025-09-16 12:35:19,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 113.0, 117.0, 106.0, 104.0, 117.0, 162.0, 105.0, 108.0, 122.0]
2025-09-16 12:35:19,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 43 minutes, 41 seconds)
2025-09-16 12:37:16,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:37:18,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 676.02667 ± 191.125
2025-09-16 12:37:18,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [559.9505, 615.5187, 765.9069, 449.41452, 628.7141, 856.61084, 908.5356, 1017.4033, 532.5844, 425.62747]
2025-09-16 12:37:18,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 130.0, 142.0, 101.0, 133.0, 164.0, 176.0, 215.0, 115.0, 92.0]
2025-09-16 12:37:18,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 41 minutes, 56 seconds)
2025-09-16 12:39:15,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:39:16,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 657.51892 ± 113.519
2025-09-16 12:39:16,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [659.29535, 759.51166, 513.3737, 605.0184, 619.5751, 439.95407, 692.7101, 762.00714, 840.02405, 683.71906]
2025-09-16 12:39:16,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 154.0, 108.0, 116.0, 129.0, 94.0, 132.0, 145.0, 179.0, 143.0]
2025-09-16 12:39:16,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 39 minutes, 58 seconds)
2025-09-16 12:41:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:41:16,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 579.83679 ± 114.924
2025-09-16 12:41:16,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [593.6075, 494.93338, 449.00342, 838.73584, 569.6044, 685.2264, 429.87344, 543.54724, 646.56964, 547.2659]
2025-09-16 12:41:16,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 110.0, 96.0, 181.0, 123.0, 149.0, 96.0, 121.0, 126.0, 122.0]
2025-09-16 12:41:16,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 38 minutes, 23 seconds)
2025-09-16 12:43:13,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:43:15,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 673.01135 ± 113.708
2025-09-16 12:43:15,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [656.0485, 753.4099, 571.8942, 553.96783, 773.61993, 542.3409, 580.0644, 921.37317, 696.9063, 680.4885]
2025-09-16 12:43:15,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 142.0, 108.0, 103.0, 144.0, 102.0, 110.0, 182.0, 132.0, 128.0]
2025-09-16 12:43:15,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 36 minutes, 37 seconds)
2025-09-16 12:45:12,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:45:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 773.55579 ± 154.401
2025-09-16 12:45:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [877.1448, 853.9959, 580.26953, 966.5623, 934.93384, 695.5124, 644.99536, 513.60455, 727.0884, 941.4511]
2025-09-16 12:45:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 160.0, 107.0, 182.0, 173.0, 145.0, 120.0, 108.0, 140.0, 190.0]
2025-09-16 12:45:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (773.56) for latency 9
2025-09-16 12:45:14,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 31 seconds)
2025-09-16 12:47:12,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:47:14,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 731.00244 ± 132.063
2025-09-16 12:47:14,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [821.9937, 909.41064, 690.67944, 756.5042, 839.7146, 536.6966, 773.56036, 521.5306, 865.53534, 594.3993]
2025-09-16 12:47:14,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 173.0, 131.0, 141.0, 157.0, 102.0, 150.0, 97.0, 161.0, 112.0]
2025-09-16 12:47:14,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 3 seconds)
2025-09-16 12:49:12,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:49:14,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 689.97626 ± 267.855
2025-09-16 12:49:14,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [519.43604, 1296.9996, 537.7415, 1108.1068, 586.28033, 578.2041, 605.8387, 516.819, 442.4596, 707.8771]
2025-09-16 12:49:14,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 245.0, 100.0, 221.0, 110.0, 107.0, 112.0, 113.0, 98.0, 136.0]
2025-09-16 12:49:14,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 31 minutes, 24 seconds)
2025-09-16 12:51:11,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:51:12,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 468.78647 ± 58.166
2025-09-16 12:51:12,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [521.1026, 503.5286, 439.1235, 352.22885, 411.50262, 548.30505, 438.05164, 505.0406, 523.6432, 445.33798]
2025-09-16 12:51:12,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 99.0, 88.0, 69.0, 87.0, 102.0, 90.0, 98.0, 110.0, 83.0]
2025-09-16 12:51:12,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 5 seconds)
2025-09-16 12:53:09,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:53:11,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 660.35229 ± 75.333
2025-09-16 12:53:11,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [698.1773, 593.96106, 747.77594, 603.3511, 666.9398, 701.4673, 713.79395, 744.207, 492.95963, 640.88983]
2025-09-16 12:53:11,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 127.0, 159.0, 119.0, 129.0, 133.0, 143.0, 142.0, 93.0, 121.0]
2025-09-16 12:53:11,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 56 seconds)
2025-09-16 12:55:08,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:55:10,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 849.80927 ± 238.651
2025-09-16 12:55:10,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [759.2466, 876.83105, 707.6499, 702.68915, 1137.4987, 1254.1428, 668.4128, 1178.3132, 688.6275, 524.68134]
2025-09-16 12:55:10,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 172.0, 131.0, 133.0, 226.0, 247.0, 128.0, 230.0, 134.0, 109.0]
2025-09-16 12:55:10,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (849.81) for latency 9
2025-09-16 12:55:10,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 25 minutes, 9 seconds)
2025-09-16 12:57:09,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:57:11,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 720.78290 ± 83.771
2025-09-16 12:57:11,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [733.4082, 769.05145, 790.5083, 803.2761, 691.73846, 800.0254, 608.371, 753.3144, 724.9307, 533.2052]
2025-09-16 12:57:11,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 142.0, 150.0, 170.0, 135.0, 150.0, 131.0, 142.0, 137.0, 115.0]
2025-09-16 12:57:11,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 minutes, 19 seconds)
2025-09-16 12:59:07,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:59:09,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 841.12708 ± 139.459
2025-09-16 12:59:09,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [712.2292, 1013.0443, 772.88336, 978.7074, 806.69934, 1056.5133, 747.5046, 868.70026, 582.91595, 872.07306]
2025-09-16 12:59:09,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 187.0, 149.0, 190.0, 152.0, 198.0, 156.0, 163.0, 125.0, 163.0]
2025-09-16 12:59:09,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 55 seconds)
2025-09-16 13:01:08,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:01:10,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 741.07495 ± 172.244
2025-09-16 13:01:10,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [783.991, 859.30994, 500.50125, 794.3228, 697.4517, 628.99396, 781.2708, 524.8841, 1135.9897, 704.0343]
2025-09-16 13:01:10,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 161.0, 95.0, 154.0, 129.0, 120.0, 153.0, 105.0, 216.0, 130.0]
2025-09-16 13:01:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 31 seconds)
2025-09-16 13:03:06,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:03:08,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 721.08234 ± 101.674
2025-09-16 13:03:08,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [732.1612, 667.05304, 783.6894, 756.7689, 504.2163, 841.1975, 588.8925, 720.4626, 830.9139, 785.4679]
2025-09-16 13:03:08,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 128.0, 147.0, 164.0, 101.0, 173.0, 112.0, 147.0, 176.0, 155.0]
2025-09-16 13:03:08,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 17 minutes, 23 seconds)
2025-09-16 13:05:07,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:05:09,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 897.85272 ± 174.049
2025-09-16 13:05:09,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1007.62445, 769.7519, 702.0183, 952.53595, 751.898, 1318.0906, 931.07935, 980.51843, 764.0967, 800.91327]
2025-09-16 13:05:09,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 154.0, 138.0, 176.0, 143.0, 259.0, 175.0, 197.0, 141.0, 156.0]
2025-09-16 13:05:09,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (897.85) for latency 9
2025-09-16 13:05:09,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 43 seconds)
2025-09-16 13:07:08,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:07:10,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 896.31805 ± 185.163
2025-09-16 13:07:10,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [828.2874, 781.6992, 1110.439, 895.0844, 1009.44806, 895.4219, 720.1817, 1301.8413, 680.17804, 740.60004]
2025-09-16 13:07:10,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 164.0, 224.0, 177.0, 186.0, 167.0, 136.0, 269.0, 125.0, 159.0]
2025-09-16 13:07:10,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 45 seconds)
2025-09-16 13:09:07,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:09:09,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 741.54138 ± 197.709
2025-09-16 13:09:09,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [558.76086, 789.7882, 864.621, 545.9317, 1202.5233, 888.9662, 775.8048, 561.2105, 589.15454, 638.6529]
2025-09-16 13:09:09,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 143.0, 157.0, 109.0, 244.0, 160.0, 161.0, 110.0, 120.0, 116.0]
2025-09-16 13:09:09,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 50 seconds)
2025-09-16 13:11:08,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:11:11,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1112.77307 ± 323.683
2025-09-16 13:11:11,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [914.0346, 954.2336, 1258.1648, 1306.73, 1149.0352, 1922.4928, 840.60504, 987.0429, 1103.7947, 691.59717]
2025-09-16 13:11:11,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 183.0, 236.0, 241.0, 219.0, 380.0, 176.0, 203.0, 205.0, 131.0]
2025-09-16 13:11:11,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1112.77) for latency 9
2025-09-16 13:11:11,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 11 seconds)
2025-09-16 13:13:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:13:08,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 833.48633 ± 181.971
2025-09-16 13:13:08,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [908.5949, 766.8008, 1185.7429, 659.8826, 1123.1167, 639.5458, 713.3654, 700.77014, 898.63025, 738.41364]
2025-09-16 13:13:08,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 163.0, 236.0, 130.0, 222.0, 134.0, 156.0, 126.0, 185.0, 149.0]
2025-09-16 13:13:08,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 7 minutes, 56 seconds)
2025-09-16 13:15:05,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:15:08,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 810.11389 ± 179.595
2025-09-16 13:15:08,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [717.0312, 943.0414, 661.5079, 1203.1436, 606.4167, 872.3732, 892.42377, 612.2088, 678.2302, 914.76184]
2025-09-16 13:15:08,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 193.0, 134.0, 223.0, 111.0, 170.0, 172.0, 137.0, 129.0, 171.0]
2025-09-16 13:15:08,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 5 minutes, 41 seconds)
2025-09-16 13:17:07,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:17:09,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 852.34534 ± 186.734
2025-09-16 13:17:09,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1133.0632, 1235.8265, 666.9861, 940.23065, 675.81177, 798.8506, 866.88403, 786.06726, 682.2744, 737.45874]
2025-09-16 13:17:09,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 241.0, 125.0, 182.0, 127.0, 149.0, 163.0, 152.0, 124.0, 136.0]
2025-09-16 13:17:09,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 3 minutes, 48 seconds)
2025-09-16 13:19:07,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:19:10,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1173.20142 ± 250.797
2025-09-16 13:19:10,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1275.9369, 1194.8984, 1081.9006, 1218.642, 1262.0171, 1301.089, 1657.7009, 1221.1508, 716.21204, 802.46576]
2025-09-16 13:19:10,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [249.0, 228.0, 218.0, 230.0, 248.0, 268.0, 312.0, 251.0, 150.0, 152.0]
2025-09-16 13:19:10,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1173.20) for latency 9
2025-09-16 13:19:10,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 20 seconds)
2025-09-16 13:21:08,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:21:11,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1097.88855 ± 330.663
2025-09-16 13:21:11,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [913.2988, 1177.603, 1066.7004, 977.7151, 889.4868, 1244.7561, 649.3991, 1806.7847, 761.0493, 1492.0927]
2025-09-16 13:21:11,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 238.0, 209.0, 188.0, 186.0, 230.0, 119.0, 343.0, 141.0, 276.0]
2025-09-16 13:21:11,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 4 seconds)
2025-09-16 13:23:11,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:23:13,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 745.28088 ± 266.744
2025-09-16 13:23:13,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [839.3227, 413.6146, 938.5364, 642.33057, 1351.457, 826.91425, 389.35132, 605.35974, 614.89264, 831.02985]
2025-09-16 13:23:13,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 90.0, 180.0, 119.0, 258.0, 155.0, 76.0, 112.0, 114.0, 158.0]
2025-09-16 13:23:13,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 59 minutes, 5 seconds)
2025-09-16 13:25:12,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:25:14,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 863.62665 ± 305.616
2025-09-16 13:25:14,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [936.38275, 952.64825, 614.9457, 764.397, 666.8446, 1657.0242, 571.74896, 1001.40857, 587.35065, 883.5164]
2025-09-16 13:25:14,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 177.0, 114.0, 143.0, 122.0, 311.0, 107.0, 188.0, 105.0, 161.0]
2025-09-16 13:25:14,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 57 minutes, 16 seconds)
2025-09-16 13:27:09,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:27:15,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2026.86169 ± 1041.118
2025-09-16 13:27:15,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1755.819, 4093.6143, 1172.0094, 663.36584, 1671.0145, 3370.4202, 1345.024, 1971.7249, 1249.6947, 2975.9287]
2025-09-16 13:27:15,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [361.0, 801.0, 222.0, 128.0, 331.0, 642.0, 276.0, 401.0, 249.0, 577.0]
2025-09-16 13:27:15,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2026.86) for latency 9
2025-09-16 13:27:15,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 7 seconds)
2025-09-16 13:29:13,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:29:16,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1362.65344 ± 356.398
2025-09-16 13:29:16,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [940.9044, 799.433, 1685.2485, 1854.4933, 1367.2358, 1249.9379, 1217.6232, 1118.4098, 1459.2979, 1933.9502]
2025-09-16 13:29:16,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [213.0, 173.0, 322.0, 345.0, 258.0, 234.0, 251.0, 209.0, 276.0, 387.0]
2025-09-16 13:29:16,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 7 seconds)
2025-09-16 13:31:15,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:31:18,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1088.70740 ± 372.371
2025-09-16 13:31:18,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [708.13226, 1755.0453, 579.4695, 1376.3071, 1133.3339, 1136.8171, 606.4488, 979.1237, 1543.0803, 1069.3169]
2025-09-16 13:31:18,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 337.0, 128.0, 276.0, 237.0, 234.0, 127.0, 206.0, 294.0, 216.0]
2025-09-16 13:31:18,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 16 seconds)
2025-09-16 13:33:20,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:33:25,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1754.35974 ± 660.871
2025-09-16 13:33:25,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1223.8, 1854.9353, 1674.783, 3494.022, 942.7759, 1314.4816, 1529.5652, 1747.5333, 1648.0493, 2113.652]
2025-09-16 13:33:25,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [234.0, 356.0, 338.0, 673.0, 179.0, 252.0, 293.0, 340.0, 325.0, 429.0]
2025-09-16 13:33:25,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 50 minutes, 5 seconds)
2025-09-16 13:35:24,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:35:31,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2628.79932 ± 883.265
2025-09-16 13:35:31,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [858.7147, 3743.0588, 3029.7097, 2712.847, 3619.9568, 1931.5721, 2144.5537, 1911.9882, 3647.975, 2687.6172]
2025-09-16 13:35:31,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 706.0, 569.0, 502.0, 678.0, 361.0, 412.0, 358.0, 689.0, 503.0]
2025-09-16 13:35:31,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2628.80) for latency 9
2025-09-16 13:35:31,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 49 minutes, 1 second)
2025-09-16 13:37:31,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:37:35,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1431.00098 ± 733.371
2025-09-16 13:37:35,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [860.3116, 2371.1409, 1315.7382, 2943.7744, 987.5381, 803.52795, 679.40656, 2137.6047, 1089.3237, 1121.6445]
2025-09-16 13:37:35,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 475.0, 271.0, 572.0, 201.0, 169.0, 148.0, 431.0, 221.0, 212.0]
2025-09-16 13:37:35,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 47 minutes, 25 seconds)
2025-09-16 13:39:39,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:39:54,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4884.19922 ± 961.014
2025-09-16 13:39:54,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5200.3643, 5213.947, 5157.9336, 5240.824, 2002.0852, 5209.303, 5239.1978, 5211.077, 5173.487, 5193.7715]
2025-09-16 13:39:54,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 405.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:39:54,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4884.20) for latency 9
2025-09-16 13:39:54,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 48 minutes, 27 seconds)
2025-09-16 13:41:53,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:42:04,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3516.82690 ± 2036.760
2025-09-16 13:42:04,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5157.3115, 744.8662, 1342.3866, 5193.717, 929.7265, 5214.348, 1096.3405, 5166.0786, 5117.9014, 5205.5947]
2025-09-16 13:42:04,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 137.0, 279.0, 1000.0, 168.0, 1000.0, 234.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:42:04,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 47 minutes, 35 seconds)
2025-09-16 13:44:07,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:44:14,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2503.06592 ± 823.302
2025-09-16 13:44:14,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2760.0115, 3194.917, 1895.8523, 1203.0613, 2335.9548, 3702.0513, 3392.863, 1335.0159, 2089.5615, 3121.3704]
2025-09-16 13:44:14,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [514.0, 586.0, 348.0, 221.0, 445.0, 680.0, 630.0, 243.0, 388.0, 579.0]
2025-09-16 13:44:14,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 45 minutes, 59 seconds)
2025-09-16 13:46:12,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:46:27,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4863.28369 ± 1091.901
2025-09-16 13:46:27,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5221.529, 5234.097, 5241.5083, 5229.341, 5127.2085, 5226.5713, 1589.3673, 5254.35, 5270.2085, 5238.656]
2025-09-16 13:46:27,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 968.0, 1000.0, 327.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:46:27,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 44 minutes, 50 seconds)
2025-09-16 13:48:27,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:48:40,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4266.72412 ± 1519.566
2025-09-16 13:48:40,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3680.0444, 5284.0195, 1276.5787, 5275.9985, 5276.199, 5228.0107, 5263.2773, 5294.0137, 1499.6953, 4589.405]
2025-09-16 13:48:40,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [689.0, 1000.0, 283.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 274.0, 877.0]
2025-09-16 13:48:40,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 44 minutes, 9 seconds)
2025-09-16 13:50:48,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:51:02,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4820.64990 ± 1230.600
2025-09-16 13:51:02,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5219.0464, 5233.609, 5224.8125, 5240.2676, 5218.7993, 5233.0356, 5233.7993, 5239.6245, 1128.9132, 5234.592]
2025-09-16 13:51:02,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 237.0, 1000.0]
2025-09-16 13:51:02,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 42 minutes, 22 seconds)
2025-09-16 13:53:01,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:53:16,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4817.95215 ± 1321.018
2025-09-16 13:53:16,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5283.633, 5258.9478, 5275.5015, 5255.9014, 5296.5396, 5216.493, 5207.0977, 5343.681, 857.1138, 5184.6123]
2025-09-16 13:53:16,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 170.0, 1000.0]
2025-09-16 13:53:16,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 40 minutes, 45 seconds)
2025-09-16 13:55:21,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:55:37,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5246.61133 ± 15.382
2025-09-16 13:55:37,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5251.0996, 5217.774, 5240.061, 5260.609, 5228.4946, 5232.523, 5252.6455, 5256.3584, 5257.7095, 5268.8403]
2025-09-16 13:55:37,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:55:37,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5246.61) for latency 9
2025-09-16 13:55:37,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 40 minutes, 9 seconds)
2025-09-16 13:57:39,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:57:51,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4123.72607 ± 1618.476
2025-09-16 13:57:51,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5138.924, 5105.765, 5151.3726, 5150.5, 1145.9417, 5081.1094, 1145.1241, 5166.8755, 5157.647, 2993.9985]
2025-09-16 13:57:51,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 242.0, 1000.0, 232.0, 1000.0, 1000.0, 594.0]
2025-09-16 13:57:51,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 38 minutes, 6 seconds)
2025-09-16 13:59:46,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:00:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4906.16895 ± 1101.481
2025-09-16 14:00:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5296.1074, 5256.37, 5296.7427, 5125.0605, 5300.449, 5308.4707, 1605.3042, 5294.9614, 5290.32, 5287.8984]
2025-09-16 14:00:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 334.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:00:00,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 35 minutes, 12 seconds)
2025-09-16 14:01:57,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:02:12,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5187.66748 ± 427.721
2025-09-16 14:02:12,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3905.2546, 5337.254, 5347.344, 5305.768, 5305.7114, 5332.294, 5338.7134, 5324.879, 5352.292, 5327.163]
2025-09-16 14:02:12,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [726.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:02:12,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 31 minutes, 31 seconds)
2025-09-16 14:04:23,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:04:39,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5283.04395 ± 29.382
2025-09-16 14:04:39,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5250.632, 5265.0347, 5233.4004, 5267.2495, 5264.667, 5290.0903, 5319.509, 5311.608, 5320.426, 5307.824]
2025-09-16 14:04:39,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:04:39,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5283.04) for latency 9
2025-09-16 14:04:39,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 31 minutes, 6 seconds)
2025-09-16 14:06:30,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:06:45,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5330.28418 ± 16.625
2025-09-16 14:06:45,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5332.42, 5311.784, 5343.417, 5342.112, 5316.9604, 5336.9873, 5361.275, 5300.584, 5324.5, 5332.7993]
2025-09-16 14:06:45,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:06:45,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5330.28) for latency 9
2025-09-16 14:06:45,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 26 minutes, 53 seconds)
2025-09-16 14:08:53,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:09:07,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4868.00684 ± 1419.568
2025-09-16 14:09:07,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5351.415, 5343.3164, 5332.111, 609.4212, 5358.4033, 5334.888, 5329.7783, 5354.585, 5341.4575, 5324.692]
2025-09-16 14:09:07,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 126.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:09:07,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 25 minutes, 35 seconds)
2025-09-16 14:11:06,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:11:19,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4617.38867 ± 1457.041
2025-09-16 14:11:19,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5333.1143, 5351.399, 5316.89, 5293.1606, 5358.3125, 1175.8225, 5347.785, 5338.9116, 5336.3804, 2322.11]
2025-09-16 14:11:19,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 256.0, 1000.0, 1000.0, 1000.0, 430.0]
2025-09-16 14:11:19,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 23 minutes, 46 seconds)
2025-09-16 14:13:19,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:13:35,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5224.93896 ± 24.111
2025-09-16 14:13:35,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5224.075, 5232.798, 5158.596, 5238.826, 5216.263, 5233.4565, 5241.3354, 5247.3013, 5217.969, 5238.773]
2025-09-16 14:13:35,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:13:35,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 21 minutes, 55 seconds)
2025-09-16 14:15:41,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:15:57,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5327.54150 ± 45.734
2025-09-16 14:15:57,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5333.413, 5338.048, 5341.092, 5375.9365, 5300.283, 5216.186, 5362.1934, 5364.766, 5356.7393, 5286.761]
2025-09-16 14:15:57,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:15:57,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 19 minutes, 4 seconds)
2025-09-16 14:18:00,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:18:16,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5294.16260 ± 16.249
2025-09-16 14:18:16,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5307.3105, 5263.4243, 5294.523, 5294.2817, 5268.604, 5310.312, 5290.3276, 5316.4033, 5303.963, 5292.4785]
2025-09-16 14:18:16,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:18:16,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 18 minutes, 14 seconds)
2025-09-16 14:20:17,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:20:33,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5312.72168 ± 10.578
2025-09-16 14:20:33,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5307.7124, 5301.3784, 5304.4917, 5322.157, 5310.167, 5328.041, 5309.6924, 5320.1694, 5327.523, 5295.885]
2025-09-16 14:20:33,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:20:33,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 15 minutes, 30 seconds)
2025-09-16 14:22:33,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:22:49,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5331.02686 ± 10.201
2025-09-16 14:22:49,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5350.791, 5339.996, 5316.4355, 5322.7744, 5337.5127, 5331.822, 5331.445, 5335.8755, 5316.7017, 5326.9194]
2025-09-16 14:22:49,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:22:49,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5331.03) for latency 9
2025-09-16 14:22:49,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 13 minutes, 35 seconds)
2025-09-16 14:24:49,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:25:05,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5231.13379 ± 56.385
2025-09-16 14:25:05,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5250.387, 5127.0723, 5246.18, 5255.6953, 5244.4966, 5259.2905, 5281.331, 5114.2007, 5258.6055, 5274.082]
2025-09-16 14:25:05,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:25:05,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 11 minutes, 23 seconds)
2025-09-16 14:27:05,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:27:21,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5025.25635 ± 269.674
2025-09-16 14:27:21,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5118.712, 4837.2373, 5156.1313, 5188.185, 5153.437, 5137.533, 5161.929, 5133.5903, 4267.199, 5098.6035]
2025-09-16 14:27:21,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 951.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 839.0, 1000.0]
2025-09-16 14:27:21,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 8 minutes, 26 seconds)
2025-09-16 14:29:18,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:29:32,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4904.69482 ± 1324.904
2025-09-16 14:29:32,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5349.512, 5358.084, 5349.188, 930.30536, 5342.246, 5382.073, 5318.5894, 5321.515, 5349.8257, 5345.6104]
2025-09-16 14:29:32,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 200.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:29:32,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 5 minutes, 23 seconds)
2025-09-16 14:31:34,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:31:48,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4919.34668 ± 1220.202
2025-09-16 14:31:48,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5334.758, 5322.85, 1258.8843, 5339.576, 5313.798, 5312.8574, 5315.864, 5317.6016, 5344.9443, 5332.3306]
2025-09-16 14:31:48,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 251.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:31:48,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 3 minutes, 1 second)
2025-09-16 14:33:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:34:09,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4729.89307 ± 1188.099
2025-09-16 14:34:09,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5325.805, 2111.0723, 2618.4163, 5292.7095, 5313.3784, 5322.1855, 5353.3726, 5332.0674, 5323.736, 5306.1875]
2025-09-16 14:34:09,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 389.0, 489.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:34:09,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 1 minute, 9 seconds)
2025-09-16 14:36:14,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:36:28,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4792.98193 ± 1389.422
2025-09-16 14:36:28,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5250.5503, 5253.7563, 5251.471, 5266.43, 5279.682, 624.89, 5244.2676, 5240.9507, 5243.0103, 5274.8076]
2025-09-16 14:36:28,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 129.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:36:28,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 59 minutes, 9 seconds)
2025-09-16 14:38:32,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:38:47,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4903.47510 ± 1130.446
2025-09-16 14:38:47,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5270.5327, 5281.1455, 1512.2379, 5279.7007, 5263.785, 5291.995, 5272.5312, 5283.0195, 5292.185, 5287.6157]
2025-09-16 14:38:47,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 312.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:38:47,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 57 minutes, 7 seconds)
2025-09-16 14:40:48,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:41:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4822.74902 ± 1381.502
2025-09-16 14:41:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5264.318, 679.5351, 5291.782, 5192.412, 5309.146, 5272.9907, 5317.616, 5296.351, 5287.2847, 5316.06]
2025-09-16 14:41:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 136.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:41:02,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 55 minutes, 11 seconds)
2025-09-16 14:43:00,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:43:16,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5361.81543 ± 56.763
2025-09-16 14:43:16,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5372.898, 5378.4375, 5384.8794, 5378.985, 5385.777, 5377.0156, 5383.611, 5390.257, 5374.058, 5192.24]
2025-09-16 14:43:16,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:43:16,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5361.82) for latency 9
2025-09-16 14:43:16,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 52 minutes, 41 seconds)
2025-09-16 14:45:30,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:45:44,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4844.21484 ± 1263.502
2025-09-16 14:45:44,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5331.4404, 5312.814, 5296.928, 5323.159, 5316.3613, 5289.116, 5293.3555, 5281.3804, 1069.002, 4928.5933]
2025-09-16 14:45:44,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 229.0, 1000.0]
2025-09-16 14:45:44,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 51 minutes, 1 second)
2025-09-16 14:47:43,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:47:58,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5354.56445 ± 13.350
2025-09-16 14:47:58,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5347.3315, 5347.225, 5332.3135, 5366.647, 5345.6226, 5358.967, 5375.233, 5367.4526, 5365.33, 5339.5205]
2025-09-16 14:47:58,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:47:58,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 48 minutes, 19 seconds)
2025-09-16 14:49:56,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:50:12,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5254.15332 ± 14.052
2025-09-16 14:50:12,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5257.348, 5247.7876, 5252.8438, 5253.8687, 5245.9116, 5257.23, 5230.5186, 5247.5176, 5288.985, 5259.524]
2025-09-16 14:50:12,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:50:12,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 45 minutes, 40 seconds)
2025-09-16 14:52:23,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:52:39,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5303.69287 ± 51.921
2025-09-16 14:52:39,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5333.83, 5332.992, 5164.597, 5343.716, 5269.7637, 5326.865, 5323.0146, 5334.872, 5328.2173, 5279.064]
2025-09-16 14:52:39,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:52:39,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 44 minutes, 7 seconds)
2025-09-16 14:54:37,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:54:54,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5245.69824 ± 16.759
2025-09-16 14:54:54,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5243.5137, 5280.8916, 5225.668, 5249.747, 5243.832, 5243.484, 5225.193, 5229.6885, 5267.0933, 5247.8726]
2025-09-16 14:54:54,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:54:54,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 41 minutes, 52 seconds)
2025-09-16 14:56:54,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:57:10,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5289.11523 ± 10.772
2025-09-16 14:57:10,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5301.514, 5260.1094, 5285.2017, 5288.7964, 5287.0713, 5288.277, 5293.718, 5296.3877, 5294.438, 5295.6396]
2025-09-16 14:57:10,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:57:10,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 38 minutes, 52 seconds)
2025-09-16 14:59:12,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:59:26,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4794.24170 ± 1243.616
2025-09-16 14:59:26,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5157.2573, 5217.5464, 5227.366, 5214.747, 5232.872, 5185.3423, 5213.5176, 1063.924, 5220.1475, 5209.6953]
2025-09-16 14:59:26,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 229.0, 1000.0, 1000.0]
2025-09-16 14:59:27,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 36 minutes, 41 seconds)
2025-09-16 15:01:26,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:01:42,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5336.42627 ± 16.091
2025-09-16 15:01:42,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5314.4453, 5308.0815, 5341.1636, 5339.1245, 5320.6577, 5354.2314, 5347.595, 5344.253, 5359.238, 5335.468]
2025-09-16 15:01:42,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:01:42,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 34 minutes, 31 seconds)
2025-09-16 15:03:39,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:03:54,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4893.48242 ± 1331.609
2025-09-16 15:03:54,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5348.4653, 898.69556, 5336.172, 5334.0454, 5341.1, 5346.8677, 5333.0054, 5333.7036, 5333.339, 5329.4297]
2025-09-16 15:03:54,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 199.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:03:54,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 31 seconds)
2025-09-16 15:05:56,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:06:11,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5377.36279 ± 6.825
2025-09-16 15:06:11,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5378.6777, 5375.87, 5381.9546, 5370.596, 5373.5166, 5361.7734, 5384.2656, 5385.0693, 5379.49, 5382.419]
2025-09-16 15:06:11,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:06:11,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5377.36) for latency 9
2025-09-16 15:06:11,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 21 seconds)
2025-09-16 15:08:13,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:08:29,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5270.64404 ± 13.471
2025-09-16 15:08:29,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5248.9106, 5260.6685, 5269.0522, 5290.929, 5268.9106, 5273.2686, 5251.973, 5290.9478, 5274.6416, 5277.1353]
2025-09-16 15:08:29,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:08:29,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 8 seconds)
2025-09-16 15:10:38,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:10:54,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5244.14160 ± 11.361
2025-09-16 15:10:54,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5226.721, 5252.431, 5244.4604, 5239.812, 5263.2734, 5243.053, 5223.707, 5254.007, 5247.8467, 5246.104]
2025-09-16 15:10:54,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:10:54,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 12 seconds)
2025-09-16 15:12:52,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:13:09,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5317.59863 ± 14.986
2025-09-16 15:13:09,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5324.1455, 5335.7627, 5325.707, 5319.2207, 5321.4614, 5290.0137, 5311.3438, 5336.026, 5320.0864, 5292.224]
2025-09-16 15:13:09,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:13:09,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 53 seconds)
2025-09-16 15:15:16,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:15:32,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5366.26709 ± 24.673
2025-09-16 15:15:32,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5369.011, 5358.5215, 5350.5186, 5386.96, 5388.42, 5368.761, 5339.8813, 5364.981, 5413.3276, 5322.287]
2025-09-16 15:15:32,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:15:32,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 56 seconds)
2025-09-16 15:17:34,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:17:50,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5200.12256 ± 17.080
2025-09-16 15:17:50,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5188.5015, 5204.8955, 5191.7656, 5189.327, 5179.078, 5241.8745, 5193.9463, 5215.3057, 5205.7295, 5190.7993]
2025-09-16 15:17:50,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:17:50,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 37 seconds)
2025-09-16 15:19:39,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:19:55,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5321.61182 ± 18.271
2025-09-16 15:19:55,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5317.3296, 5313.794, 5303.2305, 5318.9746, 5331.8496, 5338.3984, 5312.082, 5332.2036, 5358.126, 5290.1313]
2025-09-16 15:19:55,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:19:55,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes)
2025-09-16 15:21:58,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:22:14,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5385.20654 ± 12.654
2025-09-16 15:22:14,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5384.496, 5364.032, 5372.0103, 5398.9185, 5384.719, 5371.6753, 5407.0103, 5396.053, 5388.8228, 5384.327]
2025-09-16 15:22:14,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:22:14,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5385.21) for latency 9
2025-09-16 15:22:14,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 36 seconds)
2025-09-16 15:24:24,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:24:38,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4915.67480 ± 1386.227
2025-09-16 15:24:38,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5381.197, 5387.402, 5370.6753, 5348.7246, 5395.452, 5380.6924, 757.15204, 5388.301, 5372.586, 5374.5635]
2025-09-16 15:24:38,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 162.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:24:38,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 29 seconds)
2025-09-16 15:26:46,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:27:01,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4867.73047 ± 1415.216
2025-09-16 15:27:01,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5317.9775, 622.2562, 5353.125, 5351.127, 5356.4917, 5346.035, 5339.135, 5320.065, 5328.3647, 5342.73]
2025-09-16 15:27:01,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 132.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:27:01,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 10 seconds)
2025-09-16 15:29:05,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:29:22,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5317.51465 ± 20.099
2025-09-16 15:29:22,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5346.599, 5280.5625, 5337.1045, 5333.6216, 5314.7925, 5304.0923, 5291.3516, 5313.845, 5334.807, 5318.3716]
2025-09-16 15:29:22,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:29:22,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 55 seconds)
2025-09-16 15:31:19,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:31:34,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5107.89746 ± 999.465
2025-09-16 15:31:34,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5445.211, 5466.471, 5431.9194, 5438.9033, 2109.708, 5432.0874, 5450.3013, 5449.0063, 5424.2026, 5431.1655]
2025-09-16 15:31:34,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 391.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:31:34,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 39 seconds)
2025-09-16 15:33:35,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:33:51,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5383.91455 ± 11.471
2025-09-16 15:33:51,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5392.3325, 5387.3916, 5397.1357, 5366.072, 5388.5835, 5393.38, 5397.9272, 5370.4385, 5370.7554, 5375.134]
2025-09-16 15:33:51,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:33:51,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 19 seconds)
2025-09-16 15:35:55,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:36:10,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4848.05566 ± 1329.721
2025-09-16 15:36:10,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [859.0058, 5290.802, 5282.552, 5290.575, 5295.0464, 5268.376, 5305.158, 5304.5225, 5295.443, 5289.0806]
2025-09-16 15:36:10,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:36:10,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1251 [DEBUG]: Training session finished
