2025-09-16 11:16:49,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.150-delay_3
2025-09-16 11:16:49,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.150-delay_3
2025-09-16 11:16:49,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'3': <latency_env.delayed_mdp.ConstantDelay object at 0x1495e78386d0>}
2025-09-16 11:16:49,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:16:49,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:16:49,612 baseline-bpql-noisepromille150-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=427, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:16:49,612 baseline-bpql-noisepromille150-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:16:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:16:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:18:35,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:18:35,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 244.09074 ± 38.252
2025-09-16 11:18:35,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [242.975, 240.80214, 278.02774, 312.96628, 227.17607, 215.03983, 219.9708, 171.01195, 248.16116, 284.7764]
2025-09-16 11:18:35,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 43.0, 50.0, 56.0, 41.0, 40.0, 42.0, 33.0, 46.0, 53.0]
2025-09-16 11:18:35,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (244.09) for latency 3
2025-09-16 11:18:35,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 52 minutes, 22 seconds)
2025-09-16 11:20:29,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:20:30,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 376.62698 ± 84.759
2025-09-16 11:20:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [339.84232, 483.22522, 333.89246, 283.51706, 399.2437, 306.02182, 460.11182, 304.63956, 314.47354, 541.3022]
2025-09-16 11:20:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 107.0, 76.0, 62.0, 79.0, 66.0, 100.0, 61.0, 62.0, 114.0]
2025-09-16 11:20:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (376.63) for latency 3
2025-09-16 11:20:30,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 58 minutes, 41 seconds)
2025-09-16 11:22:24,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:22:25,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 376.26276 ± 46.589
2025-09-16 11:22:25,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [318.289, 438.0031, 431.93277, 293.20563, 380.0782, 330.36765, 381.84222, 417.72736, 369.95004, 401.23203]
2025-09-16 11:22:25,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 85.0, 80.0, 53.0, 73.0, 70.0, 72.0, 78.0, 70.0, 74.0]
2025-09-16 11:22:25,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 14 seconds)
2025-09-16 11:24:20,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:24:21,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 443.47086 ± 96.838
2025-09-16 11:24:21,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [548.18976, 574.3125, 445.06543, 413.13754, 617.2175, 355.2327, 417.7499, 340.95087, 328.57263, 394.27994]
2025-09-16 11:24:21,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 108.0, 92.0, 82.0, 127.0, 68.0, 80.0, 66.0, 64.0, 81.0]
2025-09-16 11:24:21,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (443.47) for latency 3
2025-09-16 11:24:21,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 seconds)
2025-09-16 11:26:16,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:26:17,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 418.46075 ± 68.451
2025-09-16 11:26:17,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [348.66724, 419.8699, 501.34586, 416.67752, 498.11734, 543.9295, 344.1693, 357.64038, 363.29953, 390.891]
2025-09-16 11:26:17,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 79.0, 95.0, 76.0, 95.0, 103.0, 63.0, 71.0, 67.0, 72.0]
2025-09-16 11:26:17,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 59 minutes, 14 seconds)
2025-09-16 11:28:11,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:28:12,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 488.25494 ± 110.366
2025-09-16 11:28:12,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [323.59174, 465.8763, 545.20013, 381.7772, 453.33926, 380.32303, 544.12036, 506.92575, 732.33905, 549.0565]
2025-09-16 11:28:12,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 90.0, 103.0, 72.0, 94.0, 73.0, 117.0, 99.0, 144.0, 108.0]
2025-09-16 11:28:12,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (488.25) for latency 3
2025-09-16 11:28:12,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 40 seconds)
2025-09-16 11:30:06,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:30:07,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 568.21521 ± 221.756
2025-09-16 11:30:07,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [650.2027, 260.977, 441.35004, 501.36798, 415.98975, 480.6494, 551.8203, 443.82013, 1011.90424, 924.0701]
2025-09-16 11:30:07,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 52.0, 97.0, 96.0, 77.0, 102.0, 106.0, 100.0, 205.0, 182.0]
2025-09-16 11:30:07,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (568.22) for latency 3
2025-09-16 11:30:07,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 59 minutes, 2 seconds)
2025-09-16 11:32:01,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:32:03,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 477.85660 ± 71.162
2025-09-16 11:32:03,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [433.22183, 456.07593, 426.8821, 540.7013, 359.2984, 451.9316, 516.4304, 436.32727, 538.30994, 619.38715]
2025-09-16 11:32:03,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 101.0, 96.0, 101.0, 80.0, 89.0, 93.0, 81.0, 117.0, 122.0]
2025-09-16 11:32:03,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 57 minutes, 6 seconds)
2025-09-16 11:33:57,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:33:58,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 499.01959 ± 102.958
2025-09-16 11:33:58,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [556.8295, 484.35544, 664.65894, 415.07248, 499.81253, 286.22708, 614.42444, 452.90125, 448.47162, 567.44244]
2025-09-16 11:33:58,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 93.0, 130.0, 80.0, 110.0, 54.0, 119.0, 103.0, 85.0, 110.0]
2025-09-16 11:33:58,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 55 minutes)
2025-09-16 11:35:52,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:35:53,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 453.54248 ± 64.330
2025-09-16 11:35:53,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [397.20755, 367.34924, 457.40823, 361.74518, 412.9083, 443.9378, 522.6392, 547.1602, 519.48486, 505.5841]
2025-09-16 11:35:53,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 84.0, 90.0, 84.0, 78.0, 103.0, 109.0, 105.0, 118.0, 101.0]
2025-09-16 11:35:53,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 56 seconds)
2025-09-16 11:37:48,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:37:50,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 585.88635 ± 156.793
2025-09-16 11:37:50,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [473.9107, 675.89526, 471.34726, 358.9371, 529.0753, 905.37463, 570.91486, 805.3687, 506.65674, 561.38275]
2025-09-16 11:37:50,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 146.0, 91.0, 80.0, 100.0, 198.0, 129.0, 163.0, 96.0, 107.0]
2025-09-16 11:37:50,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (585.89) for latency 3
2025-09-16 11:37:50,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 51 minutes, 29 seconds)
2025-09-16 11:39:43,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:39:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 646.65137 ± 125.376
2025-09-16 11:39:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [668.27795, 772.0982, 610.1828, 777.39606, 482.90475, 542.6854, 528.0995, 735.9979, 499.2809, 849.5905]
2025-09-16 11:39:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 150.0, 115.0, 157.0, 89.0, 105.0, 100.0, 162.0, 95.0, 166.0]
2025-09-16 11:39:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (646.65) for latency 3
2025-09-16 11:39:45,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 49 minutes, 31 seconds)
2025-09-16 11:41:40,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:41:41,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 524.10400 ± 60.407
2025-09-16 11:41:41,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [620.5139, 500.3296, 616.53174, 417.58334, 549.6604, 459.61307, 490.292, 518.531, 550.6979, 517.2874]
2025-09-16 11:41:41,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 95.0, 121.0, 81.0, 103.0, 87.0, 108.0, 98.0, 109.0, 99.0]
2025-09-16 11:41:41,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 47 minutes, 41 seconds)
2025-09-16 11:43:36,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:43:37,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 610.22357 ± 166.138
2025-09-16 11:43:37,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [570.71906, 595.1919, 536.762, 816.4298, 489.14392, 623.9703, 493.91916, 982.78094, 367.53906, 625.77985]
2025-09-16 11:43:37,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 133.0, 103.0, 159.0, 93.0, 119.0, 93.0, 193.0, 82.0, 119.0]
2025-09-16 11:43:37,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 46 minutes, 3 seconds)
2025-09-16 11:45:32,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:45:33,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 588.60254 ± 127.864
2025-09-16 11:45:33,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [723.43384, 684.20764, 511.74222, 463.83267, 598.67444, 838.80884, 522.49976, 410.83893, 479.38663, 652.6005]
2025-09-16 11:45:33,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 131.0, 112.0, 85.0, 116.0, 176.0, 109.0, 76.0, 106.0, 124.0]
2025-09-16 11:45:33,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 44 minutes, 16 seconds)
2025-09-16 11:47:28,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:47:29,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 538.02448 ± 154.279
2025-09-16 11:47:29,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [546.1356, 430.69897, 434.35068, 502.22308, 329.07623, 729.5254, 785.73267, 424.193, 760.8391, 437.4704]
2025-09-16 11:47:29,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 96.0, 80.0, 97.0, 75.0, 149.0, 154.0, 79.0, 143.0, 91.0]
2025-09-16 11:47:29,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 42 minutes, 16 seconds)
2025-09-16 11:49:24,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:49:26,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 696.50232 ± 160.121
2025-09-16 11:49:26,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [893.65027, 695.6237, 773.15314, 436.7519, 592.9805, 700.9963, 606.3914, 599.81647, 634.33905, 1031.3215]
2025-09-16 11:49:26,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 147.0, 147.0, 80.0, 130.0, 130.0, 127.0, 111.0, 120.0, 209.0]
2025-09-16 11:49:26,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (696.50) for latency 3
2025-09-16 11:49:26,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 40 minutes, 36 seconds)
2025-09-16 11:51:21,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:51:23,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 631.02185 ± 128.589
2025-09-16 11:51:23,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [596.26154, 644.1344, 506.342, 405.95486, 589.90955, 925.9579, 596.62897, 669.8171, 686.4262, 688.786]
2025-09-16 11:51:23,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 130.0, 110.0, 75.0, 129.0, 169.0, 129.0, 135.0, 130.0, 131.0]
2025-09-16 11:51:23,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 39 minutes, 4 seconds)
2025-09-16 11:53:17,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:53:19,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 544.40900 ± 128.970
2025-09-16 11:53:19,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [485.56308, 600.4078, 543.4399, 440.36386, 277.90274, 536.82904, 632.2708, 647.7959, 496.17395, 783.3432]
2025-09-16 11:53:19,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 115.0, 123.0, 83.0, 55.0, 116.0, 125.0, 122.0, 89.0, 146.0]
2025-09-16 11:53:19,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 37 minutes, 1 second)
2025-09-16 11:55:15,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:55:17,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 791.97913 ± 110.707
2025-09-16 11:55:17,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [748.771, 769.288, 763.53687, 1052.5325, 896.6459, 670.9594, 734.403, 801.8914, 833.7425, 648.0199]
2025-09-16 11:55:17,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 153.0, 143.0, 200.0, 182.0, 127.0, 134.0, 157.0, 154.0, 122.0]
2025-09-16 11:55:17,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (791.98) for latency 3
2025-09-16 11:55:17,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 35 minutes, 39 seconds)
2025-09-16 11:57:12,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:57:14,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 752.35193 ± 233.030
2025-09-16 11:57:14,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [703.4278, 570.67584, 973.2192, 1038.8632, 432.13693, 750.3665, 1213.8003, 579.2487, 660.9762, 600.8053]
2025-09-16 11:57:14,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 107.0, 188.0, 207.0, 82.0, 143.0, 221.0, 113.0, 126.0, 115.0]
2025-09-16 11:57:14,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 33 minutes, 47 seconds)
2025-09-16 11:59:08,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:59:10,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 808.22974 ± 212.211
2025-09-16 11:59:10,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [688.3339, 811.71106, 1058.8397, 645.94293, 663.1226, 654.3611, 912.90265, 929.9997, 1226.7979, 490.2857]
2025-09-16 11:59:10,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 151.0, 214.0, 123.0, 127.0, 142.0, 176.0, 203.0, 236.0, 100.0]
2025-09-16 11:59:10,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (808.23) for latency 3
2025-09-16 11:59:10,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes)
2025-09-16 12:01:06,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:01:08,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 742.96594 ± 128.764
2025-09-16 12:01:08,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [887.6883, 813.82776, 658.9239, 859.3956, 612.6929, 500.72128, 862.31287, 884.1266, 683.6235, 666.3467]
2025-09-16 12:01:08,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 153.0, 122.0, 168.0, 114.0, 95.0, 172.0, 170.0, 133.0, 127.0]
2025-09-16 12:01:08,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes, 9 seconds)
2025-09-16 12:03:01,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:03:03,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 815.67249 ± 232.294
2025-09-16 12:03:03,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1253.7407, 1002.27734, 856.3247, 636.75226, 681.9034, 423.96707, 1044.7812, 762.9365, 893.97394, 600.067]
2025-09-16 12:03:03,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 189.0, 164.0, 121.0, 138.0, 81.0, 201.0, 141.0, 176.0, 131.0]
2025-09-16 12:03:03,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (815.67) for latency 3
2025-09-16 12:03:03,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 28 minutes)
2025-09-16 12:04:59,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:05:01,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 873.82141 ± 290.912
2025-09-16 12:05:01,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [694.9236, 807.3946, 684.22815, 786.492, 715.0185, 703.51385, 1049.3718, 1679.9781, 915.60736, 701.6851]
2025-09-16 12:05:01,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 158.0, 132.0, 149.0, 137.0, 132.0, 195.0, 319.0, 173.0, 137.0]
2025-09-16 12:05:01,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (873.82) for latency 3
2025-09-16 12:05:01,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 26 minutes)
2025-09-16 12:06:56,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:06:58,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 789.55457 ± 135.481
2025-09-16 12:06:58,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [675.7447, 905.14966, 906.9099, 635.0795, 897.93304, 857.9289, 783.64874, 635.01996, 997.9182, 600.213]
2025-09-16 12:06:58,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 175.0, 175.0, 119.0, 173.0, 171.0, 151.0, 121.0, 192.0, 113.0]
2025-09-16 12:06:58,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 minutes, 9 seconds)
2025-09-16 12:08:54,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:08:56,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 776.69641 ± 149.675
2025-09-16 12:08:56,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [774.39825, 736.00104, 535.59686, 870.3079, 954.475, 906.3971, 641.9008, 749.1848, 1008.0284, 590.6739]
2025-09-16 12:08:56,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 139.0, 100.0, 157.0, 180.0, 173.0, 128.0, 142.0, 195.0, 114.0]
2025-09-16 12:08:56,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 22 minutes, 29 seconds)
2025-09-16 12:10:50,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:10:52,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 802.33704 ± 159.736
2025-09-16 12:10:52,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1064.4625, 740.9747, 880.69824, 864.9435, 547.23035, 725.8597, 653.8371, 823.9803, 1050.9027, 670.4807]
2025-09-16 12:10:52,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 153.0, 174.0, 164.0, 105.0, 144.0, 127.0, 151.0, 203.0, 126.0]
2025-09-16 12:10:52,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 20 minutes, 14 seconds)
2025-09-16 12:12:50,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:12:53,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1254.70801 ± 575.867
2025-09-16 12:12:53,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1482.8691, 1107.3926, 863.6098, 1078.9141, 759.88324, 1983.0022, 609.38983, 2513.6162, 744.886, 1403.5167]
2025-09-16 12:12:53,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [291.0, 214.0, 164.0, 207.0, 143.0, 380.0, 119.0, 488.0, 136.0, 270.0]
2025-09-16 12:12:53,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1254.71) for latency 3
2025-09-16 12:12:53,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 34 seconds)
2025-09-16 12:14:47,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:14:49,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1081.64917 ± 169.159
2025-09-16 12:14:49,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1081.7357, 774.7282, 832.33826, 1103.171, 1242.605, 1217.3739, 1154.5132, 1217.5719, 1270.5719, 921.88214]
2025-09-16 12:14:49,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [208.0, 146.0, 160.0, 230.0, 242.0, 238.0, 237.0, 253.0, 244.0, 177.0]
2025-09-16 12:14:49,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes, 18 seconds)
2025-09-16 12:16:47,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:16:50,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1064.29907 ± 349.437
2025-09-16 12:16:50,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [641.8151, 1097.7767, 782.1821, 693.7781, 1425.8103, 1682.8536, 1127.5938, 1037.847, 1467.9855, 685.3484]
2025-09-16 12:16:50,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 231.0, 144.0, 126.0, 299.0, 323.0, 216.0, 202.0, 290.0, 148.0]
2025-09-16 12:16:50,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 7 seconds)
2025-09-16 12:18:47,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:18:50,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1325.79370 ± 574.390
2025-09-16 12:18:50,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [722.7944, 1778.2021, 1174.6744, 1706.1885, 509.33466, 1027.6918, 805.2458, 2540.104, 1541.0562, 1452.647]
2025-09-16 12:18:50,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 368.0, 231.0, 332.0, 113.0, 191.0, 147.0, 498.0, 300.0, 283.0]
2025-09-16 12:18:50,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1325.79) for latency 3
2025-09-16 12:18:50,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 14 minutes, 44 seconds)
2025-09-16 12:20:45,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:20:49,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1431.79419 ± 444.107
2025-09-16 12:20:49,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1857.6431, 1354.5297, 1075.2676, 2369.8425, 1188.8307, 1872.2003, 930.2521, 1052.8463, 1085.452, 1531.0774]
2025-09-16 12:20:49,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [365.0, 261.0, 210.0, 452.0, 224.0, 369.0, 179.0, 203.0, 207.0, 289.0]
2025-09-16 12:20:49,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1431.79) for latency 3
2025-09-16 12:20:49,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 8 seconds)
2025-09-16 12:22:49,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:22:52,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1094.84045 ± 398.728
2025-09-16 12:22:52,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1337.9169, 1005.7989, 788.4337, 912.7603, 1250.5509, 866.6609, 839.5638, 796.92615, 2168.4673, 981.3247]
2025-09-16 12:22:52,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [265.0, 192.0, 150.0, 179.0, 240.0, 175.0, 161.0, 149.0, 421.0, 190.0]
2025-09-16 12:22:52,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 51 seconds)
2025-09-16 12:24:45,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:24:48,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1113.98413 ± 852.883
2025-09-16 12:24:48,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [922.3853, 716.4201, 3554.324, 718.9427, 634.0299, 619.1093, 874.90546, 1300.9714, 506.62082, 1292.1326]
2025-09-16 12:24:48,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 153.0, 695.0, 157.0, 139.0, 133.0, 167.0, 250.0, 110.0, 271.0]
2025-09-16 12:24:48,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 9 minutes, 40 seconds)
2025-09-16 12:26:44,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:26:48,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1435.62134 ± 401.714
2025-09-16 12:26:48,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1711.0422, 1399.196, 843.71387, 1139.2509, 1112.7904, 1482.0658, 1566.5189, 1000.3005, 2170.6523, 1930.6821]
2025-09-16 12:26:48,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [351.0, 289.0, 156.0, 240.0, 216.0, 312.0, 322.0, 217.0, 445.0, 406.0]
2025-09-16 12:26:48,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1435.62) for latency 3
2025-09-16 12:26:48,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 7 minutes, 35 seconds)
2025-09-16 12:28:46,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:28:50,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1333.84424 ± 562.673
2025-09-16 12:28:50,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1396.5231, 2488.2385, 811.21423, 1982.9762, 1432.5052, 772.11127, 1563.0652, 814.2518, 1434.3893, 643.16675]
2025-09-16 12:28:50,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [289.0, 516.0, 175.0, 407.0, 302.0, 165.0, 334.0, 176.0, 302.0, 125.0]
2025-09-16 12:28:50,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 5 minutes, 53 seconds)
2025-09-16 12:30:47,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:30:50,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1481.65161 ± 751.086
2025-09-16 12:30:50,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1406.3231, 2638.2625, 947.5316, 1094.2471, 1254.6842, 1647.713, 881.7667, 1255.4188, 3091.865, 598.7045]
2025-09-16 12:30:50,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [271.0, 508.0, 198.0, 209.0, 234.0, 318.0, 182.0, 240.0, 623.0, 115.0]
2025-09-16 12:30:50,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1481.65) for latency 3
2025-09-16 12:30:50,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 23 seconds)
2025-09-16 12:32:49,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:32:53,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1775.06909 ± 692.180
2025-09-16 12:32:53,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2279.1443, 1545.3826, 1613.9325, 1496.732, 3139.575, 1258.4918, 1021.4727, 2857.346, 1243.8513, 1294.7639]
2025-09-16 12:32:53,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [434.0, 304.0, 313.0, 282.0, 600.0, 239.0, 196.0, 540.0, 248.0, 254.0]
2025-09-16 12:32:53,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1775.07) for latency 3
2025-09-16 12:32:53,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 12 seconds)
2025-09-16 12:34:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:34:55,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2119.72363 ± 1167.262
2025-09-16 12:34:55,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3454.5256, 4085.5125, 1091.401, 1474.408, 2318.737, 992.8053, 3771.4275, 717.84814, 1484.0417, 1806.53]
2025-09-16 12:34:55,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [708.0, 811.0, 231.0, 300.0, 441.0, 212.0, 761.0, 148.0, 287.0, 348.0]
2025-09-16 12:34:55,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (2119.72) for latency 3
2025-09-16 12:34:55,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 27 seconds)
2025-09-16 12:36:56,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:37:02,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2334.17310 ± 1107.745
2025-09-16 12:37:02,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1931.6239, 3267.012, 1331.4019, 2698.7925, 2419.3735, 1195.6777, 2003.6935, 5113.682, 1432.7155, 1947.7585]
2025-09-16 12:37:02,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [395.0, 649.0, 252.0, 526.0, 470.0, 227.0, 381.0, 1000.0, 280.0, 380.0]
2025-09-16 12:37:02,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (2334.17) for latency 3
2025-09-16 12:37:02,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 45 seconds)
2025-09-16 12:38:58,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:39:03,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1654.72717 ± 680.408
2025-09-16 12:39:03,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1952.7721, 1154.0004, 2360.3975, 808.6685, 2767.3157, 1379.0111, 1974.9833, 1005.64325, 2354.8184, 789.6614]
2025-09-16 12:39:03,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [376.0, 234.0, 491.0, 164.0, 538.0, 288.0, 385.0, 192.0, 461.0, 151.0]
2025-09-16 12:39:03,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 58 minutes, 31 seconds)
2025-09-16 12:41:07,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:41:16,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3028.32690 ± 1709.517
2025-09-16 12:41:16,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [776.7559, 2163.079, 3535.1406, 4890.05, 4935.564, 4822.1885, 1231.9576, 634.7642, 2400.5698, 4893.198]
2025-09-16 12:41:16,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 413.0, 709.0, 1000.0, 1000.0, 996.0, 239.0, 137.0, 476.0, 1000.0]
2025-09-16 12:41:16,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (3028.33) for latency 3
2025-09-16 12:41:16,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 58 minutes, 50 seconds)
2025-09-16 12:43:12,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:43:21,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3424.47192 ± 1652.688
2025-09-16 12:43:21,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [855.7193, 2894.8918, 5018.7607, 4956.4004, 814.8113, 4898.196, 4936.4033, 3111.2847, 1934.1427, 4824.107]
2025-09-16 12:43:21,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 595.0, 1000.0, 1000.0, 177.0, 1000.0, 1000.0, 629.0, 404.0, 979.0]
2025-09-16 12:43:21,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (3424.47) for latency 3
2025-09-16 12:43:21,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 57 minutes, 17 seconds)
2025-09-16 12:45:14,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:45:20,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2215.81104 ± 1557.867
2025-09-16 12:45:20,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [817.85376, 960.9975, 643.8636, 2213.818, 1303.6187, 3445.4963, 4871.8066, 4814.15, 763.8061, 2322.6965]
2025-09-16 12:45:20,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 205.0, 142.0, 464.0, 274.0, 707.0, 989.0, 924.0, 160.0, 478.0]
2025-09-16 12:45:20,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 54 minutes, 34 seconds)
2025-09-16 12:47:16,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:47:26,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3649.70972 ± 1190.411
2025-09-16 12:47:26,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4917.7773, 4110.559, 5055.5366, 3316.8977, 2898.0867, 1347.5929, 3143.7422, 4194.311, 5138.9204, 2373.6746]
2025-09-16 12:47:26,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [976.0, 835.0, 1000.0, 652.0, 598.0, 282.0, 639.0, 845.0, 1000.0, 489.0]
2025-09-16 12:47:26,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (3649.71) for latency 3
2025-09-16 12:47:26,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 52 minutes, 26 seconds)
2025-09-16 12:49:24,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:49:33,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3446.38135 ± 981.462
2025-09-16 12:49:33,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1266.165, 3226.7527, 3497.7568, 4554.1274, 3283.9126, 5239.2905, 3705.8713, 2911.467, 3374.369, 3404.0986]
2025-09-16 12:49:33,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [234.0, 661.0, 674.0, 864.0, 641.0, 1000.0, 753.0, 566.0, 666.0, 644.0]
2025-09-16 12:49:33,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 51 minutes, 18 seconds)
2025-09-16 12:51:29,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:51:39,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3833.75439 ± 1353.766
2025-09-16 12:51:39,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3092.5237, 5048.538, 2893.2808, 5113.438, 4905.444, 4120.7974, 1286.1827, 4836.5234, 1962.0977, 5078.7173]
2025-09-16 12:51:39,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [660.0, 1000.0, 530.0, 1000.0, 954.0, 814.0, 244.0, 1000.0, 413.0, 1000.0]
2025-09-16 12:51:39,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (3833.75) for latency 3
2025-09-16 12:51:39,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 48 minutes, 1 second)
2025-09-16 12:53:39,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:53:45,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2180.25830 ± 1176.190
2025-09-16 12:53:45,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4928.508, 1318.0809, 3182.714, 1894.9684, 1502.595, 2547.392, 774.5861, 882.77167, 2146.6646, 2624.2998]
2025-09-16 12:53:45,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 282.0, 652.0, 391.0, 314.0, 527.0, 174.0, 188.0, 440.0, 533.0]
2025-09-16 12:53:45,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 46 minutes, 4 seconds)
2025-09-16 12:55:41,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:55:53,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4134.84277 ± 1077.181
2025-09-16 12:55:53,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5070.144, 5100.789, 5008.72, 5100.8345, 3806.2827, 4279.5947, 2326.9746, 5092.946, 2538.277, 3023.865]
2025-09-16 12:55:53,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 763.0, 840.0, 492.0, 1000.0, 514.0, 611.0]
2025-09-16 12:55:53,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4134.84) for latency 3
2025-09-16 12:55:53,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 45 minutes, 26 seconds)
2025-09-16 12:57:49,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:58:00,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4032.65234 ± 1488.203
2025-09-16 12:58:00,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4994.7573, 5102.8164, 4991.0337, 1299.7537, 1687.3667, 4968.229, 5020.539, 4991.5034, 4880.762, 2389.763]
2025-09-16 12:58:00,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 281.0, 364.0, 1000.0, 1000.0, 1000.0, 1000.0, 493.0]
2025-09-16 12:58:00,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 43 minutes, 30 seconds)
2025-09-16 12:59:56,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:00:03,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2710.57764 ± 1476.119
2025-09-16 13:00:03,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3091.0225, 3660.8801, 930.62366, 1863.8484, 5048.3306, 3010.705, 5109.9077, 2217.9746, 975.06, 1197.4247]
2025-09-16 13:00:03,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [621.0, 710.0, 176.0, 359.0, 1000.0, 596.0, 1000.0, 428.0, 189.0, 228.0]
2025-09-16 13:00:03,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 40 minutes, 53 seconds)
2025-09-16 13:01:56,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:02:06,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3842.63794 ± 1512.906
2025-09-16 13:02:06,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1906.8365, 1131.3595, 5002.0713, 2524.2515, 5031.9575, 2634.2332, 5024.5015, 5057.645, 5134.509, 4979.014]
2025-09-16 13:02:06,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [402.0, 244.0, 1000.0, 492.0, 1000.0, 513.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:02:06,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 38 minutes, 13 seconds)
2025-09-16 13:04:08,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:04:22,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4756.65527 ± 569.246
2025-09-16 13:04:22,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4983.217, 4878.2896, 3055.2722, 4896.7104, 4994.423, 4895.581, 4896.9556, 4968.2793, 4965.0884, 5032.739]
2025-09-16 13:04:22,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 619.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:04:22,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4756.66) for latency 3
2025-09-16 13:04:22,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 37 minutes, 38 seconds)
2025-09-16 13:06:13,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:06:25,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4457.75781 ± 1235.523
2025-09-16 13:06:25,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5057.062, 2709.888, 5064.6636, 5064.6714, 5039.731, 5102.0103, 1405.6954, 4997.5576, 5134.8867, 5001.408]
2025-09-16 13:06:25,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 519.0, 1000.0, 1000.0, 1000.0, 1000.0, 259.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:06:25,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 34 minutes, 51 seconds)
2025-09-16 13:08:28,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:08:40,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4208.12207 ± 1483.623
2025-09-16 13:08:40,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2696.0051, 5185.042, 640.58484, 5191.6416, 5146.608, 5198.9385, 5075.8164, 5141.1133, 3065.0273, 4740.4434]
2025-09-16 13:08:40,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [522.0, 1000.0, 126.0, 1000.0, 1000.0, 1000.0, 988.0, 1000.0, 604.0, 919.0]
2025-09-16 13:08:40,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 33 minutes, 47 seconds)
2025-09-16 13:10:29,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:10:38,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3511.93506 ± 1641.666
2025-09-16 13:10:38,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1981.9022, 1434.4086, 1380.334, 1433.3123, 5030.3247, 5062.28, 5016.289, 5018.705, 3799.3315, 4962.466]
2025-09-16 13:10:38,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [398.0, 291.0, 280.0, 281.0, 1000.0, 1000.0, 1000.0, 1000.0, 767.0, 1000.0]
2025-09-16 13:10:38,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 31 minutes)
2025-09-16 13:12:34,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:12:45,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4095.21143 ± 1291.100
2025-09-16 13:12:45,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1698.2815, 4737.851, 4821.8706, 5213.142, 5217.171, 5189.0835, 3257.479, 4316.2295, 4682.331, 1818.6716]
2025-09-16 13:12:45,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [318.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 631.0, 821.0, 905.0, 348.0]
2025-09-16 13:12:45,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 29 minutes, 28 seconds)
2025-09-16 13:14:48,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:15:00,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4387.35742 ± 1140.373
2025-09-16 13:15:00,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5012.898, 4986.8535, 5102.94, 5085.3223, 5205.4854, 1862.3135, 5048.9756, 5114.234, 3752.5212, 2702.0322]
2025-09-16 13:15:00,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 363.0, 1000.0, 1000.0, 725.0, 514.0]
2025-09-16 13:15:00,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 27 minutes, 7 seconds)
2025-09-16 13:16:56,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:17:10,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4999.63672 ± 50.934
2025-09-16 13:17:10,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4932.682, 4986.9634, 4992.7026, 5002.774, 4940.9297, 4952.225, 4987.1274, 5037.3066, 5096.9976, 5066.66]
2025-09-16 13:17:10,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:17:10,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4999.64) for latency 3
2025-09-16 13:17:10,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 25 minutes, 59 seconds)
2025-09-16 13:19:06,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:19:18,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4335.27197 ± 1356.449
2025-09-16 13:19:18,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5191.1387, 3614.8367, 1289.6176, 2364.8323, 5137.0845, 5155.2256, 5169.2114, 5193.582, 5032.61, 5204.577]
2025-09-16 13:19:18,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 717.0, 278.0, 444.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:19:18,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 22 minutes, 57 seconds)
2025-09-16 13:21:23,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:21:35,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4112.46191 ± 1492.545
2025-09-16 13:21:35,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5147.2466, 4906.001, 2029.5675, 5149.3413, 5103.9814, 5108.585, 2873.5266, 914.9067, 5086.2563, 4805.203]
2025-09-16 13:21:35,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 417.0, 1000.0, 1000.0, 1000.0, 559.0, 180.0, 1000.0, 1000.0]
2025-09-16 13:21:35,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 23 minutes, 10 seconds)
2025-09-16 13:23:28,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:23:42,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5066.12695 ± 16.356
2025-09-16 13:23:42,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5048.175, 5040.097, 5079.3887, 5058.1416, 5067.7715, 5099.423, 5069.113, 5080.529, 5056.554, 5062.0796]
2025-09-16 13:23:42,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:23:42,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5066.13) for latency 3
2025-09-16 13:23:42,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 20 minutes, 58 seconds)
2025-09-16 13:25:39,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:25:52,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5070.16748 ± 27.871
2025-09-16 13:25:52,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5056.5244, 5033.868, 5076.756, 5091.3374, 5049.9165, 5100.9336, 5107.9595, 5095.2666, 5067.4253, 5021.687]
2025-09-16 13:25:52,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:25:52,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5070.17) for latency 3
2025-09-16 13:25:52,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 18 minutes, 19 seconds)
2025-09-16 13:27:49,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:28:00,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4021.73438 ± 1519.688
2025-09-16 13:28:00,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4967.6094, 5004.593, 4963.733, 4940.4136, 1490.8811, 2231.1953, 4951.475, 5039.437, 1444.6013, 5183.4053]
2025-09-16 13:28:00,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 325.0, 479.0, 1000.0, 1000.0, 308.0, 1000.0]
2025-09-16 13:28:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 15 minutes, 51 seconds)
2025-09-16 13:30:02,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:30:13,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3988.95239 ± 1276.098
2025-09-16 13:30:13,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3649.79, 1382.312, 4489.705, 4902.7397, 2252.535, 3064.2769, 5104.647, 4912.684, 5104.9683, 5025.8657]
2025-09-16 13:30:13,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [719.0, 291.0, 900.0, 1000.0, 483.0, 598.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:30:13,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 14 minutes, 12 seconds)
2025-09-16 13:32:10,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:32:24,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4993.18994 ± 74.007
2025-09-16 13:32:24,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5011.531, 5004.3286, 5015.0605, 4804.663, 5042.113, 5029.2944, 5019.0225, 5064.0845, 4908.0317, 5033.7695]
2025-09-16 13:32:24,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 938.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:32:24,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 11 minutes, 21 seconds)
2025-09-16 13:34:20,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:34:34,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5123.80078 ± 163.597
2025-09-16 13:34:34,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5128.651, 5259.199, 5159.4316, 5156.904, 5114.4194, 5206.3286, 4648.302, 5158.4834, 5211.091, 5195.2]
2025-09-16 13:34:34,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 890.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:34:34,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5123.80) for latency 3
2025-09-16 13:34:34,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 9 minutes, 35 seconds)
2025-09-16 13:36:30,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:36:44,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5109.01807 ± 59.266
2025-09-16 13:36:44,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5125.713, 5131.9346, 5079.495, 5146.5864, 5130.7627, 4978.597, 5090.3315, 5224.9463, 5083.4785, 5098.3345]
2025-09-16 13:36:44,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:36:44,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 7 minutes, 18 seconds)
2025-09-16 13:38:36,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:38:49,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4898.95654 ± 860.728
2025-09-16 13:38:49,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2319.3342, 5195.9824, 5234.172, 5179.57, 5196.909, 5166.088, 5208.8193, 5172.141, 5089.4727, 5227.0767]
2025-09-16 13:38:49,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [448.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:38:49,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 4 minutes, 52 seconds)
2025-09-16 13:40:47,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:41:01,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5145.16309 ± 21.727
2025-09-16 13:41:01,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5164.156, 5130.7153, 5143.8076, 5155.2886, 5136.9033, 5099.209, 5157.359, 5182.673, 5129.0884, 5152.429]
2025-09-16 13:41:01,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:41:01,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5145.16) for latency 3
2025-09-16 13:41:01,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 2 minutes, 39 seconds)
2025-09-16 13:42:55,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:43:07,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4474.79248 ± 946.329
2025-09-16 13:43:07,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3758.8188, 3578.452, 5154.695, 5193.1846, 5237.554, 5193.6426, 5202.294, 5182.664, 3786.2942, 2460.3298]
2025-09-16 13:43:07,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [717.0, 729.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 716.0, 482.0]
2025-09-16 13:43:07,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 2 seconds)
2025-09-16 13:45:03,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:45:17,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5008.83936 ± 447.267
2025-09-16 13:45:17,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5151.225, 5168.905, 5073.152, 5171.2793, 5196.81, 5097.9033, 5193.5684, 5196.3555, 5166.96, 3672.2358]
2025-09-16 13:45:17,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 978.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 704.0]
2025-09-16 13:45:17,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes, 48 seconds)
2025-09-16 13:47:17,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:47:31,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5047.12891 ± 38.227
2025-09-16 13:47:31,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5078.3794, 5078.2915, 4941.237, 5055.629, 5037.519, 5068.6724, 5064.8506, 5045.141, 5066.2583, 5035.3105]
2025-09-16 13:47:31,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:47:31,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 56 minutes, 7 seconds)
2025-09-16 13:49:30,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:49:44,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5049.25879 ± 53.614
2025-09-16 13:49:44,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5069.298, 4929.1763, 5053.8813, 5090.534, 5051.2505, 5084.3975, 5074.4863, 4977.1807, 5120.4907, 5041.8916]
2025-09-16 13:49:44,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:49:44,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 54 minutes, 38 seconds)
2025-09-16 13:51:41,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:51:55,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5109.35449 ± 21.291
2025-09-16 13:51:55,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5125.853, 5088.271, 5121.92, 5094.3154, 5108.8413, 5135.4316, 5093.0063, 5076.1885, 5103.598, 5146.1206]
2025-09-16 13:51:55,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:51:55,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 52 minutes, 18 seconds)
2025-09-16 13:53:52,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:54:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4604.07520 ± 1320.269
2025-09-16 13:54:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [645.0296, 5077.129, 5063.9634, 4942.288, 5091.7544, 5059.677, 5053.084, 5050.0015, 5012.0186, 5045.799]
2025-09-16 13:54:04,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:54:04,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 50 minutes, 24 seconds)
2025-09-16 13:55:59,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:56:13,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5100.02734 ± 12.744
2025-09-16 13:56:13,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5084.025, 5085.374, 5094.1934, 5111.9395, 5084.038, 5105.9253, 5119.2446, 5117.773, 5099.146, 5098.618]
2025-09-16 13:56:13,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:56:13,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 48 minutes, 8 seconds)
2025-09-16 13:58:10,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:58:24,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4969.43213 ± 13.715
2025-09-16 13:58:24,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4953.4824, 4964.278, 4981.1025, 4974.0957, 4961.7554, 5003.7983, 4967.9473, 4968.0386, 4957.009, 4962.818]
2025-09-16 13:58:24,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:58:24,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 45 minutes, 41 seconds)
2025-09-16 14:00:23,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:00:36,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4806.96777 ± 829.755
2025-09-16 14:00:36,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5072.764, 5093.188, 5015.2676, 2318.8633, 5096.1484, 5099.71, 5094.5264, 5088.0723, 5114.183, 5076.952]
2025-09-16 14:00:36,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 450.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:00:36,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 27 seconds)
2025-09-16 14:02:36,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:02:50,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5102.57764 ± 16.054
2025-09-16 14:02:50,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5089.2153, 5084.654, 5113.6646, 5087.6685, 5111.894, 5122.87, 5119.462, 5102.1895, 5118.151, 5076.0063]
2025-09-16 14:02:50,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:02:50,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 30 seconds)
2025-09-16 14:04:49,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:05:03,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5120.02734 ± 12.118
2025-09-16 14:05:03,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5108.961, 5116.985, 5120.82, 5131.726, 5119.5796, 5109.069, 5149.7603, 5115.4087, 5106.8936, 5121.069]
2025-09-16 14:05:03,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:05:03,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes, 31 seconds)
2025-09-16 14:06:58,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:07:10,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4393.04395 ± 1486.066
2025-09-16 14:07:10,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5136.217, 5143.3013, 2437.4297, 5136.838, 5153.5894, 5137.465, 5092.098, 4931.0073, 5129.5093, 632.98956]
2025-09-16 14:07:10,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 493.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 132.0]
2025-09-16 14:07:10,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 14 seconds)
2025-09-16 14:09:07,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:09:21,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5140.79346 ± 18.059
2025-09-16 14:09:21,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5152.5864, 5144.5415, 5136.081, 5134.0615, 5180.1924, 5132.441, 5113.1797, 5118.159, 5144.4854, 5152.2085]
2025-09-16 14:09:21,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:09:21,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes)
2025-09-16 14:11:17,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:11:31,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5076.18262 ± 11.754
2025-09-16 14:11:31,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5071.526, 5051.3174, 5079.4683, 5084.9937, 5089.4824, 5083.7583, 5083.4766, 5069.6816, 5086.602, 5061.525]
2025-09-16 14:11:31,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:11:31,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 32 minutes, 44 seconds)
2025-09-16 14:13:27,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:13:42,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5096.96094 ± 64.836
2025-09-16 14:13:42,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5134.7227, 4928.1226, 5084.31, 5077.349, 5191.9385, 5124.4844, 5103.0933, 5136.113, 5104.766, 5084.711]
2025-09-16 14:13:42,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:13:42,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 23 seconds)
2025-09-16 14:15:38,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:15:52,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5194.30615 ± 44.010
2025-09-16 14:15:52,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5217.272, 5231.0645, 5192.827, 5210.725, 5092.4062, 5217.84, 5213.125, 5146.44, 5250.0474, 5171.318]
2025-09-16 14:15:52,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:15:52,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5194.31) for latency 3
2025-09-16 14:15:52,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 5 seconds)
2025-09-16 14:17:47,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:17:54,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2438.34180 ± 1079.512
2025-09-16 14:17:54,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2519.357, 3607.896, 2298.8335, 1278.6523, 891.4876, 2189.051, 2054.851, 2798.2622, 1882.911, 4862.116]
2025-09-16 14:17:54,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [505.0, 708.0, 456.0, 263.0, 170.0, 469.0, 399.0, 541.0, 372.0, 1000.0]
2025-09-16 14:17:54,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 44 seconds)
2025-09-16 14:19:46,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:19:58,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4280.16113 ± 1200.228
2025-09-16 14:19:58,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1513.3252, 4720.0586, 5082.321, 5075.2925, 5110.5317, 3484.932, 4821.6816, 5089.3174, 2769.765, 5134.3853]
2025-09-16 14:19:58,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [299.0, 930.0, 1000.0, 1000.0, 1000.0, 688.0, 987.0, 1000.0, 549.0, 1000.0]
2025-09-16 14:19:58,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 21 seconds)
2025-09-16 14:22:00,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:22:13,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4696.77734 ± 1200.516
2025-09-16 14:22:13,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5154.886, 5051.2925, 5184.875, 1105.3348, 4903.8574, 5192.24, 5097.288, 5171.318, 4976.168, 5130.5127]
2025-09-16 14:22:13,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 245.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:22:13,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 23 seconds)
2025-09-16 14:24:03,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:24:17,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4993.71436 ± 668.347
2025-09-16 14:24:17,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5189.876, 5245.834, 5242.605, 5243.737, 5177.293, 2990.329, 5190.6133, 5256.225, 5206.225, 5194.4053]
2025-09-16 14:24:17,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 603.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:24:17,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 3 seconds)
2025-09-16 14:26:21,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:26:35,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5099.29297 ± 44.755
2025-09-16 14:26:35,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5136.542, 5099.6772, 5064.4995, 5134.9688, 5093.873, 5049.7485, 5168.2305, 5013.666, 5138.684, 5093.0415]
2025-09-16 14:26:35,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:26:35,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 8 seconds)
2025-09-16 14:28:32,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:28:46,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5029.06934 ± 60.505
2025-09-16 14:28:46,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4981.77, 5076.891, 5068.638, 5076.2524, 4893.3936, 5004.1284, 5117.394, 5054.419, 5010.6094, 5007.201]
2025-09-16 14:28:46,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:28:46,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 12 seconds)
2025-09-16 14:30:40,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:30:53,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4967.45996 ± 656.256
2025-09-16 14:30:53,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5174.449, 5216.679, 5198.8267, 5127.8667, 5259.662, 3001.8237, 5218.7344, 5174.1924, 5153.3345, 5149.031]
2025-09-16 14:30:53,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 578.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:30:53,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 6 seconds)
2025-09-16 14:32:48,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:33:01,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4742.66602 ± 1152.294
2025-09-16 14:33:01,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5162.5454, 1286.5205, 5086.2134, 5100.9062, 5156.328, 5117.6685, 5105.5996, 5150.6743, 5135.859, 5124.349]
2025-09-16 14:33:01,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 262.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:33:01,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 48 seconds)
2025-09-16 14:34:56,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:35:08,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4119.63574 ± 1821.840
2025-09-16 14:35:08,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4972.7334, 5009.055, 538.5376, 5003.205, 415.02335, 5073.585, 5050.656, 5045.116, 5046.7925, 5041.657]
2025-09-16 14:35:08,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 102.0, 1000.0, 80.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:35:08,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 40 seconds)
2025-09-16 14:37:04,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:37:18,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5142.33350 ± 14.847
2025-09-16 14:37:18,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5137.5933, 5159.5596, 5135.704, 5127.888, 5140.838, 5162.9077, 5134.6104, 5113.8076, 5156.1455, 5154.2847]
2025-09-16 14:37:18,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:37:18,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 25 seconds)
2025-09-16 14:39:06,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:39:21,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5098.15332 ± 22.399
2025-09-16 14:39:21,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5127.261, 5109.304, 5120.9766, 5104.286, 5102.4146, 5119.0293, 5088.5073, 5062.2007, 5058.63, 5088.9253]
2025-09-16 14:39:21,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:39:21,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 13 seconds)
2025-09-16 14:41:16,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:41:28,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4859.23779 ± 1139.022
2025-09-16 14:41:28,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1442.5845, 5258.763, 5206.2905, 5242.622, 5239.5337, 5219.8765, 5262.802, 5262.026, 5227.0625, 5230.817]
2025-09-16 14:41:28,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [278.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:41:28,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 7 seconds)
2025-09-16 14:43:26,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:43:40,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5090.49414 ± 34.850
2025-09-16 14:43:40,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5122.554, 5099.7905, 5106.3145, 5089.7334, 5107.4365, 4991.5566, 5102.134, 5112.2793, 5092.2095, 5080.933]
2025-09-16 14:43:40,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:43:40,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1251 [DEBUG]: Training session finished
