2025-09-16 11:56:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.150-delay_6
2025-09-16 11:56:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.150-delay_6
2025-09-16 11:56:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x14fa12be48d0>}
2025-09-16 11:56:30,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:56:31,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:56:31,020 baseline-bpql-noisepromille150-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=478, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:56:31,020 baseline-bpql-noisepromille150-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:56:32,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:56:32,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:58:17,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:58:18,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 358.45505 ± 39.622
2025-09-16 11:58:18,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [356.86044, 385.57104, 393.49338, 359.26596, 277.54834, 419.56653, 306.79095, 340.58243, 378.46683, 366.40457]
2025-09-16 11:58:18,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 73.0, 75.0, 67.0, 52.0, 78.0, 57.0, 62.0, 80.0, 68.0]
2025-09-16 11:58:18,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (358.46) for latency 6
2025-09-16 11:58:18,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 54 minutes, 44 seconds)
2025-09-16 12:00:12,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:00:13,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 371.13120 ± 89.914
2025-09-16 12:00:13,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [200.1476, 469.21182, 369.30505, 350.22235, 256.89505, 474.20685, 476.85944, 396.81433, 305.2768, 412.37262]
2025-09-16 12:00:13,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 101.0, 75.0, 78.0, 58.0, 90.0, 94.0, 83.0, 58.0, 81.0]
2025-09-16 12:00:13,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (371.13) for latency 6
2025-09-16 12:00:13,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 41 seconds)
2025-09-16 12:02:07,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:02:08,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 341.93268 ± 51.156
2025-09-16 12:02:08,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [278.6351, 380.6448, 392.26517, 427.4595, 273.35086, 347.41464, 325.17902, 284.137, 322.6128, 387.62756]
2025-09-16 12:02:08,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 70.0, 76.0, 79.0, 53.0, 66.0, 60.0, 53.0, 60.0, 72.0]
2025-09-16 12:02:08,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 1 minute, 5 seconds)
2025-09-16 12:04:03,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:04:04,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 465.45551 ± 102.484
2025-09-16 12:04:04,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [322.7113, 318.4758, 608.6414, 482.9178, 447.09836, 440.55304, 475.39728, 404.59543, 656.3379, 497.82645]
2025-09-16 12:04:04,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 60.0, 128.0, 91.0, 85.0, 86.0, 89.0, 77.0, 132.0, 93.0]
2025-09-16 12:04:04,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (465.46) for latency 6
2025-09-16 12:04:04,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 52 seconds)
2025-09-16 12:06:00,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:06:01,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 504.11508 ± 159.066
2025-09-16 12:06:01,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [375.31253, 604.66785, 299.3059, 522.8195, 443.33124, 500.50732, 911.33344, 437.1141, 409.70404, 537.05536]
2025-09-16 12:06:01,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 113.0, 62.0, 99.0, 83.0, 96.0, 194.0, 82.0, 77.0, 101.0]
2025-09-16 12:06:01,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (504.12) for latency 6
2025-09-16 12:06:01,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 11 seconds)
2025-09-16 12:07:56,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:07:57,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 385.90433 ± 58.402
2025-09-16 12:07:57,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [352.13898, 401.92896, 471.76242, 382.20624, 424.39325, 353.13785, 492.85037, 328.2724, 305.23383, 347.11884]
2025-09-16 12:07:57,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 83.0, 87.0, 73.0, 88.0, 78.0, 93.0, 64.0, 57.0, 73.0]
2025-09-16 12:07:57,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 1 minute, 27 seconds)
2025-09-16 12:09:51,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:09:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 472.19818 ± 86.742
2025-09-16 12:09:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [370.4286, 485.9747, 403.08212, 579.56995, 358.37582, 503.6211, 494.935, 427.63943, 448.94604, 649.4088]
2025-09-16 12:09:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 93.0, 84.0, 111.0, 78.0, 95.0, 94.0, 92.0, 90.0, 136.0]
2025-09-16 12:09:52,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 59 minutes, 29 seconds)
2025-09-16 12:11:48,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:11:50,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 422.05341 ± 63.523
2025-09-16 12:11:50,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [361.803, 379.87875, 481.9158, 319.48224, 502.87262, 371.0176, 442.27243, 438.3987, 398.50653, 524.3864]
2025-09-16 12:11:50,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 78.0, 100.0, 60.0, 96.0, 71.0, 84.0, 88.0, 74.0, 103.0]
2025-09-16 12:11:50,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 58 minutes, 16 seconds)
2025-09-16 12:13:44,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:13:45,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 444.02057 ± 112.718
2025-09-16 12:13:45,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [549.94507, 605.8126, 383.58725, 356.4677, 305.3049, 653.27637, 345.8186, 405.1641, 380.39883, 454.43033]
2025-09-16 12:13:45,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 118.0, 78.0, 76.0, 71.0, 128.0, 65.0, 89.0, 79.0, 86.0]
2025-09-16 12:13:45,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 56 minutes, 13 seconds)
2025-09-16 12:15:42,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:15:43,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 460.93823 ± 134.952
2025-09-16 12:15:43,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [460.89026, 273.24155, 386.79987, 361.3225, 337.95093, 737.1444, 548.3842, 618.2324, 501.88083, 383.53568]
2025-09-16 12:15:43,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 53.0, 80.0, 67.0, 62.0, 144.0, 102.0, 116.0, 112.0, 80.0]
2025-09-16 12:15:43,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 54 minutes, 29 seconds)
2025-09-16 12:17:39,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:17:40,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 526.86859 ± 110.789
2025-09-16 12:17:40,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [367.26358, 413.62003, 658.67316, 586.1066, 611.2913, 683.01654, 494.5014, 358.2392, 576.93384, 519.04047]
2025-09-16 12:17:40,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 77.0, 127.0, 110.0, 126.0, 131.0, 92.0, 65.0, 108.0, 105.0]
2025-09-16 12:17:40,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (526.87) for latency 6
2025-09-16 12:17:40,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 53 minutes, 2 seconds)
2025-09-16 12:19:35,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:19:36,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 509.40161 ± 110.352
2025-09-16 12:19:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [426.41043, 418.51074, 538.48004, 443.28305, 489.9823, 797.3334, 560.16693, 481.851, 393.08032, 544.9184]
2025-09-16 12:19:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 82.0, 102.0, 81.0, 90.0, 146.0, 103.0, 96.0, 72.0, 103.0]
2025-09-16 12:19:36,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 51 minutes, 15 seconds)
2025-09-16 12:21:33,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:21:35,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 563.92480 ± 116.868
2025-09-16 12:21:35,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [596.219, 424.5025, 758.46826, 771.7203, 592.2092, 515.6747, 489.01074, 443.48816, 589.69336, 458.2617]
2025-09-16 12:21:35,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 89.0, 154.0, 145.0, 110.0, 96.0, 95.0, 81.0, 126.0, 85.0]
2025-09-16 12:21:35,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (563.92) for latency 6
2025-09-16 12:21:35,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 49 minutes, 42 seconds)
2025-09-16 12:23:29,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:23:31,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 543.34845 ± 28.281
2025-09-16 12:23:31,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [527.86914, 587.9726, 560.8955, 563.0323, 494.82507, 501.3855, 547.0474, 554.9335, 528.1421, 567.38116]
2025-09-16 12:23:31,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 109.0, 103.0, 117.0, 92.0, 107.0, 118.0, 118.0, 96.0, 112.0]
2025-09-16 12:23:31,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 47 minutes, 48 seconds)
2025-09-16 12:25:28,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:25:29,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 626.72693 ± 179.550
2025-09-16 12:25:29,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [487.40918, 687.33154, 399.61295, 962.4379, 371.5165, 797.3615, 524.9204, 727.437, 751.3267, 557.91534]
2025-09-16 12:25:29,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 132.0, 81.0, 178.0, 80.0, 167.0, 97.0, 153.0, 143.0, 119.0]
2025-09-16 12:25:29,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (626.73) for latency 6
2025-09-16 12:25:29,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 46 minutes, 10 seconds)
2025-09-16 12:27:26,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:27:28,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 732.99481 ± 147.564
2025-09-16 12:27:28,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [864.56836, 799.25134, 761.9666, 1032.7922, 644.27106, 680.7229, 430.5297, 722.84186, 711.944, 681.05994]
2025-09-16 12:27:28,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 146.0, 156.0, 203.0, 120.0, 126.0, 92.0, 142.0, 144.0, 141.0]
2025-09-16 12:27:28,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (732.99) for latency 6
2025-09-16 12:27:28,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 44 minutes, 34 seconds)
2025-09-16 12:29:24,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:29:26,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 719.83539 ± 240.917
2025-09-16 12:29:26,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [543.3845, 588.16376, 753.28424, 555.5054, 492.25073, 657.6676, 1323.8099, 571.0483, 965.7369, 747.503]
2025-09-16 12:29:26,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 124.0, 146.0, 118.0, 106.0, 132.0, 278.0, 111.0, 196.0, 157.0]
2025-09-16 12:29:26,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 43 minutes, 11 seconds)
2025-09-16 12:31:21,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:31:22,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 711.42999 ± 175.675
2025-09-16 12:31:22,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [863.3057, 573.2024, 471.62265, 1069.9108, 616.19006, 878.5564, 582.1466, 723.5901, 561.17834, 774.5969]
2025-09-16 12:31:22,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 105.0, 98.0, 220.0, 128.0, 160.0, 107.0, 140.0, 104.0, 151.0]
2025-09-16 12:31:22,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 38 seconds)
2025-09-16 12:33:19,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:33:20,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 663.08301 ± 151.226
2025-09-16 12:33:20,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [967.3194, 600.4018, 565.4789, 589.82526, 805.6468, 677.0677, 432.79462, 739.70544, 493.66504, 758.9252]
2025-09-16 12:33:20,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 125.0, 109.0, 111.0, 149.0, 124.0, 80.0, 136.0, 105.0, 141.0]
2025-09-16 12:33:20,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 39 minutes, 14 seconds)
2025-09-16 12:35:17,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:35:18,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 680.31256 ± 109.247
2025-09-16 12:35:18,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [668.0988, 733.8444, 709.8784, 699.3563, 483.52426, 836.24274, 827.94653, 507.87793, 657.3953, 678.96027]
2025-09-16 12:35:18,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 149.0, 130.0, 136.0, 103.0, 158.0, 158.0, 99.0, 125.0, 132.0]
2025-09-16 12:35:18,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 37 minutes, 3 seconds)
2025-09-16 12:37:15,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:37:16,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 644.29700 ± 98.704
2025-09-16 12:37:16,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [636.7117, 514.21796, 634.038, 534.2133, 795.0642, 838.14886, 591.26056, 588.93567, 685.1715, 625.2078]
2025-09-16 12:37:16,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 111.0, 122.0, 98.0, 146.0, 155.0, 110.0, 109.0, 130.0, 117.0]
2025-09-16 12:37:16,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 34 minutes, 48 seconds)
2025-09-16 12:39:12,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:39:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 628.81244 ± 155.913
2025-09-16 12:39:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [647.27026, 674.9827, 573.3274, 465.682, 575.2961, 657.01166, 699.1894, 1018.11383, 412.9467, 564.3048]
2025-09-16 12:39:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 145.0, 105.0, 86.0, 106.0, 121.0, 127.0, 198.0, 77.0, 119.0]
2025-09-16 12:39:13,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes, 42 seconds)
2025-09-16 12:41:10,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:41:12,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 617.87793 ± 161.281
2025-09-16 12:41:12,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [482.13727, 961.00476, 631.4531, 483.77722, 556.602, 697.08905, 591.7896, 369.30338, 605.6983, 799.9245]
2025-09-16 12:41:12,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 185.0, 126.0, 99.0, 109.0, 135.0, 127.0, 83.0, 116.0, 156.0]
2025-09-16 12:41:12,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 31 minutes, 16 seconds)
2025-09-16 12:43:09,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:43:11,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 681.56769 ± 173.358
2025-09-16 12:43:11,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [920.0907, 914.8717, 527.4477, 437.7494, 691.42426, 647.84424, 722.41815, 786.91754, 392.7301, 774.18317]
2025-09-16 12:43:11,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 168.0, 101.0, 92.0, 129.0, 119.0, 136.0, 149.0, 81.0, 144.0]
2025-09-16 12:43:11,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 29 minutes, 38 seconds)
2025-09-16 12:45:06,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:45:08,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 736.71375 ± 151.801
2025-09-16 12:45:08,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [672.92004, 790.0862, 587.0501, 818.57654, 594.3764, 901.26385, 458.97928, 990.5908, 742.7291, 810.5656]
2025-09-16 12:45:08,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 147.0, 106.0, 154.0, 112.0, 174.0, 85.0, 190.0, 143.0, 163.0]
2025-09-16 12:45:08,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (736.71) for latency 6
2025-09-16 12:45:08,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 27 minutes, 31 seconds)
2025-09-16 12:47:06,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:47:08,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 711.95581 ± 199.421
2025-09-16 12:47:08,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1127.2512, 558.59827, 861.2603, 543.5704, 548.2568, 580.37213, 811.9456, 720.8397, 465.47342, 901.98956]
2025-09-16 12:47:08,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 102.0, 162.0, 101.0, 98.0, 107.0, 160.0, 155.0, 84.0, 168.0]
2025-09-16 12:47:08,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 1 second)
2025-09-16 12:49:04,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:49:06,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 857.60468 ± 219.845
2025-09-16 12:49:06,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [677.1682, 714.6337, 1243.2533, 1081.8721, 616.50165, 836.77277, 639.8801, 1188.8674, 733.5133, 843.5843]
2025-09-16 12:49:06,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 131.0, 242.0, 203.0, 122.0, 156.0, 132.0, 220.0, 140.0, 161.0]
2025-09-16 12:49:06,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (857.60) for latency 6
2025-09-16 12:49:06,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 9 seconds)
2025-09-16 12:51:03,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:51:05,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 878.37177 ± 233.816
2025-09-16 12:51:05,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [376.50333, 852.52277, 660.09064, 743.2489, 925.2858, 828.9453, 961.9225, 1168.8365, 1167.8406, 1098.5214]
2025-09-16 12:51:05,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 160.0, 124.0, 142.0, 195.0, 153.0, 190.0, 222.0, 229.0, 224.0]
2025-09-16 12:51:05,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (878.37) for latency 6
2025-09-16 12:51:05,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 21 seconds)
2025-09-16 12:53:01,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:53:03,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 762.14417 ± 255.333
2025-09-16 12:53:03,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [620.8875, 965.8484, 531.9325, 626.3476, 1013.5693, 1368.9309, 700.66504, 633.5068, 600.9679, 558.78595]
2025-09-16 12:53:03,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 186.0, 102.0, 117.0, 188.0, 282.0, 133.0, 120.0, 109.0, 100.0]
2025-09-16 12:53:03,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 6 seconds)
2025-09-16 12:55:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:55:02,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 746.88037 ± 151.774
2025-09-16 12:55:02,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [899.2258, 716.39923, 937.9438, 508.7702, 602.5259, 689.9816, 587.02997, 985.27466, 718.83234, 822.8196]
2025-09-16 12:55:02,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 134.0, 174.0, 97.0, 111.0, 128.0, 107.0, 187.0, 133.0, 156.0]
2025-09-16 12:55:02,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 18 minutes, 38 seconds)
2025-09-16 12:56:59,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:57:02,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 900.59277 ± 309.112
2025-09-16 12:57:02,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [695.3085, 1022.6273, 802.806, 1742.1222, 662.4804, 867.9127, 612.9278, 867.7359, 1001.5465, 730.4596]
2025-09-16 12:57:02,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 199.0, 153.0, 337.0, 124.0, 161.0, 113.0, 167.0, 201.0, 138.0]
2025-09-16 12:57:02,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (900.59) for latency 6
2025-09-16 12:57:02,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 32 seconds)
2025-09-16 12:58:59,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:59:02,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1069.41772 ± 382.177
2025-09-16 12:59:02,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1079.4468, 669.42633, 951.8459, 1904.7236, 1216.2131, 360.93378, 1297.03, 1102.856, 977.92053, 1133.7815]
2025-09-16 12:59:02,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 129.0, 188.0, 378.0, 238.0, 72.0, 258.0, 211.0, 193.0, 216.0]
2025-09-16 12:59:02,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1069.42) for latency 6
2025-09-16 12:59:02,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 11 seconds)
2025-09-16 13:00:59,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:01:01,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 730.44336 ± 364.177
2025-09-16 13:01:01,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [500.9758, 661.1059, 647.9178, 560.0324, 1164.0116, 1634.86, 732.66675, 502.1998, 545.0892, 355.5743]
2025-09-16 13:01:01,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 119.0, 125.0, 106.0, 239.0, 315.0, 139.0, 108.0, 103.0, 77.0]
2025-09-16 13:01:01,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 2 seconds)
2025-09-16 13:02:59,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:03:03,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1142.17200 ± 287.175
2025-09-16 13:03:03,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [621.6668, 1394.366, 1355.3945, 1317.7874, 1082.853, 711.4382, 1169.8292, 916.20624, 1349.3801, 1502.7987]
2025-09-16 13:03:03,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 266.0, 265.0, 259.0, 210.0, 152.0, 228.0, 172.0, 265.0, 304.0]
2025-09-16 13:03:03,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1142.17) for latency 6
2025-09-16 13:03:03,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 51 seconds)
2025-09-16 13:04:58,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:05:02,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1396.72009 ± 300.529
2025-09-16 13:05:02,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1269.2859, 1540.8184, 1658.29, 1999.1836, 1635.949, 1340.4054, 1363.203, 1005.0208, 1167.139, 987.90607]
2025-09-16 13:05:02,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [249.0, 298.0, 326.0, 392.0, 324.0, 258.0, 269.0, 191.0, 224.0, 185.0]
2025-09-16 13:05:02,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1396.72) for latency 6
2025-09-16 13:05:02,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 9 minutes, 51 seconds)
2025-09-16 13:06:59,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:07:02,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1186.98267 ± 423.688
2025-09-16 13:07:02,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1407.705, 1975.0195, 1001.6359, 1586.2024, 441.57156, 592.0989, 1116.5342, 1285.6897, 1190.2047, 1273.1647]
2025-09-16 13:07:02,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [277.0, 385.0, 202.0, 317.0, 91.0, 119.0, 211.0, 245.0, 232.0, 254.0]
2025-09-16 13:07:02,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 6 seconds)
2025-09-16 13:09:01,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:09:04,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1144.67749 ± 317.717
2025-09-16 13:09:04,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1217.6582, 978.3723, 1645.2443, 1407.8256, 1136.5071, 1608.7648, 1157.632, 830.758, 762.75146, 701.26184]
2025-09-16 13:09:04,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [237.0, 189.0, 327.0, 273.0, 225.0, 317.0, 228.0, 158.0, 144.0, 133.0]
2025-09-16 13:09:04,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 24 seconds)
2025-09-16 13:11:01,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:11:05,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1279.67261 ± 303.699
2025-09-16 13:11:05,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [953.4092, 1288.7863, 1050.9553, 1049.0431, 1029.2041, 1274.8922, 1604.5408, 1823.8658, 1019.9708, 1702.0577]
2025-09-16 13:11:05,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [197.0, 252.0, 204.0, 206.0, 221.0, 253.0, 313.0, 353.0, 195.0, 334.0]
2025-09-16 13:11:05,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 51 seconds)
2025-09-16 13:13:02,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:13:05,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1163.80920 ± 364.568
2025-09-16 13:13:05,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [467.3644, 1217.5278, 709.0023, 1477.4954, 1703.8838, 1141.389, 1556.9875, 1312.2828, 1146.8551, 905.3032]
2025-09-16 13:13:05,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 239.0, 127.0, 289.0, 332.0, 213.0, 305.0, 259.0, 220.0, 179.0]
2025-09-16 13:13:05,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 27 seconds)
2025-09-16 13:15:03,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:15:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1170.15283 ± 393.246
2025-09-16 13:15:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [643.0411, 610.7044, 1040.6776, 1200.1371, 793.2621, 1247.0942, 1230.0035, 1722.2579, 1384.0269, 1830.3232]
2025-09-16 13:15:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 117.0, 210.0, 238.0, 160.0, 246.0, 233.0, 342.0, 272.0, 360.0]
2025-09-16 13:15:06,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 48 seconds)
2025-09-16 13:17:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:17:07,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1490.18774 ± 345.042
2025-09-16 13:17:07,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1498.0139, 2237.671, 1299.0933, 956.58307, 1596.6669, 1869.1395, 1238.4786, 1464.5004, 1548.4347, 1193.2959]
2025-09-16 13:17:07,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [301.0, 434.0, 256.0, 184.0, 320.0, 371.0, 247.0, 281.0, 305.0, 228.0]
2025-09-16 13:17:07,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1490.19) for latency 6
2025-09-16 13:17:07,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 52 seconds)
2025-09-16 13:19:06,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:19:08,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 836.43671 ± 330.388
2025-09-16 13:19:08,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1075.9834, 705.6902, 609.4128, 408.81342, 444.5243, 1514.0486, 1079.3981, 671.4548, 1110.6678, 744.37317]
2025-09-16 13:19:08,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 140.0, 127.0, 74.0, 81.0, 292.0, 199.0, 132.0, 211.0, 147.0]
2025-09-16 13:19:08,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 56 minutes, 49 seconds)
2025-09-16 13:21:09,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:21:14,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1636.08679 ± 278.615
2025-09-16 13:21:14,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1256.2948, 1497.957, 1476.0925, 1712.1978, 1681.6061, 1776.773, 1372.6671, 1537.6942, 2323.4807, 1726.1057]
2025-09-16 13:21:14,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [241.0, 292.0, 290.0, 330.0, 327.0, 350.0, 266.0, 296.0, 453.0, 337.0]
2025-09-16 13:21:14,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1636.09) for latency 6
2025-09-16 13:21:14,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 39 seconds)
2025-09-16 13:23:06,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:23:10,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1504.56287 ± 445.194
2025-09-16 13:23:10,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [532.2429, 1389.5568, 1421.2223, 2155.2012, 1189.8363, 1415.2996, 1531.0996, 1665.0739, 1566.4386, 2179.6584]
2025-09-16 13:23:10,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 273.0, 278.0, 417.0, 228.0, 271.0, 296.0, 326.0, 303.0, 420.0]
2025-09-16 13:23:10,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 3 seconds)
2025-09-16 13:25:09,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:25:13,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1506.43384 ± 674.931
2025-09-16 13:25:13,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [819.66327, 1422.7461, 800.23193, 994.9207, 1916.369, 912.6082, 1470.8606, 1836.621, 3124.4653, 1765.8522]
2025-09-16 13:25:13,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 270.0, 150.0, 195.0, 386.0, 184.0, 278.0, 357.0, 609.0, 341.0]
2025-09-16 13:25:13,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 21 seconds)
2025-09-16 13:27:13,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:27:18,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1762.66089 ± 324.842
2025-09-16 13:27:18,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2182.112, 2000.7731, 2075.2778, 1550.3146, 1505.3497, 2087.9773, 1172.2959, 1708.8955, 1407.4264, 1936.1864]
2025-09-16 13:27:18,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [425.0, 385.0, 393.0, 306.0, 286.0, 396.0, 226.0, 331.0, 275.0, 378.0]
2025-09-16 13:27:18,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1762.66) for latency 6
2025-09-16 13:27:18,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 50 minutes)
2025-09-16 13:29:16,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:29:20,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1607.27429 ± 389.127
2025-09-16 13:29:20,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2052.7236, 1743.5414, 1262.5068, 1755.5663, 691.0872, 1413.2313, 1613.4766, 2113.971, 1709.8329, 1716.8054]
2025-09-16 13:29:20,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [398.0, 342.0, 240.0, 341.0, 137.0, 272.0, 313.0, 403.0, 330.0, 336.0]
2025-09-16 13:29:20,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 48 minutes, 3 seconds)
2025-09-16 13:31:15,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:31:20,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1938.41858 ± 275.887
2025-09-16 13:31:20,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1951.205, 2060.1223, 1883.8191, 1742.2245, 1824.6804, 1853.9227, 2206.6143, 2023.961, 2468.6345, 1369.0015]
2025-09-16 13:31:20,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [372.0, 398.0, 370.0, 339.0, 352.0, 360.0, 422.0, 397.0, 473.0, 260.0]
2025-09-16 13:31:20,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1938.42) for latency 6
2025-09-16 13:31:20,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 45 minutes, 12 seconds)
2025-09-16 13:33:18,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:33:22,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1460.17859 ± 484.980
2025-09-16 13:33:22,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2420.4524, 1139.1033, 1374.269, 2058.5403, 496.16168, 1460.9945, 1414.71, 1403.8016, 1309.4167, 1524.3362]
2025-09-16 13:33:22,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [484.0, 220.0, 268.0, 403.0, 96.0, 291.0, 277.0, 276.0, 254.0, 300.0]
2025-09-16 13:33:22,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 44 minutes, 2 seconds)
2025-09-16 13:35:24,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:35:29,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1636.82837 ± 215.720
2025-09-16 13:35:29,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1328.3121, 1541.8009, 1694.7711, 1895.432, 1500.4965, 1397.2549, 1485.9604, 2051.2087, 1719.5914, 1753.4536]
2025-09-16 13:35:29,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [261.0, 303.0, 331.0, 385.0, 293.0, 273.0, 291.0, 403.0, 341.0, 343.0]
2025-09-16 13:35:29,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 42 minutes, 37 seconds)
2025-09-16 13:37:26,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:37:31,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1775.18066 ± 419.785
2025-09-16 13:37:31,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2542.9565, 940.652, 2133.8054, 1578.1437, 1469.0505, 1787.8212, 1919.8073, 1993.4827, 1950.8224, 1435.2662]
2025-09-16 13:37:31,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [496.0, 186.0, 427.0, 299.0, 285.0, 349.0, 373.0, 392.0, 384.0, 282.0]
2025-09-16 13:37:31,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 40 minutes, 9 seconds)
2025-09-16 13:39:29,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:39:34,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2039.28845 ± 949.475
2025-09-16 13:39:34,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4711.3887, 1347.4298, 2382.4941, 1496.2944, 1678.841, 1725.567, 1903.3513, 1854.3818, 1202.6412, 2090.4968]
2025-09-16 13:39:34,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [905.0, 257.0, 466.0, 285.0, 319.0, 341.0, 368.0, 358.0, 228.0, 392.0]
2025-09-16 13:39:34,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (2039.29) for latency 6
2025-09-16 13:39:34,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 38 minutes, 16 seconds)
2025-09-16 13:41:35,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:41:41,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2182.74463 ± 763.471
2025-09-16 13:41:41,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2326.638, 2393.5664, 2014.0029, 1560.3214, 1141.5276, 2600.5605, 2472.8274, 1121.1409, 2323.5027, 3873.3591]
2025-09-16 13:41:41,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [457.0, 469.0, 399.0, 312.0, 230.0, 509.0, 480.0, 229.0, 449.0, 771.0]
2025-09-16 13:41:41,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (2182.74) for latency 6
2025-09-16 13:41:41,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 37 minutes, 14 seconds)
2025-09-16 13:43:40,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:43:45,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2020.50952 ± 638.097
2025-09-16 13:43:45,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1850.5486, 1261.3389, 1536.2173, 2049.6694, 3494.2068, 2061.7593, 1541.5188, 1800.0558, 2860.6768, 1749.1027]
2025-09-16 13:43:45,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [355.0, 248.0, 300.0, 393.0, 663.0, 397.0, 318.0, 346.0, 552.0, 336.0]
2025-09-16 13:43:45,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 35 minutes, 29 seconds)
2025-09-16 13:45:45,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:45:51,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2209.83423 ± 592.304
2025-09-16 13:45:51,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2391.2883, 2492.6296, 3666.606, 2343.7854, 2194.2024, 1894.1403, 1545.3002, 1457.1991, 2280.509, 1832.6798]
2025-09-16 13:45:51,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [471.0, 487.0, 721.0, 451.0, 416.0, 361.0, 299.0, 293.0, 446.0, 350.0]
2025-09-16 13:45:51,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (2209.83) for latency 6
2025-09-16 13:45:51,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 33 minutes, 17 seconds)
2025-09-16 13:47:48,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:47:55,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2405.73389 ± 950.104
2025-09-16 13:47:55,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2878.1826, 2807.3857, 2704.1157, 675.6755, 2684.911, 2312.8188, 2865.856, 701.41766, 2488.095, 3938.8806]
2025-09-16 13:47:55,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [556.0, 549.0, 538.0, 145.0, 528.0, 460.0, 564.0, 145.0, 473.0, 774.0]
2025-09-16 13:47:55,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (2405.73) for latency 6
2025-09-16 13:47:55,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 31 minutes, 31 seconds)
2025-09-16 13:49:54,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:49:59,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1903.82593 ± 253.237
2025-09-16 13:49:59,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2233.3965, 1693.5447, 1724.1274, 1813.4048, 1684.7233, 2273.6162, 1924.2794, 1872.6522, 2271.0515, 1547.4645]
2025-09-16 13:49:59,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [438.0, 325.0, 340.0, 356.0, 328.0, 451.0, 376.0, 372.0, 437.0, 301.0]
2025-09-16 13:49:59,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 32 seconds)
2025-09-16 13:51:59,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:52:04,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1944.92419 ± 529.657
2025-09-16 13:52:04,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1419.0846, 2014.3698, 2029.3813, 3440.8806, 1735.9204, 1554.2811, 1872.0468, 1760.2076, 1749.4402, 1873.6306]
2025-09-16 13:52:04,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [283.0, 391.0, 399.0, 659.0, 343.0, 306.0, 365.0, 341.0, 344.0, 368.0]
2025-09-16 13:52:04,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 27 minutes, 12 seconds)
2025-09-16 13:54:02,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:54:07,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1850.00415 ± 279.113
2025-09-16 13:54:07,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1727.2543, 2171.8984, 1651.8958, 1925.7732, 1674.9221, 1997.9598, 2375.604, 1352.2764, 1945.0762, 1677.3805]
2025-09-16 13:54:07,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [345.0, 434.0, 328.0, 373.0, 328.0, 394.0, 462.0, 267.0, 389.0, 325.0]
2025-09-16 13:54:07,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 24 minutes, 58 seconds)
2025-09-16 13:56:08,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:56:13,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1895.83325 ± 515.571
2025-09-16 13:56:13,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2628.8464, 2100.714, 2291.8105, 1932.2013, 1790.5696, 1889.9885, 607.7918, 1509.6464, 2015.9916, 2190.7705]
2025-09-16 13:56:13,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [513.0, 401.0, 446.0, 376.0, 352.0, 363.0, 113.0, 291.0, 393.0, 429.0]
2025-09-16 13:56:13,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 22 minutes, 56 seconds)
2025-09-16 13:58:10,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:58:17,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2437.23926 ± 428.844
2025-09-16 13:58:17,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2522.9436, 2267.2349, 2198.828, 2807.9514, 2648.552, 3408.2886, 1878.3246, 2349.2754, 1889.9451, 2401.048]
2025-09-16 13:58:17,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [492.0, 448.0, 426.0, 550.0, 512.0, 668.0, 364.0, 463.0, 378.0, 475.0]
2025-09-16 13:58:17,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (2437.24) for latency 6
2025-09-16 13:58:17,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 20 minutes, 51 seconds)
2025-09-16 14:00:15,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:00:24,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3262.75146 ± 782.740
2025-09-16 14:00:24,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3578.326, 2379.3918, 3520.8242, 5274.7256, 2929.5063, 2746.852, 2797.1702, 3632.9534, 2644.9573, 3122.806]
2025-09-16 14:00:24,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [695.0, 466.0, 678.0, 1000.0, 571.0, 533.0, 538.0, 698.0, 523.0, 602.0]
2025-09-16 14:00:24,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (3262.75) for latency 6
2025-09-16 14:00:24,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 19 minutes, 8 seconds)
2025-09-16 14:02:21,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:02:30,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3067.58252 ± 857.270
2025-09-16 14:02:30,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4359.9727, 2673.3525, 2843.3804, 4201.144, 4060.478, 2879.9893, 2251.068, 1872.9835, 2098.7517, 3434.7036]
2025-09-16 14:02:30,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [842.0, 513.0, 563.0, 821.0, 778.0, 550.0, 441.0, 371.0, 419.0, 665.0]
2025-09-16 14:02:30,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 17 minutes, 9 seconds)
2025-09-16 14:04:25,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:04:34,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3225.00342 ± 845.882
2025-09-16 14:04:34,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3944.16, 3250.077, 2811.6858, 3074.6938, 2322.4912, 3849.1216, 4738.591, 2125.3967, 3986.095, 2147.7212]
2025-09-16 14:04:34,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [772.0, 634.0, 541.0, 592.0, 456.0, 756.0, 919.0, 425.0, 769.0, 424.0]
2025-09-16 14:04:34,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 15 minutes, 15 seconds)
2025-09-16 14:06:33,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:06:42,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2956.26489 ± 974.096
2025-09-16 14:06:42,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3230.1643, 4633.3745, 3438.888, 2627.063, 1849.6216, 3764.861, 1155.5454, 3626.9448, 2138.5042, 3097.682]
2025-09-16 14:06:42,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [634.0, 900.0, 687.0, 504.0, 360.0, 742.0, 238.0, 723.0, 422.0, 607.0]
2025-09-16 14:06:42,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 13 minutes, 19 seconds)
2025-09-16 14:08:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:08:46,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3282.85425 ± 603.888
2025-09-16 14:08:46,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2611.2903, 3054.9536, 3715.0535, 3561.999, 3960.2795, 2950.0178, 4498.303, 3046.568, 2971.4473, 2458.6304]
2025-09-16 14:08:46,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [520.0, 592.0, 705.0, 686.0, 764.0, 578.0, 878.0, 587.0, 569.0, 474.0]
2025-09-16 14:08:46,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (3282.85) for latency 6
2025-09-16 14:08:46,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 11 minutes, 19 seconds)
2025-09-16 14:10:42,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:10:52,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3400.29297 ± 1454.875
2025-09-16 14:10:52,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5219.1865, 2984.9282, 2332.1309, 4654.5664, 3656.0115, 672.93225, 4315.9785, 1596.9442, 5236.769, 3333.4822]
2025-09-16 14:10:52,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 574.0, 467.0, 900.0, 711.0, 124.0, 838.0, 326.0, 1000.0, 647.0]
2025-09-16 14:10:52,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (3400.29) for latency 6
2025-09-16 14:10:52,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 9 minutes, 6 seconds)
2025-09-16 14:12:48,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:12:59,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3921.53638 ± 1445.943
2025-09-16 14:12:59,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5140.526, 4841.1177, 4564.034, 2663.1187, 4886.8237, 3290.108, 2767.2817, 659.5921, 5216.481, 5186.2827]
2025-09-16 14:12:59,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [994.0, 926.0, 874.0, 516.0, 947.0, 634.0, 535.0, 121.0, 1000.0, 1000.0]
2025-09-16 14:12:59,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (3921.54) for latency 6
2025-09-16 14:12:59,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 7 minutes, 10 seconds)
2025-09-16 14:14:56,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:15:07,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3873.75439 ± 998.181
2025-09-16 14:15:07,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3123.11, 3654.0027, 3485.0288, 4575.8745, 1762.3201, 5092.4175, 5040.9146, 4889.983, 3847.4631, 3266.4314]
2025-09-16 14:15:07,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [612.0, 697.0, 670.0, 884.0, 358.0, 959.0, 961.0, 946.0, 743.0, 640.0]
2025-09-16 14:15:07,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 5 minutes, 24 seconds)
2025-09-16 14:17:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:17:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4165.21289 ± 997.326
2025-09-16 14:17:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3935.8066, 2723.6897, 5232.443, 5230.766, 3112.3755, 4998.227, 5205.4277, 3564.0168, 4840.52, 2808.8599]
2025-09-16 14:17:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [773.0, 536.0, 1000.0, 1000.0, 611.0, 972.0, 1000.0, 688.0, 943.0, 549.0]
2025-09-16 14:17:16,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4165.21) for latency 6
2025-09-16 14:17:16,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 3 minutes, 25 seconds)
2025-09-16 14:19:23,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:19:35,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4436.15967 ± 1359.293
2025-09-16 14:19:35,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5253.2466, 5292.6216, 1390.2888, 5322.0425, 5286.1025, 3464.8877, 5229.8936, 5240.0513, 2603.0283, 5279.4365]
2025-09-16 14:19:35,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 266.0, 1000.0, 1000.0, 672.0, 1000.0, 1000.0, 517.0, 1000.0]
2025-09-16 14:19:35,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4436.16) for latency 6
2025-09-16 14:19:35,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 2 minutes, 42 seconds)
2025-09-16 14:21:31,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:21:43,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4128.73193 ± 1160.577
2025-09-16 14:21:43,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5090.7446, 5321.566, 3225.3323, 3781.4834, 2069.3616, 5284.0566, 2476.8687, 3915.433, 4842.9526, 5279.524]
2025-09-16 14:21:43,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [992.0, 1000.0, 630.0, 726.0, 397.0, 1000.0, 467.0, 743.0, 937.0, 1000.0]
2025-09-16 14:21:43,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 42 seconds)
2025-09-16 14:23:37,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:23:48,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3888.64404 ± 1244.425
2025-09-16 14:23:48,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2246.1885, 3612.4185, 3405.0212, 1770.2675, 3271.3691, 5261.4487, 5213.714, 5251.9404, 3552.1602, 5301.913]
2025-09-16 14:23:48,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [442.0, 697.0, 666.0, 341.0, 639.0, 1000.0, 1000.0, 1000.0, 694.0, 1000.0]
2025-09-16 14:23:48,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 58 minutes, 24 seconds)
2025-09-16 14:25:40,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:25:52,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4517.86914 ± 926.541
2025-09-16 14:25:52,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3858.362, 5264.308, 3579.0037, 5304.8423, 5281.2554, 5281.252, 2702.9648, 5037.338, 5255.8994, 3613.4656]
2025-09-16 14:25:52,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [759.0, 1000.0, 700.0, 1000.0, 1000.0, 1000.0, 512.0, 962.0, 1000.0, 680.0]
2025-09-16 14:25:52,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4517.87) for latency 6
2025-09-16 14:25:52,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 55 minutes, 55 seconds)
2025-09-16 14:27:59,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:28:10,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3947.47583 ± 1555.831
2025-09-16 14:28:10,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4361.556, 5258.058, 881.84296, 5303.9824, 5270.191, 3417.029, 5249.2363, 5221.206, 2317.1936, 2194.4626]
2025-09-16 14:28:10,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [848.0, 1000.0, 174.0, 1000.0, 1000.0, 664.0, 1000.0, 1000.0, 456.0, 421.0]
2025-09-16 14:28:10,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 54 minutes, 31 seconds)
2025-09-16 14:30:00,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:30:11,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3835.09106 ± 1290.693
2025-09-16 14:30:11,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2349.3037, 2748.842, 2928.2065, 3009.2493, 5175.0645, 1890.1497, 5189.0312, 4738.059, 5058.1626, 5264.8413]
2025-09-16 14:30:11,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [475.0, 538.0, 581.0, 584.0, 1000.0, 368.0, 1000.0, 924.0, 998.0, 1000.0]
2025-09-16 14:30:11,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 50 minutes, 49 seconds)
2025-09-16 14:32:17,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:32:31,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4903.88330 ± 670.356
2025-09-16 14:32:31,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5238.742, 5225.8154, 5282.659, 5207.3223, 3639.801, 5199.1055, 5288.4243, 5241.866, 3492.0073, 5223.0913]
2025-09-16 14:32:31,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 693.0, 1000.0, 1000.0, 1000.0, 664.0, 1000.0]
2025-09-16 14:32:31,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4903.88) for latency 6
2025-09-16 14:32:31,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 41 seconds)
2025-09-16 14:34:26,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:34:39,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4957.58740 ± 622.790
2025-09-16 14:34:39,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5313.9893, 5277.8604, 3836.8384, 5264.405, 5214.073, 5191.535, 5315.749, 5272.445, 3600.5718, 5288.4106]
2025-09-16 14:34:39,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 737.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 696.0, 1000.0]
2025-09-16 14:34:39,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4957.59) for latency 6
2025-09-16 14:34:39,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 44 seconds)
2025-09-16 14:36:38,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:36:51,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4646.36914 ± 1076.520
2025-09-16 14:36:51,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5286.53, 4715.4907, 5251.854, 5263.88, 5192.007, 5265.318, 2451.1611, 5121.457, 5327.9014, 2588.0945]
2025-09-16 14:36:51,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 906.0, 1000.0, 1000.0, 1000.0, 1000.0, 483.0, 985.0, 1000.0, 501.0]
2025-09-16 14:36:51,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 5 seconds)
2025-09-16 14:38:51,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:39:05,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4979.44580 ± 462.993
2025-09-16 14:39:05,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5172.094, 5234.356, 4496.86, 5204.689, 5169.817, 3739.8994, 5195.467, 5173.465, 5211.6763, 5196.1333]
2025-09-16 14:39:05,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 874.0, 1000.0, 1000.0, 737.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:39:05,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4979.45) for latency 6
2025-09-16 14:39:05,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 39 seconds)
2025-09-16 14:40:57,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:41:08,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3781.82031 ± 1210.510
2025-09-16 14:41:08,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5225.6787, 4952.3804, 3954.4905, 5176.9155, 2294.275, 2070.0137, 2779.1853, 3130.5918, 5218.4043, 3016.2646]
2025-09-16 14:41:08,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 968.0, 770.0, 1000.0, 471.0, 417.0, 546.0, 623.0, 1000.0, 604.0]
2025-09-16 14:41:08,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 39 seconds)
2025-09-16 14:43:07,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:43:21,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4765.82275 ± 1025.954
2025-09-16 14:43:21,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5186.685, 5154.457, 5192.195, 5186.068, 5221.079, 5177.353, 5087.4937, 5176.228, 4533.56, 1743.1106]
2025-09-16 14:43:21,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 898.0, 357.0]
2025-09-16 14:43:21,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 59 seconds)
2025-09-16 14:45:24,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:45:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4850.67676 ± 921.868
2025-09-16 14:45:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5105.831, 5201.0327, 5213.31, 5145.659, 5205.1006, 5153.955, 2088.0205, 5097.9644, 5194.2104, 5101.683]
2025-09-16 14:45:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 405.0, 1000.0, 1000.0, 995.0]
2025-09-16 14:45:38,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 19 seconds)
2025-09-16 14:47:43,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:47:58,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5145.16504 ± 337.349
2025-09-16 14:47:58,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5269.6494, 5293.269, 5224.5, 5257.635, 5247.7954, 5265.939, 5267.7114, 4135.8047, 5206.458, 5282.8867]
2025-09-16 14:47:58,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 810.0, 1000.0, 1000.0]
2025-09-16 14:47:58,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5145.17) for latency 6
2025-09-16 14:47:58,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 33 seconds)
2025-09-16 14:49:53,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:50:08,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5275.93066 ± 28.354
2025-09-16 14:50:08,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5309.6074, 5292.967, 5277.3076, 5235.483, 5318.0425, 5263.6504, 5241.367, 5301.0684, 5278.7847, 5241.029]
2025-09-16 14:50:08,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:50:08,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5275.93) for latency 6
2025-09-16 14:50:08,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 8 seconds)
2025-09-16 14:52:04,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:52:18,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5075.42773 ± 425.829
2025-09-16 14:52:18,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5113.946, 5217.064, 3807.5615, 5268.93, 5135.988, 5215.448, 5261.39, 5199.9033, 5272.637, 5261.4097]
2025-09-16 14:52:18,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 739.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:52:18,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 15 seconds)
2025-09-16 14:54:18,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:54:32,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5084.48291 ± 647.320
2025-09-16 14:54:32,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5286.616, 3143.2593, 5284.206, 5312.1436, 5337.1187, 5312.4043, 5308.1567, 5277.5957, 5305.202, 5278.1255]
2025-09-16 14:54:32,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 609.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:54:32,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 5 seconds)
2025-09-16 14:56:26,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:56:40,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4803.73096 ± 752.459
2025-09-16 14:56:40,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5132.9756, 5106.889, 5136.4697, 5114.7915, 5129.054, 4437.925, 5111.883, 5126.324, 5110.183, 2630.813]
2025-09-16 14:56:40,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 874.0, 1000.0, 1000.0, 1000.0, 513.0]
2025-09-16 14:56:40,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 28 seconds)
2025-09-16 14:58:44,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:58:58,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4759.56543 ± 1228.311
2025-09-16 14:58:58,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5307.2627, 5303.116, 5252.5674, 5350.9585, 5314.387, 5259.9536, 5279.708, 1277.826, 5300.12, 3949.7485]
2025-09-16 14:58:58,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 244.0, 1000.0, 763.0]
2025-09-16 14:58:58,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 11 seconds)
2025-09-16 15:00:56,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:01:11,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5218.89990 ± 28.272
2025-09-16 15:01:11,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5277.2, 5197.668, 5198.18, 5216.1074, 5240.503, 5211.5454, 5166.817, 5215.745, 5239.188, 5226.0474]
2025-09-16 15:01:11,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:01:11,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 7 seconds)
2025-09-16 15:03:00,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:03:13,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4747.61768 ± 1319.559
2025-09-16 15:03:13,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5207.597, 5096.2427, 5188.217, 5216.201, 790.3576, 5217.4434, 5217.0293, 5155.142, 5196.596, 5191.347]
2025-09-16 15:03:13,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 152.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:03:13,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 39 seconds)
2025-09-16 15:05:10,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:05:23,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4501.41748 ± 1437.145
2025-09-16 15:05:23,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5147.7036, 942.3207, 5213.0215, 5173.119, 5198.993, 5208.4907, 5219.3516, 5196.832, 2479.54, 5234.803]
2025-09-16 15:05:23,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 180.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 497.0, 1000.0]
2025-09-16 15:05:23,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 22 seconds)
2025-09-16 15:07:19,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:07:34,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5236.25342 ± 71.895
2025-09-16 15:07:34,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5253.415, 5258.1743, 5272.015, 5253.472, 5244.592, 5208.9185, 5032.1245, 5255.752, 5280.872, 5303.199]
2025-09-16 15:07:34,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 979.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:07:34,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 15 seconds)
2025-09-16 15:09:32,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:09:46,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5034.05518 ± 668.676
2025-09-16 15:09:46,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3029.1946, 5286.584, 5278.2456, 5223.0537, 5278.4214, 5240.3467, 5258.021, 5281.9873, 5223.834, 5240.862]
2025-09-16 15:09:46,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [613.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:09:46,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 57 seconds)
2025-09-16 15:11:51,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:12:06,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5299.12012 ± 29.144
2025-09-16 15:12:06,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5312.1113, 5355.002, 5331.42, 5284.323, 5254.574, 5304.2183, 5280.9004, 5317.632, 5267.133, 5283.887]
2025-09-16 15:12:06,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:12:06,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5299.12) for latency 6
2025-09-16 15:12:06,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 55 seconds)
2025-09-16 15:14:03,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:14:17,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5255.31738 ± 109.968
2025-09-16 15:14:17,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5245.438, 5333.754, 5286.561, 5314.634, 5272.732, 4936.2896, 5288.5396, 5308.32, 5247.3486, 5319.558]
2025-09-16 15:14:17,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 934.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:14:17,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 51 seconds)
2025-09-16 15:16:15,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:16:29,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4853.56299 ± 1378.393
2025-09-16 15:16:29,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5328.929, 5299.39, 5291.2827, 5280.625, 5369.4053, 5347.6353, 5306.2344, 5279.275, 5313.6494, 719.203]
2025-09-16 15:16:29,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 134.0]
2025-09-16 15:16:29,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 39 seconds)
2025-09-16 15:18:28,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:18:43,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5247.29297 ± 26.616
2025-09-16 15:18:43,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5235.25, 5194.8696, 5237.317, 5245.011, 5282.336, 5261.671, 5217.5977, 5244.949, 5275.673, 5278.259]
2025-09-16 15:18:43,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:18:43,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 27 seconds)
2025-09-16 15:20:42,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:20:56,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5025.43262 ± 1086.680
2025-09-16 15:20:56,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5387.9243, 5345.817, 5392.878, 5394.7466, 5402.5264, 5408.534, 1766.2666, 5365.6187, 5435.328, 5354.6885]
2025-09-16 15:20:56,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 361.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:20:56,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 13 seconds)
2025-09-16 15:22:49,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:23:04,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5216.11768 ± 18.937
2025-09-16 15:23:04,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5215.92, 5240.441, 5240.1533, 5196.404, 5214.61, 5206.802, 5181.9844, 5242.7803, 5213.4478, 5208.632]
2025-09-16 15:23:04,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:23:04,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1251 [DEBUG]: Training session finished
