2025-09-16 12:08:09,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.150-delay_9
2025-09-16 12:08:09,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.150-delay_9
2025-09-16 12:08:09,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x14cf47c08650>}
2025-09-16 12:08:09,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:08:09,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:08:09,678 baseline-bpql-noisepromille150-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=529, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:08:09,678 baseline-bpql-noisepromille150-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:08:12,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:08:12,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:09:58,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:09:59,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 400.53015 ± 79.637
2025-09-16 12:09:59,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [415.83206, 465.19278, 559.2775, 393.7597, 391.53387, 285.9709, 437.55862, 402.98297, 262.39133, 390.80182]
2025-09-16 12:09:59,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 89.0, 106.0, 73.0, 72.0, 53.0, 82.0, 75.0, 48.0, 87.0]
2025-09-16 12:09:59,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (400.53) for latency 9
2025-09-16 12:09:59,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 55 minutes, 56 seconds)
2025-09-16 12:11:53,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:11:54,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 325.00708 ± 56.519
2025-09-16 12:11:54,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [364.16205, 400.36526, 282.27832, 379.25015, 290.80734, 335.7402, 299.30692, 268.07205, 229.66705, 400.42145]
2025-09-16 12:11:54,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 73.0, 56.0, 72.0, 60.0, 64.0, 58.0, 50.0, 47.0, 75.0]
2025-09-16 12:11:54,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 1 minute, 26 seconds)
2025-09-16 12:13:50,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:13:51,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 409.75058 ± 103.338
2025-09-16 12:13:51,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [274.82632, 530.0981, 350.43222, 398.21936, 636.777, 380.96323, 401.57812, 446.3995, 281.28036, 396.9317]
2025-09-16 12:13:51,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 101.0, 64.0, 75.0, 133.0, 71.0, 79.0, 96.0, 52.0, 83.0]
2025-09-16 12:13:51,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (409.75) for latency 9
2025-09-16 12:13:51,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 2 minutes, 44 seconds)
2025-09-16 12:15:48,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:15:48,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 373.00726 ± 59.430
2025-09-16 12:15:48,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [438.3433, 377.35336, 347.74063, 276.16916, 411.25626, 410.23672, 285.0273, 471.05478, 367.93106, 344.96008]
2025-09-16 12:15:48,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 69.0, 65.0, 49.0, 76.0, 87.0, 61.0, 85.0, 72.0, 64.0]
2025-09-16 12:15:48,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 2 minutes, 32 seconds)
2025-09-16 12:17:45,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:17:46,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 376.35614 ± 51.975
2025-09-16 12:17:46,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [356.151, 494.18033, 373.8167, 376.48868, 379.1636, 284.65567, 419.99844, 387.03702, 331.1695, 360.9004]
2025-09-16 12:17:46,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 91.0, 69.0, 69.0, 81.0, 62.0, 77.0, 80.0, 67.0, 72.0]
2025-09-16 12:17:46,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 1 minute, 42 seconds)
2025-09-16 12:19:42,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:19:43,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 399.22107 ± 90.978
2025-09-16 12:19:43,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [349.75552, 584.771, 335.2624, 493.8643, 308.55402, 471.3699, 321.4197, 444.8262, 388.9054, 293.48212]
2025-09-16 12:19:43,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 112.0, 61.0, 93.0, 57.0, 94.0, 60.0, 85.0, 84.0, 60.0]
2025-09-16 12:19:43,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 3 seconds)
2025-09-16 12:21:39,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:21:40,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 406.94885 ± 93.371
2025-09-16 12:21:40,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [565.4493, 379.5684, 389.7213, 325.1841, 341.14697, 599.7123, 434.01993, 328.51315, 361.81815, 344.35495]
2025-09-16 12:21:40,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 72.0, 73.0, 68.0, 72.0, 114.0, 89.0, 61.0, 66.0, 62.0]
2025-09-16 12:21:40,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 35 seconds)
2025-09-16 12:23:37,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:23:38,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 453.68643 ± 204.512
2025-09-16 12:23:38,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [318.0744, 409.47726, 341.8042, 578.22974, 354.42215, 416.4996, 358.4381, 1019.2425, 459.2781, 281.39816]
2025-09-16 12:23:38,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 80.0, 75.0, 107.0, 66.0, 77.0, 66.0, 198.0, 92.0, 51.0]
2025-09-16 12:23:38,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (453.69) for latency 9
2025-09-16 12:23:38,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 59 minutes, 51 seconds)
2025-09-16 12:25:35,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:25:36,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 432.33072 ± 69.370
2025-09-16 12:25:36,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [529.7731, 407.6465, 450.6242, 394.9421, 486.44867, 448.16782, 270.4601, 447.86514, 387.1025, 500.2769]
2025-09-16 12:25:36,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 75.0, 82.0, 71.0, 92.0, 83.0, 52.0, 85.0, 80.0, 104.0]
2025-09-16 12:25:36,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 58 minutes, 14 seconds)
2025-09-16 12:27:32,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:27:34,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 464.23941 ± 104.190
2025-09-16 12:27:34,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [381.28262, 588.3229, 587.53796, 337.2943, 441.9716, 471.70923, 426.9477, 283.8929, 534.43835, 588.99677]
2025-09-16 12:27:34,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 117.0, 106.0, 68.0, 80.0, 87.0, 90.0, 60.0, 100.0, 113.0]
2025-09-16 12:27:34,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (464.24) for latency 9
2025-09-16 12:27:34,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 19 seconds)
2025-09-16 12:29:30,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:29:32,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 422.48682 ± 55.772
2025-09-16 12:29:32,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [433.0036, 371.31723, 344.51755, 531.11487, 504.6007, 424.02936, 392.35046, 396.31555, 382.11835, 445.50015]
2025-09-16 12:29:32,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 78.0, 64.0, 98.0, 95.0, 78.0, 82.0, 73.0, 69.0, 92.0]
2025-09-16 12:29:32,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 54 minutes, 36 seconds)
2025-09-16 12:31:28,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:31:29,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 482.38004 ± 67.092
2025-09-16 12:31:29,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [445.04883, 505.94293, 604.7565, 509.28207, 374.77313, 445.9213, 384.93698, 523.15735, 490.78964, 539.1914]
2025-09-16 12:31:29,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 99.0, 115.0, 100.0, 83.0, 88.0, 75.0, 97.0, 91.0, 102.0]
2025-09-16 12:31:29,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (482.38) for latency 9
2025-09-16 12:31:29,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 52 minutes, 51 seconds)
2025-09-16 12:33:26,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:33:27,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 445.03717 ± 101.669
2025-09-16 12:33:27,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [343.3161, 573.67975, 361.01944, 415.7066, 425.94992, 386.6402, 357.1635, 420.5067, 678.7913, 487.59833]
2025-09-16 12:33:27,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 103.0, 76.0, 75.0, 79.0, 73.0, 68.0, 81.0, 147.0, 87.0]
2025-09-16 12:33:27,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 50 minutes, 54 seconds)
2025-09-16 12:35:24,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:35:25,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 428.78241 ± 75.341
2025-09-16 12:35:25,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [369.05875, 398.7425, 430.92197, 330.73547, 535.5391, 391.33716, 500.9971, 396.51205, 568.7274, 365.2527]
2025-09-16 12:35:25,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 71.0, 81.0, 60.0, 104.0, 74.0, 94.0, 74.0, 109.0, 67.0]
2025-09-16 12:35:25,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 48 minutes, 43 seconds)
2025-09-16 12:37:23,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:37:25,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 596.26080 ± 183.564
2025-09-16 12:37:25,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [371.28845, 409.8791, 642.08057, 572.29095, 825.332, 516.89435, 952.4865, 668.56134, 643.7162, 360.07803]
2025-09-16 12:37:25,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 72.0, 135.0, 106.0, 158.0, 99.0, 187.0, 128.0, 121.0, 79.0]
2025-09-16 12:37:25,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (596.26) for latency 9
2025-09-16 12:37:25,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 47 minutes, 30 seconds)
2025-09-16 12:39:23,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:39:24,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 494.14349 ± 160.092
2025-09-16 12:39:24,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [229.4393, 308.49805, 812.0256, 526.8214, 440.5301, 546.0652, 411.32794, 685.84717, 475.6333, 505.24707]
2025-09-16 12:39:24,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 56.0, 152.0, 97.0, 83.0, 113.0, 81.0, 128.0, 99.0, 93.0]
2025-09-16 12:39:24,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 45 minutes, 55 seconds)
2025-09-16 12:41:22,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:41:24,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 472.48770 ± 72.817
2025-09-16 12:41:24,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [540.98615, 360.0289, 488.47885, 476.2618, 393.34875, 562.45435, 593.96246, 399.86905, 440.64926, 468.83755]
2025-09-16 12:41:24,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 80.0, 91.0, 88.0, 74.0, 101.0, 112.0, 74.0, 92.0, 88.0]
2025-09-16 12:41:24,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 25 seconds)
2025-09-16 12:43:21,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:43:23,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 535.81726 ± 139.939
2025-09-16 12:43:23,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [455.40588, 845.28357, 481.99014, 433.90823, 641.5028, 629.10724, 523.013, 292.18542, 507.99677, 547.77954]
2025-09-16 12:43:23,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 156.0, 87.0, 80.0, 120.0, 116.0, 108.0, 56.0, 94.0, 119.0]
2025-09-16 12:43:23,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 42 minutes, 49 seconds)
2025-09-16 12:45:22,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:45:23,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 568.31152 ± 95.608
2025-09-16 12:45:23,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [730.47217, 594.4928, 533.1879, 456.22144, 640.7358, 623.46906, 493.82532, 387.27527, 622.9183, 600.517]
2025-09-16 12:45:23,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 126.0, 100.0, 83.0, 133.0, 119.0, 95.0, 69.0, 115.0, 109.0]
2025-09-16 12:45:23,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 34 seconds)
2025-09-16 12:47:21,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:47:22,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 652.21539 ± 150.105
2025-09-16 12:47:22,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [676.3205, 874.1248, 640.6869, 318.8538, 615.0632, 639.64526, 538.102, 661.05347, 872.327, 685.9769]
2025-09-16 12:47:22,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 162.0, 115.0, 56.0, 127.0, 117.0, 106.0, 123.0, 162.0, 128.0]
2025-09-16 12:47:22,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (652.22) for latency 9
2025-09-16 12:47:22,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 19 seconds)
2025-09-16 12:49:21,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:49:22,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 537.68024 ± 214.513
2025-09-16 12:49:22,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [486.4197, 360.95145, 613.13824, 525.26086, 928.6888, 412.28165, 682.08026, 117.958046, 765.6598, 484.36322]
2025-09-16 12:49:22,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 67.0, 116.0, 112.0, 187.0, 79.0, 123.0, 23.0, 142.0, 88.0]
2025-09-16 12:49:22,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 24 seconds)
2025-09-16 12:51:21,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:51:23,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 626.73230 ± 205.502
2025-09-16 12:51:23,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1021.634, 648.22253, 668.7238, 744.1822, 682.238, 308.76956, 594.26917, 814.51355, 394.65988, 390.111]
2025-09-16 12:51:23,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 119.0, 126.0, 136.0, 123.0, 57.0, 128.0, 152.0, 71.0, 85.0]
2025-09-16 12:51:23,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 46 seconds)
2025-09-16 12:53:22,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:53:23,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 653.32025 ± 200.270
2025-09-16 12:53:23,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [670.6112, 659.79083, 864.8338, 556.9723, 1137.4872, 427.10422, 663.78217, 541.1884, 555.05725, 456.3748]
2025-09-16 12:53:23,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 124.0, 163.0, 115.0, 205.0, 93.0, 127.0, 113.0, 102.0, 83.0]
2025-09-16 12:53:23,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (653.32) for latency 9
2025-09-16 12:53:23,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 11 seconds)
2025-09-16 12:55:20,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:55:21,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 657.75128 ± 238.155
2025-09-16 12:55:21,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [337.63132, 572.41077, 763.7516, 1205.3094, 693.4984, 770.0695, 561.523, 427.49832, 438.8414, 806.9789]
2025-09-16 12:55:21,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 107.0, 144.0, 227.0, 126.0, 144.0, 100.0, 77.0, 93.0, 151.0]
2025-09-16 12:55:21,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (657.75) for latency 9
2025-09-16 12:55:21,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 31 minutes, 32 seconds)
2025-09-16 12:57:22,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:57:23,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 608.08624 ± 191.982
2025-09-16 12:57:23,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [362.12415, 489.96216, 885.0128, 883.92615, 804.26434, 669.44916, 437.51205, 359.81638, 657.57916, 531.2158]
2025-09-16 12:57:23,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 107.0, 167.0, 164.0, 153.0, 136.0, 78.0, 64.0, 120.0, 95.0]
2025-09-16 12:57:23,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 12 seconds)
2025-09-16 12:59:20,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:59:22,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 737.48969 ± 335.102
2025-09-16 12:59:22,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [426.3454, 627.9599, 423.91217, 1549.6714, 552.83276, 1044.3878, 558.53204, 786.2026, 915.8689, 489.18427]
2025-09-16 12:59:22,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 117.0, 76.0, 311.0, 115.0, 199.0, 118.0, 153.0, 176.0, 91.0]
2025-09-16 12:59:22,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (737.49) for latency 9
2025-09-16 12:59:22,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 1 second)
2025-09-16 13:01:21,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:01:23,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 768.85522 ± 225.323
2025-09-16 13:01:23,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [668.4102, 591.23706, 824.54736, 916.28204, 926.02234, 537.05695, 738.7806, 1302.0924, 678.78357, 505.33966]
2025-09-16 13:01:23,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 111.0, 156.0, 169.0, 174.0, 105.0, 144.0, 241.0, 126.0, 93.0]
2025-09-16 13:01:23,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (768.86) for latency 9
2025-09-16 13:01:23,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 25 minutes, 58 seconds)
2025-09-16 13:03:21,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:03:23,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 801.07825 ± 171.843
2025-09-16 13:03:23,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [735.00604, 852.33167, 688.5472, 797.1249, 702.05164, 1091.7976, 542.2264, 1022.03546, 970.4221, 609.2392]
2025-09-16 13:03:23,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 161.0, 128.0, 150.0, 127.0, 213.0, 101.0, 203.0, 190.0, 124.0]
2025-09-16 13:03:23,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (801.08) for latency 9
2025-09-16 13:03:23,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 minutes, 58 seconds)
2025-09-16 13:05:22,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:05:24,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 797.08044 ± 266.590
2025-09-16 13:05:24,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [705.01416, 573.61365, 616.56067, 1506.4308, 810.23975, 1029.1702, 613.4779, 699.68134, 750.2902, 666.32605]
2025-09-16 13:05:24,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 104.0, 113.0, 282.0, 149.0, 205.0, 119.0, 140.0, 158.0, 121.0]
2025-09-16 13:05:24,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 37 seconds)
2025-09-16 13:07:22,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:07:24,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 803.53552 ± 230.121
2025-09-16 13:07:24,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1088.2795, 672.2808, 643.2141, 455.50345, 772.51074, 433.4476, 988.5398, 994.62524, 951.0724, 1035.8812]
2025-09-16 13:07:24,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 121.0, 134.0, 91.0, 142.0, 76.0, 177.0, 198.0, 175.0, 214.0]
2025-09-16 13:07:24,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (803.54) for latency 9
2025-09-16 13:07:24,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 18 seconds)
2025-09-16 13:09:26,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:09:28,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 797.61536 ± 252.437
2025-09-16 13:09:28,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1056.4346, 836.3085, 614.195, 734.55817, 1040.4396, 356.45264, 778.1988, 1208.2417, 477.14502, 874.17944]
2025-09-16 13:09:28,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 167.0, 113.0, 154.0, 186.0, 65.0, 137.0, 218.0, 89.0, 176.0]
2025-09-16 13:09:28,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 19 minutes, 26 seconds)
2025-09-16 13:11:24,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:11:26,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 870.32019 ± 215.651
2025-09-16 13:11:26,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [600.2309, 1144.461, 750.2584, 720.63696, 1139.5332, 533.0173, 1099.9884, 752.1638, 1016.62604, 946.2858]
2025-09-16 13:11:26,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 218.0, 156.0, 140.0, 232.0, 109.0, 202.0, 140.0, 206.0, 178.0]
2025-09-16 13:11:26,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (870.32) for latency 9
2025-09-16 13:11:26,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 47 seconds)
2025-09-16 13:13:26,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:13:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1011.36896 ± 337.928
2025-09-16 13:13:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1175.9941, 863.83514, 639.73175, 1560.4666, 602.26416, 1193.7577, 1135.2825, 1366.662, 1112.9042, 462.7914]
2025-09-16 13:13:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [231.0, 161.0, 135.0, 296.0, 110.0, 219.0, 224.0, 271.0, 222.0, 100.0]
2025-09-16 13:13:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1011.37) for latency 9
2025-09-16 13:13:28,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 15 minutes, 5 seconds)
2025-09-16 13:15:27,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:15:29,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 928.79913 ± 405.660
2025-09-16 13:15:29,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [523.1872, 782.4643, 1798.2239, 723.44293, 793.5968, 1476.989, 785.48706, 907.64215, 1106.8021, 390.15662]
2025-09-16 13:15:29,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 153.0, 351.0, 149.0, 152.0, 270.0, 150.0, 183.0, 206.0, 69.0]
2025-09-16 13:15:29,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 11 seconds)
2025-09-16 13:17:27,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:17:29,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 812.42572 ± 302.841
2025-09-16 13:17:29,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1370.9734, 735.77594, 551.35535, 679.04565, 745.8888, 552.45483, 614.94336, 831.02167, 1418.0297, 624.76874]
2025-09-16 13:17:29,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [264.0, 142.0, 106.0, 129.0, 140.0, 123.0, 116.0, 153.0, 277.0, 126.0]
2025-09-16 13:17:29,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 58 seconds)
2025-09-16 13:19:28,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:19:30,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1159.18677 ± 478.022
2025-09-16 13:19:30,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [829.2365, 1313.8579, 2155.106, 1129.7974, 1671.8063, 756.0769, 929.3164, 1290.076, 344.25412, 1172.3408]
2025-09-16 13:19:30,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 248.0, 406.0, 219.0, 315.0, 140.0, 173.0, 238.0, 66.0, 216.0]
2025-09-16 13:19:30,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1159.19) for latency 9
2025-09-16 13:19:30,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 27 seconds)
2025-09-16 13:21:31,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:21:34,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1319.34155 ± 360.275
2025-09-16 13:21:34,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2047.5914, 1527.9453, 1070.0023, 1200.0234, 1552.4288, 912.8218, 1437.4883, 1120.996, 758.72906, 1565.3883]
2025-09-16 13:21:34,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [409.0, 287.0, 213.0, 225.0, 297.0, 169.0, 290.0, 215.0, 146.0, 307.0]
2025-09-16 13:21:34,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1319.34) for latency 9
2025-09-16 13:21:34,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 44 seconds)
2025-09-16 13:23:34,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:23:37,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1294.88293 ± 737.366
2025-09-16 13:23:37,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [669.93207, 1690.3456, 840.1026, 664.46906, 3256.3022, 998.6096, 1007.3572, 1057.0791, 1695.7146, 1068.9171]
2025-09-16 13:23:37,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 331.0, 173.0, 136.0, 649.0, 199.0, 208.0, 206.0, 328.0, 223.0]
2025-09-16 13:23:37,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 51 seconds)
2025-09-16 13:25:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:25:41,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1564.53931 ± 752.348
2025-09-16 13:25:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1619.0586, 1131.2689, 1212.5321, 2896.9233, 541.4262, 2614.4907, 937.41943, 1786.2626, 2143.6375, 762.37256]
2025-09-16 13:25:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [312.0, 209.0, 223.0, 546.0, 99.0, 499.0, 175.0, 333.0, 415.0, 141.0]
2025-09-16 13:25:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1564.54) for latency 9
2025-09-16 13:25:41,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 4 minutes, 18 seconds)
2025-09-16 13:27:40,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:27:44,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1216.82788 ± 472.114
2025-09-16 13:27:44,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1266.1385, 1124.3561, 1116.7213, 897.01495, 775.61755, 1241.6858, 793.2036, 2457.0896, 1556.9873, 939.46436]
2025-09-16 13:27:44,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [261.0, 233.0, 230.0, 189.0, 140.0, 259.0, 144.0, 495.0, 297.0, 190.0]
2025-09-16 13:27:44,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 2 minutes, 56 seconds)
2025-09-16 13:29:43,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:29:46,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1334.68640 ± 638.652
2025-09-16 13:29:46,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1061.2582, 885.3999, 571.04126, 557.9009, 2212.0132, 928.90796, 1421.4976, 1253.4413, 2037.7692, 2417.634]
2025-09-16 13:29:46,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 171.0, 123.0, 105.0, 418.0, 191.0, 293.0, 241.0, 379.0, 452.0]
2025-09-16 13:29:46,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 1 minute, 7 seconds)
2025-09-16 13:31:46,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:31:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1474.48657 ± 599.921
2025-09-16 13:31:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1300.4918, 1748.0454, 2217.0012, 829.7407, 790.2386, 2737.4294, 1041.1107, 974.22375, 1634.5262, 1472.0588]
2025-09-16 13:31:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [241.0, 333.0, 407.0, 167.0, 145.0, 516.0, 201.0, 193.0, 307.0, 272.0]
2025-09-16 13:31:50,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 58 minutes, 58 seconds)
2025-09-16 13:33:51,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:33:54,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1299.52380 ± 654.939
2025-09-16 13:33:54,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [675.16656, 1465.7517, 1191.8485, 737.6947, 969.4135, 1294.8713, 2667.9463, 1740.13, 1935.2101, 317.20602]
2025-09-16 13:33:54,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 266.0, 229.0, 136.0, 183.0, 247.0, 515.0, 325.0, 383.0, 58.0]
2025-09-16 13:33:54,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 57 minutes, 10 seconds)
2025-09-16 13:35:53,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:35:57,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1709.23657 ± 870.959
2025-09-16 13:35:57,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1766.8303, 734.1515, 827.0737, 4035.0437, 1734.9769, 1797.1411, 1975.8329, 1334.5391, 1646.8391, 1239.9368]
2025-09-16 13:35:57,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [332.0, 156.0, 157.0, 764.0, 341.0, 335.0, 376.0, 258.0, 320.0, 226.0]
2025-09-16 13:35:57,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1709.24) for latency 9
2025-09-16 13:35:57,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 55 minutes, 7 seconds)
2025-09-16 13:37:58,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:38:04,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2392.99487 ± 1366.965
2025-09-16 13:38:04,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [685.4955, 1729.4088, 1580.2499, 696.2162, 2810.8274, 2018.4237, 2473.8105, 5349.665, 2548.8274, 4037.0242]
2025-09-16 13:38:04,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 346.0, 294.0, 130.0, 515.0, 400.0, 453.0, 1000.0, 484.0, 753.0]
2025-09-16 13:38:04,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (2392.99) for latency 9
2025-09-16 13:38:04,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 53 minutes, 42 seconds)
2025-09-16 13:40:01,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:40:06,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1747.12915 ± 958.248
2025-09-16 13:40:06,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1194.686, 1992.1226, 1766.9266, 632.9387, 3704.3782, 410.98834, 844.1404, 2493.5972, 2487.2817, 1944.2302]
2025-09-16 13:40:06,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 382.0, 347.0, 118.0, 708.0, 85.0, 156.0, 464.0, 454.0, 377.0]
2025-09-16 13:40:06,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 51 minutes, 31 seconds)
2025-09-16 13:42:07,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:42:12,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1932.59155 ± 1425.469
2025-09-16 13:42:12,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2126.852, 618.1351, 600.8874, 1271.968, 2551.9453, 321.39203, 4555.2896, 3900.7283, 2806.1042, 572.6142]
2025-09-16 13:42:12,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [393.0, 121.0, 129.0, 264.0, 477.0, 59.0, 863.0, 773.0, 535.0, 106.0]
2025-09-16 13:42:12,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 49 minutes, 57 seconds)
2025-09-16 13:44:12,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:44:15,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1303.22107 ± 569.759
2025-09-16 13:44:15,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2197.3123, 850.0375, 852.34766, 2242.9592, 1174.0835, 935.7999, 1843.6099, 1460.0546, 583.32196, 892.68506]
2025-09-16 13:44:15,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [424.0, 158.0, 155.0, 429.0, 221.0, 179.0, 336.0, 268.0, 107.0, 170.0]
2025-09-16 13:44:15,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 47 minutes, 41 seconds)
2025-09-16 13:46:17,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:46:20,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1078.53223 ± 334.752
2025-09-16 13:46:20,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1609.7322, 1123.6467, 813.2631, 790.18463, 621.5546, 1348.5565, 909.2485, 973.30634, 933.8155, 1662.0139]
2025-09-16 13:46:20,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [301.0, 220.0, 166.0, 146.0, 114.0, 263.0, 169.0, 179.0, 178.0, 338.0]
2025-09-16 13:46:20,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 45 minutes, 48 seconds)
2025-09-16 13:48:16,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:48:19,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1400.88599 ± 887.391
2025-09-16 13:48:19,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1913.1433, 3729.5225, 1006.84766, 893.20996, 1445.1469, 502.39786, 448.80963, 1178.6283, 1507.1757, 1383.9777]
2025-09-16 13:48:19,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [355.0, 711.0, 191.0, 168.0, 271.0, 93.0, 81.0, 242.0, 280.0, 254.0]
2025-09-16 13:48:19,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 42 minutes, 35 seconds)
2025-09-16 13:50:24,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:50:27,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1275.97437 ± 663.296
2025-09-16 13:50:27,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [427.14203, 1494.3679, 645.3645, 1221.0175, 858.3853, 1185.9253, 2488.214, 820.1086, 1176.6189, 2442.5994]
2025-09-16 13:50:27,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 273.0, 121.0, 222.0, 177.0, 221.0, 462.0, 163.0, 216.0, 445.0]
2025-09-16 13:50:27,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 41 minutes, 27 seconds)
2025-09-16 13:52:34,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:52:39,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1851.08459 ± 818.767
2025-09-16 13:52:39,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1079.565, 1215.1611, 1379.126, 1467.1195, 3292.2449, 3407.157, 2013.9451, 1334.259, 2145.8433, 1176.4257]
2025-09-16 13:52:39,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 227.0, 290.0, 298.0, 642.0, 640.0, 385.0, 255.0, 419.0, 247.0]
2025-09-16 13:52:39,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 40 minutes, 17 seconds)
2025-09-16 13:54:31,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:54:36,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1782.85645 ± 1452.377
2025-09-16 13:54:36,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1557.8887, 5494.678, 3277.1152, 1628.0752, 841.3168, 1909.8231, 619.6784, 846.4945, 982.2174, 671.2779]
2025-09-16 13:54:36,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [286.0, 1000.0, 597.0, 293.0, 156.0, 356.0, 113.0, 174.0, 176.0, 131.0]
2025-09-16 13:54:36,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 37 minutes, 11 seconds)
2025-09-16 13:56:41,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:56:46,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2297.08789 ± 1205.208
2025-09-16 13:56:46,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2380.0334, 1064.7253, 1474.2717, 814.0031, 3426.5574, 4748.036, 3325.9927, 2361.425, 968.4455, 2407.3892]
2025-09-16 13:56:46,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [427.0, 203.0, 276.0, 163.0, 627.0, 877.0, 628.0, 430.0, 188.0, 446.0]
2025-09-16 13:56:46,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 36 minutes, 5 seconds)
2025-09-16 13:58:41,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:58:49,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2871.47437 ± 2088.268
2025-09-16 13:58:49,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5228.8276, 3592.3923, 1020.7967, 5053.4688, 5326.5366, 5288.369, 1233.2971, 429.8124, 723.4217, 817.823]
2025-09-16 13:58:49,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 677.0, 185.0, 977.0, 1000.0, 1000.0, 241.0, 94.0, 139.0, 175.0]
2025-09-16 13:58:49,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (2871.47) for latency 9
2025-09-16 13:58:49,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 34 minutes, 29 seconds)
2025-09-16 14:00:49,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:00:54,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1823.44360 ± 839.229
2025-09-16 14:00:54,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2626.4333, 761.0981, 813.33466, 2028.3038, 1279.4884, 1421.2123, 1487.5334, 3076.6958, 1513.3052, 3227.0317]
2025-09-16 14:00:54,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [517.0, 148.0, 145.0, 389.0, 247.0, 273.0, 286.0, 593.0, 281.0, 611.0]
2025-09-16 14:00:54,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 31 minutes, 55 seconds)
2025-09-16 14:02:58,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:03:05,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2699.50146 ± 1912.983
2025-09-16 14:03:05,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2602.2825, 340.95013, 773.4129, 5413.4355, 2404.4607, 5323.0317, 870.4144, 5357.085, 1100.3811, 2809.5603]
2025-09-16 14:03:05,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [491.0, 62.0, 152.0, 1000.0, 449.0, 1000.0, 163.0, 1000.0, 211.0, 557.0]
2025-09-16 14:03:05,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 44 seconds)
2025-09-16 14:05:04,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:05:09,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1922.05725 ± 1236.951
2025-09-16 14:05:09,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1773.8113, 755.47296, 4744.6523, 1822.7074, 861.5567, 3507.8274, 1923.4738, 1134.5751, 633.1646, 2063.3298]
2025-09-16 14:05:09,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [333.0, 146.0, 863.0, 342.0, 161.0, 631.0, 349.0, 231.0, 138.0, 382.0]
2025-09-16 14:05:09,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 28 minutes, 36 seconds)
2025-09-16 14:07:06,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:07:16,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3674.80542 ± 1985.881
2025-09-16 14:07:16,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [821.0507, 1088.5737, 5432.1724, 1758.1451, 4948.08, 5309.413, 5301.7354, 5433.5415, 1382.3477, 5272.996]
2025-09-16 14:07:16,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 225.0, 1000.0, 344.0, 933.0, 1000.0, 1000.0, 1000.0, 281.0, 1000.0]
2025-09-16 14:07:16,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (3674.81) for latency 9
2025-09-16 14:07:16,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 26 minutes, 4 seconds)
2025-09-16 14:09:17,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:09:28,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4229.76465 ± 1584.282
2025-09-16 14:09:28,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2260.8276, 2792.1042, 5463.8574, 5455.19, 5416.675, 5556.8945, 3130.456, 1283.4459, 5463.0674, 5475.1304]
2025-09-16 14:09:28,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [412.0, 513.0, 1000.0, 1000.0, 1000.0, 1000.0, 571.0, 245.0, 1000.0, 1000.0]
2025-09-16 14:09:28,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4229.76) for latency 9
2025-09-16 14:09:28,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 25 minutes, 8 seconds)
2025-09-16 14:11:36,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:11:41,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1781.47241 ± 1489.124
2025-09-16 14:11:41,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [773.93207, 4407.213, 706.01105, 978.8137, 480.19254, 1593.1362, 447.44638, 4542.0796, 2743.841, 1142.0586]
2025-09-16 14:11:41,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 851.0, 131.0, 209.0, 98.0, 321.0, 79.0, 859.0, 518.0, 215.0]
2025-09-16 14:11:41,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 24 minutes, 6 seconds)
2025-09-16 14:13:36,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:13:45,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3594.79248 ± 1621.586
2025-09-16 14:13:45,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3174.5198, 2313.346, 2973.8508, 3724.5303, 5437.124, 898.393, 5332.919, 5359.655, 5260.1655, 1473.4209]
2025-09-16 14:13:45,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [603.0, 435.0, 573.0, 693.0, 1000.0, 164.0, 1000.0, 1000.0, 1000.0, 280.0]
2025-09-16 14:13:45,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 21 minutes, 6 seconds)
2025-09-16 14:15:43,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:15:53,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3779.83398 ± 1484.715
2025-09-16 14:15:53,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3902.9011, 5459.1997, 2965.5212, 4718.2246, 2435.1985, 916.2664, 2308.2788, 5350.781, 5437.5786, 4304.389]
2025-09-16 14:15:53,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [724.0, 1000.0, 548.0, 868.0, 461.0, 170.0, 427.0, 1000.0, 1000.0, 802.0]
2025-09-16 14:15:53,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 19 minutes, 31 seconds)
2025-09-16 14:18:00,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:18:11,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3928.67627 ± 1935.056
2025-09-16 14:18:11,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5372.573, 4241.411, 646.5664, 1970.8728, 582.32355, 5399.5747, 5257.772, 4998.7114, 5364.7075, 5452.249]
2025-09-16 14:18:11,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 801.0, 139.0, 390.0, 104.0, 1000.0, 1000.0, 938.0, 1000.0, 1000.0]
2025-09-16 14:18:11,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 18 minutes, 31 seconds)
2025-09-16 14:20:13,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:20:25,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4414.14746 ± 1665.943
2025-09-16 14:20:25,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5411.6523, 5501.4087, 5063.2627, 5452.646, 5206.925, 5419.4688, 4343.4185, 963.4418, 1337.0874, 5442.168]
2025-09-16 14:20:25,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 913.0, 1000.0, 1000.0, 1000.0, 800.0, 198.0, 259.0, 1000.0]
2025-09-16 14:20:25,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4414.15) for latency 9
2025-09-16 14:20:25,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 16 minutes, 40 seconds)
2025-09-16 14:22:19,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:22:27,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3141.67065 ± 1824.789
2025-09-16 14:22:27,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1305.6979, 680.61127, 5572.1846, 1673.4314, 1346.5538, 3034.7554, 2998.9434, 5646.9106, 5614.8164, 3542.8037]
2025-09-16 14:22:27,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 152.0, 1000.0, 310.0, 250.0, 539.0, 539.0, 1000.0, 1000.0, 647.0]
2025-09-16 14:22:27,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 13 minutes, 12 seconds)
2025-09-16 14:24:28,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:24:36,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3375.10742 ± 2170.213
2025-09-16 14:24:36,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5551.051, 5484.2725, 5496.575, 950.5484, 740.11237, 5595.1035, 682.75555, 1908.4971, 5415.9766, 1926.1813]
2025-09-16 14:24:36,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 190.0, 151.0, 1000.0, 132.0, 347.0, 1000.0, 373.0]
2025-09-16 14:24:37,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 11 minutes, 37 seconds)
2025-09-16 14:26:40,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:26:48,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 2892.17114 ± 1762.743
2025-09-16 14:26:48,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5107.567, 5182.325, 4060.9807, 2568.8652, 1577.678, 5165.5044, 1202.0035, 2600.911, 378.67712, 1077.1976]
2025-09-16 14:26:48,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 818.0, 507.0, 333.0, 1000.0, 245.0, 514.0, 85.0, 215.0]
2025-09-16 14:26:48,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 9 minutes, 50 seconds)
2025-09-16 14:28:46,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:28:55,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3324.89136 ± 1709.511
2025-09-16 14:28:55,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3476.9863, 344.7877, 5293.747, 843.01886, 5440.2905, 3545.6045, 3001.8862, 5357.2197, 2173.428, 3771.9453]
2025-09-16 14:28:55,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [665.0, 62.0, 1000.0, 166.0, 1000.0, 684.0, 562.0, 1000.0, 429.0, 702.0]
2025-09-16 14:28:55,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 6 minutes, 38 seconds)
2025-09-16 14:31:06,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:31:16,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3894.40747 ± 1628.527
2025-09-16 14:31:16,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2555.7766, 4611.0386, 4682.846, 2091.728, 5645.2925, 4696.392, 5609.887, 2930.0854, 5445.034, 675.99207]
2025-09-16 14:31:16,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [448.0, 829.0, 814.0, 369.0, 1000.0, 825.0, 1000.0, 523.0, 975.0, 151.0]
2025-09-16 14:31:16,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 5 minutes, 3 seconds)
2025-09-16 14:33:06,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:33:19,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4884.51709 ± 1427.217
2025-09-16 14:33:19,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [4201.4595, 5427.493, 5461.6523, 5545.687, 5372.75, 5483.682, 5525.7153, 5550.513, 762.61835, 5513.6035]
2025-09-16 14:33:19,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [772.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 141.0, 1000.0]
2025-09-16 14:33:19,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (4884.52) for latency 9
2025-09-16 14:33:19,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 3 minutes, 2 seconds)
2025-09-16 14:35:27,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:35:39,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4194.85645 ± 1656.784
2025-09-16 14:35:39,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [301.87766, 5409.899, 3191.5466, 5284.394, 5398.9067, 3985.4702, 2389.8828, 5344.0054, 5405.679, 5236.904]
2025-09-16 14:35:39,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 1000.0, 589.0, 1000.0, 1000.0, 752.0, 445.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:35:39,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 48 seconds)
2025-09-16 14:37:43,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:37:55,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4388.13281 ± 1751.468
2025-09-16 14:37:55,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [254.42653, 5402.825, 3339.6165, 5435.5806, 5415.043, 5436.4414, 5404.8086, 5466.1807, 5515.8286, 2210.5796]
2025-09-16 14:37:55,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [49.0, 1000.0, 614.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 404.0]
2025-09-16 14:37:55,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 59 minutes, 59 seconds)
2025-09-16 14:39:46,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:39:58,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4556.41699 ± 1575.932
2025-09-16 14:39:58,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5458.569, 5512.587, 5504.4785, 1293.4597, 4791.1816, 5562.7344, 4751.1523, 5519.874, 5539.6455, 1630.4888]
2025-09-16 14:39:58,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 260.0, 878.0, 1000.0, 862.0, 1000.0, 1000.0, 304.0]
2025-09-16 14:39:58,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 57 minutes, 27 seconds)
2025-09-16 14:41:58,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:42:10,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4377.38721 ± 1816.001
2025-09-16 14:42:10,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5502.5713, 5371.125, 5378.5425, 5319.183, 5329.4683, 470.50375, 5317.5645, 5382.1797, 4599.868, 1102.8619]
2025-09-16 14:42:10,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 83.0, 1000.0, 1000.0, 879.0, 197.0]
2025-09-16 14:42:10,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 54 minutes, 33 seconds)
2025-09-16 14:44:13,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:44:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5049.02588 ± 809.781
2025-09-16 14:44:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5582.238, 5551.1836, 4648.1343, 2856.0527, 5631.6963, 4771.808, 5572.412, 5556.433, 5358.4907, 4961.8066]
2025-09-16 14:44:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 825.0, 516.0, 1000.0, 882.0, 1000.0, 1000.0, 954.0, 885.0]
2025-09-16 14:44:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5049.03) for latency 9
2025-09-16 14:44:27,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 53 minutes, 24 seconds)
2025-09-16 14:46:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:46:36,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3470.80005 ± 2011.722
2025-09-16 14:46:36,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5372.205, 5433.418, 5417.2046, 1046.6102, 1028.063, 2594.9878, 2330.1772, 5431.149, 674.8491, 5379.335]
2025-09-16 14:46:36,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 214.0, 188.0, 487.0, 433.0, 1000.0, 138.0, 1000.0]
2025-09-16 14:46:36,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 50 minutes, 25 seconds)
2025-09-16 14:48:38,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:48:50,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4390.64160 ± 1535.776
2025-09-16 14:48:50,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5514.129, 5446.541, 5569.9214, 1209.7056, 5523.6963, 5473.315, 3235.1418, 4242.1367, 5472.749, 2219.0762]
2025-09-16 14:48:50,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 235.0, 1000.0, 1000.0, 577.0, 759.0, 1000.0, 407.0]
2025-09-16 14:48:50,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 48 minutes, 1 second)
2025-09-16 14:50:51,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:51:01,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3723.63037 ± 2040.740
2025-09-16 14:51:01,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5393.2993, 1523.3733, 5365.8027, 633.61127, 5403.343, 1641.8452, 1172.4048, 5342.84, 5374.7656, 5385.0215]
2025-09-16 14:51:01,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 297.0, 1000.0, 107.0, 1000.0, 336.0, 221.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:51:01,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 23 seconds)
2025-09-16 14:53:04,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:53:19,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5509.74316 ± 40.072
2025-09-16 14:53:19,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5531.5044, 5442.822, 5514.5796, 5528.492, 5511.483, 5530.5366, 5563.007, 5508.852, 5538.011, 5428.1445]
2025-09-16 14:53:19,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:53:19,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5509.74) for latency 9
2025-09-16 14:53:19,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 44 minutes, 32 seconds)
2025-09-16 14:55:28,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:55:40,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4893.01855 ± 1158.968
2025-09-16 14:55:40,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5696.3477, 5569.901, 5521.1646, 5544.6587, 5642.296, 5606.118, 2459.8613, 3009.0508, 5635.1396, 4245.646]
2025-09-16 14:55:40,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 985.0, 1000.0, 432.0, 532.0, 1000.0, 752.0]
2025-09-16 14:55:40,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 42 minutes, 38 seconds)
2025-09-16 14:57:37,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:57:48,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3905.71436 ± 1611.734
2025-09-16 14:57:48,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3055.3872, 5242.7056, 1776.9255, 5061.1436, 5157.066, 5166.298, 490.4651, 5155.6553, 3198.4229, 4753.078]
2025-09-16 14:57:48,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [598.0, 1000.0, 361.0, 1000.0, 1000.0, 1000.0, 86.0, 1000.0, 618.0, 892.0]
2025-09-16 14:57:48,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 40 minutes, 18 seconds)
2025-09-16 14:59:52,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:00:07,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5516.88379 ± 42.327
2025-09-16 15:00:07,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5488.3335, 5520.236, 5601.176, 5459.05, 5510.306, 5505.313, 5468.266, 5569.9614, 5547.633, 5498.566]
2025-09-16 15:00:07,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:00:07,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (5516.88) for latency 9
2025-09-16 15:00:07,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 38 minutes, 24 seconds)
2025-09-16 15:02:02,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:02:14,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4267.72656 ± 1610.762
2025-09-16 15:02:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5336.587, 5323.712, 5293.3276, 5304.082, 1966.0139, 5305.687, 2554.7498, 5200.8325, 1064.792, 5327.4805]
2025-09-16 15:02:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 394.0, 1000.0, 506.0, 1000.0, 212.0, 1000.0]
2025-09-16 15:02:14,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 53 seconds)
2025-09-16 15:04:20,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:04:34,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4939.53955 ± 1221.549
2025-09-16 15:04:34,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5414.3647, 5326.871, 5340.4683, 5291.526, 1276.7637, 5363.692, 5294.8867, 5358.334, 5405.852, 5322.6377]
2025-09-16 15:04:34,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 257.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:04:34,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 44 seconds)
2025-09-16 15:06:29,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:06:44,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5390.21680 ± 46.601
2025-09-16 15:06:44,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5437.1284, 5329.52, 5377.4873, 5385.285, 5354.115, 5438.5415, 5323.345, 5468.6, 5367.9854, 5420.1597]
2025-09-16 15:06:44,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:06:44,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 57 seconds)
2025-09-16 15:08:47,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:09:01,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4840.92285 ± 1317.639
2025-09-16 15:09:01,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5255.831, 5252.407, 890.9358, 5408.51, 5293.4565, 5217.971, 5259.6836, 5260.054, 5246.676, 5323.703]
2025-09-16 15:09:01,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 182.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:09:01,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 10 seconds)
2025-09-16 15:11:02,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:11:16,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4899.75293 ± 1263.840
2025-09-16 15:11:16,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5384.3677, 1109.1016, 5307.1763, 5310.134, 5299.339, 5288.9204, 5300.271, 5330.52, 5315.735, 5351.9634]
2025-09-16 15:11:16,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 216.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:11:16,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 44 seconds)
2025-09-16 15:13:13,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:13:26,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4503.80664 ± 1632.329
2025-09-16 15:13:26,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5256.6685, 5275.5723, 5303.4756, 669.3124, 5327.1963, 5329.4614, 5242.219, 5355.215, 1905.1466, 5373.8003]
2025-09-16 15:13:26,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 130.0, 1000.0, 1000.0, 1000.0, 1000.0, 353.0, 1000.0]
2025-09-16 15:13:26,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 37 seconds)
2025-09-16 15:15:32,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:15:46,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4756.10889 ± 1109.895
2025-09-16 15:15:46,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5309.521, 5331.095, 5301.0093, 5231.439, 1954.0974, 5270.6724, 5324.4985, 5249.3276, 3282.0972, 5307.3325]
2025-09-16 15:15:46,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 365.0, 1000.0, 1000.0, 1000.0, 628.0, 1000.0]
2025-09-16 15:15:46,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 24 seconds)
2025-09-16 15:17:48,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:18:02,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5026.46826 ± 1306.565
2025-09-16 15:18:02,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5428.561, 5495.2197, 5462.633, 5494.257, 5524.9614, 5400.8677, 5397.8276, 5465.5127, 5486.2695, 1108.5748]
2025-09-16 15:18:02,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 199.0]
2025-09-16 15:18:02,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 21 seconds)
2025-09-16 15:20:03,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:20:13,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 3700.17041 ± 2168.308
2025-09-16 15:20:13,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5439.821, 5434.504, 295.77933, 5380.8774, 5369.5503, 969.13226, 664.1686, 5417.2236, 2608.731, 5421.92]
2025-09-16 15:20:13,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 53.0, 1000.0, 1000.0, 184.0, 128.0, 1000.0, 488.0, 1000.0]
2025-09-16 15:20:13,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 54 seconds)
2025-09-16 15:22:16,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:22:29,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4393.66797 ± 1552.436
2025-09-16 15:22:29,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [971.5142, 5303.2817, 1813.5914, 5224.2837, 5296.952, 5312.0557, 5271.7495, 4109.988, 5335.9497, 5297.3105]
2025-09-16 15:22:29,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 1000.0, 347.0, 1000.0, 1000.0, 1000.0, 1000.0, 776.0, 1000.0, 1000.0]
2025-09-16 15:22:29,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 42 seconds)
2025-09-16 15:24:27,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:24:41,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5131.84424 ± 581.452
2025-09-16 15:24:41,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5315.1787, 5393.1445, 5331.428, 5346.971, 5295.0835, 5307.0645, 5310.6978, 3389.2786, 5310.027, 5319.567]
2025-09-16 15:24:41,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 636.0, 1000.0, 1000.0]
2025-09-16 15:24:41,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 30 seconds)
2025-09-16 15:26:41,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:26:55,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4925.56104 ± 1494.101
2025-09-16 15:26:55,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5388.9775, 5370.5024, 5462.832, 5420.7393, 5507.6987, 5406.2153, 5423.542, 5400.4185, 444.61407, 5430.0684]
2025-09-16 15:26:55,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 80.0, 1000.0]
2025-09-16 15:26:55,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 8 seconds)
2025-09-16 15:28:56,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:29:10,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5036.95557 ± 1085.368
2025-09-16 15:29:10,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5399.358, 5427.225, 5417.3945, 5414.0596, 5439.7393, 1782.3341, 5351.5103, 5340.9424, 5430.7656, 5366.2256]
2025-09-16 15:29:10,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 340.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:29:10,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 54 seconds)
2025-09-16 15:31:17,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:31:32,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5271.21484 ± 205.328
2025-09-16 15:31:32,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5253.4233, 5389.476, 5417.528, 5401.4536, 5241.984, 5305.0044, 4681.073, 5293.8276, 5384.951, 5343.43]
2025-09-16 15:31:32,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 883.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:31:32,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 47 seconds)
2025-09-16 15:33:24,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:33:39,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 5416.04736 ± 27.930
2025-09-16 15:33:39,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5470.1826, 5373.673, 5396.451, 5431.1543, 5414.827, 5439.774, 5389.7505, 5416.3037, 5389.6895, 5438.669]
2025-09-16 15:33:39,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:33:39,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 28 seconds)
2025-09-16 15:35:43,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:35:56,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4766.45752 ± 1535.127
2025-09-16 15:35:56,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [5393.9727, 5468.8, 5485.711, 5539.9414, 5492.48, 654.73096, 5477.8066, 5425.771, 3166.1616, 5559.1987]
2025-09-16 15:35:56,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 130.0, 1000.0, 1000.0, 580.0, 1000.0]
2025-09-16 15:35:56,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 14 seconds)
2025-09-16 15:37:54,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:38:06,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 4535.55176 ± 1656.702
2025-09-16 15:38:06,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1444.8362, 5418.785, 5294.9604, 5411.088, 5368.639, 5347.729, 1012.10803, 5338.9214, 5367.4136, 5351.033]
2025-09-16 15:38:06,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [287.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 200.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:38:06,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1251 [DEBUG]: Training session finished
