2025-09-16 13:36:02,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.025-delay_18
2025-09-16 13:36:02,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.025-delay_18
2025-09-16 13:36:02,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x154e997a0b50>}
2025-09-16 13:36:02,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 13:36:02,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 13:36:02,479 baseline-bpql-noisepromille25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=682, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 13:36:02,479 baseline-bpql-noisepromille25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 13:36:04,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 13:36:04,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 13:37:51,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:37:53,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 449.55478 ± 87.729
2025-09-16 13:37:53,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [459.03186, 382.64685, 345.8903, 398.2114, 594.79144, 634.76544, 411.0124, 427.01392, 419.7604, 422.4237]
2025-09-16 13:37:53,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 76.0, 65.0, 75.0, 116.0, 127.0, 77.0, 80.0, 82.0, 79.0]
2025-09-16 13:37:53,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (449.55) for latency 18
2025-09-16 13:37:53,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 59 minutes, 25 seconds)
2025-09-16 13:39:48,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:39:49,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 410.27045 ± 47.379
2025-09-16 13:39:49,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [407.03806, 410.69873, 415.86237, 356.3586, 529.5116, 345.55472, 416.90054, 381.58063, 414.53528, 424.66397]
2025-09-16 13:39:49,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 78.0, 82.0, 68.0, 102.0, 65.0, 79.0, 72.0, 78.0, 83.0]
2025-09-16 13:39:49,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 3 minutes, 47 seconds)
2025-09-16 13:41:45,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:41:46,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 415.62784 ± 60.293
2025-09-16 13:41:46,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [315.39435, 433.36737, 352.35324, 395.14447, 408.9137, 454.90668, 386.59753, 407.39017, 548.3062, 453.90497]
2025-09-16 13:41:46,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 82.0, 67.0, 76.0, 77.0, 88.0, 72.0, 77.0, 105.0, 88.0]
2025-09-16 13:41:46,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 4 minutes, 21 seconds)
2025-09-16 13:43:41,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:43:42,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 435.75772 ± 86.645
2025-09-16 13:43:42,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [428.93384, 390.14175, 448.90524, 392.625, 316.40546, 512.404, 364.4246, 414.73636, 440.54654, 648.4542]
2025-09-16 13:43:42,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 75.0, 92.0, 78.0, 62.0, 99.0, 71.0, 77.0, 89.0, 120.0]
2025-09-16 13:43:42,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 17 seconds)
2025-09-16 13:45:38,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:45:40,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 505.68262 ± 221.878
2025-09-16 13:45:40,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [382.4767, 1021.9712, 479.33914, 283.93658, 568.345, 373.8795, 806.51434, 385.20203, 349.5869, 405.57495]
2025-09-16 13:45:40,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 214.0, 103.0, 56.0, 120.0, 71.0, 163.0, 73.0, 66.0, 76.0]
2025-09-16 13:45:40,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (505.68) for latency 18
2025-09-16 13:45:40,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 25 seconds)
2025-09-16 13:47:35,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:47:37,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 481.82159 ± 112.235
2025-09-16 13:47:37,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [484.91852, 411.48032, 520.5323, 372.21005, 502.33746, 473.73523, 721.13654, 616.9088, 374.31647, 340.6403]
2025-09-16 13:47:37,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 92.0, 97.0, 78.0, 106.0, 90.0, 145.0, 116.0, 79.0, 71.0]
2025-09-16 13:47:37,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 1 second)
2025-09-16 13:49:32,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:49:33,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 456.30704 ± 95.237
2025-09-16 13:49:33,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [449.15173, 492.56995, 449.80884, 312.68826, 348.4466, 653.4359, 426.52188, 380.9366, 491.33215, 558.17804]
2025-09-16 13:49:33,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 93.0, 84.0, 60.0, 66.0, 120.0, 92.0, 75.0, 93.0, 115.0]
2025-09-16 13:49:33,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 6 seconds)
2025-09-16 13:51:30,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:51:31,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 503.30780 ± 89.306
2025-09-16 13:51:31,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [462.5186, 420.0028, 458.84055, 674.8622, 456.16385, 501.58252, 390.31097, 471.5981, 545.75977, 651.4392]
2025-09-16 13:51:31,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 80.0, 99.0, 132.0, 95.0, 104.0, 86.0, 99.0, 117.0, 123.0]
2025-09-16 13:51:31,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 59 minutes, 24 seconds)
2025-09-16 13:53:26,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:53:28,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 476.27228 ± 73.405
2025-09-16 13:53:28,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [500.65637, 471.44397, 415.97916, 409.04752, 569.9215, 627.49896, 428.66248, 411.5589, 406.03607, 521.91785]
2025-09-16 13:53:28,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 89.0, 92.0, 85.0, 124.0, 120.0, 81.0, 75.0, 76.0, 113.0]
2025-09-16 13:53:28,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 36 seconds)
2025-09-16 13:55:23,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:55:25,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 489.93369 ± 91.945
2025-09-16 13:55:25,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [363.79556, 449.03964, 544.9237, 464.29578, 516.20197, 524.1797, 519.26587, 445.07184, 700.9681, 371.59482]
2025-09-16 13:55:25,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 94.0, 101.0, 92.0, 98.0, 99.0, 101.0, 87.0, 130.0, 71.0]
2025-09-16 13:55:25,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 55 minutes, 28 seconds)
2025-09-16 13:57:21,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:57:23,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 520.82556 ± 114.528
2025-09-16 13:57:23,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [616.96857, 462.72253, 503.31543, 427.03748, 785.54865, 406.3949, 411.67386, 516.5457, 456.32907, 621.71985]
2025-09-16 13:57:23,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 97.0, 95.0, 90.0, 157.0, 84.0, 85.0, 95.0, 99.0, 131.0]
2025-09-16 13:57:23,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (520.83) for latency 18
2025-09-16 13:57:23,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 53 minutes, 50 seconds)
2025-09-16 13:59:18,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:59:20,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 509.78247 ± 65.736
2025-09-16 13:59:20,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [568.792, 420.1199, 572.06854, 580.28326, 490.10803, 452.20673, 555.4845, 578.51935, 476.31287, 403.9298]
2025-09-16 13:59:20,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 77.0, 106.0, 111.0, 92.0, 92.0, 120.0, 111.0, 89.0, 79.0]
2025-09-16 13:59:20,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 52 minutes, 6 seconds)
2025-09-16 14:01:16,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:01:17,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 495.57794 ± 122.246
2025-09-16 14:01:17,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [636.59564, 461.78583, 322.57492, 533.3972, 378.1578, 740.3375, 421.79663, 379.2852, 529.16205, 552.6865]
2025-09-16 14:01:17,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 94.0, 65.0, 111.0, 72.0, 141.0, 80.0, 86.0, 100.0, 103.0]
2025-09-16 14:01:17,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 50 minutes, 3 seconds)
2025-09-16 14:03:14,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:03:15,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 500.89127 ± 77.360
2025-09-16 14:03:15,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [579.39905, 473.5903, 654.55774, 435.14798, 456.46362, 390.96548, 547.88806, 525.3041, 415.52365, 530.07275]
2025-09-16 14:03:15,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 92.0, 125.0, 84.0, 98.0, 80.0, 101.0, 98.0, 89.0, 96.0]
2025-09-16 14:03:15,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 48 minutes, 25 seconds)
2025-09-16 14:05:12,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:05:13,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 534.60388 ± 91.418
2025-09-16 14:05:13,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [433.93973, 589.1845, 537.4423, 706.0705, 409.13724, 510.2703, 542.73584, 633.89276, 417.15466, 566.2103]
2025-09-16 14:05:13,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 113.0, 113.0, 138.0, 79.0, 103.0, 110.0, 132.0, 81.0, 111.0]
2025-09-16 14:05:13,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (534.60) for latency 18
2025-09-16 14:05:13,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 46 minutes, 40 seconds)
2025-09-16 14:07:10,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:07:12,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 488.23163 ± 120.674
2025-09-16 14:07:12,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [416.3052, 582.48175, 425.65988, 538.90247, 595.31256, 315.73618, 744.00745, 434.60535, 365.98578, 463.31982]
2025-09-16 14:07:12,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 110.0, 90.0, 98.0, 111.0, 66.0, 142.0, 84.0, 69.0, 90.0]
2025-09-16 14:07:12,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 44 minutes, 56 seconds)
2025-09-16 14:09:09,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:09:10,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 552.43396 ± 123.794
2025-09-16 14:09:10,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [451.6152, 416.41833, 529.9352, 737.97, 409.57568, 722.5063, 543.4532, 468.66223, 730.53955, 513.6635]
2025-09-16 14:09:10,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 81.0, 98.0, 142.0, 79.0, 140.0, 105.0, 96.0, 134.0, 97.0]
2025-09-16 14:09:10,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (552.43) for latency 18
2025-09-16 14:09:10,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 43 minutes, 19 seconds)
2025-09-16 14:11:07,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:11:08,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 480.35245 ± 172.555
2025-09-16 14:11:08,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [446.2014, 598.6787, 451.9091, 278.39023, 553.89355, 516.37427, 894.7515, 392.24677, 255.3839, 415.69553]
2025-09-16 14:11:08,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 115.0, 92.0, 57.0, 121.0, 95.0, 171.0, 75.0, 49.0, 79.0]
2025-09-16 14:11:08,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 41 minutes, 32 seconds)
2025-09-16 14:13:05,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:13:06,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 577.77728 ± 182.169
2025-09-16 14:13:06,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [968.4368, 743.9269, 524.78595, 538.2832, 577.73535, 645.65607, 297.071, 466.58755, 363.31546, 651.9746]
2025-09-16 14:13:06,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 141.0, 110.0, 102.0, 121.0, 124.0, 59.0, 89.0, 71.0, 130.0]
2025-09-16 14:13:06,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (577.78) for latency 18
2025-09-16 14:13:06,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 39 minutes, 35 seconds)
2025-09-16 14:15:03,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:15:05,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 573.97772 ± 131.846
2025-09-16 14:15:05,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [827.3957, 449.89374, 801.81665, 610.4882, 536.3947, 462.3582, 583.1751, 466.649, 451.87146, 549.7345]
2025-09-16 14:15:05,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 82.0, 163.0, 118.0, 116.0, 88.0, 126.0, 94.0, 98.0, 120.0]
2025-09-16 14:15:05,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 37 minutes, 47 seconds)
2025-09-16 14:17:02,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:17:04,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 586.76355 ± 72.831
2025-09-16 14:17:04,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [522.91144, 607.0201, 710.6791, 487.72983, 616.9929, 569.5379, 518.5393, 657.2339, 667.7303, 509.26093]
2025-09-16 14:17:04,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 113.0, 128.0, 104.0, 126.0, 105.0, 113.0, 140.0, 124.0, 96.0]
2025-09-16 14:17:04,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (586.76) for latency 18
2025-09-16 14:17:04,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 35 minutes, 53 seconds)
2025-09-16 14:19:00,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:19:02,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 544.35718 ± 106.217
2025-09-16 14:19:02,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [680.115, 605.49646, 465.01395, 343.79837, 465.10944, 690.33997, 628.25653, 584.1493, 523.4319, 457.8609]
2025-09-16 14:19:02,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 129.0, 100.0, 75.0, 87.0, 139.0, 119.0, 107.0, 100.0, 98.0]
2025-09-16 14:19:02,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 33 minutes, 50 seconds)
2025-09-16 14:20:58,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:21:00,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 564.28748 ± 83.706
2025-09-16 14:21:00,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [580.6297, 470.2397, 633.99615, 603.5404, 596.3756, 502.87634, 697.4916, 648.3981, 475.88614, 433.4403]
2025-09-16 14:21:00,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 90.0, 120.0, 112.0, 113.0, 106.0, 137.0, 123.0, 87.0, 85.0]
2025-09-16 14:21:00,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 31 minutes, 44 seconds)
2025-09-16 14:22:57,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:22:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 591.18646 ± 161.101
2025-09-16 14:22:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [506.31125, 738.74744, 557.27954, 762.4068, 325.02823, 578.2381, 454.15747, 416.04788, 826.091, 747.557]
2025-09-16 14:22:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 138.0, 102.0, 154.0, 64.0, 107.0, 86.0, 79.0, 155.0, 136.0]
2025-09-16 14:22:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (591.19) for latency 18
2025-09-16 14:22:58,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 29 minutes, 58 seconds)
2025-09-16 14:24:54,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:24:56,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 600.94641 ± 103.993
2025-09-16 14:24:56,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [616.33936, 522.3871, 492.8189, 571.1768, 602.4097, 799.48444, 666.17145, 465.61008, 525.5436, 747.52295]
2025-09-16 14:24:56,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 98.0, 107.0, 108.0, 111.0, 169.0, 124.0, 87.0, 111.0, 140.0]
2025-09-16 14:24:56,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (600.95) for latency 18
2025-09-16 14:24:56,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 27 minutes, 47 seconds)
2025-09-16 14:26:54,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:26:56,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 652.95758 ± 127.511
2025-09-16 14:26:56,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [529.1784, 640.26984, 741.03876, 917.2832, 643.01245, 594.2831, 518.1366, 724.35065, 750.9205, 471.10257]
2025-09-16 14:26:56,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 127.0, 146.0, 167.0, 137.0, 108.0, 100.0, 151.0, 135.0, 90.0]
2025-09-16 14:26:56,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (652.96) for latency 18
2025-09-16 14:26:56,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 1 second)
2025-09-16 14:28:52,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:28:54,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 530.04706 ± 55.050
2025-09-16 14:28:54,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [522.32776, 559.5083, 502.10416, 483.60803, 541.23456, 577.2249, 474.18137, 638.44385, 562.2444, 439.59384]
2025-09-16 14:28:54,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 111.0, 101.0, 88.0, 104.0, 111.0, 105.0, 131.0, 104.0, 86.0]
2025-09-16 14:28:54,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 3 seconds)
2025-09-16 14:30:50,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:30:52,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 553.79150 ± 87.167
2025-09-16 14:30:52,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [686.3854, 626.54706, 538.82806, 566.7206, 550.2443, 411.26562, 595.24036, 611.6447, 388.71033, 562.3286]
2025-09-16 14:30:52,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 112.0, 105.0, 109.0, 99.0, 77.0, 122.0, 122.0, 73.0, 104.0]
2025-09-16 14:30:52,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 8 seconds)
2025-09-16 14:32:49,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:32:51,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 595.37958 ± 100.391
2025-09-16 14:32:51,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [490.89944, 585.8502, 686.37756, 649.6388, 669.41315, 579.1584, 727.5936, 414.89725, 470.29132, 679.6758]
2025-09-16 14:32:51,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 110.0, 123.0, 137.0, 130.0, 108.0, 134.0, 85.0, 97.0, 137.0]
2025-09-16 14:32:51,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 16 seconds)
2025-09-16 14:34:47,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:34:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 584.45197 ± 97.323
2025-09-16 14:34:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [452.00232, 682.5229, 548.58026, 578.68823, 562.2576, 424.93958, 589.5695, 573.41815, 768.2344, 664.3066]
2025-09-16 14:34:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 128.0, 105.0, 125.0, 110.0, 78.0, 111.0, 125.0, 147.0, 131.0]
2025-09-16 14:34:49,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 18 minutes, 15 seconds)
2025-09-16 14:36:47,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:36:49,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 674.64587 ± 163.372
2025-09-16 14:36:49,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [449.3484, 680.90735, 637.0446, 692.0297, 853.28656, 1007.5002, 497.76807, 750.8902, 489.19, 688.49384]
2025-09-16 14:36:49,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 133.0, 126.0, 142.0, 158.0, 200.0, 107.0, 137.0, 93.0, 127.0]
2025-09-16 14:36:49,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (674.65) for latency 18
2025-09-16 14:36:49,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 21 seconds)
2025-09-16 14:38:45,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:38:46,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 578.68610 ± 181.270
2025-09-16 14:38:46,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [559.986, 438.54596, 481.23227, 998.7918, 712.2818, 353.0076, 535.9709, 516.8975, 749.4372, 440.71005]
2025-09-16 14:38:46,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 94.0, 107.0, 189.0, 131.0, 69.0, 104.0, 110.0, 142.0, 96.0]
2025-09-16 14:38:46,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 14 minutes, 16 seconds)
2025-09-16 14:40:43,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:40:45,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 626.35413 ± 137.843
2025-09-16 14:40:45,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [629.1015, 463.52667, 455.58707, 686.9564, 874.6934, 762.2574, 682.0209, 550.5377, 715.92053, 442.93976]
2025-09-16 14:40:45,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 88.0, 86.0, 149.0, 168.0, 144.0, 130.0, 120.0, 131.0, 93.0]
2025-09-16 14:40:45,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 12 minutes, 25 seconds)
2025-09-16 14:42:43,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:42:44,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 655.82068 ± 132.544
2025-09-16 14:42:44,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [764.2379, 589.5741, 394.49045, 758.4882, 438.52313, 674.34515, 668.69556, 780.07275, 716.0064, 773.77344]
2025-09-16 14:42:44,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 119.0, 84.0, 138.0, 88.0, 125.0, 128.0, 145.0, 131.0, 143.0]
2025-09-16 14:42:44,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 10 minutes, 34 seconds)
2025-09-16 14:44:40,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:44:42,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 556.61072 ± 88.088
2025-09-16 14:44:42,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [576.7241, 615.7566, 711.9784, 494.21533, 491.02966, 506.26468, 485.7574, 710.4074, 471.90613, 502.0673]
2025-09-16 14:44:42,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 116.0, 135.0, 109.0, 97.0, 103.0, 93.0, 134.0, 101.0, 93.0]
2025-09-16 14:44:42,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 8 minutes, 34 seconds)
2025-09-16 14:46:40,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:46:42,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 609.09509 ± 120.877
2025-09-16 14:46:42,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [561.21454, 629.9224, 666.1553, 580.7042, 575.7441, 406.65134, 872.40735, 546.57697, 734.0198, 517.5549]
2025-09-16 14:46:42,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 123.0, 128.0, 105.0, 116.0, 89.0, 167.0, 109.0, 144.0, 107.0]
2025-09-16 14:46:42,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 6 minutes, 35 seconds)
2025-09-16 14:48:38,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:48:40,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 661.67255 ± 160.107
2025-09-16 14:48:40,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [587.46155, 537.9274, 525.6221, 961.3524, 711.5723, 606.628, 637.5057, 916.5749, 428.49945, 703.58185]
2025-09-16 14:48:40,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 111.0, 112.0, 201.0, 135.0, 114.0, 124.0, 191.0, 91.0, 137.0]
2025-09-16 14:48:40,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 4 minutes, 44 seconds)
2025-09-16 14:50:37,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:50:39,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 750.15192 ± 205.163
2025-09-16 14:50:39,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1133.4897, 739.4793, 450.05978, 773.3295, 422.58093, 631.3735, 940.048, 834.44507, 699.56104, 877.1523]
2025-09-16 14:50:39,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 145.0, 84.0, 145.0, 83.0, 131.0, 176.0, 164.0, 138.0, 166.0]
2025-09-16 14:50:39,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (750.15) for latency 18
2025-09-16 14:50:39,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 2 minutes, 53 seconds)
2025-09-16 14:52:37,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:52:39,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 641.36981 ± 160.112
2025-09-16 14:52:39,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [598.1472, 482.67374, 574.4133, 940.244, 736.4852, 472.46576, 704.311, 493.46817, 527.146, 884.34424]
2025-09-16 14:52:39,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 93.0, 112.0, 175.0, 140.0, 88.0, 135.0, 97.0, 102.0, 169.0]
2025-09-16 14:52:39,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 52 seconds)
2025-09-16 14:54:36,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:54:38,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 629.96747 ± 95.502
2025-09-16 14:54:38,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [637.67975, 774.596, 599.4463, 624.13135, 509.10696, 598.76556, 812.8754, 494.88037, 593.1844, 655.00824]
2025-09-16 14:54:38,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 147.0, 115.0, 119.0, 96.0, 118.0, 156.0, 93.0, 113.0, 123.0]
2025-09-16 14:54:38,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 11 seconds)
2025-09-16 14:56:35,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:56:37,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 685.11487 ± 66.718
2025-09-16 14:56:37,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [658.3335, 780.2867, 634.7474, 695.138, 677.94946, 710.62964, 741.8944, 523.6857, 728.6934, 699.7903]
2025-09-16 14:56:37,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 148.0, 117.0, 135.0, 127.0, 139.0, 140.0, 113.0, 132.0, 129.0]
2025-09-16 14:56:37,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 3 seconds)
2025-09-16 14:58:34,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:58:35,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 664.87506 ± 101.695
2025-09-16 14:58:35,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [526.0178, 563.46606, 823.9593, 690.85114, 765.9143, 600.14716, 643.5871, 760.49506, 742.84326, 531.4694]
2025-09-16 14:58:35,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 104.0, 156.0, 147.0, 139.0, 114.0, 117.0, 144.0, 139.0, 110.0]
2025-09-16 14:58:35,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 2 seconds)
2025-09-16 15:00:33,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:00:35,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 751.85260 ± 128.387
2025-09-16 15:00:35,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [739.5192, 652.91846, 580.3934, 641.53143, 814.3539, 682.83765, 730.7337, 738.15393, 1038.6045, 899.4798]
2025-09-16 15:00:35,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 127.0, 107.0, 123.0, 148.0, 137.0, 152.0, 138.0, 209.0, 164.0]
2025-09-16 15:00:35,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (751.85) for latency 18
2025-09-16 15:00:35,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 12 seconds)
2025-09-16 15:02:32,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:02:34,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 716.95325 ± 116.003
2025-09-16 15:02:34,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [956.55286, 775.24615, 543.15674, 728.16345, 633.00665, 609.45215, 755.1756, 667.7306, 847.19116, 653.85626]
2025-09-16 15:02:34,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 153.0, 99.0, 135.0, 124.0, 131.0, 147.0, 136.0, 161.0, 120.0]
2025-09-16 15:02:34,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes)
2025-09-16 15:04:31,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:04:33,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 713.82910 ± 87.948
2025-09-16 15:04:33,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [712.99927, 894.7646, 741.97656, 712.2966, 647.37915, 630.9576, 709.8944, 718.6476, 560.6018, 808.7735]
2025-09-16 15:04:33,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 171.0, 147.0, 140.0, 126.0, 114.0, 150.0, 141.0, 106.0, 159.0]
2025-09-16 15:04:34,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 11 seconds)
2025-09-16 15:06:31,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:06:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 756.97223 ± 137.941
2025-09-16 15:06:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [743.7374, 957.39685, 754.4198, 909.5261, 832.3783, 933.70447, 649.1771, 612.73004, 602.86273, 573.78925]
2025-09-16 15:06:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 177.0, 144.0, 165.0, 180.0, 177.0, 118.0, 111.0, 126.0, 106.0]
2025-09-16 15:06:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (756.97) for latency 18
2025-09-16 15:06:33,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 11 seconds)
2025-09-16 15:08:29,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:08:31,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 675.50018 ± 83.387
2025-09-16 15:08:31,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [732.6385, 688.7011, 551.83325, 481.03656, 683.0897, 704.9421, 740.99963, 745.0623, 719.1475, 707.5514]
2025-09-16 15:08:31,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 128.0, 119.0, 87.0, 127.0, 131.0, 142.0, 152.0, 143.0, 136.0]
2025-09-16 15:08:31,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 17 seconds)
2025-09-16 15:10:29,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:10:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 715.18719 ± 65.916
2025-09-16 15:10:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [811.5623, 739.08563, 796.5225, 703.3054, 581.7594, 638.4562, 753.3414, 678.50055, 733.6589, 715.6803]
2025-09-16 15:10:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 133.0, 168.0, 131.0, 119.0, 118.0, 144.0, 125.0, 139.0, 128.0]
2025-09-16 15:10:32,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 21 seconds)
2025-09-16 15:12:30,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:12:32,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 717.22968 ± 155.492
2025-09-16 15:12:32,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [535.64417, 866.50244, 625.87427, 774.16394, 593.0728, 690.8347, 829.7572, 1037.4858, 508.54352, 710.4187]
2025-09-16 15:12:32,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 160.0, 117.0, 151.0, 129.0, 138.0, 166.0, 199.0, 96.0, 152.0]
2025-09-16 15:12:32,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 43 seconds)
2025-09-16 15:14:28,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:14:30,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 734.96204 ± 119.625
2025-09-16 15:14:30,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [879.43994, 551.7931, 671.11676, 706.5242, 653.8636, 659.345, 965.3173, 726.95935, 677.53455, 857.72656]
2025-09-16 15:14:30,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 108.0, 131.0, 130.0, 123.0, 131.0, 182.0, 132.0, 131.0, 159.0]
2025-09-16 15:14:30,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 22 seconds)
2025-09-16 15:16:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:16:29,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 762.87360 ± 169.970
2025-09-16 15:16:29,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [614.7779, 620.13837, 1093.7449, 681.06335, 671.6364, 722.06586, 949.76886, 638.8059, 997.18427, 639.55035]
2025-09-16 15:16:29,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 124.0, 229.0, 125.0, 129.0, 154.0, 197.0, 121.0, 214.0, 137.0]
2025-09-16 15:16:29,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (762.87) for latency 18
2025-09-16 15:16:29,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 20 seconds)
2025-09-16 15:18:27,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:18:30,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 864.01367 ± 220.071
2025-09-16 15:18:30,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [918.6215, 841.8891, 858.8044, 701.8062, 776.9331, 733.3203, 604.6325, 671.078, 1240.9763, 1292.0753]
2025-09-16 15:18:30,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 172.0, 162.0, 139.0, 151.0, 136.0, 117.0, 130.0, 242.0, 246.0]
2025-09-16 15:18:30,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (864.01) for latency 18
2025-09-16 15:18:30,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 42 seconds)
2025-09-16 15:20:27,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:20:29,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 696.56738 ± 103.118
2025-09-16 15:20:29,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [861.6666, 508.27402, 600.869, 758.043, 638.40564, 825.139, 691.1077, 657.5616, 778.34796, 646.2591]
2025-09-16 15:20:29,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 98.0, 112.0, 160.0, 124.0, 159.0, 129.0, 143.0, 142.0, 124.0]
2025-09-16 15:20:29,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 37 seconds)
2025-09-16 15:22:26,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:22:28,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 767.71814 ± 73.064
2025-09-16 15:22:28,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [761.0707, 710.81134, 870.86786, 794.74677, 668.76337, 860.5728, 776.4514, 686.80725, 856.0852, 691.0043]
2025-09-16 15:22:28,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 135.0, 160.0, 147.0, 142.0, 178.0, 144.0, 129.0, 162.0, 131.0]
2025-09-16 15:22:28,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 24 seconds)
2025-09-16 15:24:25,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:24:27,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 762.29834 ± 221.545
2025-09-16 15:24:27,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [603.66016, 716.3387, 563.9636, 620.884, 735.433, 1034.6084, 660.9818, 1312.4186, 738.3555, 636.3397]
2025-09-16 15:24:27,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 134.0, 103.0, 130.0, 138.0, 212.0, 124.0, 253.0, 158.0, 136.0]
2025-09-16 15:24:27,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 35 seconds)
2025-09-16 15:26:26,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:26:28,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 740.32361 ± 98.358
2025-09-16 15:26:28,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [661.43024, 598.9707, 798.8557, 697.51874, 804.2127, 838.51385, 905.7558, 588.95856, 775.4175, 733.6026]
2025-09-16 15:26:28,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 114.0, 152.0, 133.0, 169.0, 168.0, 176.0, 112.0, 143.0, 147.0]
2025-09-16 15:26:28,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 27 minutes, 53 seconds)
2025-09-16 15:28:24,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:28:26,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 773.93280 ± 131.331
2025-09-16 15:28:26,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [943.88, 731.0853, 1032.0441, 767.61835, 668.9807, 581.50793, 841.9353, 660.9754, 822.7522, 688.54877]
2025-09-16 15:28:26,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 159.0, 219.0, 157.0, 123.0, 125.0, 173.0, 139.0, 173.0, 124.0]
2025-09-16 15:28:26,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 32 seconds)
2025-09-16 15:30:24,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:30:27,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 759.88977 ± 156.115
2025-09-16 15:30:27,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [610.80304, 729.04987, 855.4192, 672.6903, 1029.5604, 707.22345, 606.3319, 674.6894, 1055.04, 658.09045]
2025-09-16 15:30:27,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 134.0, 159.0, 126.0, 190.0, 127.0, 116.0, 122.0, 199.0, 124.0]
2025-09-16 15:30:27,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 39 seconds)
2025-09-16 15:32:23,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:32:26,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 830.69122 ± 200.108
2025-09-16 15:32:26,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [723.02936, 864.74243, 970.08844, 542.6959, 977.73883, 1052.7512, 524.05145, 760.74207, 734.498, 1156.5742]
2025-09-16 15:32:26,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 162.0, 182.0, 112.0, 203.0, 199.0, 102.0, 147.0, 132.0, 238.0]
2025-09-16 15:32:26,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 37 seconds)
2025-09-16 15:34:24,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:34:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 771.85956 ± 140.010
2025-09-16 15:34:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [626.06976, 983.7015, 735.3575, 734.27106, 882.804, 505.9683, 717.5991, 965.3774, 742.0692, 825.37744]
2025-09-16 15:34:26,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 182.0, 142.0, 133.0, 167.0, 96.0, 152.0, 177.0, 140.0, 156.0]
2025-09-16 15:34:26,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 49 seconds)
2025-09-16 15:36:23,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:36:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 779.27289 ± 136.781
2025-09-16 15:36:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [950.73315, 559.13477, 770.4318, 751.5805, 573.26715, 998.44763, 736.3371, 873.41815, 848.7964, 730.58185]
2025-09-16 15:36:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 111.0, 142.0, 160.0, 112.0, 173.0, 141.0, 162.0, 162.0, 134.0]
2025-09-16 15:36:25,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 38 seconds)
2025-09-16 15:38:21,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:38:23,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 816.12939 ± 265.649
2025-09-16 15:38:23,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [454.47375, 1243.1866, 670.73615, 1315.9569, 768.6634, 781.5648, 964.2125, 657.8097, 558.906, 745.78436]
2025-09-16 15:38:23,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 245.0, 121.0, 259.0, 141.0, 154.0, 179.0, 125.0, 118.0, 141.0]
2025-09-16 15:38:23,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 36 seconds)
2025-09-16 15:40:22,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:40:24,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 856.38721 ± 180.501
2025-09-16 15:40:24,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [915.40314, 1069.4902, 1085.4429, 791.37714, 538.4894, 784.75885, 734.2756, 913.63416, 646.78644, 1084.2145]
2025-09-16 15:40:24,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 195.0, 200.0, 148.0, 103.0, 153.0, 132.0, 191.0, 126.0, 208.0]
2025-09-16 15:40:25,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 44 seconds)
2025-09-16 15:42:20,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:42:23,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 832.59375 ± 170.638
2025-09-16 15:42:23,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [650.7299, 702.02563, 876.4764, 706.95874, 881.97284, 605.3199, 1132.5251, 1022.3576, 1015.0091, 732.5625]
2025-09-16 15:42:23,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 135.0, 164.0, 127.0, 183.0, 124.0, 215.0, 209.0, 193.0, 160.0]
2025-09-16 15:42:23,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 40 seconds)
2025-09-16 15:44:21,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:44:23,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 911.58838 ± 288.302
2025-09-16 15:44:23,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [846.9423, 771.51385, 955.8787, 1602.4673, 698.5636, 1038.6724, 1169.0098, 780.23773, 525.61115, 726.98706]
2025-09-16 15:44:23,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 143.0, 176.0, 309.0, 127.0, 209.0, 226.0, 153.0, 102.0, 139.0]
2025-09-16 15:44:23,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (911.59) for latency 18
2025-09-16 15:44:23,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 43 seconds)
2025-09-16 15:46:21,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:46:23,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 830.24170 ± 254.964
2025-09-16 15:46:23,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [727.17316, 682.0352, 645.218, 729.45624, 869.3831, 791.46857, 1467.4196, 1013.0963, 901.2894, 475.87674]
2025-09-16 15:46:23,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 128.0, 126.0, 133.0, 167.0, 150.0, 287.0, 204.0, 196.0, 87.0]
2025-09-16 15:46:23,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 44 seconds)
2025-09-16 15:48:22,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:48:24,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 843.22577 ± 241.697
2025-09-16 15:48:24,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [629.98517, 781.4547, 1108.0328, 1004.1706, 1161.5471, 588.8907, 877.5908, 1165.9318, 512.3122, 602.3428]
2025-09-16 15:48:24,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 142.0, 212.0, 197.0, 223.0, 113.0, 164.0, 233.0, 96.0, 113.0]
2025-09-16 15:48:24,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 4 seconds)
2025-09-16 15:50:22,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:50:25,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 901.35583 ± 207.702
2025-09-16 15:50:25,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1278.0574, 1189.3124, 919.8219, 744.4398, 794.81226, 750.68835, 627.96246, 899.8485, 1095.0964, 713.51984]
2025-09-16 15:50:25,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 224.0, 173.0, 137.0, 155.0, 146.0, 136.0, 175.0, 224.0, 141.0]
2025-09-16 15:50:25,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes)
2025-09-16 15:52:22,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:52:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 951.67120 ± 350.949
2025-09-16 15:52:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1761.862, 731.2166, 428.56863, 1140.1759, 771.9435, 1275.2567, 887.30005, 824.8648, 708.7576, 986.7658]
2025-09-16 15:52:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [342.0, 155.0, 91.0, 214.0, 142.0, 252.0, 163.0, 153.0, 151.0, 182.0]
2025-09-16 15:52:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (951.67) for latency 18
2025-09-16 15:52:24,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 10 seconds)
2025-09-16 15:54:21,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:54:23,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 916.66943 ± 278.854
2025-09-16 15:54:23,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [491.77405, 1096.7144, 755.8784, 751.2329, 1186.9062, 1499.2861, 965.6659, 984.8213, 638.6767, 795.73785]
2025-09-16 15:54:23,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 229.0, 162.0, 143.0, 221.0, 300.0, 201.0, 184.0, 138.0, 147.0]
2025-09-16 15:54:23,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour)
2025-09-16 15:56:22,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:56:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 786.64777 ± 156.345
2025-09-16 15:56:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [967.0913, 920.23206, 654.65967, 1036.0415, 619.2059, 706.39655, 642.77045, 618.3281, 744.4432, 957.3086]
2025-09-16 15:56:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 200.0, 139.0, 209.0, 122.0, 138.0, 136.0, 117.0, 142.0, 178.0]
2025-09-16 15:56:24,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 7 seconds)
2025-09-16 15:58:22,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:58:25,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1057.77271 ± 299.810
2025-09-16 15:58:25,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1070.1523, 855.45404, 1291.5908, 610.7343, 716.1604, 1367.5972, 795.6394, 1576.7834, 1011.48004, 1282.135]
2025-09-16 15:58:25,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 162.0, 250.0, 118.0, 134.0, 259.0, 147.0, 313.0, 190.0, 240.0]
2025-09-16 15:58:25,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1057.77) for latency 18
2025-09-16 15:58:25,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 5 seconds)
2025-09-16 16:00:21,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:00:24,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1039.26123 ± 231.382
2025-09-16 16:00:24,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [784.09357, 731.98145, 880.2669, 1032.2274, 823.1526, 1187.8759, 1269.7252, 1285.881, 956.8915, 1440.5162]
2025-09-16 16:00:24,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 146.0, 162.0, 190.0, 160.0, 249.0, 234.0, 246.0, 175.0, 288.0]
2025-09-16 16:00:24,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 59 seconds)
2025-09-16 16:02:23,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:02:25,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 915.91553 ± 279.402
2025-09-16 16:02:25,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [730.56305, 1533.1232, 922.66345, 828.27783, 782.8049, 1260.7251, 813.68616, 477.8288, 1000.65137, 808.831]
2025-09-16 16:02:25,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 272.0, 171.0, 165.0, 146.0, 242.0, 158.0, 103.0, 187.0, 147.0]
2025-09-16 16:02:25,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 5 seconds)
2025-09-16 16:04:23,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:04:26,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 894.10565 ± 249.754
2025-09-16 16:04:26,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [686.0863, 767.66656, 1154.6903, 1060.6453, 593.4631, 925.03625, 876.577, 1445.5656, 647.0015, 784.32465]
2025-09-16 16:04:26,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 153.0, 236.0, 196.0, 115.0, 171.0, 167.0, 295.0, 124.0, 154.0]
2025-09-16 16:04:26,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 11 seconds)
2025-09-16 16:06:23,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:06:25,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 816.54181 ± 140.747
2025-09-16 16:06:25,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [667.1722, 701.2001, 1060.416, 711.00586, 966.3294, 778.1202, 718.02954, 849.28937, 692.7365, 1021.1191]
2025-09-16 16:06:25,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 151.0, 202.0, 127.0, 187.0, 167.0, 132.0, 187.0, 124.0, 189.0]
2025-09-16 16:06:25,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 4 seconds)
2025-09-16 16:08:23,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:08:25,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 830.27325 ± 248.832
2025-09-16 16:08:25,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [439.50003, 839.24036, 442.13248, 827.7507, 1022.56665, 967.3952, 780.7785, 685.20685, 1020.27936, 1277.8822]
2025-09-16 16:08:25,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 156.0, 84.0, 160.0, 182.0, 188.0, 155.0, 141.0, 193.0, 244.0]
2025-09-16 16:08:25,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 59 seconds)
2025-09-16 16:10:22,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:10:25,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 954.52704 ± 291.520
2025-09-16 16:10:25,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1375.8484, 630.64905, 1150.1313, 1046.1409, 804.14, 845.5888, 657.218, 1508.8025, 857.1818, 669.56995]
2025-09-16 16:10:25,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [276.0, 130.0, 206.0, 193.0, 162.0, 175.0, 132.0, 295.0, 166.0, 124.0]
2025-09-16 16:10:25,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 3 seconds)
2025-09-16 16:12:24,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:12:27,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 982.19531 ± 255.769
2025-09-16 16:12:27,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1003.6679, 1054.1984, 1456.8105, 720.2712, 559.4811, 657.8757, 1113.9211, 1193.4467, 986.2554, 1076.0251]
2025-09-16 16:12:27,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 199.0, 277.0, 138.0, 109.0, 116.0, 231.0, 217.0, 184.0, 243.0]
2025-09-16 16:12:27,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 6 seconds)
2025-09-16 16:14:25,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:14:28,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 907.97900 ± 223.294
2025-09-16 16:14:28,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [758.40314, 1015.61725, 825.29364, 1003.4548, 883.74054, 790.21625, 1514.6151, 726.5667, 758.14716, 803.7349]
2025-09-16 16:14:28,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 203.0, 155.0, 203.0, 178.0, 160.0, 306.0, 150.0, 137.0, 164.0]
2025-09-16 16:14:28,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 8 seconds)
2025-09-16 16:16:26,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:16:29,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1049.90527 ± 346.476
2025-09-16 16:16:29,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1270.065, 944.5624, 1156.1985, 603.42944, 973.90717, 700.9907, 1576.579, 1654.9492, 661.91486, 956.45605]
2025-09-16 16:16:29,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [269.0, 181.0, 229.0, 130.0, 215.0, 127.0, 296.0, 340.0, 130.0, 178.0]
2025-09-16 16:16:29,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 12 seconds)
2025-09-16 16:18:25,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:18:28,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1033.34973 ± 292.366
2025-09-16 16:18:28,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1073.0453, 1374.2178, 1039.2721, 960.0165, 1128.9374, 467.6598, 896.2071, 668.823, 1492.8151, 1232.5029]
2025-09-16 16:18:28,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 260.0, 194.0, 182.0, 213.0, 94.0, 170.0, 126.0, 282.0, 234.0]
2025-09-16 16:18:28,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 9 seconds)
2025-09-16 16:20:26,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:20:29,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1027.56946 ± 376.517
2025-09-16 16:20:29,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1055.5209, 852.2683, 1284.6198, 856.51074, 1135.0403, 944.17773, 675.2904, 570.1861, 1989.7179, 912.3624]
2025-09-16 16:20:29,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [222.0, 165.0, 243.0, 175.0, 217.0, 189.0, 143.0, 106.0, 422.0, 169.0]
2025-09-16 16:20:29,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 14 seconds)
2025-09-16 16:22:26,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:22:29,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1097.48303 ± 269.005
2025-09-16 16:22:29,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [885.4208, 752.04584, 1043.8431, 1183.515, 1258.1256, 1607.296, 968.47156, 1364.5723, 695.7688, 1215.7703]
2025-09-16 16:22:29,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 141.0, 196.0, 231.0, 232.0, 312.0, 188.0, 262.0, 136.0, 235.0]
2025-09-16 16:22:29,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1097.48) for latency 18
2025-09-16 16:22:29,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 6 seconds)
2025-09-16 16:24:29,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:24:32,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 998.36536 ± 241.146
2025-09-16 16:24:32,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1130.702, 994.39734, 1015.5789, 758.49207, 1152.5464, 1033.6111, 675.19684, 1272.06, 1363.3153, 587.75433]
2025-09-16 16:24:32,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 197.0, 201.0, 151.0, 222.0, 208.0, 140.0, 249.0, 264.0, 114.0]
2025-09-16 16:24:32,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 13 seconds)
2025-09-16 16:26:28,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:26:31,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 909.92157 ± 246.121
2025-09-16 16:26:31,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [756.94446, 672.0122, 1077.8622, 607.6747, 1145.638, 655.1857, 971.89996, 745.5346, 1384.5884, 1081.8752]
2025-09-16 16:26:31,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 140.0, 206.0, 123.0, 242.0, 135.0, 195.0, 163.0, 268.0, 207.0]
2025-09-16 16:26:31,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 6 seconds)
2025-09-16 16:28:28,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:28:31,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 933.32507 ± 238.284
2025-09-16 16:28:31,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1171.1223, 1087.7278, 1071.4847, 714.1007, 525.86426, 894.9441, 1334.3317, 632.6456, 928.67365, 972.35657]
2025-09-16 16:28:31,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [218.0, 198.0, 197.0, 133.0, 98.0, 171.0, 251.0, 123.0, 174.0, 221.0]
2025-09-16 16:28:31,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 8 seconds)
2025-09-16 16:30:29,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:30:33,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1092.93750 ± 342.073
2025-09-16 16:30:33,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [709.30023, 1097.254, 1019.4207, 1854.4594, 659.6941, 777.2982, 1220.4143, 1193.6617, 973.7375, 1424.1348]
2025-09-16 16:30:33,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 199.0, 190.0, 351.0, 142.0, 162.0, 226.0, 227.0, 179.0, 296.0]
2025-09-16 16:30:33,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 7 seconds)
2025-09-16 16:32:30,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:32:34,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1203.44202 ± 349.145
2025-09-16 16:32:34,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [767.1883, 860.51154, 1203.3857, 1417.5662, 1652.4406, 1538.8016, 801.9137, 1435.6729, 1587.6056, 769.33405]
2025-09-16 16:32:34,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 173.0, 222.0, 265.0, 327.0, 295.0, 145.0, 280.0, 309.0, 139.0]
2025-09-16 16:32:34,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1203.44) for latency 18
2025-09-16 16:32:34,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 10 seconds)
2025-09-16 16:34:30,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:34:32,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 973.98621 ± 268.476
2025-09-16 16:34:32,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1018.37445, 1008.35626, 1074.4148, 1641.6917, 762.6152, 779.9672, 1021.94696, 1041.5387, 788.7741, 602.184]
2025-09-16 16:34:32,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 199.0, 221.0, 344.0, 142.0, 150.0, 198.0, 226.0, 158.0, 121.0]
2025-09-16 16:34:32,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes)
2025-09-16 16:36:29,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:36:33,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1140.63269 ± 340.098
2025-09-16 16:36:33,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1212.5541, 1005.58484, 1265.3708, 972.5353, 1454.7281, 526.7748, 1277.5397, 1232.1353, 1763.8894, 695.21484]
2025-09-16 16:36:33,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 180.0, 251.0, 177.0, 287.0, 113.0, 243.0, 224.0, 339.0, 126.0]
2025-09-16 16:36:33,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 2 seconds)
2025-09-16 16:38:30,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:38:33,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1097.64563 ± 306.636
2025-09-16 16:38:33,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [767.1685, 1690.4309, 620.8361, 972.9565, 1122.0398, 1005.0384, 929.2837, 1055.7529, 1464.4729, 1348.4763]
2025-09-16 16:38:33,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 336.0, 122.0, 176.0, 223.0, 198.0, 179.0, 198.0, 272.0, 263.0]
2025-09-16 16:38:33,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 2 seconds)
2025-09-16 16:40:30,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:40:35,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1692.15588 ± 660.113
2025-09-16 16:40:35,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1879.5712, 1582.0472, 1785.5818, 1416.1543, 1071.4622, 1390.4701, 1136.4142, 1469.5443, 1657.118, 3533.1953]
2025-09-16 16:40:35,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [375.0, 330.0, 359.0, 285.0, 196.0, 263.0, 240.0, 275.0, 314.0, 691.0]
2025-09-16 16:40:35,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1692.16) for latency 18
2025-09-16 16:40:35,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 3 seconds)
2025-09-16 16:42:31,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:42:35,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1320.93347 ± 415.025
2025-09-16 16:42:35,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1171.2913, 609.3049, 1809.5243, 1951.7029, 887.63727, 1584.5194, 1136.5315, 1071.878, 1762.0426, 1224.9023]
2025-09-16 16:42:35,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 112.0, 335.0, 410.0, 163.0, 316.0, 213.0, 200.0, 348.0, 233.0]
2025-09-16 16:42:35,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 1 second)
2025-09-16 16:44:29,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:44:34,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1454.79712 ± 665.549
2025-09-16 16:44:34,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [827.7693, 759.33154, 2795.9248, 2016.581, 1622.5316, 928.9806, 2266.0015, 1211.2404, 831.26184, 1288.3496]
2025-09-16 16:44:34,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 142.0, 555.0, 407.0, 308.0, 187.0, 453.0, 225.0, 172.0, 235.0]
2025-09-16 16:44:34,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 1 second)
2025-09-16 16:46:31,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:46:35,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1255.19458 ± 269.000
2025-09-16 16:46:35,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1036.8308, 1009.3179, 1181.7842, 1877.2533, 1016.367, 1564.0073, 1225.1538, 1069.3353, 1415.1462, 1156.75]
2025-09-16 16:46:35,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [222.0, 196.0, 237.0, 362.0, 224.0, 305.0, 226.0, 195.0, 285.0, 205.0]
2025-09-16 16:46:35,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 1 second)
2025-09-16 16:48:30,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:48:34,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1302.73535 ± 497.546
2025-09-16 16:48:34,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1354.5654, 1280.3176, 1011.5118, 1101.6052, 807.2129, 515.963, 1882.0046, 2322.4636, 1156.2836, 1595.4253]
2025-09-16 16:48:34,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [259.0, 252.0, 189.0, 205.0, 158.0, 101.0, 361.0, 492.0, 223.0, 316.0]
2025-09-16 16:48:34,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes)
2025-09-16 16:50:32,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:50:35,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1115.66992 ± 342.814
2025-09-16 16:50:35,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1365.7878, 1069.7454, 1428.3068, 938.0891, 978.206, 1487.069, 1658.6561, 648.32684, 566.8632, 1015.6482]
2025-09-16 16:50:35,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [247.0, 210.0, 283.0, 178.0, 183.0, 283.0, 310.0, 124.0, 109.0, 195.0]
2025-09-16 16:50:35,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes)
2025-09-16 16:52:30,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:52:33,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1247.36597 ± 289.452
2025-09-16 16:52:33,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1470.3218, 1623.5476, 543.29443, 1328.7827, 1143.6954, 1023.6723, 1385.0883, 1152.671, 1465.1188, 1337.4675]
2025-09-16 16:52:33,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [284.0, 306.0, 112.0, 249.0, 211.0, 196.0, 256.0, 213.0, 301.0, 251.0]
2025-09-16 16:52:33,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-09-16 16:54:31,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:54:35,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1350.18140 ± 563.281
2025-09-16 16:54:35,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1735.9446, 1603.1517, 1598.4365, 703.61096, 896.4055, 716.21277, 1049.663, 948.59467, 2583.4739, 1666.32]
2025-09-16 16:54:35,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [319.0, 300.0, 295.0, 143.0, 172.0, 150.0, 209.0, 180.0, 498.0, 312.0]
2025-09-16 16:54:35,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1251 [DEBUG]: Training session finished
