2025-09-16 12:11:40,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.025-delay_12
2025-09-16 12:11:40,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.025-delay_12
2025-09-16 12:11:40,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'12': <latency_env.delayed_mdp.ConstantDelay object at 0x14cc5a69c8d0>}
2025-09-16 12:11:40,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:11:40,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:11:41,014 baseline-bpql-noisepromille25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=580, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:11:41,014 baseline-bpql-noisepromille25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:11:43,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:11:43,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:13:30,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:13:31,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 446.03302 ± 77.266
2025-09-16 12:13:31,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [479.9322, 572.12067, 502.86017, 490.30737, 365.07138, 340.41446, 508.8515, 339.3963, 475.21152, 386.1643]
2025-09-16 12:13:31,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 109.0, 95.0, 91.0, 74.0, 63.0, 99.0, 71.0, 96.0, 81.0]
2025-09-16 12:13:31,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (446.03) for latency 12
2025-09-16 12:13:31,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 58 minutes, 46 seconds)
2025-09-16 12:15:27,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:15:28,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 405.21512 ± 38.100
2025-09-16 12:15:28,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [400.3939, 376.10446, 421.26746, 389.07358, 474.25836, 456.54208, 383.4837, 375.40048, 431.1677, 344.45938]
2025-09-16 12:15:28,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 72.0, 89.0, 74.0, 91.0, 93.0, 73.0, 70.0, 81.0, 65.0]
2025-09-16 12:15:28,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 3 minutes, 33 seconds)
2025-09-16 12:17:25,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:17:26,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 449.06305 ± 60.698
2025-09-16 12:17:26,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [491.49686, 398.71866, 527.8401, 465.83847, 472.77557, 386.1199, 547.3662, 374.5941, 455.82043, 370.06006]
2025-09-16 12:17:26,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 74.0, 104.0, 89.0, 87.0, 72.0, 102.0, 69.0, 87.0, 70.0]
2025-09-16 12:17:26,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (449.06) for latency 12
2025-09-16 12:17:26,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 4 minutes, 43 seconds)
2025-09-16 12:19:22,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:19:23,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 442.40912 ± 51.935
2025-09-16 12:19:23,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [445.69757, 435.7794, 513.43304, 462.0376, 546.5658, 360.32938, 390.28568, 421.60904, 430.8192, 417.53445]
2025-09-16 12:19:23,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 81.0, 97.0, 87.0, 103.0, 73.0, 76.0, 79.0, 92.0, 79.0]
2025-09-16 12:19:23,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 54 seconds)
2025-09-16 12:21:18,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:21:20,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 476.43683 ± 99.783
2025-09-16 12:21:20,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [426.2277, 445.13745, 396.86563, 715.73987, 415.58655, 403.06165, 609.60974, 459.3155, 404.283, 488.54092]
2025-09-16 12:21:20,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 93.0, 75.0, 137.0, 88.0, 76.0, 130.0, 86.0, 83.0, 90.0]
2025-09-16 12:21:20,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (476.44) for latency 12
2025-09-16 12:21:20,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 38 seconds)
2025-09-16 12:23:17,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:23:18,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 454.45294 ± 71.888
2025-09-16 12:23:18,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [426.9596, 370.0298, 380.95956, 340.9255, 474.93567, 464.46204, 572.6503, 458.9259, 540.96124, 513.7198]
2025-09-16 12:23:18,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 75.0, 72.0, 65.0, 93.0, 91.0, 105.0, 85.0, 102.0, 102.0]
2025-09-16 12:23:18,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 52 seconds)
2025-09-16 12:25:14,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:25:16,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 542.95502 ± 91.957
2025-09-16 12:25:16,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [497.52457, 521.6569, 426.31656, 712.2592, 502.94162, 601.09406, 683.60345, 436.15527, 561.9869, 486.0117]
2025-09-16 12:25:16,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 99.0, 80.0, 147.0, 93.0, 115.0, 135.0, 85.0, 104.0, 93.0]
2025-09-16 12:25:16,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (542.96) for latency 12
2025-09-16 12:25:16,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 2 minutes, 11 seconds)
2025-09-16 12:27:12,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:27:13,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 554.33722 ± 108.149
2025-09-16 12:27:13,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [425.65555, 566.5531, 622.2821, 432.9354, 638.92285, 371.19574, 510.88507, 664.1831, 601.38544, 709.3742]
2025-09-16 12:27:13,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 106.0, 117.0, 89.0, 120.0, 70.0, 97.0, 129.0, 115.0, 144.0]
2025-09-16 12:27:13,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (554.34) for latency 12
2025-09-16 12:27:14,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 13 seconds)
2025-09-16 12:29:09,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:29:11,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 541.00867 ± 128.843
2025-09-16 12:29:11,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [598.61487, 395.85645, 499.973, 375.11856, 601.1584, 398.74103, 451.33325, 727.61115, 738.23505, 623.44464]
2025-09-16 12:29:11,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 73.0, 94.0, 69.0, 113.0, 75.0, 83.0, 139.0, 151.0, 118.0]
2025-09-16 12:29:11,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 58 minutes, 21 seconds)
2025-09-16 12:31:07,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:31:09,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 528.69214 ± 102.836
2025-09-16 12:31:09,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [339.4638, 600.1249, 677.88873, 478.91565, 541.59827, 666.369, 572.1201, 492.35776, 395.5035, 522.5796]
2025-09-16 12:31:09,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 112.0, 135.0, 93.0, 106.0, 147.0, 116.0, 93.0, 74.0, 97.0]
2025-09-16 12:31:09,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 44 seconds)
2025-09-16 12:33:05,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:33:06,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 512.26501 ± 59.449
2025-09-16 12:33:06,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [572.8097, 596.63794, 518.73035, 567.86005, 486.8148, 486.12943, 498.5859, 522.8873, 500.09375, 372.10046]
2025-09-16 12:33:06,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 113.0, 96.0, 103.0, 94.0, 92.0, 106.0, 99.0, 98.0, 72.0]
2025-09-16 12:33:06,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 54 minutes, 31 seconds)
2025-09-16 12:35:04,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:35:05,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 566.72076 ± 83.620
2025-09-16 12:35:05,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [500.48892, 592.15985, 610.87256, 391.0543, 645.7151, 535.46735, 717.4757, 588.95074, 557.8626, 527.16]
2025-09-16 12:35:05,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 109.0, 124.0, 78.0, 128.0, 101.0, 140.0, 127.0, 115.0, 116.0]
2025-09-16 12:35:05,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (566.72) for latency 12
2025-09-16 12:35:05,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 53 minutes, 3 seconds)
2025-09-16 12:37:02,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:37:04,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 622.36487 ± 180.209
2025-09-16 12:37:04,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [661.71857, 615.6267, 1015.8977, 588.34973, 374.8169, 582.7583, 476.53168, 800.7128, 409.12558, 698.1109]
2025-09-16 12:37:04,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 117.0, 199.0, 112.0, 71.0, 109.0, 100.0, 159.0, 75.0, 143.0]
2025-09-16 12:37:04,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (622.36) for latency 12
2025-09-16 12:37:04,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 13 seconds)
2025-09-16 12:39:03,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:39:04,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 519.15448 ± 150.734
2025-09-16 12:39:04,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [550.7574, 466.5805, 475.03345, 743.5176, 565.75714, 456.63675, 165.7663, 469.63953, 603.86835, 693.98785]
2025-09-16 12:39:04,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 89.0, 98.0, 143.0, 120.0, 94.0, 32.0, 88.0, 113.0, 129.0]
2025-09-16 12:39:04,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 50 minutes, 12 seconds)
2025-09-16 12:41:03,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:41:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 573.79486 ± 158.193
2025-09-16 12:41:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [442.34665, 620.1466, 465.86676, 571.87195, 540.69415, 934.48846, 556.3245, 379.79422, 766.3881, 460.02725]
2025-09-16 12:41:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 118.0, 95.0, 122.0, 101.0, 187.0, 120.0, 82.0, 149.0, 97.0]
2025-09-16 12:41:05,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 48 minutes, 46 seconds)
2025-09-16 12:43:01,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:43:03,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 484.24960 ± 116.357
2025-09-16 12:43:03,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [498.7037, 594.0314, 273.7943, 490.65448, 418.90063, 737.2046, 531.68713, 428.79712, 423.49777, 445.22534]
2025-09-16 12:43:03,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 109.0, 55.0, 94.0, 81.0, 136.0, 102.0, 81.0, 82.0, 86.0]
2025-09-16 12:43:03,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 46 minutes, 57 seconds)
2025-09-16 12:45:00,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:45:02,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 648.87097 ± 108.795
2025-09-16 12:45:02,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [635.05865, 758.0088, 736.5292, 657.6456, 557.76544, 783.0677, 424.6925, 570.0029, 601.1476, 764.7916]
2025-09-16 12:45:02,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 148.0, 141.0, 126.0, 119.0, 167.0, 93.0, 121.0, 114.0, 163.0]
2025-09-16 12:45:02,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (648.87) for latency 12
2025-09-16 12:45:02,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 59 seconds)
2025-09-16 12:47:01,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:47:03,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 589.61096 ± 125.824
2025-09-16 12:47:03,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [441.76822, 465.61978, 649.1031, 462.18347, 508.28812, 574.6551, 549.9902, 817.2854, 641.79083, 785.42456]
2025-09-16 12:47:03,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 101.0, 123.0, 101.0, 111.0, 123.0, 113.0, 155.0, 122.0, 151.0]
2025-09-16 12:47:03,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 39 seconds)
2025-09-16 12:49:00,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:49:02,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 628.44934 ± 193.496
2025-09-16 12:49:02,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [563.41327, 637.2894, 770.8205, 834.2079, 426.746, 1029.8243, 596.84985, 616.3751, 450.5375, 358.42947]
2025-09-16 12:49:02,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 121.0, 146.0, 159.0, 93.0, 209.0, 110.0, 115.0, 85.0, 69.0]
2025-09-16 12:49:02,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 16 seconds)
2025-09-16 12:51:00,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:51:02,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 638.83691 ± 137.954
2025-09-16 12:51:02,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [739.0954, 646.3406, 972.143, 489.26743, 468.98297, 685.0108, 531.68335, 625.83417, 576.4518, 653.5599]
2025-09-16 12:51:02,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 127.0, 182.0, 92.0, 100.0, 149.0, 97.0, 119.0, 111.0, 131.0]
2025-09-16 12:51:02,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 19 seconds)
2025-09-16 12:53:00,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:53:01,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 680.86304 ± 289.908
2025-09-16 12:53:01,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [272.22974, 633.11224, 495.70114, 1392.5323, 443.6315, 674.5839, 641.28687, 569.58484, 781.2921, 904.67596]
2025-09-16 12:53:01,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 116.0, 92.0, 277.0, 95.0, 135.0, 118.0, 107.0, 158.0, 192.0]
2025-09-16 12:53:01,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (680.86) for latency 12
2025-09-16 12:53:01,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 39 seconds)
2025-09-16 12:55:00,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:55:01,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 604.67261 ± 131.869
2025-09-16 12:55:01,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [393.7886, 532.305, 713.8549, 548.0589, 810.86865, 729.01465, 530.4543, 767.5599, 535.623, 485.1983]
2025-09-16 12:55:01,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 100.0, 136.0, 98.0, 151.0, 153.0, 97.0, 152.0, 115.0, 101.0]
2025-09-16 12:55:01,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 53 seconds)
2025-09-16 12:57:00,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:57:02,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 614.74622 ± 96.615
2025-09-16 12:57:02,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [559.18414, 585.4486, 544.3294, 628.00665, 644.2665, 421.4586, 818.0541, 622.4394, 653.55054, 670.72363]
2025-09-16 12:57:02,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 118.0, 117.0, 116.0, 119.0, 87.0, 159.0, 114.0, 124.0, 129.0]
2025-09-16 12:57:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 43 seconds)
2025-09-16 12:59:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:59:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 647.42139 ± 145.504
2025-09-16 12:59:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [695.6876, 831.7494, 588.6062, 899.26025, 434.62442, 760.72955, 508.9365, 549.33856, 508.84335, 696.4381]
2025-09-16 12:59:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 183.0, 124.0, 183.0, 93.0, 150.0, 113.0, 120.0, 106.0, 134.0]
2025-09-16 12:59:02,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 31 minutes, 58 seconds)
2025-09-16 13:01:00,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:01:02,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 618.67535 ± 233.994
2025-09-16 13:01:02,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [590.56726, 731.1063, 439.89435, 1193.8936, 617.52075, 319.17535, 431.36063, 546.3026, 520.5027, 796.43]
2025-09-16 13:01:02,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 136.0, 91.0, 227.0, 124.0, 68.0, 86.0, 103.0, 96.0, 147.0]
2025-09-16 13:01:02,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 58 seconds)
2025-09-16 13:03:01,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:03:02,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 740.79626 ± 139.765
2025-09-16 13:03:02,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [825.24054, 459.074, 897.03784, 666.59143, 693.3337, 821.2789, 811.42224, 609.74756, 949.70605, 674.5309]
2025-09-16 13:03:02,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 85.0, 176.0, 141.0, 141.0, 159.0, 164.0, 115.0, 178.0, 129.0]
2025-09-16 13:03:02,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (740.80) for latency 12
2025-09-16 13:03:02,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 15 seconds)
2025-09-16 13:05:01,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:05:03,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 832.45154 ± 358.480
2025-09-16 13:05:03,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1822.2133, 626.32556, 806.37933, 891.8882, 556.6826, 721.54944, 805.4053, 889.23615, 428.32895, 776.5068]
2025-09-16 13:05:03,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [354.0, 121.0, 154.0, 191.0, 108.0, 137.0, 153.0, 167.0, 83.0, 144.0]
2025-09-16 13:05:03,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (832.45) for latency 12
2025-09-16 13:05:03,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 23 seconds)
2025-09-16 13:07:00,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:07:02,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 667.87537 ± 130.062
2025-09-16 13:07:02,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [570.30994, 632.22626, 725.0393, 836.4221, 741.7658, 837.66815, 677.9, 401.1677, 533.2248, 723.0301]
2025-09-16 13:07:02,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 131.0, 148.0, 178.0, 138.0, 173.0, 122.0, 88.0, 106.0, 136.0]
2025-09-16 13:07:02,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 5 seconds)
2025-09-16 13:09:01,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:09:03,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 782.31561 ± 200.792
2025-09-16 13:09:03,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [893.14264, 877.5256, 640.24365, 832.4471, 545.5554, 848.935, 462.38632, 1162.5135, 941.01385, 619.3934]
2025-09-16 13:09:03,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 164.0, 128.0, 174.0, 108.0, 162.0, 95.0, 223.0, 191.0, 129.0]
2025-09-16 13:09:03,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 14 seconds)
2025-09-16 13:11:02,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:11:04,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 749.31458 ± 135.581
2025-09-16 13:11:04,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1039.9384, 603.78705, 680.0273, 659.20764, 863.7651, 901.628, 648.10077, 782.3289, 675.254, 639.1085]
2025-09-16 13:11:04,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 125.0, 124.0, 127.0, 158.0, 164.0, 136.0, 151.0, 124.0, 124.0]
2025-09-16 13:11:04,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 26 seconds)
2025-09-16 13:13:01,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:13:03,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 800.26038 ± 132.074
2025-09-16 13:13:03,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [748.62866, 787.8679, 759.4687, 876.4016, 668.47534, 762.7781, 1028.4574, 1013.4928, 773.90106, 583.13245]
2025-09-16 13:13:03,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 149.0, 138.0, 171.0, 123.0, 142.0, 194.0, 205.0, 149.0, 116.0]
2025-09-16 13:13:03,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 3 seconds)
2025-09-16 13:15:02,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:15:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 959.90314 ± 154.814
2025-09-16 13:15:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [811.27515, 809.9563, 1029.1217, 831.27295, 919.2581, 969.12524, 1241.7966, 1111.9398, 1122.6514, 752.63464]
2025-09-16 13:15:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 158.0, 212.0, 157.0, 191.0, 191.0, 247.0, 225.0, 235.0, 139.0]
2025-09-16 13:15:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (959.90) for latency 12
2025-09-16 13:15:05,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 28 seconds)
2025-09-16 13:17:03,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:17:06,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 887.14758 ± 188.880
2025-09-16 13:17:06,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [967.3197, 700.9036, 1042.2145, 1189.0115, 887.33746, 870.6537, 706.52423, 884.9437, 531.94403, 1090.6227]
2025-09-16 13:17:06,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 138.0, 205.0, 236.0, 168.0, 163.0, 127.0, 181.0, 103.0, 224.0]
2025-09-16 13:17:06,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 14 minutes, 48 seconds)
2025-09-16 13:19:05,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:19:07,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 753.79901 ± 206.206
2025-09-16 13:19:07,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [836.2115, 460.03, 650.3948, 722.61487, 1016.30634, 1193.4481, 676.0865, 665.7986, 543.82855, 773.27106]
2025-09-16 13:19:07,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 101.0, 141.0, 144.0, 207.0, 244.0, 134.0, 128.0, 106.0, 149.0]
2025-09-16 13:19:07,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 55 seconds)
2025-09-16 13:21:05,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:21:08,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 954.86243 ± 225.678
2025-09-16 13:21:08,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1001.11865, 906.5973, 883.2435, 510.1772, 864.2893, 1061.9377, 1063.1287, 1459.7482, 967.97424, 830.40875]
2025-09-16 13:21:08,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 170.0, 164.0, 110.0, 172.0, 205.0, 203.0, 288.0, 184.0, 171.0]
2025-09-16 13:21:08,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 50 seconds)
2025-09-16 13:23:07,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:23:09,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 960.96600 ± 177.029
2025-09-16 13:23:09,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1212.2421, 1048.1736, 1078.702, 643.3576, 805.0587, 808.5151, 1023.08405, 843.235, 937.1761, 1210.1163]
2025-09-16 13:23:09,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 212.0, 209.0, 137.0, 148.0, 155.0, 216.0, 161.0, 199.0, 221.0]
2025-09-16 13:23:09,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (960.97) for latency 12
2025-09-16 13:23:09,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 25 seconds)
2025-09-16 13:25:08,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:25:11,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1054.52466 ± 336.091
2025-09-16 13:25:11,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1473.1481, 898.0435, 773.7016, 746.0923, 1508.1357, 852.01807, 1119.6375, 900.71954, 1620.4111, 653.3395]
2025-09-16 13:25:11,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [287.0, 193.0, 153.0, 146.0, 301.0, 180.0, 236.0, 175.0, 322.0, 138.0]
2025-09-16 13:25:11,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1054.52) for latency 12
2025-09-16 13:25:11,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 20 seconds)
2025-09-16 13:27:10,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:27:12,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 830.08215 ± 188.150
2025-09-16 13:27:12,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [784.36523, 622.2855, 701.566, 577.4436, 959.5775, 1136.4326, 1081.3484, 663.9682, 988.3673, 785.4668]
2025-09-16 13:27:12,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 119.0, 144.0, 107.0, 172.0, 208.0, 208.0, 124.0, 185.0, 140.0]
2025-09-16 13:27:12,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 18 seconds)
2025-09-16 13:29:10,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:29:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1000.40613 ± 397.224
2025-09-16 13:29:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1409.08, 1898.1556, 1145.1328, 743.1623, 652.8026, 1163.7744, 852.6387, 847.01215, 838.38, 453.92313]
2025-09-16 13:29:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [265.0, 382.0, 208.0, 137.0, 136.0, 213.0, 161.0, 155.0, 163.0, 97.0]
2025-09-16 13:29:13,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 15 seconds)
2025-09-16 13:31:12,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:31:14,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 866.81873 ± 133.754
2025-09-16 13:31:14,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1161.8385, 640.9008, 758.03, 841.35394, 975.2033, 870.6932, 765.8427, 883.65454, 937.36957, 833.3008]
2025-09-16 13:31:14,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 130.0, 144.0, 154.0, 191.0, 160.0, 138.0, 166.0, 168.0, 155.0]
2025-09-16 13:31:14,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 17 seconds)
2025-09-16 13:33:13,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:33:15,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 725.37732 ± 234.316
2025-09-16 13:33:15,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [473.87872, 1184.7441, 1037.7408, 836.62335, 538.3133, 821.2043, 529.05505, 443.3563, 712.5158, 676.342]
2025-09-16 13:33:15,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 241.0, 197.0, 158.0, 108.0, 155.0, 107.0, 91.0, 156.0, 127.0]
2025-09-16 13:33:15,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 59 minutes, 1 second)
2025-09-16 13:35:12,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:35:16,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1138.17041 ± 186.237
2025-09-16 13:35:16,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1288.6521, 1301.6624, 1111.6437, 1251.6682, 988.2305, 1103.8785, 1093.231, 1416.8513, 1111.797, 714.08875]
2025-09-16 13:35:16,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [257.0, 260.0, 218.0, 251.0, 190.0, 215.0, 206.0, 270.0, 219.0, 144.0]
2025-09-16 13:35:16,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1138.17) for latency 12
2025-09-16 13:35:16,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 56 minutes, 48 seconds)
2025-09-16 13:37:15,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:37:18,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 997.38153 ± 106.665
2025-09-16 13:37:18,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1126.6012, 958.9537, 948.0836, 1078.8202, 1187.2576, 946.1024, 907.9223, 1037.325, 980.20355, 802.5459]
2025-09-16 13:37:18,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 172.0, 183.0, 204.0, 235.0, 174.0, 167.0, 195.0, 185.0, 146.0]
2025-09-16 13:37:18,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 3 seconds)
2025-09-16 13:39:17,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:39:20,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1124.58911 ± 219.607
2025-09-16 13:39:20,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1049.541, 1005.1305, 776.04645, 1224.1655, 1195.4238, 1434.2333, 1152.3756, 1043.7273, 853.9676, 1511.2804]
2025-09-16 13:39:20,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 196.0, 155.0, 228.0, 236.0, 266.0, 242.0, 203.0, 170.0, 276.0]
2025-09-16 13:39:20,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 16 seconds)
2025-09-16 13:41:20,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:41:22,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1092.93823 ± 504.292
2025-09-16 13:41:22,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1320.1409, 1656.9739, 807.6968, 660.2786, 740.1732, 1022.73724, 2318.484, 768.4889, 907.0135, 727.3953]
2025-09-16 13:41:22,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [249.0, 328.0, 158.0, 123.0, 141.0, 205.0, 455.0, 143.0, 186.0, 140.0]
2025-09-16 13:41:22,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 31 seconds)
2025-09-16 13:43:19,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:43:24,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1653.48474 ± 827.684
2025-09-16 13:43:24,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [942.8826, 1152.8046, 3755.4285, 1216.9161, 1617.1106, 1345.3534, 1171.6965, 2621.2107, 1207.4031, 1504.0413]
2025-09-16 13:43:24,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 219.0, 750.0, 259.0, 324.0, 274.0, 236.0, 510.0, 230.0, 269.0]
2025-09-16 13:43:24,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1653.48) for latency 12
2025-09-16 13:43:24,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 42 seconds)
2025-09-16 13:45:23,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:45:27,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1618.37610 ± 826.404
2025-09-16 13:45:27,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1385.4163, 3134.454, 1768.5696, 691.95306, 1240.9536, 1450.5372, 601.7818, 2007.65, 2955.8782, 946.56757]
2025-09-16 13:45:27,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [266.0, 613.0, 336.0, 125.0, 234.0, 270.0, 111.0, 373.0, 586.0, 180.0]
2025-09-16 13:45:27,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 48 minutes, 5 seconds)
2025-09-16 13:47:27,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:47:32,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1560.45679 ± 610.002
2025-09-16 13:47:32,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2578.0076, 558.68964, 1255.9297, 994.7645, 1750.7513, 1970.0133, 2291.2065, 1412.7816, 918.91125, 1873.5115]
2025-09-16 13:47:32,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [480.0, 99.0, 239.0, 195.0, 326.0, 364.0, 440.0, 267.0, 174.0, 356.0]
2025-09-16 13:47:32,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 46 minutes, 26 seconds)
2025-09-16 13:49:32,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:49:35,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1175.45129 ± 353.326
2025-09-16 13:49:35,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2069.861, 970.86554, 691.4602, 1154.407, 1072.95, 1247.9805, 1048.9423, 1444.8082, 918.79083, 1134.4469]
2025-09-16 13:49:35,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [395.0, 203.0, 131.0, 212.0, 202.0, 236.0, 191.0, 259.0, 161.0, 230.0]
2025-09-16 13:49:35,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 44 minutes, 33 seconds)
2025-09-16 13:51:35,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:51:38,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1356.28540 ± 350.831
2025-09-16 13:51:38,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1504.2743, 1288.3263, 1275.21, 2076.0652, 1134.534, 899.0019, 1773.38, 1541.018, 1096.7433, 974.30035]
2025-09-16 13:51:38,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [275.0, 238.0, 259.0, 395.0, 208.0, 170.0, 337.0, 290.0, 197.0, 185.0]
2025-09-16 13:51:38,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 42 minutes, 37 seconds)
2025-09-16 13:53:37,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:53:40,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 987.37793 ± 421.631
2025-09-16 13:53:40,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [803.6696, 920.3319, 973.17145, 819.50433, 1120.9983, 2191.9797, 845.4166, 697.1498, 637.1266, 864.431]
2025-09-16 13:53:40,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 184.0, 176.0, 154.0, 221.0, 404.0, 176.0, 143.0, 124.0, 159.0]
2025-09-16 13:53:40,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 40 minutes, 33 seconds)
2025-09-16 13:55:37,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:55:42,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2058.13965 ± 1143.725
2025-09-16 13:55:42,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [814.7795, 3220.8513, 2393.5776, 1336.0789, 2017.7747, 1277.4425, 4812.2295, 1080.8342, 2187.2637, 1440.5641]
2025-09-16 13:55:42,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 592.0, 451.0, 249.0, 376.0, 259.0, 909.0, 226.0, 421.0, 270.0]
2025-09-16 13:55:42,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2058.14) for latency 12
2025-09-16 13:55:42,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 38 minutes, 23 seconds)
2025-09-16 13:57:42,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:57:45,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1202.05762 ± 434.693
2025-09-16 13:57:45,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1091.2502, 1128.0425, 793.71533, 1402.0592, 665.53424, 1187.3184, 1403.0035, 578.8924, 1970.3372, 1800.4233]
2025-09-16 13:57:45,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 224.0, 152.0, 270.0, 140.0, 246.0, 277.0, 117.0, 397.0, 342.0]
2025-09-16 13:57:45,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 36 minutes, 5 seconds)
2025-09-16 13:59:47,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:59:51,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1794.76953 ± 605.715
2025-09-16 13:59:51,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1886.3079, 2236.0615, 2633.7202, 1201.5984, 1907.1873, 2731.553, 1512.4755, 1453.8689, 645.44763, 1739.4773]
2025-09-16 13:59:51,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [344.0, 414.0, 520.0, 223.0, 346.0, 515.0, 277.0, 286.0, 121.0, 320.0]
2025-09-16 13:59:52,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 34 minutes, 32 seconds)
2025-09-16 14:01:48,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:01:53,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1853.75317 ± 848.638
2025-09-16 14:01:53,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1457.3075, 2248.664, 1579.5457, 2266.131, 2265.9128, 2069.8992, 3826.7236, 787.926, 1031.1493, 1004.2731]
2025-09-16 14:01:53,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [268.0, 436.0, 305.0, 437.0, 413.0, 412.0, 712.0, 150.0, 199.0, 183.0]
2025-09-16 14:01:53,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 32 minutes, 13 seconds)
2025-09-16 14:03:52,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:04:01,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3103.62061 ± 1306.491
2025-09-16 14:04:01,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3216.668, 3260.2349, 2554.932, 5094.3887, 5167.8555, 1248.2015, 1132.3914, 3497.0608, 3626.761, 2237.7139]
2025-09-16 14:04:01,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [621.0, 635.0, 507.0, 1000.0, 1000.0, 231.0, 217.0, 693.0, 713.0, 422.0]
2025-09-16 14:04:01,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (3103.62) for latency 12
2025-09-16 14:04:01,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 31 minutes, 7 seconds)
2025-09-16 14:06:03,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:06:10,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2559.09497 ± 1483.457
2025-09-16 14:06:10,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [919.4442, 2651.118, 2439.8547, 1832.2382, 2699.654, 1903.7546, 1297.8937, 5287.473, 1249.4434, 5310.076]
2025-09-16 14:06:10,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 491.0, 439.0, 351.0, 517.0, 351.0, 249.0, 1000.0, 234.0, 1000.0]
2025-09-16 14:06:10,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 53 seconds)
2025-09-16 14:08:08,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:08:13,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2038.01050 ± 1164.972
2025-09-16 14:08:13,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [936.8271, 969.7797, 3757.224, 1194.3593, 657.4385, 2898.2615, 3421.3384, 3090.4, 807.9613, 2646.5154]
2025-09-16 14:08:13,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 173.0, 738.0, 245.0, 120.0, 532.0, 623.0, 566.0, 166.0, 492.0]
2025-09-16 14:08:13,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 27 minutes, 57 seconds)
2025-09-16 14:10:10,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:10:18,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2666.65210 ± 1293.327
2025-09-16 14:10:18,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1745.704, 5024.8022, 2891.7358, 1505.4493, 3109.3909, 2419.085, 1449.9739, 2322.2378, 1270.321, 4927.8223]
2025-09-16 14:10:18,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [332.0, 1000.0, 571.0, 314.0, 613.0, 470.0, 282.0, 460.0, 256.0, 1000.0]
2025-09-16 14:10:18,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 25 minutes, 38 seconds)
2025-09-16 14:12:20,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:12:27,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2906.55664 ± 1466.886
2025-09-16 14:12:27,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5333.6064, 5318.83, 2619.5518, 1590.1904, 1765.0126, 4284.8804, 2825.7283, 1573.1031, 1211.5304, 2543.132]
2025-09-16 14:12:27,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 489.0, 306.0, 327.0, 769.0, 527.0, 293.0, 235.0, 470.0]
2025-09-16 14:12:27,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 24 minutes, 34 seconds)
2025-09-16 14:14:31,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:14:38,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2598.90918 ± 1855.510
2025-09-16 14:14:38,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1067.9247, 5335.161, 3565.3372, 4882.132, 1152.5159, 5323.09, 1791.1681, 611.2428, 1232.1383, 1028.3823]
2025-09-16 14:14:38,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 1000.0, 672.0, 907.0, 232.0, 1000.0, 345.0, 124.0, 237.0, 192.0]
2025-09-16 14:14:38,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 22 minutes, 45 seconds)
2025-09-16 14:16:40,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:16:50,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3322.90698 ± 1937.848
2025-09-16 14:16:50,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5386.619, 5189.1265, 1318.4242, 1839.7961, 1396.6127, 5196.9165, 1193.3635, 5197.2905, 1216.6616, 5294.259]
2025-09-16 14:16:50,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 254.0, 378.0, 270.0, 1000.0, 252.0, 976.0, 255.0, 1000.0]
2025-09-16 14:16:50,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (3322.91) for latency 12
2025-09-16 14:16:50,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 21 minutes, 4 seconds)
2025-09-16 14:18:47,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:18:53,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2326.08008 ± 1021.567
2025-09-16 14:18:53,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2347.761, 927.19763, 1565.7711, 1174.5626, 2247.1438, 3731.355, 1793.0819, 2724.7075, 2386.7505, 4362.4688]
2025-09-16 14:18:53,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [437.0, 192.0, 280.0, 216.0, 416.0, 707.0, 341.0, 490.0, 433.0, 787.0]
2025-09-16 14:18:53,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 18 minutes, 57 seconds)
2025-09-16 14:20:57,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:21:08,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3958.55518 ± 1351.636
2025-09-16 14:21:08,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5275.594, 5158.018, 4435.701, 5253.582, 3695.0515, 3194.4255, 911.156, 3604.315, 5277.015, 2780.697]
2025-09-16 14:21:08,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 861.0, 1000.0, 694.0, 595.0, 189.0, 681.0, 1000.0, 541.0]
2025-09-16 14:21:08,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (3958.56) for latency 12
2025-09-16 14:21:08,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 17 minutes, 57 seconds)
2025-09-16 14:23:05,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:23:18,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4136.48682 ± 1531.957
2025-09-16 14:23:18,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3447.7087, 1664.0051, 5081.117, 5090.0234, 5127.931, 5044.3403, 834.5461, 5014.997, 5008.229, 5051.9707]
2025-09-16 14:23:18,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [700.0, 342.0, 1000.0, 1000.0, 1000.0, 1000.0, 173.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:23:18,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4136.49) for latency 12
2025-09-16 14:23:18,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 15 minutes, 51 seconds)
2025-09-16 14:25:14,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:25:24,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3425.01099 ± 1848.354
2025-09-16 14:25:24,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2652.2834, 1368.7612, 1935.788, 5360.4014, 748.6495, 5255.6865, 1507.9589, 5332.5347, 5434.4956, 4653.5483]
2025-09-16 14:25:24,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [508.0, 276.0, 364.0, 1000.0, 164.0, 1000.0, 290.0, 1000.0, 1000.0, 862.0]
2025-09-16 14:25:24,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 13 minutes, 14 seconds)
2025-09-16 14:27:31,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:27:43,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4130.99854 ± 1285.412
2025-09-16 14:27:43,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5008.447, 2987.554, 5287.4414, 2322.7036, 5258.7993, 5244.088, 5282.5107, 2785.021, 2233.893, 4899.5264]
2025-09-16 14:27:43,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [940.0, 572.0, 1000.0, 419.0, 1000.0, 1000.0, 1000.0, 548.0, 445.0, 932.0]
2025-09-16 14:27:43,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 11 minutes, 49 seconds)
2025-09-16 14:29:36,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:29:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4701.25537 ± 619.797
2025-09-16 14:29:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [4713.888, 5356.072, 3956.9238, 4570.2266, 5448.1206, 4133.598, 4754.422, 5358.923, 3556.0464, 5164.3364]
2025-09-16 14:29:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [876.0, 1000.0, 726.0, 857.0, 1000.0, 776.0, 879.0, 1000.0, 669.0, 1000.0]
2025-09-16 14:29:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4701.26) for latency 12
2025-09-16 14:29:49,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 9 minutes, 55 seconds)
2025-09-16 14:31:55,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:32:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5335.85107 ± 54.200
2025-09-16 14:32:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5348.1597, 5426.448, 5246.5234, 5425.6294, 5350.25, 5319.922, 5276.724, 5338.206, 5313.447, 5313.2056]
2025-09-16 14:32:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:32:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5335.85) for latency 12
2025-09-16 14:32:10,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 8 minutes, 26 seconds)
2025-09-16 14:34:14,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:34:28,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4919.20752 ± 704.365
2025-09-16 14:34:28,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5345.5283, 5432.0264, 5245.9805, 4765.885, 5365.4956, 5175.871, 5353.965, 4134.082, 3127.746, 5245.493]
2025-09-16 14:34:28,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 892.0, 1000.0, 1000.0, 1000.0, 781.0, 610.0, 1000.0]
2025-09-16 14:34:28,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 7 minutes, 4 seconds)
2025-09-16 14:36:26,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:36:37,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3679.97729 ± 1534.066
2025-09-16 14:36:37,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1451.103, 5273.967, 2687.8086, 3936.8298, 2731.9978, 5382.888, 5374.395, 3043.6802, 5433.6406, 1483.4601]
2025-09-16 14:36:37,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [273.0, 1000.0, 503.0, 727.0, 514.0, 1000.0, 1000.0, 588.0, 1000.0, 310.0]
2025-09-16 14:36:37,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 5 minutes, 1 second)
2025-09-16 14:38:36,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:38:46,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3538.85425 ± 2017.947
2025-09-16 14:38:46,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5138.8145, 2115.9312, 696.5251, 5176.8115, 843.0434, 5148.1396, 5175.4917, 5151.921, 780.96106, 5160.9053]
2025-09-16 14:38:46,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 425.0, 153.0, 1000.0, 181.0, 1000.0, 1000.0, 1000.0, 168.0, 1000.0]
2025-09-16 14:38:46,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 57 seconds)
2025-09-16 14:40:50,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:41:05,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5071.02734 ± 526.557
2025-09-16 14:41:05,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5241.4346, 5126.5806, 5365.543, 5228.782, 5237.4243, 5263.797, 3500.153, 5256.5186, 5271.677, 5218.364]
2025-09-16 14:41:05,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 688.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:41:05,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 49 seconds)
2025-09-16 14:43:01,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:43:15,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5025.13574 ± 569.220
2025-09-16 14:43:15,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5376.7266, 5331.1924, 5300.689, 5268.77, 4000.3381, 5296.5376, 3785.918, 5331.0254, 5327.2383, 5232.9214]
2025-09-16 14:43:15,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 785.0, 1000.0, 728.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:43:15,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 57 minutes, 39 seconds)
2025-09-16 14:45:16,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:45:30,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4903.78418 ± 819.947
2025-09-16 14:45:30,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3699.769, 3245.1206, 5467.891, 4122.0073, 5497.1914, 5418.7183, 5349.0884, 5398.662, 5426.8384, 5412.5522]
2025-09-16 14:45:30,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [686.0, 583.0, 1000.0, 766.0, 1000.0, 1000.0, 1000.0, 980.0, 1000.0, 987.0]
2025-09-16 14:45:30,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 55 minutes, 6 seconds)
2025-09-16 14:47:35,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:47:50,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5213.96191 ± 26.726
2025-09-16 14:47:50,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5217.298, 5237.8477, 5210.3037, 5256.2666, 5146.7817, 5217.622, 5226.142, 5210.1045, 5206.2437, 5211.0073]
2025-09-16 14:47:50,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:47:50,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 53 minutes, 53 seconds)
2025-09-16 14:49:51,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:50:06,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4892.13770 ± 863.157
2025-09-16 14:50:06,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5146.301, 5178.595, 5209.7954, 5122.3145, 5172.193, 5234.723, 2304.447, 5210.997, 5189.751, 5152.261]
2025-09-16 14:50:06,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 443.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:50:06,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 52 minutes, 4 seconds)
2025-09-16 14:52:04,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:52:19,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5343.97168 ± 48.221
2025-09-16 14:52:19,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5403.1997, 5385.6587, 5331.7227, 5318.432, 5369.1514, 5332.694, 5310.204, 5230.9673, 5385.943, 5371.7417]
2025-09-16 14:52:19,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:52:19,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5343.97) for latency 12
2025-09-16 14:52:19,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 49 minutes, 27 seconds)
2025-09-16 14:54:17,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:54:28,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3897.80713 ± 1424.534
2025-09-16 14:54:28,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5037.042, 2738.6528, 1796.3838, 5089.6055, 1854.7135, 4986.562, 5093.8867, 2325.7415, 5027.816, 5027.6665]
2025-09-16 14:54:28,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 531.0, 360.0, 1000.0, 368.0, 1000.0, 1000.0, 470.0, 1000.0, 1000.0]
2025-09-16 14:54:28,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 47 minutes, 6 seconds)
2025-09-16 14:56:34,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:56:50,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5231.06738 ± 57.785
2025-09-16 14:56:50,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5334.826, 5212.7173, 5231.202, 5187.4116, 5228.5747, 5285.0083, 5220.0044, 5246.782, 5260.4854, 5103.663]
2025-09-16 14:56:50,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:56:50,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 45 minutes, 20 seconds)
2025-09-16 14:58:57,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:59:12,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5119.96289 ± 34.657
2025-09-16 14:59:12,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5067.0537, 5087.928, 5114.6533, 5127.9424, 5147.4546, 5166.0894, 5064.1157, 5156.827, 5120.546, 5147.016]
2025-09-16 14:59:12,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:59:12,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 43 minutes, 10 seconds)
2025-09-16 15:01:03,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:01:19,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5038.61035 ± 32.642
2025-09-16 15:01:19,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5039.2954, 5043.443, 5056.6343, 5012.9834, 5016.683, 5023.1514, 5055.5713, 4971.047, 5080.4395, 5086.8555]
2025-09-16 15:01:19,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:01:19,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 40 minutes, 24 seconds)
2025-09-16 15:03:31,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:03:47,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5267.24609 ± 40.828
2025-09-16 15:03:47,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5252.0073, 5309.883, 5289.643, 5329.742, 5259.7993, 5253.743, 5239.4634, 5316.9307, 5223.3047, 5197.947]
2025-09-16 15:03:47,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:03:47,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 38 minutes, 56 seconds)
2025-09-16 15:05:40,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:05:55,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5070.20850 ± 320.722
2025-09-16 15:05:55,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5116.252, 5148.133, 5155.916, 4117.3965, 5195.5835, 5169.76, 5183.639, 5292.8667, 5178.308, 5144.2324]
2025-09-16 15:05:55,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 816.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:05:55,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 36 minutes, 37 seconds)
2025-09-16 15:08:04,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:08:19,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5203.96631 ± 28.492
2025-09-16 15:08:19,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5167.0176, 5222.739, 5228.0527, 5225.5425, 5212.3833, 5237.436, 5192.2095, 5222.639, 5146.2456, 5185.3984]
2025-09-16 15:08:19,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:08:19,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 34 minutes, 29 seconds)
2025-09-16 15:10:11,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:10:26,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4856.03955 ± 1166.292
2025-09-16 15:10:26,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5279.991, 5264.0723, 5265.3594, 5165.3955, 5259.8364, 5175.2534, 5272.7505, 5246.8604, 1359.0833, 5271.7915]
2025-09-16 15:10:26,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 290.0, 1000.0]
2025-09-16 15:10:26,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 27 seconds)
2025-09-16 15:12:28,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:12:43,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5250.70410 ± 38.523
2025-09-16 15:12:43,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5273.354, 5213.109, 5257.5005, 5218.711, 5163.0645, 5276.8228, 5287.148, 5283.9663, 5250.3184, 5283.051]
2025-09-16 15:12:43,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:12:43,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 39 seconds)
2025-09-16 15:14:48,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:15:03,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5281.50000 ± 37.549
2025-09-16 15:15:03,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5308.451, 5262.765, 5309.451, 5246.5947, 5260.284, 5303.8975, 5318.309, 5197.551, 5322.1416, 5285.5527]
2025-09-16 15:15:03,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:15:03,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 3 seconds)
2025-09-16 15:17:11,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:17:27,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5133.97168 ± 31.476
2025-09-16 15:17:27,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5131.103, 5085.516, 5088.8413, 5139.5205, 5168.2603, 5096.021, 5155.7075, 5145.044, 5178.3735, 5151.324]
2025-09-16 15:17:27,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:17:27,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 21 seconds)
2025-09-16 15:19:25,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:19:40,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5229.33350 ± 76.310
2025-09-16 15:19:40,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5255.7295, 5313.613, 5068.479, 5171.9556, 5199.473, 5162.7534, 5256.2964, 5335.643, 5291.2646, 5238.125]
2025-09-16 15:19:40,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:19:40,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 40 seconds)
2025-09-16 15:21:36,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:21:51,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5248.36572 ± 66.032
2025-09-16 15:21:51,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5233.739, 5266.4453, 5280.518, 5317.4604, 5237.5767, 5294.7964, 5280.2383, 5274.455, 5066.6826, 5231.744]
2025-09-16 15:21:51,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:21:51,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 33 seconds)
2025-09-16 15:23:53,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:24:09,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5081.57275 ± 19.853
2025-09-16 15:24:09,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5091.457, 5074.324, 5089.8423, 5040.451, 5078.3794, 5113.77, 5073.427, 5091.283, 5061.093, 5101.698]
2025-09-16 15:24:09,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:24:09,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 16 seconds)
2025-09-16 15:26:11,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:26:27,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5392.97168 ± 38.693
2025-09-16 15:26:27,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5392.5586, 5341.5776, 5435.354, 5370.7896, 5366.351, 5422.27, 5432.0, 5361.955, 5349.713, 5457.1445]
2025-09-16 15:26:27,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:26:27,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5392.97) for latency 12
2025-09-16 15:26:27,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 57 seconds)
2025-09-16 15:28:27,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:28:43,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5372.22705 ± 39.082
2025-09-16 15:28:43,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5409.882, 5370.3926, 5304.7373, 5420.797, 5382.216, 5370.338, 5296.824, 5393.5796, 5398.4077, 5375.0986]
2025-09-16 15:28:43,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:28:43,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 31 seconds)
2025-09-16 15:30:51,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:31:05,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4767.03223 ± 1290.006
2025-09-16 15:31:05,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5257.175, 5143.2607, 5197.2964, 898.78674, 5209.1494, 5162.656, 5176.5947, 5234.4756, 5248.2524, 5142.672]
2025-09-16 15:31:05,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 189.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:31:05,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 25 seconds)
2025-09-16 15:33:06,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:33:20,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4940.81689 ± 1339.976
2025-09-16 15:33:20,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5310.8433, 5449.304, 5341.1675, 5355.618, 5370.794, 922.78613, 5436.653, 5416.344, 5394.302, 5410.3555]
2025-09-16 15:33:20,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 180.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:33:20,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 10 seconds)
2025-09-16 15:35:19,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:35:33,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4923.85645 ± 1291.918
2025-09-16 15:35:33,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5358.542, 5293.212, 5397.187, 5352.362, 5356.969, 5369.845, 1049.2878, 5361.198, 5396.231, 5303.729]
2025-09-16 15:35:33,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 202.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:35:33,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 50 seconds)
2025-09-16 15:37:35,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:37:50,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5230.06885 ± 25.831
2025-09-16 15:37:50,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5254.364, 5250.4565, 5224.712, 5264.47, 5233.247, 5219.3687, 5168.052, 5216.7095, 5223.1787, 5246.1284]
2025-09-16 15:37:50,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:37:50,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 33 seconds)
2025-09-16 15:39:45,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:40:00,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5150.85059 ± 38.563
2025-09-16 15:40:00,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5159.021, 5113.1836, 5176.5425, 5221.7983, 5097.2036, 5195.2603, 5135.102, 5168.5776, 5136.496, 5105.317]
2025-09-16 15:40:00,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:40:00,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 15 seconds)
2025-09-16 15:41:59,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:42:13,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4936.23828 ± 1075.870
2025-09-16 15:42:13,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5239.8784, 5321.7197, 5327.197, 1709.5619, 5305.681, 5303.5903, 5267.2705, 5316.3105, 5298.177, 5273.001]
2025-09-16 15:42:13,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 346.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:42:13,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1251 [DEBUG]: Training session finished
