2025-09-16 14:43:03,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.025-delay_21
2025-09-16 14:43:03,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.025-delay_21
2025-09-16 14:43:03,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x153bea078810>}
2025-09-16 14:43:03,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:43:04,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:43:04,020 baseline-bpql-noisepromille25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=733, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:43:04,020 baseline-bpql-noisepromille25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:43:05,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:43:05,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:44:56,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:44:57,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 390.01910 ± 52.559
2025-09-16 14:44:57,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [322.63727, 365.74673, 365.46234, 369.534, 529.1743, 365.98325, 391.511, 377.0298, 426.7064, 386.4062]
2025-09-16 14:44:57,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 70.0, 70.0, 69.0, 105.0, 70.0, 76.0, 72.0, 83.0, 72.0]
2025-09-16 14:44:57,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (390.02) for latency 21
2025-09-16 14:44:57,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 4 minutes)
2025-09-16 14:46:56,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:46:58,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 418.00726 ± 117.827
2025-09-16 14:46:58,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [558.7955, 435.18723, 607.3416, 429.3432, 226.68863, 393.6271, 418.39236, 446.70615, 209.68018, 454.311]
2025-09-16 14:46:58,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 81.0, 116.0, 85.0, 44.0, 78.0, 77.0, 88.0, 41.0, 85.0]
2025-09-16 14:46:58,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (418.01) for latency 21
2025-09-16 14:46:58,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 9 minutes, 34 seconds)
2025-09-16 14:48:56,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:48:57,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 404.32364 ± 151.185
2025-09-16 14:48:57,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [263.60355, 378.31833, 410.86084, 161.85367, 406.32788, 692.6816, 635.8997, 434.82535, 342.2098, 316.65564]
2025-09-16 14:48:57,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 76.0, 81.0, 31.0, 83.0, 130.0, 126.0, 91.0, 70.0, 63.0]
2025-09-16 14:48:57,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 9 minutes, 39 seconds)
2025-09-16 14:50:55,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:50:56,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 297.87689 ± 114.159
2025-09-16 14:50:56,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [451.445, 499.78534, 298.0582, 299.5751, 417.00577, 164.6979, 186.59247, 267.13562, 182.15233, 212.32118]
2025-09-16 14:50:56,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 98.0, 63.0, 60.0, 87.0, 32.0, 36.0, 55.0, 35.0, 44.0]
2025-09-16 14:50:56,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 8 minutes, 22 seconds)
2025-09-16 14:52:56,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:52:58,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 445.35611 ± 92.101
2025-09-16 14:52:58,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [472.56873, 404.74396, 333.94348, 420.18173, 367.7338, 407.46915, 439.30054, 685.6978, 503.62408, 418.2977]
2025-09-16 14:52:58,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 78.0, 63.0, 83.0, 71.0, 78.0, 84.0, 138.0, 109.0, 79.0]
2025-09-16 14:52:58,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (445.36) for latency 21
2025-09-16 14:52:58,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 7 minutes, 34 seconds)
2025-09-16 14:54:57,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:54:58,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 451.12979 ± 88.654
2025-09-16 14:54:58,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [402.00583, 502.77557, 440.5061, 560.9779, 621.43024, 375.25705, 478.73605, 442.3376, 379.8783, 307.39316]
2025-09-16 14:54:58,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 100.0, 82.0, 108.0, 116.0, 75.0, 102.0, 84.0, 72.0, 59.0]
2025-09-16 14:54:58,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (451.13) for latency 21
2025-09-16 14:54:58,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 8 minutes, 23 seconds)
2025-09-16 14:56:58,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:57:00,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 498.18359 ± 85.246
2025-09-16 14:57:00,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [411.32574, 496.5987, 569.56683, 540.05695, 335.8837, 461.64645, 547.9719, 548.99554, 641.7609, 428.02936]
2025-09-16 14:57:00,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 93.0, 116.0, 104.0, 64.0, 87.0, 120.0, 104.0, 122.0, 92.0]
2025-09-16 14:57:00,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (498.18) for latency 21
2025-09-16 14:57:00,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 6 minutes, 37 seconds)
2025-09-16 14:58:58,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:59:00,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 499.82294 ± 131.040
2025-09-16 14:59:00,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [363.36542, 454.75766, 770.5956, 430.17358, 663.6227, 349.79367, 568.7433, 371.75385, 544.8974, 480.5262]
2025-09-16 14:59:00,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 88.0, 163.0, 94.0, 128.0, 76.0, 112.0, 79.0, 106.0, 103.0]
2025-09-16 14:59:00,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (499.82) for latency 21
2025-09-16 14:59:00,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 4 minutes, 48 seconds)
2025-09-16 15:01:00,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:01:01,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 426.69305 ± 122.339
2025-09-16 15:01:01,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [531.42993, 257.75198, 175.95135, 390.16702, 550.10596, 510.95224, 383.783, 504.88882, 411.02863, 550.8718]
2025-09-16 15:01:01,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 51.0, 34.0, 84.0, 119.0, 95.0, 74.0, 109.0, 91.0, 114.0]
2025-09-16 15:01:01,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 3 minutes, 25 seconds)
2025-09-16 15:03:00,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:03:02,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 436.67441 ± 103.789
2025-09-16 15:03:02,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [546.1689, 191.85378, 374.67316, 504.4192, 368.23746, 522.9696, 443.1848, 420.97766, 558.9104, 435.34958]
2025-09-16 15:03:02,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 37.0, 71.0, 94.0, 69.0, 98.0, 95.0, 80.0, 117.0, 82.0]
2025-09-16 15:03:02,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 1 minute, 8 seconds)
2025-09-16 15:05:00,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:05:02,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 499.31903 ± 116.654
2025-09-16 15:05:02,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [339.236, 408.1346, 725.57336, 408.3796, 553.5447, 669.324, 456.14273, 512.26416, 406.6341, 513.95715]
2025-09-16 15:05:02,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 78.0, 141.0, 81.0, 106.0, 133.0, 87.0, 97.0, 76.0, 100.0]
2025-09-16 15:05:02,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 59 minutes, 4 seconds)
2025-09-16 15:07:01,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:07:03,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 450.94385 ± 76.789
2025-09-16 15:07:03,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [420.24448, 383.7181, 369.46835, 407.3948, 428.88522, 605.654, 452.6143, 381.4555, 489.90222, 570.1016]
2025-09-16 15:07:03,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 71.0, 70.0, 76.0, 83.0, 115.0, 87.0, 73.0, 91.0, 124.0]
2025-09-16 15:07:03,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 56 minutes, 52 seconds)
2025-09-16 15:09:01,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:09:03,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 454.61163 ± 53.968
2025-09-16 15:09:03,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [390.99557, 400.0222, 486.7141, 401.3446, 462.22214, 416.11087, 536.01166, 554.45575, 453.2806, 444.95868]
2025-09-16 15:09:03,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 75.0, 92.0, 76.0, 89.0, 78.0, 104.0, 104.0, 85.0, 82.0]
2025-09-16 15:09:03,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 54 minutes, 48 seconds)
2025-09-16 15:11:03,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:11:05,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 450.60321 ± 126.528
2025-09-16 15:11:05,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [445.17776, 209.7254, 585.59436, 458.79272, 487.18335, 490.61554, 591.8361, 390.39, 256.3907, 590.3259]
2025-09-16 15:11:05,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 41.0, 112.0, 87.0, 104.0, 92.0, 107.0, 72.0, 49.0, 112.0]
2025-09-16 15:11:05,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 53 minutes, 1 second)
2025-09-16 15:13:04,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:13:05,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 477.17480 ± 54.863
2025-09-16 15:13:05,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [418.31827, 421.82706, 534.9043, 447.50635, 497.21945, 462.28812, 423.7165, 553.4858, 570.91144, 441.5709]
2025-09-16 15:13:05,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 77.0, 99.0, 83.0, 103.0, 86.0, 90.0, 105.0, 124.0, 83.0]
2025-09-16 15:13:05,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 51 minutes, 4 seconds)
2025-09-16 15:15:05,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:15:06,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 502.10297 ± 124.679
2025-09-16 15:15:06,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [389.05908, 520.4404, 466.76166, 419.5797, 557.20557, 691.1142, 270.20816, 702.1654, 531.5121, 472.98358]
2025-09-16 15:15:06,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 111.0, 88.0, 78.0, 103.0, 143.0, 52.0, 135.0, 102.0, 87.0]
2025-09-16 15:15:06,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (502.10) for latency 21
2025-09-16 15:15:06,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 49 minutes, 17 seconds)
2025-09-16 15:17:05,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:17:07,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 491.31647 ± 139.237
2025-09-16 15:17:07,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [668.69574, 156.53389, 439.5733, 611.3352, 431.95877, 433.9729, 436.80252, 556.34485, 580.06116, 597.88605]
2025-09-16 15:17:07,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 30.0, 83.0, 114.0, 84.0, 80.0, 83.0, 107.0, 122.0, 111.0]
2025-09-16 15:17:07,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 47 minutes, 14 seconds)
2025-09-16 15:19:07,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:19:09,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 558.92712 ± 156.987
2025-09-16 15:19:09,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [492.14786, 530.1118, 947.00275, 496.04187, 534.9789, 491.0378, 568.0436, 732.28235, 401.16757, 396.45682]
2025-09-16 15:19:09,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 115.0, 181.0, 93.0, 102.0, 93.0, 106.0, 139.0, 85.0, 75.0]
2025-09-16 15:19:09,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (558.93) for latency 21
2025-09-16 15:19:09,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 45 minutes, 38 seconds)
2025-09-16 15:21:08,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:21:09,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 452.93195 ± 145.784
2025-09-16 15:21:09,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [511.55603, 237.19922, 457.12405, 349.2674, 410.27713, 665.4253, 220.79851, 550.7065, 471.48236, 655.483]
2025-09-16 15:21:09,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 46.0, 87.0, 68.0, 78.0, 139.0, 43.0, 103.0, 90.0, 125.0]
2025-09-16 15:21:10,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 43 minutes, 19 seconds)
2025-09-16 15:23:09,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:23:10,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 448.66373 ± 133.361
2025-09-16 15:23:10,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [715.56, 382.77576, 510.16104, 487.72995, 359.52463, 489.75168, 405.34534, 161.57442, 493.52383, 480.69055]
2025-09-16 15:23:10,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 73.0, 98.0, 92.0, 69.0, 92.0, 76.0, 31.0, 91.0, 104.0]
2025-09-16 15:23:10,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 41 minutes, 17 seconds)
2025-09-16 15:25:11,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:25:12,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 514.77667 ± 139.863
2025-09-16 15:25:12,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [408.92633, 792.3158, 254.1832, 543.3429, 577.5277, 366.09637, 504.1894, 526.0172, 617.38477, 557.7833]
2025-09-16 15:25:12,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 153.0, 49.0, 100.0, 107.0, 72.0, 106.0, 110.0, 116.0, 104.0]
2025-09-16 15:25:12,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 39 minutes, 30 seconds)
2025-09-16 15:27:11,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:27:12,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 524.11340 ± 67.944
2025-09-16 15:27:12,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [530.72565, 435.73825, 504.83154, 525.57605, 577.55927, 495.53482, 649.455, 615.7231, 457.6835, 448.30646]
2025-09-16 15:27:12,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 80.0, 95.0, 99.0, 125.0, 92.0, 121.0, 118.0, 86.0, 84.0]
2025-09-16 15:27:12,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 37 minutes, 24 seconds)
2025-09-16 15:29:13,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:29:15,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 556.68512 ± 95.722
2025-09-16 15:29:15,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [605.46155, 489.29678, 477.8459, 684.33563, 560.9456, 656.0976, 455.80972, 711.79486, 449.04843, 476.21494]
2025-09-16 15:29:15,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 103.0, 91.0, 142.0, 115.0, 126.0, 85.0, 141.0, 95.0, 89.0]
2025-09-16 15:29:15,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 35 minutes, 30 seconds)
2025-09-16 15:31:15,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:31:16,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 523.90051 ± 74.008
2025-09-16 15:31:16,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [604.4753, 583.55853, 667.6522, 467.0618, 536.6622, 423.01956, 535.88873, 445.11176, 460.16666, 515.4086]
2025-09-16 15:31:16,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 109.0, 127.0, 87.0, 113.0, 86.0, 98.0, 82.0, 88.0, 113.0]
2025-09-16 15:31:16,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 33 minutes, 40 seconds)
2025-09-16 15:33:16,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:33:17,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 496.64078 ± 75.433
2025-09-16 15:33:17,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [449.86176, 500.58463, 470.57065, 574.0779, 449.5539, 563.44104, 404.75198, 456.07245, 662.37604, 435.11725]
2025-09-16 15:33:17,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 93.0, 100.0, 123.0, 86.0, 115.0, 85.0, 98.0, 139.0, 94.0]
2025-09-16 15:33:17,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 31 minutes, 44 seconds)
2025-09-16 15:35:16,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:35:18,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 526.44427 ± 95.124
2025-09-16 15:35:18,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [456.5235, 427.4815, 551.9878, 487.9763, 671.20734, 558.946, 703.06177, 467.1033, 542.8403, 397.31473]
2025-09-16 15:35:18,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 91.0, 103.0, 104.0, 128.0, 103.0, 152.0, 98.0, 114.0, 84.0]
2025-09-16 15:35:18,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 29 minutes, 25 seconds)
2025-09-16 15:37:18,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:37:20,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 645.44440 ± 156.493
2025-09-16 15:37:20,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [474.36722, 582.1868, 553.48505, 983.91956, 781.3479, 453.63904, 734.6803, 712.4002, 503.82953, 674.5882]
2025-09-16 15:37:20,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 121.0, 104.0, 195.0, 154.0, 98.0, 140.0, 135.0, 99.0, 128.0]
2025-09-16 15:37:20,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (645.44) for latency 21
2025-09-16 15:37:20,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 27 minutes, 49 seconds)
2025-09-16 15:39:20,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:39:22,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 540.41345 ± 89.298
2025-09-16 15:39:22,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [500.60193, 468.35315, 689.7811, 454.53363, 452.45023, 695.00806, 529.31354, 606.3455, 551.30786, 456.44058]
2025-09-16 15:39:22,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 92.0, 129.0, 87.0, 84.0, 134.0, 113.0, 113.0, 115.0, 87.0]
2025-09-16 15:39:22,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 25 minutes, 43 seconds)
2025-09-16 15:41:21,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:41:23,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 619.68958 ± 175.792
2025-09-16 15:41:23,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [528.52686, 527.2759, 580.71954, 738.7069, 1081.0735, 548.63055, 714.73126, 474.73676, 516.5388, 485.95505]
2025-09-16 15:41:23,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 97.0, 126.0, 155.0, 211.0, 106.0, 152.0, 90.0, 97.0, 100.0]
2025-09-16 15:41:23,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 23 minutes, 41 seconds)
2025-09-16 15:43:23,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:43:25,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 574.27777 ± 103.333
2025-09-16 15:43:25,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [534.4826, 479.18338, 381.77133, 550.4752, 679.1034, 549.8281, 553.02734, 638.14954, 597.9074, 778.8497]
2025-09-16 15:43:25,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 88.0, 74.0, 118.0, 127.0, 103.0, 119.0, 133.0, 111.0, 149.0]
2025-09-16 15:43:25,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 21 minutes, 51 seconds)
2025-09-16 15:45:26,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:45:27,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 575.87567 ± 109.197
2025-09-16 15:45:27,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [589.8567, 656.18976, 515.31024, 431.42523, 548.304, 446.90143, 672.47687, 449.42496, 748.9841, 699.88367]
2025-09-16 15:45:27,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 139.0, 98.0, 91.0, 104.0, 95.0, 131.0, 95.0, 154.0, 138.0]
2025-09-16 15:45:27,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 20 minutes, 12 seconds)
2025-09-16 15:47:26,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:47:28,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 543.13708 ± 50.469
2025-09-16 15:47:28,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [535.3264, 543.69165, 494.09058, 510.78137, 530.4339, 523.3296, 651.7613, 468.64667, 575.0786, 598.23114]
2025-09-16 15:47:28,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 106.0, 92.0, 96.0, 100.0, 98.0, 130.0, 90.0, 121.0, 113.0]
2025-09-16 15:47:28,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 17 minutes, 51 seconds)
2025-09-16 15:49:29,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:49:31,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 642.10370 ± 138.777
2025-09-16 15:49:31,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [557.08594, 510.1538, 860.77795, 563.4393, 540.14856, 756.432, 601.23456, 646.3546, 488.17035, 897.23956]
2025-09-16 15:49:31,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 106.0, 165.0, 104.0, 115.0, 142.0, 114.0, 137.0, 89.0, 173.0]
2025-09-16 15:49:31,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 15 minutes, 57 seconds)
2025-09-16 15:51:30,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:51:32,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 604.41522 ± 182.102
2025-09-16 15:51:32,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [879.26764, 583.6159, 567.56946, 732.58325, 455.06625, 205.99637, 647.56824, 505.17773, 657.1923, 810.115]
2025-09-16 15:51:32,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 121.0, 105.0, 150.0, 97.0, 40.0, 121.0, 93.0, 125.0, 171.0]
2025-09-16 15:51:32,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 55 seconds)
2025-09-16 15:53:32,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:53:33,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 574.00226 ± 151.393
2025-09-16 15:53:33,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [696.52246, 580.9097, 638.8404, 620.4252, 661.8029, 499.13562, 620.548, 502.14975, 176.0552, 743.63336]
2025-09-16 15:53:33,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 110.0, 135.0, 125.0, 122.0, 90.0, 112.0, 92.0, 34.0, 140.0]
2025-09-16 15:53:33,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes, 48 seconds)
2025-09-16 15:55:34,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:55:36,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 534.20618 ± 55.393
2025-09-16 15:55:36,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [479.48535, 514.91583, 581.3212, 507.2015, 467.58163, 589.0356, 476.52094, 505.80923, 584.8846, 635.30634]
2025-09-16 15:55:36,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 98.0, 108.0, 109.0, 87.0, 124.0, 102.0, 108.0, 110.0, 134.0]
2025-09-16 15:55:36,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 50 seconds)
2025-09-16 15:57:36,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:57:38,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 532.47180 ± 161.180
2025-09-16 15:57:38,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [727.37714, 492.546, 459.99017, 182.87439, 553.608, 757.91327, 377.7409, 572.45593, 658.3199, 541.8926]
2025-09-16 15:57:38,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 90.0, 86.0, 35.0, 103.0, 141.0, 81.0, 106.0, 120.0, 101.0]
2025-09-16 15:57:38,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 8 minutes, 3 seconds)
2025-09-16 15:59:37,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:59:39,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 609.18304 ± 102.876
2025-09-16 15:59:39,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [672.0316, 761.5495, 503.14386, 672.9333, 450.43253, 693.2563, 520.9702, 644.6525, 688.93884, 483.92203]
2025-09-16 15:59:39,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 144.0, 93.0, 125.0, 83.0, 130.0, 94.0, 120.0, 145.0, 105.0]
2025-09-16 15:59:39,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 46 seconds)
2025-09-16 16:01:38,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:01:41,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 719.34930 ± 183.993
2025-09-16 16:01:41,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [765.3033, 635.1741, 571.0763, 1109.862, 613.8697, 569.81464, 995.46625, 779.8859, 558.3461, 594.69495]
2025-09-16 16:01:41,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 121.0, 105.0, 228.0, 113.0, 102.0, 190.0, 151.0, 122.0, 124.0]
2025-09-16 16:01:41,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (719.35) for latency 21
2025-09-16 16:01:41,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 43 seconds)
2025-09-16 16:03:41,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:03:44,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 642.49921 ± 136.214
2025-09-16 16:03:44,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [557.4127, 653.9492, 975.9487, 574.29834, 630.05145, 484.54047, 622.7891, 485.59335, 728.5865, 711.8225]
2025-09-16 16:03:44,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 138.0, 186.0, 110.0, 134.0, 103.0, 118.0, 93.0, 136.0, 149.0]
2025-09-16 16:03:44,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 2 minutes)
2025-09-16 16:05:43,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:05:45,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 621.49402 ± 95.096
2025-09-16 16:05:45,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [563.18024, 599.6714, 601.03796, 592.5508, 680.8737, 490.16956, 551.15564, 561.00806, 806.4724, 768.82]
2025-09-16 16:05:45,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 115.0, 114.0, 111.0, 134.0, 92.0, 99.0, 104.0, 179.0, 151.0]
2025-09-16 16:05:45,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 59 minutes, 43 seconds)
2025-09-16 16:07:47,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:07:48,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 577.34631 ± 101.655
2025-09-16 16:07:48,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [499.0793, 770.8139, 614.0461, 629.8989, 541.5137, 469.03568, 710.37305, 465.2576, 469.27588, 604.1689]
2025-09-16 16:07:48,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 149.0, 126.0, 138.0, 100.0, 87.0, 131.0, 101.0, 103.0, 119.0]
2025-09-16 16:07:48,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 58 minutes)
2025-09-16 16:09:47,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:09:49,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 651.31769 ± 136.509
2025-09-16 16:09:49,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [578.8475, 784.63806, 486.9321, 586.4405, 919.9971, 542.77686, 636.916, 836.54565, 584.0298, 556.05334]
2025-09-16 16:09:49,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 165.0, 105.0, 129.0, 179.0, 114.0, 140.0, 157.0, 125.0, 111.0]
2025-09-16 16:09:49,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 48 seconds)
2025-09-16 16:11:50,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:11:52,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 608.44501 ± 106.992
2025-09-16 16:11:52,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [637.25287, 573.2074, 522.28845, 660.7816, 553.3493, 504.1417, 857.24994, 462.17355, 669.45764, 644.5481]
2025-09-16 16:11:52,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 105.0, 105.0, 127.0, 101.0, 95.0, 163.0, 100.0, 125.0, 120.0]
2025-09-16 16:11:52,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 54 minutes, 3 seconds)
2025-09-16 16:13:52,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:13:54,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 759.68860 ± 249.487
2025-09-16 16:13:54,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [729.68274, 178.10638, 747.089, 831.7675, 823.8332, 596.5685, 847.66724, 648.7614, 1099.441, 1093.969]
2025-09-16 16:13:54,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 34.0, 156.0, 168.0, 163.0, 111.0, 170.0, 125.0, 203.0, 211.0]
2025-09-16 16:13:54,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (759.69) for latency 21
2025-09-16 16:13:54,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 52 minutes)
2025-09-16 16:15:54,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:15:56,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 631.60187 ± 125.487
2025-09-16 16:15:56,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [415.0767, 670.24164, 763.7061, 814.0358, 640.95996, 723.4932, 569.5769, 420.8833, 626.9695, 671.0754]
2025-09-16 16:15:56,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 130.0, 152.0, 173.0, 121.0, 132.0, 123.0, 90.0, 117.0, 131.0]
2025-09-16 16:15:56,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 56 seconds)
2025-09-16 16:17:56,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:17:58,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 813.74774 ± 211.522
2025-09-16 16:17:58,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [877.2207, 589.2345, 928.81934, 671.53723, 546.20557, 1095.3762, 1055.1775, 904.077, 480.21402, 989.615]
2025-09-16 16:17:58,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 109.0, 179.0, 125.0, 111.0, 204.0, 221.0, 178.0, 87.0, 182.0]
2025-09-16 16:17:58,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (813.75) for latency 21
2025-09-16 16:17:58,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 47 minutes, 44 seconds)
2025-09-16 16:19:58,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:20:01,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 785.81378 ± 251.076
2025-09-16 16:20:01,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [637.503, 1434.9247, 724.7468, 928.0137, 815.8229, 670.71344, 522.1825, 910.23627, 603.835, 610.1591]
2025-09-16 16:20:01,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 296.0, 148.0, 189.0, 152.0, 148.0, 96.0, 187.0, 112.0, 113.0]
2025-09-16 16:20:01,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 46 minutes, 4 seconds)
2025-09-16 16:22:01,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:22:03,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 716.81531 ± 160.294
2025-09-16 16:22:03,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [812.7024, 666.7067, 557.6881, 976.78613, 465.8732, 624.739, 761.29315, 551.87335, 902.606, 847.88513]
2025-09-16 16:22:03,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 135.0, 113.0, 186.0, 91.0, 120.0, 143.0, 113.0, 158.0, 173.0]
2025-09-16 16:22:03,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 43 minutes, 57 seconds)
2025-09-16 16:24:03,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:24:05,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 654.68250 ± 224.949
2025-09-16 16:24:05,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [593.9525, 581.6036, 465.1531, 513.1288, 601.5582, 642.8415, 1311.7521, 636.7433, 595.66156, 604.4307]
2025-09-16 16:24:05,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 111.0, 93.0, 94.0, 114.0, 120.0, 259.0, 120.0, 109.0, 130.0]
2025-09-16 16:24:05,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 41 minutes, 43 seconds)
2025-09-16 16:26:04,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:26:06,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 717.98083 ± 140.549
2025-09-16 16:26:06,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [767.038, 667.2864, 859.212, 677.6882, 1035.8019, 621.38104, 705.32623, 718.9798, 643.51874, 483.57608]
2025-09-16 16:26:06,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 130.0, 156.0, 136.0, 191.0, 133.0, 151.0, 136.0, 122.0, 106.0]
2025-09-16 16:26:06,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 39 minutes, 44 seconds)
2025-09-16 16:28:05,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:28:07,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 658.86926 ± 116.036
2025-09-16 16:28:07,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [957.7664, 646.79193, 694.2274, 529.586, 695.4483, 651.71564, 682.9536, 578.19934, 526.08356, 625.9205]
2025-09-16 16:28:07,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 124.0, 128.0, 101.0, 133.0, 127.0, 129.0, 104.0, 96.0, 128.0]
2025-09-16 16:28:07,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 37 minutes, 23 seconds)
2025-09-16 16:30:04,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:30:07,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 776.19196 ± 127.630
2025-09-16 16:30:07,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [622.20294, 638.3132, 787.3271, 700.6077, 1036.0308, 869.989, 787.7253, 615.20294, 866.3623, 838.158]
2025-09-16 16:30:07,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 118.0, 164.0, 155.0, 198.0, 169.0, 162.0, 128.0, 156.0, 167.0]
2025-09-16 16:30:07,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 57 seconds)
2025-09-16 16:32:05,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:32:07,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 738.70752 ± 223.304
2025-09-16 16:32:07,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [802.4412, 1204.4442, 748.5198, 781.4705, 594.22797, 704.9893, 569.6648, 453.97388, 496.4222, 1030.9213]
2025-09-16 16:32:07,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 224.0, 136.0, 146.0, 106.0, 127.0, 106.0, 99.0, 93.0, 197.0]
2025-09-16 16:32:07,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 36 seconds)
2025-09-16 16:34:06,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:34:08,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 655.49347 ± 134.547
2025-09-16 16:34:08,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [831.4387, 647.3749, 958.9621, 650.03796, 625.6098, 515.02136, 560.79816, 510.34186, 574.41766, 680.93256]
2025-09-16 16:34:08,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 137.0, 193.0, 145.0, 130.0, 98.0, 103.0, 108.0, 119.0, 121.0]
2025-09-16 16:34:08,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 27 seconds)
2025-09-16 16:36:18,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:36:21,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 724.65820 ± 131.626
2025-09-16 16:36:21,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [796.12354, 690.8344, 736.7277, 775.7359, 652.9572, 612.22626, 664.23865, 1073.1809, 608.4267, 636.13074]
2025-09-16 16:36:21,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 123.0, 135.0, 151.0, 140.0, 126.0, 119.0, 200.0, 125.0, 130.0]
2025-09-16 16:36:21,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 30 minutes, 5 seconds)
2025-09-16 16:38:40,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:38:43,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 815.00403 ± 247.241
2025-09-16 16:38:43,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [525.64294, 1033.7544, 978.71466, 809.3959, 669.6998, 672.2813, 495.21704, 833.6609, 763.3962, 1368.277]
2025-09-16 16:38:43,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 194.0, 204.0, 175.0, 142.0, 127.0, 93.0, 176.0, 161.0, 271.0]
2025-09-16 16:38:43,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (815.00) for latency 21
2025-09-16 16:38:43,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 31 minutes, 9 seconds)
2025-09-16 16:40:56,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:40:58,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 740.47687 ± 167.619
2025-09-16 16:40:58,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [640.8453, 1142.5852, 954.6869, 650.6344, 618.2534, 672.6335, 601.76953, 638.8543, 798.2586, 686.24713]
2025-09-16 16:40:58,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 213.0, 184.0, 144.0, 119.0, 130.0, 110.0, 138.0, 151.0, 135.0]
2025-09-16 16:40:58,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 31 minutes, 13 seconds)
2025-09-16 16:43:18,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:43:20,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 747.30066 ± 125.887
2025-09-16 16:43:20,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [862.0258, 762.3549, 770.28314, 634.9557, 661.1786, 643.58563, 558.04315, 806.0486, 754.38885, 1020.1421]
2025-09-16 16:43:20,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 146.0, 143.0, 132.0, 128.0, 116.0, 122.0, 155.0, 137.0, 216.0]
2025-09-16 16:43:20,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 32 minutes)
2025-09-16 16:45:37,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:45:40,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 907.02814 ± 194.872
2025-09-16 16:45:40,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [718.0068, 586.3472, 1213.7256, 841.7296, 754.61316, 940.5474, 839.3089, 953.69055, 991.98254, 1230.3292]
2025-09-16 16:45:40,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 108.0, 225.0, 160.0, 139.0, 179.0, 174.0, 180.0, 182.0, 260.0]
2025-09-16 16:45:40,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (907.03) for latency 21
2025-09-16 16:45:40,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 32 minutes, 16 seconds)
2025-09-16 16:47:56,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:47:59,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 908.15027 ± 235.482
2025-09-16 16:47:59,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [720.17615, 843.1453, 1373.1239, 531.1896, 1169.6456, 924.8744, 747.96313, 909.74493, 755.7393, 1105.9008]
2025-09-16 16:47:59,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 175.0, 271.0, 98.0, 238.0, 173.0, 154.0, 170.0, 166.0, 199.0]
2025-09-16 16:47:59,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (908.15) for latency 21
2025-09-16 16:47:59,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 30 minutes, 48 seconds)
2025-09-16 16:50:20,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:50:23,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 830.08484 ± 260.495
2025-09-16 16:50:23,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1206.9163, 434.99854, 1148.7041, 675.5077, 1053.0842, 846.39233, 657.0671, 478.05902, 1043.9778, 756.142]
2025-09-16 16:50:23,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [241.0, 90.0, 239.0, 142.0, 195.0, 162.0, 116.0, 87.0, 185.0, 135.0]
2025-09-16 16:50:23,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 28 minutes, 38 seconds)
2025-09-16 16:52:40,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:52:44,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1040.91528 ± 453.186
2025-09-16 16:52:44,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [870.1107, 619.5665, 801.51355, 582.27246, 696.4388, 2096.5625, 985.35596, 1379.2806, 1493.0737, 884.9773]
2025-09-16 16:52:44,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 124.0, 149.0, 104.0, 142.0, 412.0, 178.0, 292.0, 285.0, 174.0]
2025-09-16 16:52:44,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1040.92) for latency 21
2025-09-16 16:52:44,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 27 minutes, 1 second)
2025-09-16 16:55:01,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:55:05,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1055.19800 ± 313.562
2025-09-16 16:55:05,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1181.9127, 743.63947, 847.3628, 1843.6691, 1055.1914, 1005.52203, 1162.8752, 891.27563, 1157.513, 663.01917]
2025-09-16 16:55:05,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 158.0, 165.0, 341.0, 216.0, 193.0, 214.0, 171.0, 199.0, 130.0]
2025-09-16 16:55:05,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1055.20) for latency 21
2025-09-16 16:55:05,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 24 minutes, 33 seconds)
2025-09-16 16:57:22,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:57:25,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 908.68427 ± 216.665
2025-09-16 16:57:25,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1018.88495, 957.88916, 729.3928, 610.3623, 1330.2374, 662.0132, 937.9718, 1176.2311, 757.9186, 905.9415]
2025-09-16 16:57:25,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 177.0, 145.0, 134.0, 240.0, 121.0, 175.0, 235.0, 166.0, 195.0]
2025-09-16 16:57:25,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 22 minutes, 13 seconds)
2025-09-16 16:59:44,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:59:47,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 805.21155 ± 163.332
2025-09-16 16:59:47,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1069.3315, 535.0413, 919.0421, 753.96094, 826.02905, 796.44446, 833.27985, 1036.2743, 646.50165, 636.2103]
2025-09-16 16:59:47,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 102.0, 176.0, 164.0, 151.0, 156.0, 157.0, 190.0, 132.0, 137.0]
2025-09-16 16:59:47,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 20 minutes, 12 seconds)
2025-09-16 17:02:04,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:02:07,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 900.32947 ± 243.826
2025-09-16 17:02:07,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [908.7143, 801.2143, 697.9751, 1096.278, 774.82336, 944.02496, 693.349, 946.9086, 1508.7653, 631.2425]
2025-09-16 17:02:07,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 153.0, 146.0, 193.0, 165.0, 176.0, 136.0, 204.0, 283.0, 132.0]
2025-09-16 17:02:07,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 17 minutes, 28 seconds)
2025-09-16 17:04:26,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:04:29,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 829.61249 ± 192.093
2025-09-16 17:04:29,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [810.31525, 602.2627, 860.5155, 926.16046, 1253.5123, 589.2028, 990.1348, 637.7109, 758.2105, 868.0999]
2025-09-16 17:04:29,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 112.0, 157.0, 172.0, 231.0, 116.0, 198.0, 127.0, 135.0, 161.0]
2025-09-16 17:04:29,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 15 minutes, 9 seconds)
2025-09-16 17:06:48,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:06:52,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1016.16522 ± 221.976
2025-09-16 17:06:52,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1167.4901, 1184.6816, 872.9155, 885.44885, 881.0828, 1004.6272, 1156.3066, 1380.0868, 538.79083, 1090.2217]
2025-09-16 17:06:52,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 230.0, 176.0, 170.0, 169.0, 199.0, 223.0, 264.0, 103.0, 213.0]
2025-09-16 17:06:52,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 13 minutes, 1 second)
2025-09-16 17:09:09,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:09:12,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1011.25378 ± 265.913
2025-09-16 17:09:12,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [967.1065, 1225.1693, 1025.4373, 930.7906, 878.33344, 864.1242, 811.5358, 1586.261, 1245.9059, 577.8741]
2025-09-16 17:09:12,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 235.0, 183.0, 186.0, 179.0, 177.0, 149.0, 302.0, 234.0, 125.0]
2025-09-16 17:09:12,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 10 minutes, 43 seconds)
2025-09-16 17:11:30,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:11:33,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 899.91730 ± 180.703
2025-09-16 17:11:33,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [704.738, 722.4062, 1166.4669, 1199.1746, 938.57104, 765.69806, 900.16956, 983.5267, 972.56885, 645.8527]
2025-09-16 17:11:33,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 157.0, 235.0, 218.0, 180.0, 142.0, 167.0, 194.0, 201.0, 126.0]
2025-09-16 17:11:33,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 8 minutes, 13 seconds)
2025-09-16 17:13:54,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:13:57,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 808.98578 ± 147.467
2025-09-16 17:13:57,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [935.17035, 797.95355, 607.2252, 743.5203, 797.9665, 960.9669, 535.61365, 900.089, 780.22363, 1031.1284]
2025-09-16 17:13:57,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 166.0, 111.0, 132.0, 163.0, 181.0, 110.0, 161.0, 164.0, 183.0]
2025-09-16 17:13:57,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 6 minutes, 13 seconds)
2025-09-16 17:16:12,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:16:16,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1178.39368 ± 576.950
2025-09-16 17:16:16,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1066.9482, 607.066, 1598.1536, 628.08014, 974.2606, 1891.2366, 1021.4365, 2424.4717, 981.00604, 591.2776]
2025-09-16 17:16:16,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 106.0, 302.0, 112.0, 200.0, 381.0, 197.0, 473.0, 188.0, 109.0]
2025-09-16 17:16:16,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1178.39) for latency 21
2025-09-16 17:16:16,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 3 minutes, 37 seconds)
2025-09-16 17:18:36,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:18:40,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1147.64941 ± 389.124
2025-09-16 17:18:40,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [838.34076, 760.271, 1291.0792, 1470.9569, 1807.6951, 1506.6229, 1309.5917, 499.6119, 1204.4204, 787.90393]
2025-09-16 17:18:40,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 161.0, 262.0, 268.0, 332.0, 286.0, 257.0, 104.0, 218.0, 172.0]
2025-09-16 17:18:40,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 1 minute, 24 seconds)
2025-09-16 17:20:54,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:20:57,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 999.06854 ± 283.026
2025-09-16 17:20:57,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [496.8769, 958.20953, 945.54944, 1467.3307, 648.8308, 789.388, 1288.8896, 1075.1095, 1251.5333, 1068.9683]
2025-09-16 17:20:57,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 176.0, 177.0, 293.0, 129.0, 154.0, 224.0, 212.0, 257.0, 200.0]
2025-09-16 17:20:57,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 58 minutes, 46 seconds)
2025-09-16 17:23:18,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:23:21,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 912.01642 ± 295.649
2025-09-16 17:23:21,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1029.165, 527.73724, 1306.3793, 984.4046, 686.6204, 806.65735, 971.577, 1498.6553, 584.90155, 724.06665]
2025-09-16 17:23:21,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 109.0, 260.0, 190.0, 141.0, 153.0, 186.0, 275.0, 107.0, 131.0]
2025-09-16 17:23:21,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 56 minutes, 40 seconds)
2025-09-16 17:25:38,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:25:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1004.06525 ± 575.701
2025-09-16 17:25:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1779.3109, 1712.0901, 947.0458, 865.5588, 196.79329, 591.92896, 873.3624, 1796.5172, 161.28937, 1116.7563]
2025-09-16 17:25:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [348.0, 330.0, 190.0, 153.0, 38.0, 121.0, 184.0, 326.0, 31.0, 237.0]
2025-09-16 17:25:42,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 54 minutes, 2 seconds)
2025-09-16 17:27:58,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:28:02,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1140.73962 ± 434.690
2025-09-16 17:28:02,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2244.048, 1139.5679, 862.0584, 581.0928, 1051.2654, 1156.8636, 1461.2998, 1130.6735, 740.6098, 1039.9177]
2025-09-16 17:28:02,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [445.0, 211.0, 175.0, 113.0, 192.0, 211.0, 269.0, 233.0, 137.0, 185.0]
2025-09-16 17:28:02,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 51 minutes, 48 seconds)
2025-09-16 17:30:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:30:26,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1024.71704 ± 308.313
2025-09-16 17:30:26,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [521.40155, 1133.3799, 931.0884, 1403.8079, 1077.0088, 1314.5043, 1435.3762, 844.98254, 1055.2422, 530.37915]
2025-09-16 17:30:26,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 207.0, 203.0, 291.0, 235.0, 263.0, 295.0, 148.0, 202.0, 116.0]
2025-09-16 17:30:26,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 49 minutes, 23 seconds)
2025-09-16 17:32:43,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:32:48,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1470.47925 ± 946.859
2025-09-16 17:32:48,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1326.8304, 1202.2762, 3170.5605, 1135.9408, 1286.4906, 1198.1619, 787.55145, 3325.4172, 155.14392, 1116.4186]
2025-09-16 17:32:48,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 237.0, 622.0, 205.0, 250.0, 221.0, 157.0, 631.0, 30.0, 208.0]
2025-09-16 17:32:48,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1470.48) for latency 21
2025-09-16 17:32:48,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 47 minutes, 22 seconds)
2025-09-16 17:35:05,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:35:10,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1498.14685 ± 497.607
2025-09-16 17:35:10,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1846.746, 1447.4832, 1678.8153, 961.99335, 1113.8915, 1336.434, 527.4634, 2267.289, 1889.0245, 1912.3296]
2025-09-16 17:35:10,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [352.0, 263.0, 326.0, 175.0, 195.0, 255.0, 100.0, 425.0, 362.0, 341.0]
2025-09-16 17:35:10,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1498.15) for latency 21
2025-09-16 17:35:10,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 44 minutes, 52 seconds)
2025-09-16 17:37:29,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:37:33,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1360.96094 ± 423.110
2025-09-16 17:37:33,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [699.8455, 1744.8827, 1285.4957, 1013.6952, 1303.0413, 2161.9814, 1621.0784, 1122.4164, 1720.8977, 936.2752]
2025-09-16 17:37:33,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 348.0, 248.0, 180.0, 258.0, 408.0, 289.0, 196.0, 355.0, 178.0]
2025-09-16 17:37:33,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 42 minutes, 42 seconds)
2025-09-16 17:39:53,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:39:57,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1314.00366 ± 411.480
2025-09-16 17:39:57,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2040.7988, 1913.8586, 1577.7999, 1101.019, 1095.7465, 1435.8789, 646.28357, 1209.0509, 1203.9442, 915.65674]
2025-09-16 17:39:57,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [378.0, 337.0, 319.0, 222.0, 217.0, 272.0, 121.0, 235.0, 237.0, 166.0]
2025-09-16 17:39:57,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 40 minutes, 29 seconds)
2025-09-16 17:42:16,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:42:20,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1212.71509 ± 497.279
2025-09-16 17:42:20,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1422.7743, 1522.7579, 1306.0491, 527.1967, 1621.6948, 942.2925, 765.59515, 696.7758, 1050.462, 2271.5518]
2025-09-16 17:42:20,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [280.0, 275.0, 247.0, 100.0, 290.0, 184.0, 145.0, 143.0, 213.0, 455.0]
2025-09-16 17:42:20,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 38 minutes, 4 seconds)
2025-09-16 17:44:38,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:44:42,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1268.28149 ± 399.999
2025-09-16 17:44:42,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1164.0106, 1792.8671, 824.33, 667.1127, 1094.6771, 1332.8195, 1686.4264, 1957.5471, 1108.262, 1054.7631]
2025-09-16 17:44:42,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [246.0, 316.0, 175.0, 121.0, 194.0, 252.0, 326.0, 381.0, 232.0, 184.0]
2025-09-16 17:44:42,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 35 minutes, 42 seconds)
2025-09-16 17:47:05,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:47:11,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1746.56897 ± 961.156
2025-09-16 17:47:11,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [708.7052, 3378.229, 795.27136, 980.10913, 1870.6244, 2421.9712, 544.5056, 1678.0452, 1906.5656, 3181.6633]
2025-09-16 17:47:11,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 662.0, 159.0, 177.0, 353.0, 472.0, 97.0, 327.0, 348.0, 607.0]
2025-09-16 17:47:11,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1746.57) for latency 21
2025-09-16 17:47:11,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 33 minutes, 38 seconds)
2025-09-16 17:49:24,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:49:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1761.16382 ± 916.924
2025-09-16 17:49:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1346.3916, 981.7039, 2091.492, 2905.141, 3776.1628, 1983.3896, 759.1407, 1723.6104, 982.23425, 1062.3729]
2025-09-16 17:49:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [258.0, 193.0, 403.0, 530.0, 731.0, 358.0, 142.0, 314.0, 198.0, 198.0]
2025-09-16 17:49:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1761.16) for latency 21
2025-09-16 17:49:30,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 31 minutes, 3 seconds)
2025-09-16 17:51:48,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:51:51,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1129.54663 ± 400.113
2025-09-16 17:51:51,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [550.43567, 1070.9285, 611.86786, 1384.1176, 1566.5087, 680.28, 1124.8295, 1831.7736, 1337.6884, 1137.0359]
2025-09-16 17:51:51,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 198.0, 127.0, 269.0, 295.0, 125.0, 196.0, 321.0, 237.0, 215.0]
2025-09-16 17:51:51,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 28 minutes, 34 seconds)
2025-09-16 17:54:10,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:54:16,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1556.19739 ± 784.151
2025-09-16 17:54:16,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [716.8772, 2015.8142, 683.3928, 1810.3844, 1595.6041, 682.4847, 1825.2753, 2840.1409, 725.97504, 2666.0254]
2025-09-16 17:54:16,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 376.0, 121.0, 376.0, 325.0, 152.0, 360.0, 536.0, 137.0, 513.0]
2025-09-16 17:54:16,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 26 minutes, 14 seconds)
2025-09-16 17:56:38,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:56:41,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1061.91638 ± 347.895
2025-09-16 17:56:41,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1019.2335, 927.4134, 1097.6697, 1024.0786, 678.33044, 1209.8765, 527.49805, 1564.1411, 849.06775, 1721.8553]
2025-09-16 17:56:41,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 175.0, 195.0, 216.0, 132.0, 253.0, 99.0, 285.0, 147.0, 317.0]
2025-09-16 17:56:41,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 57 seconds)
2025-09-16 17:58:56,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:59:00,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1300.18689 ± 539.566
2025-09-16 17:59:00,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [815.24475, 829.87427, 924.07275, 866.40216, 2667.017, 1706.6045, 1558.2468, 1211.3506, 1182.288, 1240.7681]
2025-09-16 17:59:00,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 154.0, 181.0, 170.0, 497.0, 316.0, 298.0, 214.0, 241.0, 227.0]
2025-09-16 17:59:00,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 17 seconds)
2025-09-16 18:01:20,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:01:26,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1674.06714 ± 550.845
2025-09-16 18:01:26,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2563.2473, 1160.306, 1461.889, 2362.8794, 1523.0338, 1958.7976, 1078.274, 1160.5225, 2340.02, 1131.7041]
2025-09-16 18:01:26,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [454.0, 211.0, 264.0, 449.0, 284.0, 380.0, 209.0, 244.0, 437.0, 222.0]
2025-09-16 18:01:26,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 19 minutes, 4 seconds)
2025-09-16 18:03:45,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:03:49,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1255.67151 ± 420.788
2025-09-16 18:03:49,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1235.7584, 927.6501, 858.73376, 1058.5353, 1274.217, 1728.8796, 860.07306, 899.6305, 1512.818, 2200.4194]
2025-09-16 18:03:49,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 185.0, 162.0, 207.0, 228.0, 330.0, 170.0, 175.0, 286.0, 419.0]
2025-09-16 18:03:49,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 44 seconds)
2025-09-16 18:06:05,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:06:10,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1493.84863 ± 834.053
2025-09-16 18:06:10,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [616.66724, 916.0151, 910.65094, 607.1383, 2515.3372, 1962.228, 3176.3499, 773.18726, 1700.5286, 1760.3833]
2025-09-16 18:06:10,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 161.0, 200.0, 110.0, 481.0, 382.0, 599.0, 139.0, 353.0, 351.0]
2025-09-16 18:06:10,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes, 17 seconds)
2025-09-16 18:08:35,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:08:39,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1463.47241 ± 602.535
2025-09-16 18:08:39,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [740.5761, 775.2149, 894.1028, 1706.492, 1310.1722, 1713.1185, 1484.1321, 2041.0125, 2787.225, 1182.6776]
2025-09-16 18:08:39,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 138.0, 184.0, 316.0, 251.0, 338.0, 298.0, 370.0, 534.0, 230.0]
2025-09-16 18:08:39,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 58 seconds)
2025-09-16 18:10:55,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:11:00,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1632.70874 ± 848.249
2025-09-16 18:11:00,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [967.3873, 1151.8536, 2580.3467, 2495.2908, 485.1006, 1379.0956, 2070.9272, 3003.217, 443.64716, 1750.2206]
2025-09-16 18:11:00,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 209.0, 486.0, 456.0, 88.0, 271.0, 369.0, 579.0, 84.0, 315.0]
2025-09-16 18:11:00,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 36 seconds)
2025-09-16 18:13:20,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:13:25,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1590.76599 ± 1133.598
2025-09-16 18:13:25,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [617.05676, 1313.2096, 534.96545, 2624.9697, 1154.436, 1098.7305, 1067.3936, 2014.4583, 993.65063, 4488.7905]
2025-09-16 18:13:25,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 266.0, 101.0, 505.0, 229.0, 224.0, 218.0, 356.0, 193.0, 822.0]
2025-09-16 18:13:25,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 11 seconds)
2025-09-16 18:15:45,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:15:51,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2199.76416 ± 866.638
2025-09-16 18:15:51,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3731.7913, 3042.6292, 3203.4836, 1077.7064, 1552.8287, 1516.2477, 2457.4583, 2056.3145, 2259.5737, 1099.6096]
2025-09-16 18:15:51,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [681.0, 599.0, 582.0, 224.0, 286.0, 263.0, 480.0, 366.0, 422.0, 198.0]
2025-09-16 18:15:51,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2199.76) for latency 21
2025-09-16 18:15:52,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 49 seconds)
2025-09-16 18:18:11,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:18:16,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1439.64526 ± 428.647
2025-09-16 18:18:16,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1008.32947, 1410.4415, 1502.4171, 1673.349, 1419.0153, 2146.3406, 1734.8435, 552.5847, 1146.2092, 1802.9222]
2025-09-16 18:18:16,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 283.0, 317.0, 344.0, 293.0, 413.0, 315.0, 105.0, 236.0, 360.0]
2025-09-16 18:18:16,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 25 seconds)
2025-09-16 18:20:31,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:20:36,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1681.22656 ± 718.071
2025-09-16 18:20:36,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1957.7345, 1974.9443, 2114.2915, 1223.526, 956.0721, 3077.7432, 2508.1726, 833.94086, 1119.7709, 1046.0701]
2025-09-16 18:20:36,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [358.0, 362.0, 385.0, 238.0, 187.0, 558.0, 443.0, 143.0, 194.0, 190.0]
2025-09-16 18:20:36,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1251 [DEBUG]: Training session finished
