2025-09-16 12:14:14,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.050-delay_12
2025-09-16 12:14:14,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.050-delay_12
2025-09-16 12:14:14,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'12': <latency_env.delayed_mdp.ConstantDelay object at 0x14bb202088d0>}
2025-09-16 12:14:14,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:14:14,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:14:14,048 baseline-bpql-noisepromille50-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=580, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:14:14,049 baseline-bpql-noisepromille50-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:14:15,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:14:15,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:16:04,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:16:05,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 300.35861 ± 30.663
2025-09-16 12:16:05,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [321.58435, 293.25272, 273.10754, 343.57526, 300.48215, 330.22455, 296.22665, 281.90753, 235.03639, 328.18912]
2025-09-16 12:16:05,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 57.0, 52.0, 66.0, 57.0, 63.0, 56.0, 53.0, 45.0, 63.0]
2025-09-16 12:16:05,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (300.36) for latency 12
2025-09-16 12:16:05,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 26 seconds)
2025-09-16 12:18:01,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:18:02,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 393.48615 ± 84.911
2025-09-16 12:18:02,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [189.24751, 440.04807, 375.85486, 430.82916, 365.09644, 400.27585, 488.12888, 464.0491, 314.15552, 467.1762]
2025-09-16 12:18:02,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 84.0, 69.0, 90.0, 69.0, 86.0, 90.0, 96.0, 60.0, 90.0]
2025-09-16 12:18:02,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (393.49) for latency 12
2025-09-16 12:18:02,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 2 seconds)
2025-09-16 12:19:59,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:20:00,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 465.14282 ± 167.936
2025-09-16 12:20:00,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [179.19757, 540.9341, 842.0032, 449.36676, 292.6111, 488.71268, 400.45413, 580.83746, 400.25974, 477.05167]
2025-09-16 12:20:00,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 102.0, 181.0, 84.0, 55.0, 94.0, 87.0, 109.0, 76.0, 89.0]
2025-09-16 12:20:00,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (465.14) for latency 12
2025-09-16 12:20:00,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 5 minutes, 47 seconds)
2025-09-16 12:21:57,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:21:59,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 436.39331 ± 43.495
2025-09-16 12:21:59,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [406.34796, 416.61066, 432.50348, 437.8393, 495.10156, 458.31253, 440.5538, 390.59412, 518.7874, 367.2823]
2025-09-16 12:21:59,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 80.0, 81.0, 88.0, 91.0, 84.0, 88.0, 78.0, 101.0, 72.0]
2025-09-16 12:21:59,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 5 minutes, 21 seconds)
2025-09-16 12:23:56,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:23:57,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 439.47955 ± 63.341
2025-09-16 12:23:57,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [413.6789, 438.0934, 303.84946, 467.67203, 497.62302, 401.70203, 425.8508, 475.46655, 415.60315, 555.256]
2025-09-16 12:23:57,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 81.0, 57.0, 86.0, 92.0, 74.0, 80.0, 102.0, 76.0, 103.0]
2025-09-16 12:23:57,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 4 minutes, 17 seconds)
2025-09-16 12:25:55,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:25:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 474.84784 ± 84.499
2025-09-16 12:25:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [489.92157, 422.99988, 400.74515, 549.46857, 523.04144, 561.2767, 618.92377, 332.00272, 397.5682, 452.5305]
2025-09-16 12:25:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 78.0, 77.0, 102.0, 98.0, 117.0, 128.0, 65.0, 83.0, 85.0]
2025-09-16 12:25:56,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (474.85) for latency 12
2025-09-16 12:25:56,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 5 minutes, 25 seconds)
2025-09-16 12:27:54,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:27:55,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 525.53448 ± 153.556
2025-09-16 12:27:55,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [603.1285, 355.66455, 471.4852, 616.536, 499.67133, 414.1582, 923.3957, 416.89557, 446.5969, 507.81323]
2025-09-16 12:27:55,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 76.0, 103.0, 132.0, 108.0, 79.0, 188.0, 77.0, 82.0, 94.0]
2025-09-16 12:27:55,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (525.53) for latency 12
2025-09-16 12:27:55,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 3 minutes, 54 seconds)
2025-09-16 12:29:53,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:29:54,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 474.83838 ± 134.820
2025-09-16 12:29:54,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [358.2635, 487.84015, 493.97888, 453.21335, 550.5332, 460.3233, 610.6111, 449.20703, 177.62227, 706.791]
2025-09-16 12:29:54,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 108.0, 107.0, 99.0, 111.0, 93.0, 113.0, 84.0, 34.0, 138.0]
2025-09-16 12:29:54,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 2 minutes, 11 seconds)
2025-09-16 12:31:53,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:31:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 523.42511 ± 99.030
2025-09-16 12:31:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [570.6256, 383.11667, 609.31995, 548.98285, 482.997, 573.4904, 609.40424, 631.5972, 311.81662, 512.9007]
2025-09-16 12:31:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 71.0, 115.0, 118.0, 90.0, 107.0, 111.0, 118.0, 60.0, 95.0]
2025-09-16 12:31:54,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 36 seconds)
2025-09-16 12:33:52,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:33:53,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 532.14514 ± 102.841
2025-09-16 12:33:53,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [710.1054, 439.21478, 688.7859, 571.74414, 437.01804, 529.563, 373.55753, 573.7647, 523.32227, 474.37524]
2025-09-16 12:33:53,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 81.0, 129.0, 107.0, 82.0, 97.0, 70.0, 108.0, 96.0, 88.0]
2025-09-16 12:33:53,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (532.15) for latency 12
2025-09-16 12:33:53,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 58 minutes, 47 seconds)
2025-09-16 12:35:51,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:35:53,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 580.65564 ± 189.959
2025-09-16 12:35:53,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [484.08804, 707.8257, 421.89902, 423.19537, 629.78467, 588.12, 521.51605, 1084.7212, 504.25778, 441.14813]
2025-09-16 12:35:53,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 140.0, 78.0, 94.0, 132.0, 109.0, 96.0, 223.0, 93.0, 81.0]
2025-09-16 12:35:53,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (580.66) for latency 12
2025-09-16 12:35:53,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 55 seconds)
2025-09-16 12:37:51,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:37:53,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 595.24341 ± 123.741
2025-09-16 12:37:53,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [669.67584, 434.6478, 767.82855, 528.11523, 540.4975, 608.535, 469.63272, 527.4798, 560.31665, 845.7044]
2025-09-16 12:37:53,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 93.0, 149.0, 99.0, 104.0, 128.0, 87.0, 102.0, 106.0, 174.0]
2025-09-16 12:37:53,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (595.24) for latency 12
2025-09-16 12:37:53,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 55 minutes, 23 seconds)
2025-09-16 12:39:52,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:39:53,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 580.64270 ± 168.944
2025-09-16 12:39:53,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [676.95966, 638.8993, 460.30893, 495.38388, 780.74725, 338.623, 470.00467, 924.682, 433.22223, 587.5967]
2025-09-16 12:39:53,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 138.0, 86.0, 92.0, 165.0, 63.0, 92.0, 199.0, 97.0, 110.0]
2025-09-16 12:39:53,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 44 seconds)
2025-09-16 12:41:53,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:41:55,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 637.74127 ± 144.421
2025-09-16 12:41:55,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [821.56287, 556.5935, 519.96216, 727.7197, 667.9128, 461.654, 900.24677, 477.86197, 720.29736, 523.6015]
2025-09-16 12:41:55,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 104.0, 110.0, 145.0, 127.0, 85.0, 173.0, 89.0, 146.0, 115.0]
2025-09-16 12:41:55,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (637.74) for latency 12
2025-09-16 12:41:55,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 52 minutes, 11 seconds)
2025-09-16 12:43:53,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:43:54,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 552.17249 ± 134.678
2025-09-16 12:43:54,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [668.6428, 777.4057, 375.4641, 583.4861, 452.05115, 580.2643, 428.39658, 521.5505, 397.888, 736.5752]
2025-09-16 12:43:54,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 160.0, 72.0, 116.0, 85.0, 109.0, 81.0, 109.0, 74.0, 138.0]
2025-09-16 12:43:54,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 50 minutes, 14 seconds)
2025-09-16 12:45:53,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:45:54,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 600.89069 ± 101.887
2025-09-16 12:45:54,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [793.03143, 552.9395, 559.9661, 699.86945, 415.81546, 574.9565, 521.1965, 667.8155, 551.9407, 671.3755]
2025-09-16 12:45:54,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 122.0, 122.0, 128.0, 78.0, 129.0, 102.0, 127.0, 101.0, 126.0]
2025-09-16 12:45:54,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 48 minutes, 23 seconds)
2025-09-16 12:47:54,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:47:56,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 645.34839 ± 211.645
2025-09-16 12:47:56,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [576.7851, 852.45825, 513.22363, 431.4476, 539.5208, 1056.3099, 895.5933, 390.37933, 710.893, 486.8732]
2025-09-16 12:47:56,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 178.0, 108.0, 84.0, 100.0, 206.0, 175.0, 76.0, 134.0, 89.0]
2025-09-16 12:47:56,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (645.35) for latency 12
2025-09-16 12:47:56,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 46 minutes, 50 seconds)
2025-09-16 12:49:55,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:49:56,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 600.02112 ± 156.752
2025-09-16 12:49:56,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [414.05875, 597.4071, 688.79083, 999.4237, 587.88947, 466.78717, 644.2901, 456.021, 544.123, 601.4201]
2025-09-16 12:49:56,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 112.0, 131.0, 192.0, 111.0, 87.0, 120.0, 89.0, 106.0, 118.0]
2025-09-16 12:49:56,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 44 minutes, 49 seconds)
2025-09-16 12:51:55,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:51:57,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 703.59192 ± 287.726
2025-09-16 12:51:57,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1509.569, 494.39868, 643.9983, 475.8157, 510.936, 729.2362, 584.18396, 600.42114, 662.51105, 824.84924]
2025-09-16 12:51:57,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [297.0, 94.0, 122.0, 90.0, 95.0, 140.0, 111.0, 134.0, 127.0, 162.0]
2025-09-16 12:51:57,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (703.59) for latency 12
2025-09-16 12:51:57,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 42 minutes, 34 seconds)
2025-09-16 12:53:55,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:53:56,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 585.02014 ± 129.549
2025-09-16 12:53:56,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [659.45465, 584.64844, 439.9291, 485.8095, 637.357, 749.5768, 501.42957, 524.70056, 426.01605, 841.27954]
2025-09-16 12:53:56,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 111.0, 82.0, 90.0, 120.0, 147.0, 94.0, 113.0, 79.0, 162.0]
2025-09-16 12:53:56,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 40 minutes, 34 seconds)
2025-09-16 12:55:55,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:55:57,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 665.26624 ± 131.766
2025-09-16 12:55:57,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [780.7234, 632.23444, 592.5075, 875.62573, 504.88986, 805.0714, 792.1481, 470.8501, 629.248, 569.36316]
2025-09-16 12:55:57,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 124.0, 117.0, 185.0, 95.0, 152.0, 154.0, 88.0, 118.0, 124.0]
2025-09-16 12:55:57,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 38 minutes, 43 seconds)
2025-09-16 12:57:56,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:57:58,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 682.25800 ± 191.216
2025-09-16 12:57:58,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [519.41284, 723.4446, 981.1076, 879.8088, 893.3203, 679.3179, 549.4282, 302.45975, 656.4193, 637.8606]
2025-09-16 12:57:58,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 136.0, 191.0, 172.0, 176.0, 130.0, 115.0, 59.0, 124.0, 118.0]
2025-09-16 12:57:58,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 36 minutes, 25 seconds)
2025-09-16 12:59:57,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:59:58,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 621.65985 ± 198.674
2025-09-16 12:59:58,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1066.7784, 555.14307, 414.36618, 573.8889, 683.00555, 441.21692, 393.07468, 714.8512, 543.8311, 830.4425]
2025-09-16 12:59:58,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 103.0, 90.0, 107.0, 145.0, 94.0, 75.0, 137.0, 101.0, 156.0]
2025-09-16 12:59:58,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 32 seconds)
2025-09-16 13:01:58,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:02:00,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 664.30389 ± 96.647
2025-09-16 13:02:00,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [434.0852, 800.26373, 599.4651, 733.4116, 713.1617, 667.4785, 749.4598, 609.88025, 680.25397, 655.5789]
2025-09-16 13:02:00,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 167.0, 113.0, 159.0, 135.0, 132.0, 144.0, 118.0, 149.0, 124.0]
2025-09-16 13:02:00,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 45 seconds)
2025-09-16 13:03:57,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:03:59,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 657.40863 ± 165.472
2025-09-16 13:03:59,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [526.9196, 606.9652, 635.9332, 514.5205, 543.1694, 593.51215, 1113.0083, 636.4527, 748.03815, 655.5673]
2025-09-16 13:03:59,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 112.0, 126.0, 93.0, 99.0, 126.0, 221.0, 136.0, 147.0, 125.0]
2025-09-16 13:03:59,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 41 seconds)
2025-09-16 13:05:58,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:06:00,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 761.87195 ± 206.758
2025-09-16 13:06:00,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [543.74554, 502.72458, 617.2394, 841.02856, 1050.5278, 613.7456, 1062.376, 563.48505, 964.89624, 858.9508]
2025-09-16 13:06:00,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 92.0, 114.0, 158.0, 198.0, 114.0, 227.0, 120.0, 190.0, 166.0]
2025-09-16 13:06:00,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (761.87) for latency 12
2025-09-16 13:06:00,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 41 seconds)
2025-09-16 13:08:00,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:08:02,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 734.55334 ± 297.590
2025-09-16 13:08:02,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [362.31433, 466.24823, 515.7901, 1162.1718, 637.82294, 1356.1364, 594.7169, 649.44366, 719.42737, 881.4612]
2025-09-16 13:08:02,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 87.0, 106.0, 225.0, 118.0, 261.0, 130.0, 121.0, 141.0, 166.0]
2025-09-16 13:08:02,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 27 minutes, 7 seconds)
2025-09-16 13:10:02,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:10:04,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 846.30255 ± 252.797
2025-09-16 13:10:04,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [867.16345, 1084.0325, 886.33276, 668.7623, 1405.3375, 820.46173, 969.0165, 491.76813, 581.1472, 689.0033]
2025-09-16 13:10:04,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 209.0, 181.0, 125.0, 275.0, 154.0, 208.0, 110.0, 113.0, 133.0]
2025-09-16 13:10:04,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (846.30) for latency 12
2025-09-16 13:10:04,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 25 minutes, 25 seconds)
2025-09-16 13:12:03,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:12:04,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 668.95868 ± 179.319
2025-09-16 13:12:04,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [594.69696, 631.713, 576.0161, 669.13574, 864.4144, 592.88654, 1096.1249, 451.5086, 495.9161, 717.1747]
2025-09-16 13:12:04,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 119.0, 105.0, 125.0, 163.0, 111.0, 216.0, 84.0, 92.0, 133.0]
2025-09-16 13:12:04,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 23 minutes, 4 seconds)
2025-09-16 13:14:05,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:14:07,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 756.76410 ± 181.590
2025-09-16 13:14:07,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [853.3475, 688.627, 536.9435, 452.2365, 886.81604, 908.413, 837.5472, 518.26404, 909.2683, 976.1783]
2025-09-16 13:14:07,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 133.0, 116.0, 95.0, 177.0, 178.0, 160.0, 111.0, 174.0, 193.0]
2025-09-16 13:14:07,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 21 minutes, 57 seconds)
2025-09-16 13:16:07,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:16:09,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 731.53058 ± 224.693
2025-09-16 13:16:09,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [617.34644, 649.0304, 679.68097, 1265.5452, 685.6991, 1050.2338, 638.75354, 535.95667, 525.16516, 667.8948]
2025-09-16 13:16:09,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 143.0, 148.0, 259.0, 134.0, 191.0, 135.0, 99.0, 116.0, 124.0]
2025-09-16 13:16:09,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 20 minutes, 4 seconds)
2025-09-16 13:18:11,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:18:13,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 758.68182 ± 175.991
2025-09-16 13:18:13,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [726.3705, 747.1791, 941.8349, 602.65753, 967.30695, 686.2427, 1061.8469, 450.02255, 637.59644, 765.7606]
2025-09-16 13:18:13,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 139.0, 191.0, 109.0, 184.0, 131.0, 206.0, 82.0, 121.0, 157.0]
2025-09-16 13:18:13,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 18 minutes, 23 seconds)
2025-09-16 13:20:14,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:20:17,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1007.34766 ± 505.944
2025-09-16 13:20:17,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [545.51764, 1078.6403, 699.90814, 454.99783, 835.7991, 951.77026, 849.2693, 1379.0652, 2327.556, 950.95276]
2025-09-16 13:20:17,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 216.0, 151.0, 83.0, 172.0, 186.0, 178.0, 286.0, 441.0, 177.0]
2025-09-16 13:20:17,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1007.35) for latency 12
2025-09-16 13:20:17,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 16 minutes, 53 seconds)
2025-09-16 13:22:19,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:22:21,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 871.98279 ± 193.673
2025-09-16 13:22:21,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [857.1036, 670.8427, 648.3127, 935.56854, 869.0194, 901.55365, 1092.5962, 1298.6385, 676.0887, 770.10376]
2025-09-16 13:22:21,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 123.0, 116.0, 175.0, 159.0, 169.0, 209.0, 249.0, 123.0, 146.0]
2025-09-16 13:22:21,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 15 minutes, 45 seconds)
2025-09-16 13:24:22,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:24:25,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 887.79053 ± 221.152
2025-09-16 13:24:25,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [889.71094, 1238.031, 850.8269, 947.29956, 1317.3243, 591.3035, 673.8874, 762.65356, 885.442, 721.426]
2025-09-16 13:24:25,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 236.0, 179.0, 185.0, 268.0, 122.0, 140.0, 149.0, 192.0, 149.0]
2025-09-16 13:24:25,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 13 minutes, 48 seconds)
2025-09-16 13:26:27,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:26:29,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1029.67383 ± 372.445
2025-09-16 13:26:29,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [660.3059, 794.1036, 1126.7078, 1337.7722, 610.0772, 853.86365, 941.60376, 867.1906, 1163.3602, 1941.7533]
2025-09-16 13:26:29,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 167.0, 218.0, 258.0, 127.0, 168.0, 194.0, 178.0, 218.0, 379.0]
2025-09-16 13:26:29,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1029.67) for latency 12
2025-09-16 13:26:29,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 12 minutes, 23 seconds)
2025-09-16 13:28:28,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:28:31,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 962.63977 ± 344.194
2025-09-16 13:28:31,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [838.44684, 1678.3254, 1196.4373, 378.6515, 675.6641, 1214.7073, 994.1835, 862.77325, 1111.5114, 675.6971]
2025-09-16 13:28:31,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 312.0, 224.0, 69.0, 129.0, 239.0, 205.0, 181.0, 226.0, 133.0]
2025-09-16 13:28:31,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 9 minutes, 43 seconds)
2025-09-16 13:30:30,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:30:33,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 894.21301 ± 225.579
2025-09-16 13:30:33,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [932.161, 1295.9203, 922.68695, 847.78217, 988.3328, 1206.887, 882.6632, 705.63257, 620.92615, 539.13794]
2025-09-16 13:30:33,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 257.0, 186.0, 160.0, 184.0, 231.0, 179.0, 139.0, 114.0, 115.0]
2025-09-16 13:30:33,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 7 minutes, 10 seconds)
2025-09-16 13:32:34,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:32:37,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1240.35803 ± 374.172
2025-09-16 13:32:37,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1308.806, 1783.7269, 759.05884, 699.31323, 1212.648, 1318.0693, 1475.427, 904.04254, 1087.8756, 1854.612]
2025-09-16 13:32:37,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 333.0, 148.0, 143.0, 228.0, 261.0, 278.0, 184.0, 216.0, 367.0]
2025-09-16 13:32:37,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1240.36) for latency 12
2025-09-16 13:32:37,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 5 minutes, 14 seconds)
2025-09-16 13:34:36,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:34:39,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1134.83521 ± 388.018
2025-09-16 13:34:39,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [732.34357, 1736.7537, 973.9572, 595.194, 1851.1624, 1341.3818, 914.20953, 1185.2369, 1122.0591, 896.05225]
2025-09-16 13:34:39,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 349.0, 192.0, 114.0, 353.0, 266.0, 169.0, 218.0, 208.0, 164.0]
2025-09-16 13:34:39,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 2 minutes, 47 seconds)
2025-09-16 13:36:40,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:36:43,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 905.55743 ± 332.856
2025-09-16 13:36:43,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1388.7524, 598.6719, 1151.9963, 902.70447, 445.24435, 976.6089, 682.25397, 1256.6058, 428.27563, 1224.4608]
2025-09-16 13:36:43,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [283.0, 125.0, 228.0, 183.0, 80.0, 196.0, 142.0, 240.0, 93.0, 257.0]
2025-09-16 13:36:43,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 35 seconds)
2025-09-16 13:38:41,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:38:47,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1809.03931 ± 1555.107
2025-09-16 13:38:47,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [650.25604, 1619.0022, 977.02167, 5012.975, 548.2085, 922.9218, 4545.249, 935.50854, 750.60065, 2128.649]
2025-09-16 13:38:47,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 317.0, 205.0, 1000.0, 99.0, 179.0, 898.0, 186.0, 155.0, 409.0]
2025-09-16 13:38:47,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1809.04) for latency 12
2025-09-16 13:38:47,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 59 minutes, 7 seconds)
2025-09-16 13:40:54,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:40:59,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1667.23572 ± 907.533
2025-09-16 13:40:59,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1409.8632, 3763.616, 1660.8495, 893.42126, 886.86035, 744.53503, 1338.0101, 1080.6302, 2599.6958, 2294.877]
2025-09-16 13:40:59,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [268.0, 752.0, 317.0, 172.0, 185.0, 149.0, 269.0, 212.0, 515.0, 462.0]
2025-09-16 13:40:59,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 58 minutes, 57 seconds)
2025-09-16 13:42:52,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:42:56,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1515.47253 ± 1029.660
2025-09-16 13:42:56,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1080.0608, 839.1961, 2949.9126, 746.0898, 714.18866, 2412.9338, 3612.9805, 1474.723, 685.29456, 639.3449]
2025-09-16 13:42:56,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 171.0, 572.0, 146.0, 134.0, 466.0, 697.0, 297.0, 124.0, 115.0]
2025-09-16 13:42:56,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 55 minutes, 27 seconds)
2025-09-16 13:44:59,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:45:04,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1793.37048 ± 961.309
2025-09-16 13:45:04,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1210.68, 1745.8661, 828.85565, 2084.2732, 1334.9177, 3241.5684, 1184.3068, 1521.296, 904.7413, 3877.2007]
2025-09-16 13:45:04,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 358.0, 154.0, 393.0, 245.0, 614.0, 227.0, 279.0, 175.0, 736.0]
2025-09-16 13:45:04,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 54 minutes, 33 seconds)
2025-09-16 13:47:13,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:47:19,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2082.43311 ± 1618.666
2025-09-16 13:47:19,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1011.7688, 607.28253, 650.39734, 2120.6504, 1212.5426, 5132.764, 5129.506, 996.1203, 1607.5226, 2355.7747]
2025-09-16 13:47:19,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 126.0, 124.0, 406.0, 251.0, 1000.0, 1000.0, 184.0, 334.0, 472.0]
2025-09-16 13:47:19,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2082.43) for latency 12
2025-09-16 13:47:19,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 54 minutes, 33 seconds)
2025-09-16 13:49:12,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:49:19,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2525.31860 ± 1548.074
2025-09-16 13:49:19,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1274.9792, 4968.317, 630.12006, 5109.6, 3011.2812, 1586.9531, 1779.7092, 840.5717, 2267.0632, 3784.5925]
2025-09-16 13:49:19,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [266.0, 971.0, 127.0, 1000.0, 576.0, 311.0, 351.0, 183.0, 460.0, 742.0]
2025-09-16 13:49:19,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2525.32) for latency 12
2025-09-16 13:49:19,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 51 minutes, 46 seconds)
2025-09-16 13:51:20,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:51:25,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1711.81372 ± 673.749
2025-09-16 13:51:25,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1785.9619, 1399.2911, 1387.0099, 3448.407, 1554.577, 1673.3071, 1269.9008, 2271.3135, 863.6598, 1464.708]
2025-09-16 13:51:25,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [333.0, 258.0, 255.0, 642.0, 289.0, 320.0, 236.0, 425.0, 162.0, 272.0]
2025-09-16 13:51:25,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 48 minutes, 34 seconds)
2025-09-16 13:53:26,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:53:30,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1412.91296 ± 752.949
2025-09-16 13:53:30,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1839.7728, 627.0455, 647.2835, 747.5398, 1792.6638, 2850.0369, 805.3669, 1127.7936, 1211.04, 2480.5867]
2025-09-16 13:53:30,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [379.0, 119.0, 139.0, 149.0, 376.0, 590.0, 174.0, 227.0, 239.0, 486.0]
2025-09-16 13:53:30,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 47 minutes, 52 seconds)
2025-09-16 13:55:32,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:55:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4449.78711 ± 1350.216
2025-09-16 13:55:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5117.373, 1465.5876, 5118.351, 5105.0674, 5092.5083, 5059.2686, 5176.062, 5142.603, 2060.9124, 5160.136]
2025-09-16 13:55:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 287.0, 1000.0, 1000.0, 1000.0, 990.0, 1000.0, 1000.0, 397.0, 1000.0]
2025-09-16 13:55:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4449.79) for latency 12
2025-09-16 13:55:46,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 47 minutes, 2 seconds)
2025-09-16 13:57:57,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:58:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3107.75000 ± 1804.258
2025-09-16 13:58:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2340.0076, 5126.971, 743.07764, 5159.8926, 5171.701, 1400.6953, 577.0968, 2663.34, 2743.5527, 5151.165]
2025-09-16 13:58:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [482.0, 1000.0, 143.0, 1000.0, 1000.0, 273.0, 112.0, 514.0, 533.0, 1000.0]
2025-09-16 13:58:07,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 45 minutes, 45 seconds)
2025-09-16 14:00:09,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:00:24,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4827.30713 ± 1117.941
2025-09-16 14:00:24,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1597.9346, 5331.2827, 5298.8735, 5355.2476, 4297.1807, 5323.1006, 5301.6064, 5295.331, 5302.017, 5170.502]
2025-09-16 14:00:24,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [321.0, 1000.0, 1000.0, 1000.0, 838.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:00:24,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4827.31) for latency 12
2025-09-16 14:00:24,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 46 minutes, 21 seconds)
2025-09-16 14:02:24,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:02:34,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3236.73486 ± 1608.411
2025-09-16 14:02:34,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2460.512, 5169.117, 5147.163, 5125.216, 2697.3518, 3643.6294, 4325.983, 2074.5618, 1010.06433, 713.74854]
2025-09-16 14:02:34,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [486.0, 1000.0, 1000.0, 1000.0, 555.0, 711.0, 834.0, 411.0, 189.0, 147.0]
2025-09-16 14:02:34,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 44 minutes, 49 seconds)
2025-09-16 14:04:43,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:04:56,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4223.71729 ± 1524.166
2025-09-16 14:04:56,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5171.902, 5215.418, 5141.368, 5161.0347, 5199.2114, 3310.7944, 5077.902, 5193.9375, 1260.2158, 1505.3878]
2025-09-16 14:04:56,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 629.0, 1000.0, 1000.0, 246.0, 291.0]
2025-09-16 14:04:56,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 45 minutes, 9 seconds)
2025-09-16 14:06:56,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:07:09,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4523.04297 ± 1493.892
2025-09-16 14:07:09,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [781.14624, 2489.417, 5269.05, 5284.3267, 5245.1978, 5255.259, 5213.338, 5272.4307, 5253.275, 5166.9897]
2025-09-16 14:07:09,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 495.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:07:09,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 42 minutes, 26 seconds)
2025-09-16 14:09:16,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:09:31,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4685.98877 ± 1355.002
2025-09-16 14:09:31,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5161.201, 5111.1606, 5135.1157, 621.9778, 5118.0044, 5168.591, 5111.954, 5171.5264, 5177.3926, 5082.9653]
2025-09-16 14:09:31,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 111.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:09:31,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 40 minutes, 24 seconds)
2025-09-16 14:11:33,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:11:44,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3576.45264 ± 1663.074
2025-09-16 14:11:44,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5068.4463, 934.6311, 2493.5977, 4985.158, 5001.8467, 5030.509, 800.59326, 2474.4502, 5021.2593, 3954.0366]
2025-09-16 14:11:44,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 187.0, 495.0, 1000.0, 1000.0, 1000.0, 178.0, 513.0, 1000.0, 779.0]
2025-09-16 14:11:44,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 37 minutes, 30 seconds)
2025-09-16 14:13:51,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:14:05,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4481.87207 ± 1232.231
2025-09-16 14:14:05,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5155.9727, 5076.2256, 2755.8628, 5131.184, 5108.3096, 1429.0231, 5035.445, 5125.52, 5047.4053, 4953.775]
2025-09-16 14:14:05,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 550.0, 1000.0, 1000.0, 266.0, 1000.0, 1000.0, 1000.0, 975.0]
2025-09-16 14:14:05,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 36 minutes, 41 seconds)
2025-09-16 14:15:57,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:16:13,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5145.47607 ± 11.413
2025-09-16 14:16:13,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5153.9937, 5143.638, 5136.9365, 5139.996, 5140.275, 5168.035, 5151.8203, 5139.5796, 5125.1133, 5155.3745]
2025-09-16 14:16:13,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:16:13,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5145.48) for latency 12
2025-09-16 14:16:13,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 32 minutes, 31 seconds)
2025-09-16 14:18:19,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:18:34,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4839.10107 ± 691.963
2025-09-16 14:18:34,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4984.8965, 5117.0854, 5006.203, 5126.653, 5113.0737, 5106.469, 2768.1558, 5077.442, 5073.777, 5017.259]
2025-09-16 14:18:34,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 550.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:18:34,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 31 minutes, 21 seconds)
2025-09-16 14:20:44,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:20:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3939.19263 ± 1922.047
2025-09-16 14:20:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5176.0635, 905.2982, 5221.719, 5204.029, 5228.054, 582.2479, 1585.8032, 5167.439, 5125.541, 5195.729]
2025-09-16 14:20:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 174.0, 1000.0, 1000.0, 1000.0, 117.0, 297.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:20:56,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 29 minutes)
2025-09-16 14:22:57,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:23:07,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3193.08398 ± 2057.056
2025-09-16 14:23:07,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1205.2374, 5214.5103, 5144.2954, 614.1365, 2384.4, 5230.0317, 5234.7427, 683.3505, 1044.289, 5175.845]
2025-09-16 14:23:07,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 1000.0, 1000.0, 110.0, 473.0, 1000.0, 1000.0, 124.0, 190.0, 1000.0]
2025-09-16 14:23:07,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 26 minutes, 29 seconds)
2025-09-16 14:25:05,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:25:20,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4734.62988 ± 1360.975
2025-09-16 14:25:20,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5222.3535, 5154.0264, 5177.079, 5181.145, 5249.346, 5213.918, 652.7368, 5155.8584, 5189.051, 5150.784]
2025-09-16 14:25:20,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 118.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:25:20,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 23 minutes, 14 seconds)
2025-09-16 14:27:24,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:27:40,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5012.19238 ± 456.378
2025-09-16 14:27:40,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5230.2124, 5137.5107, 5213.5083, 5175.4844, 5114.0493, 3647.5576, 5173.017, 5112.092, 5177.688, 5140.8027]
2025-09-16 14:27:40,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 703.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:27:40,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 22 minutes, 22 seconds)
2025-09-16 14:29:43,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:29:57,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4662.33447 ± 1285.407
2025-09-16 14:29:57,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5277.7427, 5272.186, 5256.01, 5276.268, 5297.2446, 3127.5728, 5265.7544, 5259.373, 1317.2675, 5273.9272]
2025-09-16 14:29:57,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 587.0, 1000.0, 1000.0, 254.0, 1000.0]
2025-09-16 14:29:57,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 19 minutes, 36 seconds)
2025-09-16 14:31:51,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:32:04,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4511.26611 ± 1567.933
2025-09-16 14:32:04,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5266.231, 5290.066, 5285.9795, 5274.4233, 5329.646, 1417.7983, 1334.0996, 5322.671, 5266.1025, 5325.6436]
2025-09-16 14:32:04,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 277.0, 260.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:32:04,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 15 minutes, 43 seconds)
2025-09-16 14:34:14,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:34:31,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5126.89160 ± 19.312
2025-09-16 14:34:31,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5120.2075, 5107.6665, 5170.79, 5143.5303, 5117.623, 5113.389, 5125.521, 5146.2524, 5108.7197, 5115.216]
2025-09-16 14:34:31,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:34:31,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 15 minutes, 11 seconds)
2025-09-16 14:36:32,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:36:45,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3995.06689 ± 1735.348
2025-09-16 14:36:45,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [865.3074, 5105.0527, 5062.381, 5014.5527, 5048.0845, 5120.9277, 2914.1284, 5042.6675, 673.1224, 5104.446]
2025-09-16 14:36:45,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 568.0, 1000.0, 129.0, 1000.0]
2025-09-16 14:36:45,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 13 minutes, 2 seconds)
2025-09-16 14:38:46,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:39:01,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4690.01855 ± 775.004
2025-09-16 14:39:01,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5050.717, 5084.288, 5050.913, 3622.0427, 5060.867, 5067.094, 5032.1265, 2756.8853, 5086.82, 5088.433]
2025-09-16 14:39:01,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 719.0, 1000.0, 1000.0, 1000.0, 527.0, 1000.0, 1000.0]
2025-09-16 14:39:01,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 10 minutes, 23 seconds)
2025-09-16 14:41:00,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:41:17,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5125.13184 ± 48.810
2025-09-16 14:41:17,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5098.899, 5159.53, 5092.9697, 5152.686, 5147.4653, 5002.759, 5111.8984, 5167.8, 5173.5215, 5143.788]
2025-09-16 14:41:17,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 989.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:41:17,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 7 minutes, 59 seconds)
2025-09-16 14:43:19,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:43:33,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4304.00439 ± 1558.647
2025-09-16 14:43:33,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5100.3403, 5092.5254, 5074.9634, 4996.378, 5090.361, 5095.9556, 5107.4326, 1038.7583, 1341.7244, 5101.606]
2025-09-16 14:43:33,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 201.0, 261.0, 1000.0]
2025-09-16 14:43:33,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 6 minutes, 34 seconds)
2025-09-16 14:45:34,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:45:47,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4659.94189 ± 1285.907
2025-09-16 14:45:47,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5344.838, 3276.0212, 5261.6333, 5335.995, 5354.7656, 4573.7373, 5370.324, 5388.1606, 1302.5593, 5391.384]
2025-09-16 14:45:47,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 610.0, 968.0, 1000.0, 1000.0, 850.0, 1000.0, 1000.0, 239.0, 1000.0]
2025-09-16 14:45:48,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 3 minutes, 9 seconds)
2025-09-16 14:47:49,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:48:04,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4778.85938 ± 1196.097
2025-09-16 14:48:04,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5214.58, 5163.6353, 5185.8086, 5179.585, 5158.2896, 5152.7944, 5166.254, 5186.0503, 1190.943, 5190.6504]
2025-09-16 14:48:04,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 243.0, 1000.0]
2025-09-16 14:48:04,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 1 minute, 6 seconds)
2025-09-16 14:50:09,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:50:24,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4772.35254 ± 1034.491
2025-09-16 14:50:24,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5109.2925, 5089.836, 5119.9062, 5124.78, 5151.0356, 5099.873, 5106.0366, 5152.5894, 5100.719, 1669.4545]
2025-09-16 14:50:24,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 319.0]
2025-09-16 14:50:24,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 59 minutes, 12 seconds)
2025-09-16 14:52:25,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:52:40,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4678.84424 ± 1278.496
2025-09-16 14:52:40,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5064.692, 5143.92, 5090.532, 5108.004, 5125.093, 5138.0903, 844.7866, 5106.4727, 5026.8433, 5140.005]
2025-09-16 14:52:40,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 157.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:52:40,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 56 minutes, 58 seconds)
2025-09-16 14:54:40,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:54:56,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5213.18359 ± 23.385
2025-09-16 14:54:56,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5218.219, 5231.1396, 5228.8237, 5200.6895, 5156.2236, 5217.648, 5195.9136, 5214.7686, 5245.849, 5222.5645]
2025-09-16 14:54:56,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:54:56,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5213.18) for latency 12
2025-09-16 14:54:56,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 54 minutes, 39 seconds)
2025-09-16 14:56:56,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:57:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4868.72119 ± 1200.740
2025-09-16 14:57:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5271.5063, 5243.1445, 5276.4023, 5255.6562, 5271.3887, 5271.7847, 1266.65, 5271.664, 5275.1045, 5283.91]
2025-09-16 14:57:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 228.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:57:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 52 minutes, 22 seconds)
2025-09-16 14:59:07,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:59:22,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4784.93506 ± 1181.257
2025-09-16 14:59:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5188.3228, 5216.5845, 5189.2363, 5142.8496, 5191.1035, 5158.786, 5213.91, 5172.8613, 5133.6714, 1242.026]
2025-09-16 14:59:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 241.0]
2025-09-16 14:59:22,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 49 minutes, 42 seconds)
2025-09-16 15:01:32,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:01:48,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5303.14746 ± 24.140
2025-09-16 15:01:48,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5259.003, 5293.1274, 5287.3726, 5274.6743, 5322.026, 5320.062, 5339.199, 5317.1597, 5293.591, 5325.2563]
2025-09-16 15:01:48,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:01:48,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5303.15) for latency 12
2025-09-16 15:01:48,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 47 minutes, 52 seconds)
2025-09-16 15:03:40,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:03:53,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4362.41455 ± 1809.513
2025-09-16 15:03:53,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5257.75, 5270.0044, 751.62866, 5211.524, 5273.9, 5273.411, 5284.383, 5284.5933, 5281.3374, 735.6143]
2025-09-16 15:03:53,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 142.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 139.0]
2025-09-16 15:03:53,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 44 minutes, 48 seconds)
2025-09-16 15:05:50,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:06:06,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4804.33105 ± 976.406
2025-09-16 15:06:06,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5148.7104, 5116.328, 5154.853, 1875.6206, 5110.7964, 5148.5615, 5098.286, 5147.0103, 5119.933, 5123.212]
2025-09-16 15:06:06,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 383.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:06:06,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 42 minutes, 25 seconds)
2025-09-16 15:08:10,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:08:24,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4543.87207 ± 1481.309
2025-09-16 15:08:24,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3102.4485, 5211.1377, 526.72296, 5224.498, 5231.669, 5263.1396, 5191.508, 5236.493, 5216.7803, 5234.3193]
2025-09-16 15:08:24,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [593.0, 1000.0, 101.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:08:24,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 40 minutes, 23 seconds)
2025-09-16 15:10:27,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:10:43,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4882.48340 ± 355.550
2025-09-16 15:10:43,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5050.924, 5063.964, 5055.081, 5021.9995, 5070.9966, 4402.5273, 3989.9614, 5042.791, 5060.053, 5066.5347]
2025-09-16 15:10:43,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 896.0, 800.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:10:43,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 38 minutes, 36 seconds)
2025-09-16 15:12:46,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:13:02,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5184.11328 ± 27.840
2025-09-16 15:13:02,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5183.0117, 5195.2344, 5172.9136, 5113.2314, 5167.2896, 5183.1436, 5212.52, 5195.7905, 5206.7437, 5211.2593]
2025-09-16 15:13:02,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:13:02,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 55 seconds)
2025-09-16 15:15:00,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:15:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4660.68994 ± 910.898
2025-09-16 15:15:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5246.3633, 3066.0793, 5295.7603, 3243.0518, 5216.312, 3527.2275, 5236.264, 5230.2563, 5278.4976, 5267.0884]
2025-09-16 15:15:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 581.0, 1000.0, 624.0, 1000.0, 676.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:15:14,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 34 minutes, 4 seconds)
2025-09-16 15:17:06,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:17:21,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4473.23633 ± 1111.500
2025-09-16 15:17:21,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2078.3816, 5010.9004, 5021.5337, 4996.2227, 5089.386, 5027.685, 5057.316, 5016.719, 2434.7805, 4999.439]
2025-09-16 15:17:21,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [411.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 470.0, 1000.0]
2025-09-16 15:17:21,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 29 seconds)
2025-09-16 15:19:25,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:19:41,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5316.35986 ± 18.344
2025-09-16 15:19:41,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5330.587, 5318.7417, 5323.0225, 5294.6934, 5327.3706, 5272.347, 5322.356, 5332.468, 5332.101, 5309.909]
2025-09-16 15:19:41,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:19:41,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5316.36) for latency 12
2025-09-16 15:19:41,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 20 seconds)
2025-09-16 15:21:36,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:21:52,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4942.50146 ± 422.361
2025-09-16 15:21:52,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5106.9507, 5092.3438, 5108.456, 5058.654, 3677.3743, 5101.041, 5101.824, 5084.9917, 5034.9, 5058.4824]
2025-09-16 15:21:52,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 736.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:21:52,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 46 seconds)
2025-09-16 15:23:52,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:24:08,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5316.35986 ± 12.523
2025-09-16 15:24:08,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5306.0557, 5333.412, 5332.8354, 5309.103, 5329.9116, 5305.349, 5306.3535, 5306.882, 5303.5845, 5330.111]
2025-09-16 15:24:08,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:24:08,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 26 seconds)
2025-09-16 15:26:12,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:26:28,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5177.16504 ± 21.849
2025-09-16 15:26:28,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5189.6973, 5154.4375, 5201.57, 5183.2456, 5169.8237, 5173.1226, 5144.906, 5221.9727, 5158.732, 5174.15]
2025-09-16 15:26:28,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:26:28,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 27 seconds)
2025-09-16 15:28:28,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:28:44,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5128.34131 ± 39.511
2025-09-16 15:28:44,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5057.059, 5142.7227, 5047.864, 5162.075, 5149.3164, 5119.99, 5147.5186, 5154.262, 5158.536, 5144.0713]
2025-09-16 15:28:44,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:28:44,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 30 seconds)
2025-09-16 15:30:43,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:30:58,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5543.45605 ± 19.151
2025-09-16 15:30:58,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5557.582, 5555.0493, 5531.615, 5546.3076, 5516.4346, 5549.1343, 5556.4766, 5572.5513, 5543.462, 5505.9487]
2025-09-16 15:30:58,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:30:58,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5543.46) for latency 12
2025-09-16 15:30:58,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 3 seconds)
2025-09-16 15:32:59,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:33:15,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5149.64453 ± 39.001
2025-09-16 15:33:15,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5141.158, 5131.9136, 5161.857, 5196.1807, 5148.0376, 5149.962, 5170.0117, 5193.9443, 5049.105, 5154.273]
2025-09-16 15:33:15,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:33:15,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 55 seconds)
2025-09-16 15:35:11,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:35:24,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4335.14062 ± 1789.415
2025-09-16 15:35:24,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5244.1616, 583.2518, 5220.327, 936.57214, 5235.423, 5232.3276, 5244.3066, 5191.915, 5227.7607, 5235.3574]
2025-09-16 15:35:24,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 109.0, 1000.0, 188.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:35:24,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 30 seconds)
2025-09-16 15:37:24,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:37:38,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4855.50537 ± 1421.227
2025-09-16 15:37:38,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5346.427, 5360.703, 592.4645, 5333.123, 5327.5747, 5323.233, 5317.743, 5280.1704, 5369.0103, 5304.6064]
2025-09-16 15:37:38,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 136.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:37:38,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 10 seconds)
2025-09-16 15:39:43,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:39:59,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5400.46582 ± 20.659
2025-09-16 15:39:59,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5395.494, 5419.0747, 5411.314, 5356.7896, 5377.339, 5408.191, 5400.4517, 5425.165, 5387.8394, 5422.9946]
2025-09-16 15:39:59,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:39:59,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 59 seconds)
2025-09-16 15:41:59,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:42:15,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5389.73779 ± 11.758
2025-09-16 15:42:15,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5397.3027, 5369.342, 5390.7227, 5389.648, 5371.6743, 5403.3804, 5404.0195, 5383.2466, 5386.5034, 5401.5376]
2025-09-16 15:42:15,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:42:15,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 46 seconds)
2025-09-16 15:44:18,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:44:34,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5218.89014 ± 23.115
2025-09-16 15:44:34,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5202.0386, 5194.9683, 5181.4243, 5205.4653, 5238.5293, 5231.318, 5254.641, 5230.3115, 5246.0435, 5204.164]
2025-09-16 15:44:34,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:44:34,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 31 seconds)
2025-09-16 15:46:25,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:46:37,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4021.37256 ± 2015.094
2025-09-16 15:46:37,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5347.0596, 5344.4927, 5307.412, 5373.44, 1147.0096, 5345.8276, 784.0511, 5322.308, 906.789, 5335.337]
2025-09-16 15:46:37,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 232.0, 1000.0, 148.0, 1000.0, 177.0, 1000.0]
2025-09-16 15:46:37,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 14 seconds)
2025-09-16 15:48:41,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:48:57,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5386.00391 ± 16.209
2025-09-16 15:48:57,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5384.706, 5354.2764, 5392.85, 5391.744, 5393.23, 5366.841, 5410.3677, 5382.024, 5406.807, 5377.1924]
2025-09-16 15:48:57,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:48:57,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1251 [DEBUG]: Training session finished
