2025-09-16 14:58:54,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.050-delay_24
2025-09-16 14:58:54,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.050-delay_24
2025-09-16 14:58:54,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x145a4354c890>}
2025-09-16 14:58:54,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:58:54,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:58:54,123 baseline-bpql-noisepromille50-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:58:54,123 baseline-bpql-noisepromille50-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:58:55,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:58:55,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 15:00:49,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:00:50,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 297.04730 ± 72.837
2025-09-16 15:00:50,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [299.92645, 264.61533, 324.69476, 140.41663, 394.84778, 213.34328, 364.91425, 365.39386, 281.0102, 321.31052]
2025-09-16 15:00:50,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 51.0, 59.0, 27.0, 75.0, 44.0, 72.0, 69.0, 54.0, 62.0]
2025-09-16 15:00:50,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (297.05) for latency 24
2025-09-16 15:00:50,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 8 minutes, 57 seconds)
2025-09-16 15:02:52,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:02:53,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 431.74756 ± 154.259
2025-09-16 15:02:53,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [581.9227, 197.50308, 552.0871, 527.992, 387.68887, 436.7346, 595.264, 140.3246, 555.2038, 342.75497]
2025-09-16 15:02:53,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 38.0, 105.0, 104.0, 71.0, 81.0, 117.0, 27.0, 106.0, 68.0]
2025-09-16 15:02:53,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (431.75) for latency 24
2025-09-16 15:02:53,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 14 minutes, 17 seconds)
2025-09-16 15:04:53,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:04:54,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 371.39420 ± 138.843
2025-09-16 15:04:54,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [344.09497, 453.66068, 643.6853, 421.39926, 370.9531, 386.4542, 374.07123, 145.27277, 140.43378, 433.91684]
2025-09-16 15:04:54,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 89.0, 123.0, 80.0, 71.0, 77.0, 73.0, 28.0, 27.0, 89.0]
2025-09-16 15:04:55,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 13 minutes, 33 seconds)
2025-09-16 15:06:57,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:06:58,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 337.48898 ± 128.598
2025-09-16 15:06:58,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [158.54482, 363.7477, 392.7178, 343.6316, 248.29509, 574.0402, 483.00732, 314.71475, 129.8014, 366.3891]
2025-09-16 15:06:58,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 79.0, 78.0, 70.0, 50.0, 116.0, 97.0, 61.0, 25.0, 72.0]
2025-09-16 15:06:58,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 13 minutes, 4 seconds)
2025-09-16 15:09:01,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:09:02,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 307.30157 ± 101.945
2025-09-16 15:09:02,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [185.94496, 159.4057, 130.06267, 369.43848, 346.30075, 355.28848, 375.6532, 393.20114, 324.37555, 433.3447]
2025-09-16 15:09:02,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 31.0, 25.0, 72.0, 69.0, 68.0, 72.0, 76.0, 62.0, 92.0]
2025-09-16 15:09:02,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 12 minutes, 9 seconds)
2025-09-16 15:11:06,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:11:07,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 259.86823 ± 99.952
2025-09-16 15:11:07,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [178.18053, 156.59528, 218.86143, 298.02963, 135.94167, 324.15515, 197.65657, 289.0407, 311.00348, 489.21805]
2025-09-16 15:11:07,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 30.0, 42.0, 58.0, 26.0, 64.0, 38.0, 55.0, 60.0, 102.0]
2025-09-16 15:11:07,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 13 minutes, 21 seconds)
2025-09-16 15:13:10,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:13:10,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 272.22263 ± 92.472
2025-09-16 15:13:10,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [165.87033, 373.83246, 139.51456, 378.26047, 373.07822, 186.04137, 218.64891, 338.19666, 203.01564, 345.7678]
2025-09-16 15:13:10,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 72.0, 27.0, 76.0, 73.0, 36.0, 42.0, 68.0, 39.0, 67.0]
2025-09-16 15:13:10,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 11 minutes, 19 seconds)
2025-09-16 15:15:14,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:15:15,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 237.40781 ± 102.543
2025-09-16 15:15:15,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [140.9129, 141.19864, 150.79176, 252.28932, 366.23242, 412.74066, 130.49135, 312.74173, 148.53693, 318.1424]
2025-09-16 15:15:15,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 27.0, 29.0, 49.0, 72.0, 81.0, 25.0, 61.0, 29.0, 63.0]
2025-09-16 15:15:15,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 10 minutes, 12 seconds)
2025-09-16 15:17:15,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:17:16,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 278.57425 ± 55.603
2025-09-16 15:17:16,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [342.35828, 271.59366, 366.62195, 214.16779, 322.23474, 230.81032, 291.18597, 315.9816, 239.57431, 191.21394]
2025-09-16 15:17:16,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 54.0, 71.0, 41.0, 66.0, 45.0, 57.0, 65.0, 46.0, 37.0]
2025-09-16 15:17:16,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 7 minutes, 29 seconds)
2025-09-16 15:19:16,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:19:17,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 225.54089 ± 74.071
2025-09-16 15:19:17,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [135.6962, 371.00916, 178.2335, 219.1177, 321.84457, 188.10608, 184.2725, 211.61365, 149.39632, 296.1193]
2025-09-16 15:19:17,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 73.0, 35.0, 43.0, 67.0, 37.0, 36.0, 42.0, 29.0, 59.0]
2025-09-16 15:19:17,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 4 minutes, 25 seconds)
2025-09-16 15:21:16,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:21:17,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 199.25330 ± 78.829
2025-09-16 15:21:17,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [361.91144, 135.32304, 154.54884, 144.64568, 166.36607, 338.14532, 164.44987, 222.42581, 145.44653, 159.27028]
2025-09-16 15:21:17,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 26.0, 30.0, 28.0, 33.0, 71.0, 32.0, 44.0, 28.0, 31.0]
2025-09-16 15:21:17,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 49 seconds)
2025-09-16 15:23:15,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:23:15,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 187.55223 ± 80.998
2025-09-16 15:23:15,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [164.62666, 149.22224, 423.27274, 210.22238, 135.22743, 160.02745, 140.29695, 164.09619, 173.52202, 155.00839]
2025-09-16 15:23:15,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 29.0, 88.0, 42.0, 26.0, 31.0, 27.0, 32.0, 34.0, 30.0]
2025-09-16 15:23:15,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 57 minutes, 25 seconds)
2025-09-16 15:25:14,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:25:14,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 160.05670 ± 23.813
2025-09-16 15:25:14,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [135.36131, 155.10832, 160.38983, 225.6923, 144.67818, 163.43237, 163.60316, 139.5225, 153.30821, 159.47087]
2025-09-16 15:25:14,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 30.0, 31.0, 45.0, 28.0, 32.0, 32.0, 27.0, 30.0, 31.0]
2025-09-16 15:25:14,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 47 seconds)
2025-09-16 15:27:13,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:27:13,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 165.89346 ± 29.922
2025-09-16 15:27:13,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [220.42572, 158.3547, 164.42033, 155.13156, 223.6242, 164.03368, 158.0023, 144.60194, 140.86865, 129.47145]
2025-09-16 15:27:13,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 31.0, 32.0, 30.0, 43.0, 32.0, 31.0, 28.0, 27.0, 25.0]
2025-09-16 15:27:13,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 51 minutes, 14 seconds)
2025-09-16 15:29:12,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:29:12,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 160.24130 ± 16.695
2025-09-16 15:29:12,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [135.96529, 203.91005, 163.95944, 154.85008, 159.58784, 168.10725, 149.92366, 153.818, 154.62456, 157.66695]
2025-09-16 15:29:12,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 41.0, 32.0, 30.0, 31.0, 33.0, 29.0, 30.0, 30.0, 31.0]
2025-09-16 15:29:12,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 48 minutes, 40 seconds)
2025-09-16 15:31:11,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:31:11,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 164.00290 ± 13.268
2025-09-16 15:31:11,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [165.15625, 154.05553, 145.32524, 173.50716, 140.14426, 158.9172, 173.21143, 176.377, 182.45554, 170.87936]
2025-09-16 15:31:11,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 30.0, 28.0, 34.0, 27.0, 31.0, 34.0, 35.0, 36.0, 33.0]
2025-09-16 15:31:11,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 46 minutes, 32 seconds)
2025-09-16 15:33:10,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:33:10,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 184.17429 ± 43.916
2025-09-16 15:33:10,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [308.1507, 150.06705, 162.6242, 160.09843, 180.84232, 160.47957, 171.16237, 165.26877, 207.35486, 175.6946]
2025-09-16 15:33:10,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 29.0, 32.0, 31.0, 35.0, 31.0, 34.0, 32.0, 41.0, 34.0]
2025-09-16 15:33:10,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 37 seconds)
2025-09-16 15:35:09,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:35:10,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 166.95078 ± 23.230
2025-09-16 15:35:10,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [159.92142, 161.64882, 172.88805, 158.57368, 159.73639, 160.40115, 230.4422, 134.77113, 170.45734, 160.6677]
2025-09-16 15:35:10,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 32.0, 34.0, 31.0, 31.0, 31.0, 46.0, 26.0, 33.0, 31.0]
2025-09-16 15:35:10,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 42 minutes, 49 seconds)
2025-09-16 15:37:08,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:37:08,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 174.29416 ± 48.574
2025-09-16 15:37:08,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [228.03575, 150.87976, 160.5533, 139.90964, 150.64848, 163.81941, 129.30687, 170.96338, 298.7738, 150.05106]
2025-09-16 15:37:08,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 29.0, 31.0, 27.0, 29.0, 32.0, 25.0, 34.0, 60.0, 29.0]
2025-09-16 15:37:08,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes, 38 seconds)
2025-09-16 15:39:07,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:39:07,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 183.07089 ± 46.098
2025-09-16 15:39:07,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [154.81894, 203.3025, 167.5062, 185.55374, 184.24165, 168.64188, 170.52782, 150.39241, 310.02045, 135.70326]
2025-09-16 15:39:07,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 41.0, 33.0, 37.0, 37.0, 33.0, 33.0, 29.0, 64.0, 26.0]
2025-09-16 15:39:07,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 38 minutes, 40 seconds)
2025-09-16 15:41:05,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:41:06,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 180.18088 ± 42.418
2025-09-16 15:41:06,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [203.47562, 160.43498, 145.05298, 229.47392, 159.29999, 281.88376, 159.54465, 162.84131, 155.2409, 144.56076]
2025-09-16 15:41:06,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 31.0, 28.0, 47.0, 31.0, 57.0, 31.0, 32.0, 30.0, 28.0]
2025-09-16 15:41:06,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 36 minutes, 34 seconds)
2025-09-16 15:43:04,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:43:05,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 156.86160 ± 11.671
2025-09-16 15:43:05,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [150.1625, 160.56995, 141.07198, 163.42873, 170.99954, 150.44395, 169.33772, 159.28296, 168.3088, 135.00981]
2025-09-16 15:43:05,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 31.0, 27.0, 32.0, 33.0, 29.0, 33.0, 31.0, 33.0, 26.0]
2025-09-16 15:43:05,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 37 seconds)
2025-09-16 15:45:03,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:45:04,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 224.14233 ± 107.952
2025-09-16 15:45:04,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [155.22998, 154.55519, 175.60606, 129.81291, 370.3414, 135.15225, 174.04768, 163.21704, 436.4775, 346.98346]
2025-09-16 15:45:04,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 34.0, 25.0, 74.0, 26.0, 34.0, 32.0, 90.0, 70.0]
2025-09-16 15:45:04,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 25 seconds)
2025-09-16 15:47:00,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:47:01,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 175.64697 ± 44.159
2025-09-16 15:47:01,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [160.74069, 166.52226, 162.67294, 150.47346, 209.0149, 159.01547, 298.07675, 145.42245, 150.48354, 154.04726]
2025-09-16 15:47:01,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 33.0, 32.0, 29.0, 41.0, 31.0, 62.0, 28.0, 29.0, 30.0]
2025-09-16 15:47:01,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 7 seconds)
2025-09-16 15:48:59,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:48:59,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 163.50891 ± 17.520
2025-09-16 15:48:59,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [154.61412, 182.98076, 159.47835, 140.08295, 199.24701, 172.69875, 154.78473, 171.12808, 159.71718, 140.35728]
2025-09-16 15:48:59,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 37.0, 31.0, 27.0, 39.0, 34.0, 30.0, 34.0, 31.0, 27.0]
2025-09-16 15:48:59,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 28 minutes, 2 seconds)
2025-09-16 15:50:58,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:50:59,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 157.83719 ± 12.988
2025-09-16 15:50:59,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [168.15297, 158.81499, 153.40846, 162.47124, 178.7035, 153.73193, 172.47319, 150.91339, 130.4465, 149.25557]
2025-09-16 15:50:59,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 30.0, 32.0, 35.0, 30.0, 34.0, 29.0, 25.0, 29.0]
2025-09-16 15:50:59,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 16 seconds)
2025-09-16 15:52:58,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:52:59,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 160.46968 ± 13.818
2025-09-16 15:52:59,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [153.70557, 173.06276, 149.4523, 135.24251, 149.72672, 164.08661, 169.89969, 157.79077, 164.71762, 187.01219]
2025-09-16 15:52:59,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 34.0, 29.0, 26.0, 29.0, 32.0, 34.0, 31.0, 32.0, 37.0]
2025-09-16 15:52:59,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 25 seconds)
2025-09-16 15:54:58,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:54:58,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 152.16919 ± 9.527
2025-09-16 15:54:58,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [159.45154, 155.65268, 170.33397, 140.60603, 139.58826, 154.91573, 153.90132, 158.7289, 148.82413, 139.68939]
2025-09-16 15:54:58,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 30.0, 34.0, 27.0, 27.0, 30.0, 30.0, 31.0, 29.0, 27.0]
2025-09-16 15:54:58,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 41 seconds)
2025-09-16 15:56:58,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:56:58,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 154.81212 ± 8.896
2025-09-16 15:56:58,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [157.35329, 167.05724, 166.9197, 157.29562, 154.0306, 158.46857, 150.66267, 152.86952, 134.48506, 148.97888]
2025-09-16 15:56:58,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 33.0, 33.0, 31.0, 30.0, 31.0, 29.0, 30.0, 26.0, 29.0]
2025-09-16 15:56:58,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 21 minutes, 16 seconds)
2025-09-16 15:58:56,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:58:56,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 153.42401 ± 14.554
2025-09-16 15:58:56,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [174.24994, 140.5376, 135.3178, 163.48415, 171.86246, 162.49011, 140.32286, 145.53423, 135.85309, 164.58798]
2025-09-16 15:58:56,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 27.0, 26.0, 32.0, 34.0, 32.0, 27.0, 28.0, 26.0, 32.0]
2025-09-16 15:58:56,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 12 seconds)
2025-09-16 16:00:52,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:00:53,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 163.90338 ± 14.169
2025-09-16 16:00:53,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [145.84843, 157.64539, 159.4477, 140.61885, 163.14339, 188.2523, 177.93929, 168.87927, 158.12558, 179.1335]
2025-09-16 16:00:53,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 31.0, 31.0, 27.0, 32.0, 37.0, 35.0, 33.0, 31.0, 35.0]
2025-09-16 16:00:53,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 34 seconds)
2025-09-16 16:02:49,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:02:50,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 211.01956 ± 93.301
2025-09-16 16:02:50,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [129.67036, 143.87772, 153.70073, 145.35892, 154.70937, 322.7873, 353.96075, 162.5085, 377.85126, 165.77081]
2025-09-16 16:02:50,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 28.0, 30.0, 28.0, 30.0, 64.0, 70.0, 32.0, 79.0, 32.0]
2025-09-16 16:02:50,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 14 minutes)
2025-09-16 16:04:46,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:04:47,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 162.56255 ± 19.298
2025-09-16 16:04:47,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [161.61041, 161.31902, 194.84166, 173.18488, 130.20203, 163.34659, 143.87851, 188.10048, 168.53767, 140.60417]
2025-09-16 16:04:47,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 38.0, 34.0, 25.0, 32.0, 28.0, 37.0, 33.0, 27.0]
2025-09-16 16:04:47,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 11 minutes, 28 seconds)
2025-09-16 16:06:43,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:06:44,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 186.11467 ± 68.368
2025-09-16 16:06:44,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [165.06822, 144.93216, 149.16881, 135.83513, 159.4114, 300.33563, 155.01996, 160.45003, 149.85655, 341.0689]
2025-09-16 16:06:44,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 28.0, 29.0, 26.0, 31.0, 60.0, 30.0, 31.0, 29.0, 71.0]
2025-09-16 16:06:44,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 50 seconds)
2025-09-16 16:08:40,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:08:40,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 169.18742 ± 15.821
2025-09-16 16:08:40,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [203.02136, 166.33952, 149.01843, 167.66568, 153.87498, 173.37196, 160.62799, 167.87071, 191.55804, 158.52545]
2025-09-16 16:08:40,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 33.0, 29.0, 33.0, 30.0, 34.0, 31.0, 33.0, 38.0, 31.0]
2025-09-16 16:08:40,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 34 seconds)
2025-09-16 16:10:36,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:10:37,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 154.16626 ± 12.477
2025-09-16 16:10:37,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [177.22475, 149.76302, 149.9641, 158.44043, 148.98395, 155.35391, 136.2136, 135.10136, 167.2289, 163.38861]
2025-09-16 16:10:37,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 29.0, 29.0, 31.0, 29.0, 30.0, 26.0, 26.0, 33.0, 32.0]
2025-09-16 16:10:37,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 4 minutes, 38 seconds)
2025-09-16 16:12:33,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:12:33,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 185.00293 ± 69.362
2025-09-16 16:12:33,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [145.19453, 332.16794, 144.5796, 154.73384, 154.30904, 158.94771, 154.82254, 313.39835, 157.09659, 134.77924]
2025-09-16 16:12:33,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 67.0, 28.0, 30.0, 30.0, 31.0, 30.0, 64.0, 31.0, 26.0]
2025-09-16 16:12:33,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 32 seconds)
2025-09-16 16:14:30,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:14:30,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 152.14200 ± 19.049
2025-09-16 16:14:30,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [140.08304, 163.36972, 140.30966, 195.92012, 159.86313, 141.00409, 165.45097, 130.40643, 154.3751, 130.63773]
2025-09-16 16:14:30,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 32.0, 27.0, 39.0, 31.0, 27.0, 33.0, 25.0, 30.0, 25.0]
2025-09-16 16:14:30,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 31 seconds)
2025-09-16 16:16:26,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:16:26,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 170.93358 ± 40.626
2025-09-16 16:16:26,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [164.81244, 162.76776, 288.6014, 172.66681, 140.00523, 154.24817, 139.15974, 169.9452, 158.24876, 158.8802]
2025-09-16 16:16:26,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 58.0, 34.0, 27.0, 30.0, 27.0, 33.0, 31.0, 31.0]
2025-09-16 16:16:26,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 58 minutes, 30 seconds)
2025-09-16 16:18:23,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:18:24,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 233.43462 ± 109.066
2025-09-16 16:18:24,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [164.07738, 361.1024, 175.11458, 345.5107, 470.77817, 175.10347, 145.20123, 145.5249, 172.12273, 179.81047]
2025-09-16 16:18:24,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 76.0, 34.0, 69.0, 94.0, 34.0, 28.0, 28.0, 34.0, 35.0]
2025-09-16 16:18:24,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 56 minutes, 41 seconds)
2025-09-16 16:20:20,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:20:21,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 173.73076 ± 53.069
2025-09-16 16:20:21,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [154.20108, 325.7233, 197.46854, 159.50993, 143.8342, 148.52832, 135.52855, 148.36763, 159.71307, 164.43307]
2025-09-16 16:20:21,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 67.0, 39.0, 31.0, 28.0, 29.0, 26.0, 29.0, 31.0, 32.0]
2025-09-16 16:20:21,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 54 minutes, 48 seconds)
2025-09-16 16:22:17,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:22:17,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 156.62857 ± 15.664
2025-09-16 16:22:17,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [144.86569, 159.73871, 135.00735, 130.01476, 182.86111, 167.53795, 169.96674, 153.58748, 168.097, 154.60884]
2025-09-16 16:22:17,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 31.0, 26.0, 25.0, 36.0, 33.0, 33.0, 30.0, 33.0, 30.0]
2025-09-16 16:22:17,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 52 minutes, 52 seconds)
2025-09-16 16:24:13,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:24:14,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 158.50406 ± 6.562
2025-09-16 16:24:14,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [164.0603, 158.75072, 167.0463, 163.30467, 166.06384, 150.44737, 157.52583, 159.75797, 148.93343, 149.15018]
2025-09-16 16:24:14,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 31.0, 33.0, 32.0, 33.0, 29.0, 31.0, 31.0, 29.0, 29.0]
2025-09-16 16:24:14,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 50 minutes, 53 seconds)
2025-09-16 16:26:10,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:26:10,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 154.06863 ± 19.155
2025-09-16 16:26:10,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [125.16538, 149.5936, 161.25343, 175.80797, 149.33653, 167.43887, 125.633, 174.06119, 135.33124, 177.06497]
2025-09-16 16:26:10,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 29.0, 31.0, 35.0, 29.0, 33.0, 24.0, 34.0, 26.0, 35.0]
2025-09-16 16:26:10,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 48 minutes, 57 seconds)
2025-09-16 16:28:07,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:28:07,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 165.01012 ± 14.342
2025-09-16 16:28:07,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [144.78577, 190.2221, 153.84816, 171.57187, 159.23213, 173.2174, 145.32425, 157.55081, 175.63536, 178.71338]
2025-09-16 16:28:07,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 37.0, 30.0, 34.0, 31.0, 34.0, 28.0, 31.0, 35.0, 35.0]
2025-09-16 16:28:07,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 46 minutes, 59 seconds)
2025-09-16 16:30:04,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:30:05,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 160.27371 ± 19.663
2025-09-16 16:30:05,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [159.0708, 161.35982, 166.4043, 203.41383, 173.28206, 150.39809, 140.39436, 145.18576, 173.12564, 130.1025]
2025-09-16 16:30:05,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 31.0, 32.0, 39.0, 34.0, 29.0, 27.0, 28.0, 34.0, 25.0]
2025-09-16 16:30:05,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 7 seconds)
2025-09-16 16:32:00,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:32:01,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 169.81711 ± 17.728
2025-09-16 16:32:01,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [150.54024, 179.7708, 144.36757, 169.51053, 159.71141, 166.00902, 204.52306, 155.49197, 190.78812, 177.45839]
2025-09-16 16:32:01,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 35.0, 28.0, 33.0, 31.0, 32.0, 40.0, 30.0, 37.0, 35.0]
2025-09-16 16:32:01,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 43 minutes, 3 seconds)
2025-09-16 16:33:56,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:33:57,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 172.17467 ± 25.290
2025-09-16 16:33:57,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [171.40071, 215.07123, 168.91379, 140.71684, 159.04222, 140.55777, 170.97414, 160.35522, 219.41808, 175.29675]
2025-09-16 16:33:57,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 42.0, 33.0, 27.0, 31.0, 27.0, 33.0, 31.0, 43.0, 34.0]
2025-09-16 16:33:57,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 41 minutes, 3 seconds)
2025-09-16 16:35:52,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:35:52,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 187.51672 ± 50.591
2025-09-16 16:35:52,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [203.87663, 169.61304, 329.3461, 185.49226, 158.50807, 169.68753, 150.8357, 148.50685, 198.83781, 160.46327]
2025-09-16 16:35:52,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 33.0, 66.0, 37.0, 31.0, 33.0, 29.0, 29.0, 39.0, 31.0]
2025-09-16 16:35:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 38 minutes, 58 seconds)
2025-09-16 16:37:47,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:37:48,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 148.03268 ± 12.426
2025-09-16 16:37:48,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [119.11104, 149.23242, 158.77383, 154.5447, 144.66342, 164.18103, 145.26573, 134.80843, 155.11168, 154.6347]
2025-09-16 16:37:48,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 29.0, 31.0, 30.0, 28.0, 32.0, 28.0, 26.0, 30.0, 30.0]
2025-09-16 16:37:48,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 36 minutes, 43 seconds)
2025-09-16 16:39:42,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:39:43,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 159.83508 ± 12.693
2025-09-16 16:39:43,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [134.45293, 159.69754, 163.47342, 162.96782, 144.53256, 158.27045, 185.83278, 163.19092, 162.50537, 163.42706]
2025-09-16 16:39:43,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 31.0, 32.0, 32.0, 28.0, 31.0, 36.0, 32.0, 32.0, 32.0]
2025-09-16 16:39:43,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 34 minutes, 24 seconds)
2025-09-16 16:41:38,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:41:38,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 162.43979 ± 17.578
2025-09-16 16:41:38,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [170.7849, 153.71953, 143.5464, 165.52058, 149.10547, 163.91676, 158.57074, 166.60172, 207.78246, 144.84924]
2025-09-16 16:41:38,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 30.0, 28.0, 32.0, 29.0, 32.0, 31.0, 33.0, 42.0, 28.0]
2025-09-16 16:41:38,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 24 seconds)
2025-09-16 16:43:33,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:43:33,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 163.29884 ± 20.554
2025-09-16 16:43:33,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [158.75645, 140.95976, 195.46454, 204.65509, 145.443, 154.25816, 145.0681, 175.0256, 157.69794, 155.65982]
2025-09-16 16:43:33,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 38.0, 41.0, 28.0, 30.0, 28.0, 34.0, 31.0, 30.0]
2025-09-16 16:43:33,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 21 seconds)
2025-09-16 16:45:29,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:45:30,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 242.75723 ± 83.429
2025-09-16 16:45:30,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [201.0455, 318.35147, 396.91928, 369.1172, 195.0141, 181.51846, 155.72192, 255.78816, 178.90475, 175.19136]
2025-09-16 16:45:30,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 64.0, 78.0, 71.0, 38.0, 35.0, 30.0, 51.0, 34.0, 34.0]
2025-09-16 16:45:30,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes, 31 seconds)
2025-09-16 16:47:26,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:47:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 207.23177 ± 97.968
2025-09-16 16:47:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [423.72302, 144.65302, 168.48819, 135.68498, 140.46364, 178.06425, 154.15936, 179.6235, 171.64793, 375.80978]
2025-09-16 16:47:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 28.0, 33.0, 26.0, 27.0, 35.0, 30.0, 35.0, 34.0, 75.0]
2025-09-16 16:47:26,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 26 minutes, 46 seconds)
2025-09-16 16:49:21,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:49:22,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 170.77542 ± 47.717
2025-09-16 16:49:22,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [152.84163, 149.4822, 150.19719, 139.71298, 159.23863, 159.70038, 161.70987, 153.50523, 169.2831, 312.08292]
2025-09-16 16:49:22,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 29.0, 29.0, 27.0, 31.0, 31.0, 32.0, 30.0, 33.0, 63.0]
2025-09-16 16:49:22,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 24 minutes, 55 seconds)
2025-09-16 16:51:17,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:51:18,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 151.35880 ± 8.282
2025-09-16 16:51:18,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [158.93024, 149.48337, 144.29779, 140.08702, 153.00703, 149.87346, 153.5685, 167.83032, 139.66096, 156.8492]
2025-09-16 16:51:18,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 29.0, 28.0, 27.0, 30.0, 29.0, 30.0, 33.0, 27.0, 31.0]
2025-09-16 16:51:18,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 23 minutes, 3 seconds)
2025-09-16 16:53:13,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:53:13,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 145.36481 ± 7.887
2025-09-16 16:53:13,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [156.90129, 140.27965, 130.38892, 144.45522, 140.09082, 149.56393, 149.68292, 145.0365, 157.3587, 139.89024]
2025-09-16 16:53:13,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 25.0, 28.0, 27.0, 29.0, 29.0, 28.0, 31.0, 27.0]
2025-09-16 16:53:13,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 21 minutes, 7 seconds)
2025-09-16 16:55:08,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:55:08,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 174.85452 ± 60.267
2025-09-16 16:55:08,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [148.70432, 354.63016, 145.11343, 154.01723, 154.8431, 165.11497, 153.42314, 159.18724, 165.22757, 148.28403]
2025-09-16 16:55:08,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 75.0, 28.0, 30.0, 30.0, 32.0, 30.0, 31.0, 33.0, 29.0]
2025-09-16 16:55:08,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 19 minutes, 1 second)
2025-09-16 16:57:03,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:57:04,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 160.12929 ± 10.171
2025-09-16 16:57:04,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [172.27702, 167.67456, 154.2931, 159.00255, 168.1083, 175.28862, 152.53313, 156.95021, 155.28429, 139.88109]
2025-09-16 16:57:04,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 33.0, 30.0, 31.0, 33.0, 34.0, 30.0, 31.0, 30.0, 27.0]
2025-09-16 16:57:04,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 16 minutes, 58 seconds)
2025-09-16 16:58:58,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:58:59,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 182.14462 ± 63.532
2025-09-16 16:58:59,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [178.19395, 149.23164, 134.46399, 314.3166, 158.1698, 139.96396, 160.82596, 158.43657, 297.97818, 129.86557]
2025-09-16 16:58:59,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 29.0, 26.0, 63.0, 31.0, 27.0, 31.0, 31.0, 60.0, 25.0]
2025-09-16 16:58:59,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 14 minutes, 58 seconds)
2025-09-16 17:00:53,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:00:54,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 160.41797 ± 14.363
2025-09-16 17:00:54,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [157.87415, 153.3825, 174.45726, 150.40894, 135.3309, 183.22087, 163.11972, 166.06587, 176.30238, 144.0171]
2025-09-16 17:00:54,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 30.0, 35.0, 29.0, 26.0, 36.0, 32.0, 33.0, 35.0, 28.0]
2025-09-16 17:00:54,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 12 minutes, 58 seconds)
2025-09-16 17:02:49,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:02:49,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 177.26643 ± 58.543
2025-09-16 17:02:49,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [135.69675, 135.49422, 168.31815, 341.99008, 149.50824, 206.3301, 144.46118, 173.9768, 149.07764, 167.81119]
2025-09-16 17:02:49,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 26.0, 33.0, 70.0, 29.0, 41.0, 28.0, 34.0, 29.0, 33.0]
2025-09-16 17:02:49,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 11 minutes, 4 seconds)
2025-09-16 17:04:44,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:04:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 196.07976 ± 84.321
2025-09-16 17:04:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [162.70616, 150.1539, 333.6339, 389.5862, 150.1047, 149.8137, 167.02863, 135.15758, 173.20409, 149.4088]
2025-09-16 17:04:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 29.0, 67.0, 75.0, 29.0, 29.0, 33.0, 26.0, 34.0, 29.0]
2025-09-16 17:04:45,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 9 minutes, 13 seconds)
2025-09-16 17:06:40,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:06:41,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 193.10457 ± 79.967
2025-09-16 17:06:41,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [287.6624, 400.43784, 149.01367, 157.99226, 158.61, 154.3181, 135.38809, 165.22643, 162.5741, 159.82283]
2025-09-16 17:06:41,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 86.0, 29.0, 31.0, 31.0, 30.0, 26.0, 32.0, 32.0, 31.0]
2025-09-16 17:06:41,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 7 minutes, 20 seconds)
2025-09-16 17:08:36,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:08:36,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 152.58168 ± 12.765
2025-09-16 17:08:36,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [131.10095, 145.61638, 140.1217, 149.06355, 157.81508, 144.31386, 163.04047, 178.79906, 157.21155, 158.73412]
2025-09-16 17:08:36,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 28.0, 27.0, 29.0, 31.0, 28.0, 32.0, 35.0, 31.0, 31.0]
2025-09-16 17:08:36,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 minutes, 28 seconds)
2025-09-16 17:10:31,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:10:32,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 177.01961 ± 69.680
2025-09-16 17:10:32,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [158.89035, 158.9252, 154.01967, 148.45146, 162.97923, 384.8427, 158.58485, 152.17238, 134.70018, 156.62982]
2025-09-16 17:10:32,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 31.0, 30.0, 29.0, 32.0, 79.0, 31.0, 30.0, 26.0, 31.0]
2025-09-16 17:10:32,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 3 minutes, 33 seconds)
2025-09-16 17:12:26,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:12:27,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 147.75922 ± 9.890
2025-09-16 17:12:27,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [148.89383, 149.31104, 144.70042, 140.17699, 149.743, 144.12643, 125.49478, 159.85562, 162.79396, 152.49608]
2025-09-16 17:12:27,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 28.0, 27.0, 29.0, 28.0, 24.0, 31.0, 32.0, 30.0]
2025-09-16 17:12:27,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 34 seconds)
2025-09-16 17:14:21,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:14:22,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 158.59500 ± 15.214
2025-09-16 17:14:22,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [139.7012, 168.52925, 145.63788, 162.45135, 190.1765, 163.68709, 139.9221, 168.4591, 162.60326, 144.78236]
2025-09-16 17:14:22,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 33.0, 28.0, 32.0, 38.0, 33.0, 27.0, 33.0, 32.0, 28.0]
2025-09-16 17:14:22,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 59 minutes, 36 seconds)
2025-09-16 17:16:16,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:16:17,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 170.95767 ± 60.936
2025-09-16 17:16:17,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [159.15468, 167.70692, 158.7999, 155.61496, 134.61995, 159.39407, 350.06723, 125.11364, 154.36186, 144.74347]
2025-09-16 17:16:17,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 33.0, 31.0, 30.0, 26.0, 31.0, 73.0, 24.0, 30.0, 28.0]
2025-09-16 17:16:17,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 57 minutes, 35 seconds)
2025-09-16 17:18:11,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:18:12,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 147.54323 ± 8.745
2025-09-16 17:18:12,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [139.80353, 144.12248, 149.40878, 167.03151, 139.78406, 153.20984, 152.70267, 134.45296, 144.3349, 150.58138]
2025-09-16 17:18:12,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 28.0, 29.0, 33.0, 27.0, 30.0, 30.0, 26.0, 28.0, 29.0]
2025-09-16 17:18:12,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 55 minutes, 38 seconds)
2025-09-16 17:20:06,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:20:07,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 157.81090 ± 13.260
2025-09-16 17:20:07,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [154.39357, 145.11105, 162.2008, 153.90569, 144.61961, 194.10008, 153.55759, 153.78816, 161.76349, 154.6691]
2025-09-16 17:20:07,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 28.0, 32.0, 30.0, 28.0, 39.0, 30.0, 30.0, 32.0, 30.0]
2025-09-16 17:20:07,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 42 seconds)
2025-09-16 17:22:01,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:22:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 162.76120 ± 12.785
2025-09-16 17:22:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [158.922, 179.35178, 144.8341, 179.97806, 161.54271, 148.67728, 161.81749, 152.10005, 182.4817, 157.90694]
2025-09-16 17:22:02,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 36.0, 28.0, 35.0, 32.0, 29.0, 32.0, 30.0, 37.0, 31.0]
2025-09-16 17:22:02,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 47 seconds)
2025-09-16 17:23:56,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:23:57,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 145.70020 ± 11.855
2025-09-16 17:23:57,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [161.98044, 129.61887, 143.33125, 135.84383, 143.4473, 144.64027, 157.39702, 140.02724, 167.09734, 133.61847]
2025-09-16 17:23:57,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 25.0, 28.0, 26.0, 28.0, 28.0, 31.0, 27.0, 33.0, 26.0]
2025-09-16 17:23:57,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 50 seconds)
2025-09-16 17:25:51,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:25:51,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 152.32599 ± 10.682
2025-09-16 17:25:51,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [158.82849, 140.13779, 142.9367, 148.58217, 148.30623, 172.03294, 144.26376, 169.55263, 154.94997, 143.66928]
2025-09-16 17:25:51,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 28.0, 29.0, 29.0, 34.0, 28.0, 33.0, 30.0, 28.0]
2025-09-16 17:25:51,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 53 seconds)
2025-09-16 17:27:46,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:27:46,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 155.25301 ± 8.637
2025-09-16 17:27:46,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [157.74529, 157.00424, 149.80794, 140.04036, 160.4556, 173.55441, 161.2673, 154.54295, 149.29282, 148.81923]
2025-09-16 17:27:46,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 31.0, 29.0, 27.0, 32.0, 34.0, 32.0, 30.0, 29.0, 29.0]
2025-09-16 17:27:46,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 57 seconds)
2025-09-16 17:29:41,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:29:41,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 154.05742 ± 7.904
2025-09-16 17:29:41,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [156.87439, 154.02403, 158.43274, 159.09732, 152.84833, 153.32639, 145.24493, 161.95119, 135.48286, 163.29198]
2025-09-16 17:29:41,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 30.0, 31.0, 31.0, 30.0, 30.0, 28.0, 32.0, 26.0, 32.0]
2025-09-16 17:29:41,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 44 minutes)
2025-09-16 17:31:36,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:31:36,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 150.59184 ± 18.307
2025-09-16 17:31:36,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [135.23051, 130.0077, 135.11879, 149.38799, 155.1483, 190.11388, 172.2133, 134.83351, 143.3032, 160.56134]
2025-09-16 17:31:36,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 26.0, 29.0, 30.0, 38.0, 34.0, 26.0, 28.0, 32.0]
2025-09-16 17:31:36,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 42 minutes, 5 seconds)
2025-09-16 17:33:31,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:33:31,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 159.95251 ± 20.140
2025-09-16 17:33:31,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [161.80211, 174.75122, 152.36731, 169.1251, 140.3011, 134.60504, 179.69936, 160.8428, 196.58536, 129.44562]
2025-09-16 17:33:31,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 34.0, 30.0, 33.0, 27.0, 26.0, 35.0, 31.0, 39.0, 25.0]
2025-09-16 17:33:31,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 40 minutes, 11 seconds)
2025-09-16 17:35:25,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:35:26,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 154.95108 ± 10.717
2025-09-16 17:35:26,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [160.29324, 167.21077, 159.42531, 157.48746, 140.64075, 140.29855, 143.87766, 145.46329, 166.75983, 168.0541]
2025-09-16 17:35:26,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 33.0, 31.0, 31.0, 27.0, 27.0, 28.0, 28.0, 33.0, 33.0]
2025-09-16 17:35:26,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 17 seconds)
2025-09-16 17:37:20,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:37:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 149.07722 ± 12.368
2025-09-16 17:37:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [139.77219, 148.43858, 148.62329, 163.49664, 134.98087, 167.38142, 158.82274, 154.2113, 150.02042, 125.02478]
2025-09-16 17:37:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 29.0, 29.0, 32.0, 26.0, 33.0, 31.0, 30.0, 29.0, 24.0]
2025-09-16 17:37:20,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 21 seconds)
2025-09-16 17:39:15,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:39:15,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 150.55177 ± 11.331
2025-09-16 17:39:15,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [144.39952, 130.72304, 177.01497, 145.3412, 153.14345, 152.46887, 149.1506, 144.79747, 159.37888, 149.0997]
2025-09-16 17:39:15,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 25.0, 35.0, 28.0, 30.0, 30.0, 29.0, 28.0, 31.0, 29.0]
2025-09-16 17:39:15,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 26 seconds)
2025-09-16 17:41:09,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:41:10,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 176.14323 ± 72.506
2025-09-16 17:41:10,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [129.83415, 155.41699, 391.1132, 164.05856, 163.13084, 148.20297, 158.51817, 162.00829, 154.19931, 134.94983]
2025-09-16 17:41:10,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 30.0, 84.0, 32.0, 33.0, 29.0, 31.0, 32.0, 30.0, 26.0]
2025-09-16 17:41:10,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 29 seconds)
2025-09-16 17:43:04,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:43:05,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 153.90463 ± 13.398
2025-09-16 17:43:05,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [152.70027, 149.6932, 139.91318, 145.35786, 150.30469, 154.53445, 134.4171, 157.54114, 175.44684, 179.13776]
2025-09-16 17:43:05,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 29.0, 27.0, 28.0, 29.0, 30.0, 26.0, 31.0, 35.0, 36.0]
2025-09-16 17:43:05,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 35 seconds)
2025-09-16 17:44:59,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:45:00,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 188.50864 ± 87.205
2025-09-16 17:45:00,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [166.67825, 159.97789, 158.41972, 168.07178, 135.08168, 170.24762, 182.06071, 447.14545, 140.27386, 157.1294]
2025-09-16 17:45:00,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 31.0, 33.0, 26.0, 34.0, 36.0, 97.0, 27.0, 31.0]
2025-09-16 17:45:00,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 42 seconds)
2025-09-16 17:46:54,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:46:55,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 150.67328 ± 9.159
2025-09-16 17:46:55,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [139.87915, 159.3726, 157.65854, 166.25589, 143.94185, 161.27063, 143.71324, 145.12297, 149.67365, 139.84433]
2025-09-16 17:46:55,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 31.0, 31.0, 33.0, 28.0, 32.0, 28.0, 28.0, 29.0, 27.0]
2025-09-16 17:46:55,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 48 seconds)
2025-09-16 17:48:49,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:48:49,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 155.15385 ± 10.117
2025-09-16 17:48:49,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [159.89719, 149.38441, 150.21541, 153.3808, 167.85466, 134.47688, 145.62834, 165.63298, 157.96169, 167.10614]
2025-09-16 17:48:49,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 29.0, 29.0, 30.0, 33.0, 26.0, 28.0, 33.0, 31.0, 33.0]
2025-09-16 17:48:49,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 53 seconds)
2025-09-16 17:50:44,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:50:44,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 154.47223 ± 6.142
2025-09-16 17:50:44,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [149.30156, 159.4886, 149.58374, 144.32944, 157.2557, 157.33939, 155.26619, 148.10129, 165.71194, 158.34439]
2025-09-16 17:50:44,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 31.0, 29.0, 28.0, 31.0, 31.0, 30.0, 29.0, 33.0, 31.0]
2025-09-16 17:50:44,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 58 seconds)
2025-09-16 17:52:38,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:52:39,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 149.18994 ± 8.463
2025-09-16 17:52:39,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [163.9279, 144.0476, 139.68079, 140.31177, 148.4819, 145.12204, 156.87025, 160.34047, 139.88412, 153.2326]
2025-09-16 17:52:39,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 28.0, 27.0, 27.0, 29.0, 28.0, 31.0, 32.0, 27.0, 30.0]
2025-09-16 17:52:39,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 3 seconds)
2025-09-16 17:54:33,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:54:34,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 214.41663 ± 117.559
2025-09-16 17:54:34,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [162.6112, 144.5594, 158.09637, 159.75806, 294.6105, 209.30563, 169.04332, 149.44893, 543.119, 153.61385]
2025-09-16 17:54:34,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 28.0, 31.0, 31.0, 60.0, 41.0, 33.0, 29.0, 106.0, 30.0]
2025-09-16 17:54:34,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 7 seconds)
2025-09-16 17:56:28,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:56:29,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 156.50014 ± 21.481
2025-09-16 17:56:29,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [130.32306, 168.09952, 163.8495, 163.40758, 153.72289, 140.33852, 130.37111, 145.36885, 207.34123, 162.17905]
2025-09-16 17:56:29,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 33.0, 32.0, 32.0, 30.0, 27.0, 25.0, 28.0, 41.0, 32.0]
2025-09-16 17:56:29,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 13 seconds)
2025-09-16 17:58:24,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:58:24,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 156.11282 ± 10.120
2025-09-16 17:58:24,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [157.0874, 143.6121, 143.55853, 164.33788, 144.77805, 171.19095, 167.32436, 148.87689, 154.07101, 166.29103]
2025-09-16 17:58:24,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 28.0, 28.0, 33.0, 28.0, 34.0, 33.0, 29.0, 30.0, 33.0]
2025-09-16 17:58:24,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 19 seconds)
2025-09-16 18:00:19,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:00:19,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 149.88113 ± 7.933
2025-09-16 18:00:19,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [149.4091, 157.85707, 159.2271, 162.09897, 143.78522, 143.82205, 134.95757, 152.8261, 149.80618, 145.02197]
2025-09-16 18:00:19,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 31.0, 31.0, 32.0, 28.0, 28.0, 26.0, 30.0, 29.0, 28.0]
2025-09-16 18:00:19,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 24 seconds)
2025-09-16 18:02:13,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:02:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 153.09232 ± 7.433
2025-09-16 18:02:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [140.52585, 159.54799, 148.54945, 155.00447, 163.48987, 145.1366, 145.54805, 153.45953, 157.67737, 161.98393]
2025-09-16 18:02:14,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 32.0, 29.0, 30.0, 32.0, 28.0, 28.0, 30.0, 31.0, 32.0]
2025-09-16 18:02:14,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 29 seconds)
2025-09-16 18:04:07,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:04:07,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 153.31384 ± 17.493
2025-09-16 18:04:07,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [145.08308, 154.86676, 135.7154, 160.61351, 146.28073, 180.8772, 140.31879, 153.83849, 129.40906, 186.13539]
2025-09-16 18:04:07,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 30.0, 26.0, 31.0, 28.0, 35.0, 27.0, 30.0, 25.0, 36.0]
2025-09-16 18:04:07,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 33 seconds)
2025-09-16 18:05:59,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:05:59,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 164.19107 ± 44.136
2025-09-16 18:05:59,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [144.36037, 144.52797, 144.53773, 292.3875, 140.23283, 176.4415, 162.8156, 139.35931, 143.50041, 153.74744]
2025-09-16 18:05:59,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 28.0, 28.0, 57.0, 27.0, 35.0, 32.0, 27.0, 28.0, 30.0]
2025-09-16 18:06:00,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 36 seconds)
2025-09-16 18:07:51,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:07:51,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 178.96538 ± 68.749
2025-09-16 18:07:51,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [158.64116, 163.40128, 135.5485, 170.88367, 158.32039, 149.18707, 135.38187, 174.5122, 161.98718, 381.79053]
2025-09-16 18:07:51,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 32.0, 26.0, 34.0, 31.0, 29.0, 26.0, 34.0, 32.0, 75.0]
2025-09-16 18:07:51,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 40 seconds)
2025-09-16 18:09:44,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:09:44,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 153.97948 ± 8.490
2025-09-16 18:09:44,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [170.4065, 144.6857, 147.37962, 149.94992, 157.9115, 161.67265, 144.70079, 156.47212, 145.03366, 161.58229]
2025-09-16 18:09:44,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 28.0, 29.0, 29.0, 31.0, 32.0, 28.0, 31.0, 28.0, 32.0]
2025-09-16 18:09:44,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 46 seconds)
2025-09-16 18:11:39,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:11:39,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 173.72171 ± 78.111
2025-09-16 18:11:39,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [149.19682, 153.726, 129.8004, 162.85359, 154.04553, 140.66402, 405.85312, 153.89374, 129.61696, 157.56686]
2025-09-16 18:11:39,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 25.0, 32.0, 30.0, 27.0, 80.0, 30.0, 25.0, 31.0]
2025-09-16 18:11:39,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 53 seconds)
2025-09-16 18:13:33,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:13:33,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 148.25912 ± 12.403
2025-09-16 18:13:33,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [135.46442, 158.25812, 125.021576, 166.69257, 158.0333, 144.50444, 134.51385, 156.11012, 150.15686, 153.83601]
2025-09-16 18:13:33,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 31.0, 24.0, 33.0, 31.0, 28.0, 26.0, 31.0, 29.0, 30.0]
2025-09-16 18:13:33,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1251 [DEBUG]: Training session finished
