2025-09-16 12:41:30,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.150-delay_15
2025-09-16 12:41:30,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.150-delay_15
2025-09-16 12:41:30,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x1526767b0950>}
2025-09-16 12:41:30,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:41:30,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:41:30,225 baseline-bpql-noisepromille150-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=631, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:41:30,225 baseline-bpql-noisepromille150-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:41:31,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:41:31,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:43:17,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:43:18,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 303.76807 ± 109.274
2025-09-16 12:43:18,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [381.7278, 96.350685, 124.39432, 359.64755, 333.5899, 300.90826, 478.83353, 311.199, 360.61276, 290.41687]
2025-09-16 12:43:18,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 19.0, 24.0, 69.0, 61.0, 58.0, 106.0, 60.0, 67.0, 55.0]
2025-09-16 12:43:18,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (303.77) for latency 15
2025-09-16 12:43:18,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 55 minutes, 9 seconds)
2025-09-16 12:45:12,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:45:12,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 211.86800 ± 128.850
2025-09-16 12:45:12,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [267.49692, 99.87968, 122.96961, 491.34448, 101.93708, 272.54852, 156.8959, 112.88731, 117.10659, 375.61392]
2025-09-16 12:45:12,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 20.0, 24.0, 92.0, 20.0, 50.0, 30.0, 22.0, 23.0, 76.0]
2025-09-16 12:45:12,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 23 seconds)
2025-09-16 12:47:06,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:47:07,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 275.92041 ± 116.172
2025-09-16 12:47:07,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [157.27652, 440.09494, 333.41388, 320.17288, 194.87169, 342.27722, 102.30196, 112.649284, 374.65063, 381.49524]
2025-09-16 12:47:07,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 84.0, 65.0, 61.0, 41.0, 70.0, 20.0, 22.0, 67.0, 70.0]
2025-09-16 12:47:07,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 47 seconds)
2025-09-16 12:49:01,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:49:02,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 327.38715 ± 100.234
2025-09-16 12:49:02,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [397.44885, 381.98776, 384.05167, 158.04065, 347.54697, 430.3391, 242.69624, 418.58414, 366.51923, 146.65707]
2025-09-16 12:49:02,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 73.0, 71.0, 30.0, 66.0, 83.0, 47.0, 78.0, 69.0, 29.0]
2025-09-16 12:49:02,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (327.39) for latency 15
2025-09-16 12:49:02,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 17 seconds)
2025-09-16 12:50:57,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:50:58,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 333.54727 ± 72.315
2025-09-16 12:50:58,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [267.54785, 288.4226, 291.33044, 397.68958, 379.25018, 355.19064, 366.22726, 372.39325, 438.39804, 179.02303]
2025-09-16 12:50:58,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 52.0, 54.0, 74.0, 70.0, 75.0, 70.0, 80.0, 90.0, 36.0]
2025-09-16 12:50:58,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (333.55) for latency 15
2025-09-16 12:50:58,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 59 minutes, 30 seconds)
2025-09-16 12:52:53,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:52:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 239.27231 ± 73.721
2025-09-16 12:52:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [268.69366, 179.58434, 257.26855, 264.10876, 383.3222, 134.5256, 280.7974, 282.23624, 124.24701, 217.93953]
2025-09-16 12:52:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 39.0, 51.0, 50.0, 73.0, 26.0, 58.0, 62.0, 24.0, 43.0]
2025-09-16 12:52:54,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 29 seconds)
2025-09-16 12:54:48,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:54:49,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 320.35696 ± 119.096
2025-09-16 12:54:49,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [305.46555, 379.0066, 520.74713, 343.36673, 113.20967, 404.61792, 312.39737, 386.3467, 325.51013, 112.901505]
2025-09-16 12:54:49,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 67.0, 114.0, 66.0, 22.0, 87.0, 58.0, 71.0, 73.0, 22.0]
2025-09-16 12:54:49,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 58 minutes, 44 seconds)
2025-09-16 12:56:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:56:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 375.38467 ± 148.498
2025-09-16 12:56:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [507.42487, 357.31992, 130.7814, 107.71885, 341.57724, 541.9128, 361.60083, 572.42896, 415.70773, 417.3742]
2025-09-16 12:56:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 68.0, 25.0, 21.0, 64.0, 104.0, 72.0, 116.0, 86.0, 77.0]
2025-09-16 12:56:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (375.38) for latency 15
2025-09-16 12:56:45,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 57 minutes, 15 seconds)
2025-09-16 12:58:40,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:58:40,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 295.10977 ± 120.364
2025-09-16 12:58:40,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [271.36243, 309.6149, 379.3308, 508.46088, 432.80695, 307.50024, 95.19697, 125.116936, 284.9826, 236.72481]
2025-09-16 12:58:40,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 63.0, 73.0, 97.0, 96.0, 60.0, 19.0, 24.0, 54.0, 49.0]
2025-09-16 12:58:40,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 55 minutes, 25 seconds)
2025-09-16 13:00:34,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:00:35,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 298.87018 ± 122.570
2025-09-16 13:00:35,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.205605, 176.66634, 397.91376, 374.7962, 325.09055, 424.77295, 96.09041, 446.0682, 314.1398, 331.95804]
2025-09-16 13:00:35,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 38.0, 75.0, 79.0, 60.0, 82.0, 19.0, 93.0, 58.0, 62.0]
2025-09-16 13:00:35,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 53 minutes, 7 seconds)
2025-09-16 13:02:30,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:02:31,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 344.39047 ± 129.535
2025-09-16 13:02:31,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [96.91059, 436.3855, 354.3198, 367.26627, 546.93274, 286.28885, 390.64404, 377.9655, 144.20233, 442.98917]
2025-09-16 13:02:31,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 91.0, 66.0, 71.0, 104.0, 52.0, 71.0, 83.0, 28.0, 86.0]
2025-09-16 13:02:31,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 51 minutes, 24 seconds)
2025-09-16 13:04:26,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:04:26,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 188.44057 ± 121.222
2025-09-16 13:04:26,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [124.342026, 412.6601, 138.89754, 162.88957, 95.34191, 439.43915, 117.34984, 126.241104, 171.42459, 95.819786]
2025-09-16 13:04:26,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 82.0, 27.0, 31.0, 19.0, 93.0, 23.0, 25.0, 33.0, 19.0]
2025-09-16 13:04:26,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 49 minutes, 19 seconds)
2025-09-16 13:06:21,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:06:22,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 326.78430 ± 190.782
2025-09-16 13:06:22,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [211.61192, 108.213905, 484.95868, 375.47787, 660.638, 247.63991, 356.42685, 139.13373, 95.61905, 588.12305]
2025-09-16 13:06:22,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [43.0, 21.0, 107.0, 72.0, 125.0, 47.0, 67.0, 27.0, 19.0, 116.0]
2025-09-16 13:06:22,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 47 minutes, 13 seconds)
2025-09-16 13:08:16,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:08:17,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 362.99228 ± 182.690
2025-09-16 13:08:17,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [157.5025, 327.7289, 300.7568, 168.11372, 364.77856, 585.56934, 640.4654, 356.9616, 117.181274, 610.8647]
2025-09-16 13:08:17,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 62.0, 57.0, 32.0, 66.0, 105.0, 139.0, 67.0, 23.0, 117.0]
2025-09-16 13:08:17,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 45 minutes, 16 seconds)
2025-09-16 13:10:12,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:10:13,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 274.64639 ± 122.756
2025-09-16 13:10:13,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [147.9888, 418.55078, 128.44003, 291.91037, 366.6002, 95.04539, 391.80557, 154.81851, 336.51547, 414.7889]
2025-09-16 13:10:13,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 77.0, 25.0, 54.0, 70.0, 19.0, 86.0, 30.0, 65.0, 78.0]
2025-09-16 13:10:13,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 43 minutes, 34 seconds)
2025-09-16 13:12:08,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:12:08,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 302.39716 ± 142.189
2025-09-16 13:12:08,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.70893, 346.4531, 323.0661, 497.3034, 95.18205, 351.68735, 487.3963, 369.76965, 118.912094, 338.49234]
2025-09-16 13:12:08,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 64.0, 60.0, 93.0, 19.0, 76.0, 90.0, 68.0, 23.0, 63.0]
2025-09-16 13:12:08,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 41 minutes, 35 seconds)
2025-09-16 13:14:03,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:14:04,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 298.44443 ± 128.377
2025-09-16 13:14:04,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [486.27402, 122.3591, 299.66095, 368.94778, 132.44019, 381.27612, 89.94856, 339.4899, 380.92017, 383.1274]
2025-09-16 13:14:04,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 24.0, 56.0, 69.0, 26.0, 84.0, 18.0, 63.0, 69.0, 82.0]
2025-09-16 13:14:04,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 39 minutes, 55 seconds)
2025-09-16 13:15:59,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:16:00,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 304.92462 ± 127.555
2025-09-16 13:16:00,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [501.54053, 419.97876, 145.7711, 429.16943, 341.50278, 398.2511, 264.66327, 95.7631, 259.92853, 192.67757]
2025-09-16 13:16:00,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 76.0, 28.0, 92.0, 61.0, 74.0, 51.0, 19.0, 51.0, 37.0]
2025-09-16 13:16:00,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 38 minutes, 6 seconds)
2025-09-16 13:17:55,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:17:56,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 441.01221 ± 182.674
2025-09-16 13:17:56,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [329.95566, 576.1226, 564.65826, 367.88605, 456.69473, 284.08774, 339.05966, 733.9585, 101.66612, 656.03296]
2025-09-16 13:17:56,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 107.0, 103.0, 69.0, 84.0, 61.0, 63.0, 145.0, 20.0, 131.0]
2025-09-16 13:17:56,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (441.01) for latency 15
2025-09-16 13:17:56,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 36 minutes, 25 seconds)
2025-09-16 13:19:50,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:19:51,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 375.17618 ± 144.744
2025-09-16 13:19:51,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [380.885, 312.06686, 325.01147, 702.5556, 124.640976, 418.11575, 394.68484, 323.01956, 503.45642, 267.32526]
2025-09-16 13:19:51,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 58.0, 64.0, 129.0, 24.0, 90.0, 75.0, 63.0, 111.0, 52.0]
2025-09-16 13:19:51,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 34 minutes, 20 seconds)
2025-09-16 13:21:46,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:21:47,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 315.54718 ± 149.365
2025-09-16 13:21:47,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [367.11063, 484.1962, 573.8217, 96.09229, 96.48397, 166.38837, 291.41138, 375.95093, 350.34958, 353.66644]
2025-09-16 13:21:47,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 89.0, 107.0, 19.0, 19.0, 32.0, 54.0, 75.0, 65.0, 65.0]
2025-09-16 13:21:47,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 32 minutes, 25 seconds)
2025-09-16 13:23:42,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:23:43,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 288.72095 ± 123.241
2025-09-16 13:23:43,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [326.72595, 328.58386, 197.31708, 135.50337, 477.8683, 124.78746, 366.4202, 410.12567, 386.30154, 133.57605]
2025-09-16 13:23:43,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 61.0, 38.0, 26.0, 88.0, 24.0, 72.0, 77.0, 87.0, 26.0]
2025-09-16 13:23:43,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 28 seconds)
2025-09-16 13:25:38,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:25:39,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 305.20917 ± 137.712
2025-09-16 13:25:39,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.45907, 319.96213, 192.35902, 198.53404, 351.16144, 439.97714, 367.03604, 150.56717, 368.6009, 568.4352]
2025-09-16 13:25:39,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 57.0, 37.0, 38.0, 75.0, 80.0, 67.0, 29.0, 67.0, 104.0]
2025-09-16 13:25:39,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 33 seconds)
2025-09-16 13:27:33,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:27:34,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 331.49142 ± 147.039
2025-09-16 13:27:34,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [159.13086, 580.4214, 89.26882, 412.789, 488.94867, 388.20428, 339.4935, 332.73553, 365.65347, 158.26859]
2025-09-16 13:27:34,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 105.0, 18.0, 77.0, 92.0, 71.0, 62.0, 63.0, 66.0, 31.0]
2025-09-16 13:27:34,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 24 seconds)
2025-09-16 13:29:29,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:29:30,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 320.29041 ± 134.722
2025-09-16 13:29:30,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [134.67249, 478.98764, 342.46527, 496.71805, 107.54095, 322.85898, 358.30603, 397.2591, 150.36586, 413.72946]
2025-09-16 13:29:30,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 88.0, 63.0, 91.0, 21.0, 62.0, 65.0, 74.0, 29.0, 89.0]
2025-09-16 13:29:30,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 39 seconds)
2025-09-16 13:31:25,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:31:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 319.43790 ± 154.398
2025-09-16 13:31:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [107.70584, 175.84805, 518.6644, 505.99924, 546.7484, 307.65982, 241.0267, 374.8027, 287.23187, 128.69196]
2025-09-16 13:31:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 33.0, 100.0, 90.0, 113.0, 56.0, 45.0, 71.0, 54.0, 25.0]
2025-09-16 13:31:26,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 22 minutes, 48 seconds)
2025-09-16 13:33:21,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:33:22,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 341.89130 ± 184.967
2025-09-16 13:33:22,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [734.9635, 423.67307, 117.27497, 124.18065, 89.74968, 452.04547, 387.35327, 411.83533, 324.82666, 353.01025]
2025-09-16 13:33:22,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 85.0, 23.0, 24.0, 18.0, 84.0, 72.0, 84.0, 59.0, 65.0]
2025-09-16 13:33:22,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 21 minutes)
2025-09-16 13:35:18,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:35:19,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 338.88425 ± 152.720
2025-09-16 13:35:19,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [451.79132, 449.492, 343.38968, 366.0822, 118.038506, 361.43915, 101.214714, 495.3679, 154.87152, 547.15546]
2025-09-16 13:35:19,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 95.0, 63.0, 69.0, 23.0, 71.0, 20.0, 92.0, 30.0, 114.0]
2025-09-16 13:35:19,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 10 seconds)
2025-09-16 13:37:13,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:37:14,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 403.16620 ± 156.134
2025-09-16 13:37:14,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [365.33777, 320.6668, 376.22574, 364.42087, 324.00217, 776.4248, 129.87993, 408.78754, 494.68948, 471.227]
2025-09-16 13:37:14,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 70.0, 70.0, 69.0, 63.0, 158.0, 25.0, 88.0, 94.0, 87.0]
2025-09-16 13:37:14,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 18 seconds)
2025-09-16 13:39:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:39:10,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 281.62964 ± 173.884
2025-09-16 13:39:10,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [160.29677, 306.50928, 401.3528, 130.22064, 589.59375, 533.68024, 150.18011, 95.774994, 95.890396, 352.7976]
2025-09-16 13:39:10,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 58.0, 73.0, 25.0, 124.0, 97.0, 29.0, 19.0, 19.0, 64.0]
2025-09-16 13:39:10,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 15 minutes, 20 seconds)
2025-09-16 13:41:05,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:41:06,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 426.59210 ± 58.845
2025-09-16 13:41:06,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [305.35767, 435.98355, 439.50616, 530.8401, 412.5069, 426.42596, 476.20584, 469.5775, 364.76453, 404.75262]
2025-09-16 13:41:06,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 94.0, 78.0, 102.0, 76.0, 77.0, 88.0, 85.0, 65.0, 83.0]
2025-09-16 13:41:06,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 13 minutes, 22 seconds)
2025-09-16 13:43:02,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:43:03,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 311.02817 ± 155.186
2025-09-16 13:43:03,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [404.98685, 501.25012, 102.15516, 96.05902, 112.57139, 411.18643, 544.6116, 322.15945, 338.07028, 277.23145]
2025-09-16 13:43:03,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 109.0, 20.0, 19.0, 22.0, 74.0, 100.0, 60.0, 63.0, 50.0]
2025-09-16 13:43:03,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 32 seconds)
2025-09-16 13:44:57,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:44:58,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 445.86264 ± 168.448
2025-09-16 13:44:58,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [382.8652, 774.59106, 116.818924, 408.48788, 552.35205, 460.99716, 421.17035, 629.14185, 347.0207, 365.1808]
2025-09-16 13:44:58,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 158.0, 23.0, 73.0, 102.0, 85.0, 74.0, 124.0, 65.0, 67.0]
2025-09-16 13:44:58,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (445.86) for latency 15
2025-09-16 13:44:58,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 27 seconds)
2025-09-16 13:46:54,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:46:55,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 392.47772 ± 73.051
2025-09-16 13:46:55,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [310.12393, 557.15753, 378.7633, 394.55182, 386.40668, 323.31406, 488.07108, 386.47217, 321.14853, 378.76837]
2025-09-16 13:46:55,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 104.0, 70.0, 71.0, 73.0, 61.0, 89.0, 69.0, 59.0, 69.0]
2025-09-16 13:46:55,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 43 seconds)
2025-09-16 13:48:50,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:48:51,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 355.04163 ± 181.442
2025-09-16 13:48:51,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [452.59488, 411.1483, 487.95367, 152.31064, 107.75034, 438.58698, 712.3503, 108.66783, 319.54306, 359.5105]
2025-09-16 13:48:51,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 77.0, 88.0, 29.0, 21.0, 82.0, 145.0, 21.0, 60.0, 66.0]
2025-09-16 13:48:51,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 5 minutes, 52 seconds)
2025-09-16 13:50:46,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:50:47,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 352.61420 ± 175.694
2025-09-16 13:50:47,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [438.94623, 328.05588, 108.02867, 90.290115, 119.23307, 391.21774, 470.87628, 443.5254, 590.50305, 545.4655]
2025-09-16 13:50:47,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 59.0, 21.0, 18.0, 23.0, 72.0, 86.0, 79.0, 109.0, 101.0]
2025-09-16 13:50:47,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 4 minutes)
2025-09-16 13:52:42,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:52:44,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 475.43536 ± 217.330
2025-09-16 13:52:44,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [311.2005, 453.27753, 340.87292, 456.61313, 626.8188, 996.576, 135.4572, 450.16843, 578.6894, 404.67975]
2025-09-16 13:52:44,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 90.0, 62.0, 82.0, 121.0, 208.0, 26.0, 84.0, 108.0, 75.0]
2025-09-16 13:52:44,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (475.44) for latency 15
2025-09-16 13:52:44,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 2 seconds)
2025-09-16 13:54:40,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:54:41,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 348.65219 ± 159.050
2025-09-16 13:54:41,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [315.27634, 134.09103, 323.93085, 155.65857, 401.8166, 113.467445, 529.1055, 495.59497, 462.00198, 555.5786]
2025-09-16 13:54:41,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 26.0, 63.0, 30.0, 72.0, 22.0, 108.0, 89.0, 96.0, 101.0]
2025-09-16 13:54:41,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 23 seconds)
2025-09-16 13:56:36,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:56:37,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 369.31998 ± 169.250
2025-09-16 13:56:37,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.39387, 110.40206, 362.91177, 293.52078, 363.272, 357.4038, 417.23853, 442.20792, 567.9431, 676.906]
2025-09-16 13:56:37,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 69.0, 56.0, 67.0, 66.0, 91.0, 98.0, 113.0, 128.0]
2025-09-16 13:56:37,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 58 minutes, 22 seconds)
2025-09-16 13:58:33,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:58:34,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 327.49335 ± 128.277
2025-09-16 13:58:34,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [429.27774, 352.08252, 119.19178, 392.5909, 95.52978, 238.41217, 411.5329, 375.8599, 347.44754, 513.0085]
2025-09-16 13:58:34,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 68.0, 23.0, 71.0, 19.0, 46.0, 76.0, 81.0, 64.0, 96.0]
2025-09-16 13:58:34,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 56 minutes, 30 seconds)
2025-09-16 14:00:29,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:00:30,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 437.15527 ± 184.553
2025-09-16 14:00:30,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [425.40616, 638.3791, 373.8992, 107.94391, 513.4324, 682.25085, 596.0382, 495.95483, 133.98381, 404.26398]
2025-09-16 14:00:30,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 114.0, 70.0, 21.0, 103.0, 133.0, 108.0, 88.0, 26.0, 75.0]
2025-09-16 14:00:30,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 54 minutes, 35 seconds)
2025-09-16 14:02:25,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:02:26,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 382.73517 ± 175.262
2025-09-16 14:02:26,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [424.02383, 363.59384, 358.5214, 384.37982, 784.9251, 453.10345, 356.3524, 150.3335, 449.74506, 102.37309]
2025-09-16 14:02:26,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 67.0, 67.0, 79.0, 141.0, 83.0, 64.0, 29.0, 82.0, 20.0]
2025-09-16 14:02:26,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 52 minutes, 33 seconds)
2025-09-16 14:04:21,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:04:22,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 472.98706 ± 230.392
2025-09-16 14:04:22,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [108.24052, 273.3489, 683.05255, 845.4625, 135.29646, 373.7847, 645.1513, 581.6644, 531.7235, 552.1457]
2025-09-16 14:04:22,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 51.0, 138.0, 165.0, 26.0, 71.0, 119.0, 108.0, 100.0, 100.0]
2025-09-16 14:04:22,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 50 minutes, 25 seconds)
2025-09-16 14:06:17,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:06:18,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 540.70087 ± 336.786
2025-09-16 14:06:18,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [475.37778, 106.67282, 635.0957, 1199.5807, 1048.186, 380.70483, 605.3082, 360.22076, 472.29105, 123.57087]
2025-09-16 14:06:18,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 21.0, 113.0, 232.0, 201.0, 82.0, 115.0, 66.0, 90.0, 24.0]
2025-09-16 14:06:18,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (540.70) for latency 15
2025-09-16 14:06:18,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 48 minutes, 31 seconds)
2025-09-16 14:08:14,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:08:15,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 234.54016 ± 221.678
2025-09-16 14:08:15,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.97052, 129.86565, 108.214355, 101.09465, 825.22424, 102.10352, 118.024216, 411.9974, 316.60703, 130.30016]
2025-09-16 14:08:15,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 25.0, 21.0, 20.0, 154.0, 20.0, 23.0, 78.0, 59.0, 25.0]
2025-09-16 14:08:15,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 46 minutes, 35 seconds)
2025-09-16 14:10:10,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:10:11,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 354.02853 ± 157.170
2025-09-16 14:10:11,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [368.79095, 396.7339, 284.44095, 465.57068, 135.9638, 555.18994, 108.55559, 542.5968, 193.68051, 488.76248]
2025-09-16 14:10:11,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 73.0, 58.0, 87.0, 26.0, 99.0, 21.0, 100.0, 37.0, 88.0]
2025-09-16 14:10:11,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 44 minutes, 33 seconds)
2025-09-16 14:12:06,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:12:07,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 366.47546 ± 202.839
2025-09-16 14:12:07,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.00953, 310.12186, 107.39091, 472.25174, 398.58072, 591.5718, 748.44867, 366.6199, 439.93454, 128.82487]
2025-09-16 14:12:07,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 60.0, 21.0, 87.0, 74.0, 111.0, 148.0, 79.0, 96.0, 25.0]
2025-09-16 14:12:07,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 40 seconds)
2025-09-16 14:14:02,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:14:03,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 432.99258 ± 249.762
2025-09-16 14:14:03,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [418.26602, 1074.0614, 419.42383, 124.046036, 605.86694, 335.39673, 348.35376, 485.86603, 198.91348, 319.73163]
2025-09-16 14:14:03,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 203.0, 75.0, 24.0, 111.0, 60.0, 62.0, 88.0, 39.0, 58.0]
2025-09-16 14:14:03,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 40 minutes, 44 seconds)
2025-09-16 14:16:00,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:16:00,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 297.51266 ± 161.269
2025-09-16 14:16:00,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [608.8505, 418.6839, 132.5555, 444.30826, 102.04321, 127.746704, 378.51096, 322.91183, 137.97662, 301.5394]
2025-09-16 14:16:00,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 78.0, 26.0, 79.0, 20.0, 25.0, 69.0, 60.0, 27.0, 57.0]
2025-09-16 14:16:00,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 38 minutes, 55 seconds)
2025-09-16 14:17:56,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:17:57,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 457.66348 ± 262.377
2025-09-16 14:17:57,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [161.84126, 394.50983, 168.07996, 453.15848, 589.26196, 389.12598, 414.75363, 414.67377, 435.81497, 1155.4148]
2025-09-16 14:17:57,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 78.0, 32.0, 84.0, 109.0, 83.0, 75.0, 83.0, 76.0, 240.0]
2025-09-16 14:17:57,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 37 minutes, 1 second)
2025-09-16 14:19:52,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:19:53,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 396.54376 ± 137.222
2025-09-16 14:19:53,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [509.1824, 490.93185, 456.23157, 414.57983, 475.9253, 520.2643, 391.9447, 124.231026, 440.71103, 141.43562]
2025-09-16 14:19:53,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 89.0, 85.0, 75.0, 99.0, 98.0, 72.0, 24.0, 96.0, 27.0]
2025-09-16 14:19:53,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 4 seconds)
2025-09-16 14:21:50,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:21:52,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 598.59900 ± 169.666
2025-09-16 14:21:52,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [445.22153, 764.7887, 410.1971, 361.56122, 512.035, 792.65485, 755.40393, 716.4161, 788.4313, 439.28094]
2025-09-16 14:21:52,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 142.0, 74.0, 74.0, 93.0, 152.0, 140.0, 143.0, 149.0, 90.0]
2025-09-16 14:21:52,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (598.60) for latency 15
2025-09-16 14:21:52,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 33 minutes, 29 seconds)
2025-09-16 14:23:47,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:23:48,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 453.40787 ± 245.985
2025-09-16 14:23:48,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [426.70575, 505.83682, 404.24933, 705.35736, 429.3011, 743.3042, 869.5406, 124.90154, 143.69759, 181.18417]
2025-09-16 14:23:48,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 110.0, 71.0, 132.0, 80.0, 139.0, 168.0, 24.0, 28.0, 34.0]
2025-09-16 14:23:48,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 31 minutes, 36 seconds)
2025-09-16 14:25:43,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:25:44,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 491.08237 ± 206.790
2025-09-16 14:25:44,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [629.3361, 648.89185, 600.8575, 666.77563, 591.3688, 112.69256, 649.93274, 96.93036, 504.65155, 409.38675]
2025-09-16 14:25:44,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 137.0, 111.0, 125.0, 117.0, 22.0, 135.0, 19.0, 109.0, 78.0]
2025-09-16 14:25:44,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 29 minutes, 32 seconds)
2025-09-16 14:27:40,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:27:41,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 368.12558 ± 238.010
2025-09-16 14:27:41,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [882.50195, 348.40955, 379.05505, 101.723976, 447.50284, 96.31432, 653.9511, 112.295746, 320.98148, 338.52008]
2025-09-16 14:27:41,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 64.0, 72.0, 20.0, 94.0, 19.0, 129.0, 22.0, 59.0, 61.0]
2025-09-16 14:27:41,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 27 minutes, 31 seconds)
2025-09-16 14:29:36,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:29:37,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 480.26373 ± 218.003
2025-09-16 14:29:37,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [112.430695, 700.3461, 417.01196, 568.485, 701.37366, 779.87305, 112.32997, 415.08588, 518.26416, 477.43652]
2025-09-16 14:29:37,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 129.0, 74.0, 116.0, 137.0, 144.0, 22.0, 81.0, 91.0, 94.0]
2025-09-16 14:29:37,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 25 minutes, 41 seconds)
2025-09-16 14:31:33,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:31:35,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 558.50977 ± 308.249
2025-09-16 14:31:35,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [541.7085, 101.89465, 880.9453, 483.85422, 721.73987, 582.3346, 1184.6552, 479.55176, 500.43536, 107.97815]
2025-09-16 14:31:35,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 20.0, 163.0, 91.0, 130.0, 121.0, 225.0, 92.0, 98.0, 21.0]
2025-09-16 14:31:35,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 23 minutes, 37 seconds)
2025-09-16 14:33:32,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:33:33,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 472.86792 ± 175.905
2025-09-16 14:33:33,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [354.006, 483.4848, 800.68085, 95.427216, 534.7689, 555.322, 386.7353, 538.40533, 384.24722, 595.60156]
2025-09-16 14:33:33,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 89.0, 143.0, 19.0, 93.0, 107.0, 69.0, 101.0, 74.0, 107.0]
2025-09-16 14:33:33,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 21 minutes, 51 seconds)
2025-09-16 14:35:27,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:35:28,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 411.77203 ± 209.026
2025-09-16 14:35:28,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [398.66095, 131.2741, 405.88025, 349.22675, 546.70654, 366.55118, 888.079, 568.9686, 323.5661, 138.80699]
2025-09-16 14:35:28,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 25.0, 75.0, 63.0, 96.0, 67.0, 174.0, 108.0, 74.0, 27.0]
2025-09-16 14:35:28,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 19 minutes, 47 seconds)
2025-09-16 14:37:26,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:37:27,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 463.29156 ± 157.370
2025-09-16 14:37:27,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [384.25833, 463.25064, 108.078896, 560.1851, 355.21762, 517.5643, 714.8298, 545.72723, 393.1262, 590.67755]
2025-09-16 14:37:27,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 88.0, 21.0, 121.0, 77.0, 92.0, 131.0, 111.0, 74.0, 110.0]
2025-09-16 14:37:27,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 11 seconds)
2025-09-16 14:39:21,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:39:22,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 383.72687 ± 233.092
2025-09-16 14:39:22,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [432.57925, 112.42027, 880.60443, 101.52801, 326.55194, 406.80124, 317.8522, 431.81583, 160.01656, 667.09894]
2025-09-16 14:39:22,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 22.0, 163.0, 20.0, 60.0, 80.0, 62.0, 81.0, 31.0, 126.0]
2025-09-16 14:39:22,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 3 seconds)
2025-09-16 14:41:20,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:41:21,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 489.53311 ± 211.291
2025-09-16 14:41:21,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [571.51245, 632.0593, 113.94166, 763.6402, 468.95993, 513.2892, 96.108086, 700.4436, 491.5958, 543.7809]
2025-09-16 14:41:21,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 121.0, 22.0, 144.0, 82.0, 90.0, 19.0, 126.0, 88.0, 98.0]
2025-09-16 14:41:21,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 13 seconds)
2025-09-16 14:43:15,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:43:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 480.87262 ± 324.182
2025-09-16 14:43:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [391.81305, 122.5372, 1115.115, 139.2772, 371.81024, 585.7179, 572.95685, 979.4237, 371.25034, 158.82425]
2025-09-16 14:43:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 24.0, 204.0, 27.0, 79.0, 104.0, 103.0, 198.0, 79.0, 30.0]
2025-09-16 14:43:16,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 11 minutes, 58 seconds)
2025-09-16 14:45:13,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:45:14,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 418.40033 ± 270.445
2025-09-16 14:45:14,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [616.5768, 721.7587, 793.03906, 466.98337, 96.34672, 665.23236, 95.17962, 124.63277, 108.559456, 495.69455]
2025-09-16 14:45:14,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 131.0, 153.0, 84.0, 19.0, 122.0, 19.0, 24.0, 21.0, 89.0]
2025-09-16 14:45:14,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 19 seconds)
2025-09-16 14:47:10,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:47:11,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 417.77304 ± 217.999
2025-09-16 14:47:11,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [613.55, 614.97394, 396.3604, 124.20381, 674.3572, 465.66, 107.70288, 119.15654, 658.05597, 403.70978]
2025-09-16 14:47:11,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 110.0, 73.0, 24.0, 132.0, 84.0, 21.0, 23.0, 115.0, 78.0]
2025-09-16 14:47:11,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 5 seconds)
2025-09-16 14:49:07,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:49:08,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 390.47824 ± 240.240
2025-09-16 14:49:08,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [554.3867, 388.25302, 692.3325, 112.47473, 661.0009, 492.41675, 112.957184, 96.000404, 656.7804, 138.17984]
2025-09-16 14:49:08,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 69.0, 128.0, 22.0, 128.0, 89.0, 22.0, 19.0, 135.0, 27.0]
2025-09-16 14:49:08,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 20 seconds)
2025-09-16 14:51:06,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:51:07,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 472.89209 ± 244.295
2025-09-16 14:51:07,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [351.08432, 108.40891, 489.73422, 112.216225, 881.45447, 468.26443, 641.41907, 818.02313, 413.99976, 444.31638]
2025-09-16 14:51:07,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 21.0, 92.0, 22.0, 156.0, 86.0, 120.0, 163.0, 75.0, 80.0]
2025-09-16 14:51:07,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 29 seconds)
2025-09-16 14:53:02,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:53:04,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 625.82404 ± 245.751
2025-09-16 14:53:04,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [427.4846, 113.31001, 796.01215, 553.033, 885.7886, 945.31024, 676.04596, 854.2426, 585.333, 421.67957]
2025-09-16 14:53:04,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 22.0, 164.0, 117.0, 159.0, 170.0, 126.0, 150.0, 103.0, 79.0]
2025-09-16 14:53:04,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (625.82) for latency 15
2025-09-16 14:53:04,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 37 seconds)
2025-09-16 14:54:59,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:55:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 457.21313 ± 292.026
2025-09-16 14:55:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [823.2124, 700.5077, 129.10506, 107.64546, 853.6489, 509.89227, 130.94827, 561.6937, 113.295784, 642.1817]
2025-09-16 14:55:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 147.0, 25.0, 21.0, 160.0, 92.0, 25.0, 102.0, 22.0, 134.0]
2025-09-16 14:55:00,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 34 seconds)
2025-09-16 14:56:57,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:56:59,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 649.59802 ± 192.265
2025-09-16 14:56:59,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [573.3501, 339.2791, 556.832, 704.64703, 726.04114, 359.6547, 771.3556, 682.75903, 1020.4427, 761.6187]
2025-09-16 14:56:59,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 63.0, 106.0, 139.0, 140.0, 66.0, 146.0, 128.0, 189.0, 144.0]
2025-09-16 14:56:59,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (649.60) for latency 15
2025-09-16 14:56:59,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 48 seconds)
2025-09-16 14:58:55,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:58:56,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 463.79095 ± 270.103
2025-09-16 14:58:56,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [486.4893, 407.19275, 891.42065, 648.1046, 759.8284, 689.1131, 108.10521, 113.20129, 119.0888, 415.36584]
2025-09-16 14:58:56,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 71.0, 161.0, 124.0, 130.0, 127.0, 21.0, 22.0, 23.0, 87.0]
2025-09-16 14:58:56,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 50 seconds)
2025-09-16 15:00:52,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:00:54,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 671.93207 ± 338.916
2025-09-16 15:00:54,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [143.57489, 434.4004, 124.44642, 585.3992, 909.4816, 1050.8385, 1022.7605, 537.2074, 998.37494, 912.8369]
2025-09-16 15:00:54,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 79.0, 24.0, 120.0, 170.0, 194.0, 190.0, 109.0, 183.0, 176.0]
2025-09-16 15:00:54,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (671.93) for latency 15
2025-09-16 15:00:54,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 46 seconds)
2025-09-16 15:02:49,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:02:51,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 540.85632 ± 359.553
2025-09-16 15:02:51,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [172.31013, 1232.7805, 499.92847, 881.0566, 394.627, 108.52539, 897.4951, 692.8167, 107.2296, 421.79376]
2025-09-16 15:02:51,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 233.0, 93.0, 176.0, 69.0, 21.0, 189.0, 132.0, 21.0, 78.0]
2025-09-16 15:02:51,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 51 seconds)
2025-09-16 15:04:47,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:04:49,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 548.76324 ± 380.134
2025-09-16 15:04:49,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [445.20953, 393.62515, 1065.1172, 108.31384, 438.5983, 632.34814, 504.7599, 406.31424, 1390.8473, 102.49915]
2025-09-16 15:04:49,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 74.0, 220.0, 21.0, 95.0, 115.0, 98.0, 72.0, 267.0, 20.0]
2025-09-16 15:04:49,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 50 minutes, 58 seconds)
2025-09-16 15:06:45,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:06:46,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 541.28070 ± 415.686
2025-09-16 15:06:46,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [687.6141, 353.91724, 97.36427, 1109.5046, 1382.2402, 762.3155, 279.60535, 113.7682, 171.51045, 454.96765]
2025-09-16 15:06:46,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 67.0, 19.0, 210.0, 239.0, 149.0, 53.0, 22.0, 33.0, 86.0]
2025-09-16 15:06:46,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 48 minutes, 56 seconds)
2025-09-16 15:08:43,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:08:44,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 440.81683 ± 262.804
2025-09-16 15:08:44,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [658.2626, 395.28394, 450.9187, 444.34488, 859.7884, 823.2775, 89.316, 428.62463, 155.97592, 102.375984]
2025-09-16 15:08:44,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 72.0, 80.0, 79.0, 160.0, 153.0, 18.0, 94.0, 30.0, 20.0]
2025-09-16 15:08:44,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 3 seconds)
2025-09-16 15:10:40,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:10:41,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 443.75781 ± 156.262
2025-09-16 15:10:41,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [560.1382, 627.2025, 124.93037, 493.7474, 326.19846, 586.6177, 301.34088, 626.67676, 404.78052, 385.94522]
2025-09-16 15:10:41,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 116.0, 24.0, 89.0, 71.0, 110.0, 57.0, 112.0, 73.0, 68.0]
2025-09-16 15:10:41,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 44 minutes, 58 seconds)
2025-09-16 15:12:37,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:12:39,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 731.22894 ± 327.874
2025-09-16 15:12:39,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [442.43954, 380.29044, 1181.501, 531.2618, 489.9723, 966.16394, 1360.2937, 484.54404, 904.7314, 571.0915]
2025-09-16 15:12:39,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 71.0, 221.0, 103.0, 89.0, 183.0, 250.0, 88.0, 166.0, 101.0]
2025-09-16 15:12:39,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (731.23) for latency 15
2025-09-16 15:12:39,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 8 seconds)
2025-09-16 15:14:35,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:14:37,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 575.22058 ± 204.075
2025-09-16 15:14:37,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [303.82346, 557.8287, 535.2366, 379.3975, 749.0037, 1011.44214, 520.62024, 771.06604, 375.15137, 548.6369]
2025-09-16 15:14:37,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 99.0, 99.0, 69.0, 133.0, 190.0, 92.0, 149.0, 69.0, 101.0]
2025-09-16 15:14:37,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 11 seconds)
2025-09-16 15:16:34,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:16:35,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 586.49731 ± 293.679
2025-09-16 15:16:35,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [450.61838, 593.0437, 644.03284, 517.821, 832.6854, 1111.8944, 124.1632, 612.5648, 124.46881, 853.6807]
2025-09-16 15:16:35,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 112.0, 119.0, 109.0, 160.0, 239.0, 24.0, 111.0, 24.0, 155.0]
2025-09-16 15:16:35,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 15 seconds)
2025-09-16 15:18:30,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:18:32,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 627.80823 ± 519.733
2025-09-16 15:18:32,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [431.5138, 404.59183, 102.99815, 568.4057, 1457.3951, 564.5064, 119.47821, 1643.869, 96.18815, 889.1358]
2025-09-16 15:18:32,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 72.0, 20.0, 106.0, 266.0, 127.0, 23.0, 307.0, 19.0, 156.0]
2025-09-16 15:18:32,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 12 seconds)
2025-09-16 15:20:30,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:20:32,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 676.74768 ± 403.695
2025-09-16 15:20:32,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [818.0709, 593.0156, 645.16656, 537.4403, 438.58386, 1741.4159, 844.7133, 119.17848, 532.5049, 497.38705]
2025-09-16 15:20:32,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 107.0, 111.0, 95.0, 88.0, 318.0, 159.0, 23.0, 96.0, 108.0]
2025-09-16 15:20:32,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 26 seconds)
2025-09-16 15:22:28,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:22:29,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 544.57520 ± 312.804
2025-09-16 15:22:29,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [467.94717, 325.8492, 839.25775, 408.59924, 593.8435, 102.486336, 764.6145, 578.75604, 1198.9612, 165.4368]
2025-09-16 15:22:29,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 61.0, 170.0, 75.0, 109.0, 20.0, 149.0, 104.0, 235.0, 32.0]
2025-09-16 15:22:29,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 26 seconds)
2025-09-16 15:24:24,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:24:25,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 659.86145 ± 223.861
2025-09-16 15:24:25,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [441.5205, 706.78296, 688.45654, 472.53717, 957.1502, 532.4469, 358.8496, 1031.4331, 902.2547, 507.18253]
2025-09-16 15:24:25,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 131.0, 140.0, 99.0, 168.0, 93.0, 80.0, 200.0, 168.0, 91.0]
2025-09-16 15:24:25,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 23 seconds)
2025-09-16 15:26:22,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:26:24,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 568.38904 ± 322.142
2025-09-16 15:26:24,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [819.54584, 136.32867, 123.637985, 333.65372, 759.0939, 1037.8303, 496.88733, 919.83527, 799.23193, 257.84546]
2025-09-16 15:26:24,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 26.0, 24.0, 63.0, 147.0, 189.0, 89.0, 182.0, 153.0, 49.0]
2025-09-16 15:26:24,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 26 seconds)
2025-09-16 15:28:20,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:28:22,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 622.26697 ± 267.909
2025-09-16 15:28:22,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [506.51328, 547.39105, 598.94214, 797.69586, 112.43488, 832.2443, 530.53296, 1207.4736, 528.91425, 560.5278]
2025-09-16 15:28:22,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 99.0, 128.0, 149.0, 22.0, 174.0, 92.0, 226.0, 102.0, 99.0]
2025-09-16 15:28:22,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 32 seconds)
2025-09-16 15:30:18,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:30:19,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 573.53992 ± 431.964
2025-09-16 15:30:19,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [124.83402, 107.4318, 536.53925, 156.28813, 742.1063, 550.06805, 834.106, 1644.1058, 627.5454, 412.37424]
2025-09-16 15:30:19,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 21.0, 96.0, 30.0, 137.0, 116.0, 150.0, 305.0, 108.0, 76.0]
2025-09-16 15:30:19,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 27 seconds)
2025-09-16 15:32:17,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:32:20,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 813.48865 ± 490.082
2025-09-16 15:32:20,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [837.50275, 1186.214, 664.11, 95.77301, 1524.0372, 383.32855, 1732.1722, 641.74976, 478.58484, 591.41364]
2025-09-16 15:32:20,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 220.0, 117.0, 19.0, 289.0, 67.0, 327.0, 138.0, 89.0, 111.0]
2025-09-16 15:32:20,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (813.49) for latency 15
2025-09-16 15:32:20,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 37 seconds)
2025-09-16 15:34:16,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:34:18,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 526.65320 ± 282.870
2025-09-16 15:34:18,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [748.30505, 194.78828, 1028.7688, 656.2172, 133.1301, 724.7772, 135.16577, 514.08014, 462.19095, 669.10846]
2025-09-16 15:34:18,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 37.0, 187.0, 141.0, 26.0, 135.0, 26.0, 107.0, 87.0, 121.0]
2025-09-16 15:34:18,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 42 seconds)
2025-09-16 15:36:12,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:36:14,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 641.59869 ± 436.540
2025-09-16 15:36:14,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1027.2252, 186.29536, 494.12183, 409.51605, 615.80035, 106.5254, 919.7087, 464.43317, 519.52655, 1672.8346]
2025-09-16 15:36:14,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 35.0, 91.0, 76.0, 112.0, 21.0, 180.0, 87.0, 95.0, 320.0]
2025-09-16 15:36:14,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 39 seconds)
2025-09-16 15:38:10,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:38:12,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 500.45575 ± 307.538
2025-09-16 15:38:12,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [145.48439, 524.0402, 474.2368, 674.3845, 686.71906, 340.63593, 428.04, 95.542404, 1240.5402, 394.93408]
2025-09-16 15:38:12,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 115.0, 88.0, 145.0, 129.0, 62.0, 78.0, 19.0, 220.0, 71.0]
2025-09-16 15:38:12,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 41 seconds)
2025-09-16 15:40:08,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:40:09,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 465.57260 ± 433.288
2025-09-16 15:40:09,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [584.839, 113.44189, 577.37134, 119.228355, 108.82593, 296.226, 800.05884, 107.32006, 385.51193, 1562.9025]
2025-09-16 15:40:09,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 22.0, 103.0, 23.0, 21.0, 57.0, 162.0, 21.0, 70.0, 282.0]
2025-09-16 15:40:09,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 44 seconds)
2025-09-16 15:42:06,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:42:08,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 757.54248 ± 405.975
2025-09-16 15:42:08,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [985.59045, 1465.445, 1392.57, 576.6267, 516.9386, 919.7797, 671.0474, 384.5159, 151.04817, 511.86285]
2025-09-16 15:42:08,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 279.0, 261.0, 102.0, 91.0, 165.0, 127.0, 74.0, 29.0, 90.0]
2025-09-16 15:42:08,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 44 seconds)
2025-09-16 15:44:04,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:44:06,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 829.05176 ± 573.201
2025-09-16 15:44:06,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [431.23102, 506.59253, 1043.687, 732.9414, 674.6329, 112.69074, 803.22485, 2384.2695, 871.4851, 729.7629]
2025-09-16 15:44:06,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 94.0, 203.0, 126.0, 120.0, 22.0, 153.0, 469.0, 165.0, 132.0]
2025-09-16 15:44:06,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (829.05) for latency 15
2025-09-16 15:44:06,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 46 seconds)
2025-09-16 15:46:03,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:46:05,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 486.21402 ± 256.905
2025-09-16 15:46:05,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [823.8599, 889.1538, 323.07388, 391.36798, 664.5209, 119.61982, 449.79272, 636.7518, 89.06413, 474.93573]
2025-09-16 15:46:05,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 159.0, 60.0, 84.0, 118.0, 23.0, 84.0, 119.0, 18.0, 100.0]
2025-09-16 15:46:05,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 51 seconds)
2025-09-16 15:48:00,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:48:03,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1001.00830 ± 383.284
2025-09-16 15:48:03,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1418.7704, 1253.3092, 638.524, 658.7408, 1120.8452, 504.16263, 455.8145, 1065.1377, 1344.2794, 1550.4989]
2025-09-16 15:48:03,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 231.0, 124.0, 119.0, 199.0, 89.0, 85.0, 213.0, 247.0, 294.0]
2025-09-16 15:48:03,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1001.01) for latency 15
2025-09-16 15:48:03,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 52 seconds)
2025-09-16 15:50:00,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:50:01,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 627.06689 ± 355.156
2025-09-16 15:50:01,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [903.4232, 125.19416, 697.11554, 1413.6913, 329.6598, 955.04285, 470.49374, 489.42944, 478.04382, 408.57477]
2025-09-16 15:50:01,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 24.0, 129.0, 266.0, 60.0, 187.0, 84.0, 88.0, 86.0, 71.0]
2025-09-16 15:50:01,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 55 seconds)
2025-09-16 15:51:54,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:51:56,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 680.89044 ± 263.233
2025-09-16 15:51:56,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [669.3013, 417.64813, 833.0119, 786.0493, 157.06494, 734.0622, 800.29333, 1212.305, 632.6593, 566.50854]
2025-09-16 15:51:56,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 77.0, 168.0, 149.0, 30.0, 133.0, 138.0, 228.0, 114.0, 104.0]
2025-09-16 15:51:56,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 55 seconds)
2025-09-16 15:53:49,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:53:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 582.11194 ± 337.517
2025-09-16 15:53:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [306.46014, 1114.9686, 783.247, 762.8736, 438.07947, 1029.4327, 153.90025, 113.87686, 761.1584, 357.12256]
2025-09-16 15:53:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 215.0, 137.0, 138.0, 79.0, 192.0, 30.0, 22.0, 140.0, 65.0]
2025-09-16 15:53:51,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 56 seconds)
2025-09-16 15:55:48,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:55:49,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 531.60571 ± 200.819
2025-09-16 15:55:49,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [141.50433, 615.0454, 503.4344, 652.63104, 386.44098, 436.67102, 415.54376, 947.17255, 592.55914, 625.05457]
2025-09-16 15:55:49,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 112.0, 89.0, 132.0, 69.0, 97.0, 72.0, 176.0, 103.0, 119.0]
2025-09-16 15:55:49,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1251 [DEBUG]: Training session finished
