2025-09-16 13:36:56,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.050-delay_18
2025-09-16 13:36:56,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.050-delay_18
2025-09-16 13:36:56,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x14b161fc08d0>}
2025-09-16 13:36:56,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 13:36:56,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 13:36:56,094 baseline-bpql-noisepromille50-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=682, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 13:36:56,094 baseline-bpql-noisepromille50-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 13:36:59,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 13:36:59,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 13:38:46,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:38:47,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 436.16461 ± 115.857
2025-09-16 13:38:47,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [309.2284, 333.0509, 653.87665, 423.7168, 406.62997, 442.2628, 286.24747, 391.2236, 507.80118, 607.60803]
2025-09-16 13:38:47,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 63.0, 131.0, 80.0, 77.0, 83.0, 57.0, 76.0, 97.0, 118.0]
2025-09-16 13:38:47,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (436.16) for latency 18
2025-09-16 13:38:47,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 59 minutes, 30 seconds)
2025-09-16 13:40:43,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:40:44,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 362.00412 ± 58.908
2025-09-16 13:40:44,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [312.52768, 308.564, 447.69193, 401.6451, 355.0516, 340.66074, 479.64926, 299.63998, 359.9993, 314.61182]
2025-09-16 13:40:44,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 66.0, 89.0, 78.0, 80.0, 66.0, 94.0, 64.0, 70.0, 68.0]
2025-09-16 13:40:44,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 4 minutes, 7 seconds)
2025-09-16 13:42:40,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:42:42,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 464.20752 ± 73.931
2025-09-16 13:42:42,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [358.95828, 603.9291, 436.54803, 371.17325, 465.8459, 449.2109, 430.56253, 438.57864, 542.2269, 545.0415]
2025-09-16 13:42:42,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 113.0, 82.0, 70.0, 85.0, 86.0, 79.0, 83.0, 101.0, 114.0]
2025-09-16 13:42:42,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (464.21) for latency 18
2025-09-16 13:42:42,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 4 minutes, 57 seconds)
2025-09-16 13:44:39,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:44:40,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 337.53656 ± 81.421
2025-09-16 13:44:40,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [357.56607, 377.27667, 372.86206, 373.8292, 372.7961, 425.0861, 267.0503, 431.97134, 210.11621, 186.8116]
2025-09-16 13:44:40,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 82.0, 79.0, 71.0, 74.0, 87.0, 57.0, 91.0, 41.0, 36.0]
2025-09-16 13:44:40,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 4 minutes, 26 seconds)
2025-09-16 13:46:37,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:46:38,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 372.65353 ± 121.365
2025-09-16 13:46:38,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [385.42688, 353.72574, 449.456, 413.99426, 185.45117, 135.51897, 350.78745, 462.9621, 425.65375, 563.559]
2025-09-16 13:46:38,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 67.0, 87.0, 80.0, 36.0, 26.0, 68.0, 97.0, 81.0, 103.0]
2025-09-16 13:46:38,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 3 minutes, 29 seconds)
2025-09-16 13:48:34,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:48:35,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 405.96265 ± 71.864
2025-09-16 13:48:35,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [321.83908, 422.1631, 429.31894, 560.18524, 415.23047, 338.07504, 415.0819, 449.36148, 417.23853, 291.13248]
2025-09-16 13:48:35,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 80.0, 81.0, 105.0, 88.0, 64.0, 79.0, 87.0, 76.0, 55.0]
2025-09-16 13:48:35,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 4 minutes, 12 seconds)
2025-09-16 13:50:33,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:50:34,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 399.99286 ± 92.729
2025-09-16 13:50:34,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [377.68054, 408.04678, 360.71738, 200.41818, 510.1834, 375.77908, 574.4526, 421.6135, 385.3756, 385.66132]
2025-09-16 13:50:34,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 76.0, 77.0, 39.0, 96.0, 70.0, 107.0, 80.0, 77.0, 75.0]
2025-09-16 13:50:34,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 2 minutes, 54 seconds)
2025-09-16 13:52:30,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:52:32,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 446.77499 ± 82.529
2025-09-16 13:52:32,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [482.3746, 421.13852, 358.44254, 337.21994, 628.6366, 393.94733, 514.7613, 504.05145, 427.91205, 399.26538]
2025-09-16 13:52:32,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 84.0, 67.0, 71.0, 121.0, 82.0, 96.0, 101.0, 87.0, 74.0]
2025-09-16 13:52:32,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 52 seconds)
2025-09-16 13:54:28,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:54:29,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 401.40533 ± 76.287
2025-09-16 13:54:29,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [375.0158, 469.0537, 408.60373, 493.30716, 203.43382, 446.64807, 369.14883, 426.49847, 385.4159, 436.92783]
2025-09-16 13:54:29,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 87.0, 77.0, 92.0, 39.0, 84.0, 71.0, 80.0, 72.0, 81.0]
2025-09-16 13:54:29,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 58 minutes, 51 seconds)
2025-09-16 13:56:26,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:56:28,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 438.34039 ± 43.546
2025-09-16 13:56:28,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [484.01547, 438.87732, 417.34137, 432.7925, 495.2155, 473.481, 390.33606, 351.47443, 419.80704, 480.06293]
2025-09-16 13:56:28,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 85.0, 77.0, 82.0, 105.0, 89.0, 71.0, 66.0, 83.0, 89.0]
2025-09-16 13:56:28,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 55 seconds)
2025-09-16 13:58:25,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:58:27,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 463.49799 ± 119.289
2025-09-16 13:58:27,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [186.70087, 567.14154, 442.81885, 445.6004, 514.4984, 618.11346, 606.0362, 426.1118, 390.90057, 437.05792]
2025-09-16 13:58:27,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 106.0, 83.0, 83.0, 95.0, 124.0, 114.0, 80.0, 75.0, 81.0]
2025-09-16 13:58:27,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 55 minutes, 28 seconds)
2025-09-16 14:00:23,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:00:24,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 429.84302 ± 170.696
2025-09-16 14:00:24,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [430.54245, 174.81688, 458.69727, 474.69138, 344.72473, 657.5763, 425.8852, 727.6799, 155.4073, 448.40857]
2025-09-16 14:00:24,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 34.0, 92.0, 93.0, 67.0, 141.0, 88.0, 149.0, 30.0, 90.0]
2025-09-16 14:00:24,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 53 minutes, 8 seconds)
2025-09-16 14:02:22,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:02:23,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 494.33633 ± 158.877
2025-09-16 14:02:23,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [171.34602, 549.23956, 826.09705, 529.0844, 422.09497, 391.2752, 569.56146, 408.7704, 513.5944, 562.2997]
2025-09-16 14:02:23,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 115.0, 162.0, 98.0, 79.0, 76.0, 109.0, 76.0, 96.0, 110.0]
2025-09-16 14:02:23,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (494.34) for latency 18
2025-09-16 14:02:23,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 39 seconds)
2025-09-16 14:04:20,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:04:21,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 412.81061 ± 131.519
2025-09-16 14:04:21,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [150.9689, 345.86874, 647.5114, 360.35135, 466.17343, 341.25134, 341.91483, 475.3391, 570.8032, 427.92358]
2025-09-16 14:04:21,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 73.0, 125.0, 68.0, 94.0, 71.0, 64.0, 101.0, 105.0, 86.0]
2025-09-16 14:04:21,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 49 minutes, 41 seconds)
2025-09-16 14:06:18,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:06:19,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 416.13144 ± 147.356
2025-09-16 14:06:19,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [181.95758, 509.7133, 640.6007, 399.20792, 135.0954, 382.26215, 427.38803, 469.35687, 469.45972, 546.2729]
2025-09-16 14:06:19,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 100.0, 121.0, 74.0, 26.0, 72.0, 79.0, 87.0, 89.0, 102.0]
2025-09-16 14:06:19,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 47 minutes, 38 seconds)
2025-09-16 14:08:17,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:08:19,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 490.38736 ± 50.966
2025-09-16 14:08:19,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [502.3996, 504.69205, 502.25208, 482.53918, 448.3957, 563.8848, 584.7296, 458.44592, 410.65543, 445.8795]
2025-09-16 14:08:19,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 104.0, 95.0, 90.0, 95.0, 108.0, 111.0, 86.0, 78.0, 86.0]
2025-09-16 14:08:19,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 45 minutes, 49 seconds)
2025-09-16 14:10:16,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:10:17,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 460.42618 ± 102.115
2025-09-16 14:10:17,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [460.67545, 440.02213, 423.3176, 732.2619, 435.36188, 299.30133, 453.25967, 421.2696, 480.6258, 458.1658]
2025-09-16 14:10:17,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 82.0, 79.0, 147.0, 83.0, 58.0, 86.0, 89.0, 89.0, 87.0]
2025-09-16 14:10:17,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 3 seconds)
2025-09-16 14:12:15,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:12:16,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 466.34894 ± 160.267
2025-09-16 14:12:16,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [618.436, 569.8146, 540.69275, 501.2231, 634.4577, 446.24844, 470.5564, 171.04047, 550.20886, 160.811]
2025-09-16 14:12:16,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 108.0, 111.0, 91.0, 136.0, 96.0, 91.0, 33.0, 114.0, 31.0]
2025-09-16 14:12:16,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 42 minutes, 1 second)
2025-09-16 14:14:13,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:14:14,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 436.70956 ± 114.563
2025-09-16 14:14:14,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [161.83603, 506.82034, 394.09393, 577.33215, 489.62305, 397.61362, 348.57333, 530.7611, 527.41956, 433.02255]
2025-09-16 14:14:14,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 100.0, 76.0, 108.0, 91.0, 74.0, 66.0, 100.0, 99.0, 94.0]
2025-09-16 14:14:14,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes, 3 seconds)
2025-09-16 14:16:12,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:16:14,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 524.99640 ± 87.592
2025-09-16 14:16:14,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [582.7138, 519.6569, 414.6424, 492.96643, 463.07626, 590.4011, 491.9166, 447.4612, 510.62338, 736.50586]
2025-09-16 14:16:14,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 98.0, 79.0, 93.0, 90.0, 112.0, 98.0, 84.0, 95.0, 139.0]
2025-09-16 14:16:14,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (525.00) for latency 18
2025-09-16 14:16:14,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 38 minutes, 32 seconds)
2025-09-16 14:18:11,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:18:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 531.01794 ± 153.503
2025-09-16 14:18:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [748.57965, 409.51294, 291.39673, 845.8034, 546.6426, 472.45377, 574.38727, 503.66623, 471.3955, 446.34186]
2025-09-16 14:18:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 86.0, 56.0, 165.0, 108.0, 90.0, 113.0, 94.0, 90.0, 81.0]
2025-09-16 14:18:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (531.02) for latency 18
2025-09-16 14:18:13,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 36 minutes, 27 seconds)
2025-09-16 14:20:10,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:20:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 529.07617 ± 115.730
2025-09-16 14:20:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [437.78632, 436.81113, 393.47095, 581.59625, 419.90036, 442.35022, 534.3957, 631.43823, 667.57916, 745.4338]
2025-09-16 14:20:12,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 81.0, 73.0, 109.0, 80.0, 96.0, 100.0, 116.0, 126.0, 152.0]
2025-09-16 14:20:12,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 33 seconds)
2025-09-16 14:22:10,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:22:11,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 561.42102 ± 127.954
2025-09-16 14:22:11,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [524.95465, 571.5075, 511.64752, 441.65906, 488.6799, 914.2732, 495.98904, 544.27576, 486.1184, 635.106]
2025-09-16 14:22:11,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 111.0, 96.0, 91.0, 105.0, 195.0, 95.0, 103.0, 98.0, 126.0]
2025-09-16 14:22:11,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (561.42) for latency 18
2025-09-16 14:22:11,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 47 seconds)
2025-09-16 14:24:09,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:24:10,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 412.19775 ± 164.056
2025-09-16 14:24:10,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [406.5955, 391.09998, 384.01056, 119.895424, 560.3371, 598.91296, 542.3312, 406.42538, 582.4379, 129.9318]
2025-09-16 14:24:10,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 75.0, 72.0, 23.0, 110.0, 112.0, 100.0, 76.0, 123.0, 25.0]
2025-09-16 14:24:10,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 55 seconds)
2025-09-16 14:26:07,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:26:09,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 544.95477 ± 224.310
2025-09-16 14:26:09,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [703.5659, 385.75864, 555.107, 755.3797, 1011.69684, 424.53812, 150.87814, 563.9949, 417.9792, 480.64966]
2025-09-16 14:26:09,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 72.0, 103.0, 139.0, 197.0, 89.0, 29.0, 109.0, 78.0, 88.0]
2025-09-16 14:26:09,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 28 minutes, 39 seconds)
2025-09-16 14:28:06,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:28:07,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 460.89252 ± 119.386
2025-09-16 14:28:07,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [164.76395, 560.49567, 514.70374, 397.61008, 457.1693, 513.7299, 399.23898, 635.1315, 477.17825, 488.9036]
2025-09-16 14:28:07,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 121.0, 102.0, 74.0, 87.0, 96.0, 76.0, 123.0, 103.0, 107.0]
2025-09-16 14:28:07,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 33 seconds)
2025-09-16 14:30:05,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:30:06,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 537.07727 ± 135.224
2025-09-16 14:30:06,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [381.30957, 658.50775, 426.9604, 447.511, 519.7456, 447.49768, 513.12146, 599.23157, 868.87836, 508.00986]
2025-09-16 14:30:06,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 120.0, 90.0, 84.0, 112.0, 85.0, 94.0, 120.0, 168.0, 98.0]
2025-09-16 14:30:06,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 42 seconds)
2025-09-16 14:32:04,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:32:05,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 489.73691 ± 209.170
2025-09-16 14:32:05,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [503.82202, 557.75946, 144.46693, 406.98285, 440.55826, 759.53674, 165.81357, 823.2229, 482.076, 613.13043]
2025-09-16 14:32:05,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 113.0, 28.0, 80.0, 81.0, 144.0, 32.0, 162.0, 103.0, 125.0]
2025-09-16 14:32:05,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 32 seconds)
2025-09-16 14:34:03,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:34:05,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 473.11411 ± 148.156
2025-09-16 14:34:05,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [309.9022, 482.96283, 457.3362, 638.205, 509.78012, 422.3751, 543.46796, 588.96014, 642.89685, 135.25475]
2025-09-16 14:34:05,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 96.0, 96.0, 120.0, 106.0, 90.0, 102.0, 103.0, 138.0, 26.0]
2025-09-16 14:34:05,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 51 seconds)
2025-09-16 14:36:03,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:36:05,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 615.68793 ± 106.185
2025-09-16 14:36:05,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [690.72455, 497.97394, 773.08124, 700.6873, 571.36926, 594.30524, 587.4391, 465.2025, 769.2014, 506.89514]
2025-09-16 14:36:05,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 107.0, 152.0, 147.0, 105.0, 108.0, 122.0, 88.0, 152.0, 96.0]
2025-09-16 14:36:05,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (615.69) for latency 18
2025-09-16 14:36:05,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 7 seconds)
2025-09-16 14:38:01,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:38:03,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 459.05914 ± 205.805
2025-09-16 14:38:03,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [482.07672, 155.56662, 605.93585, 589.3551, 424.1108, 125.22396, 872.85583, 517.6962, 408.1806, 409.59]
2025-09-16 14:38:03,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 30.0, 126.0, 108.0, 89.0, 24.0, 182.0, 102.0, 82.0, 80.0]
2025-09-16 14:38:03,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 56 seconds)
2025-09-16 14:40:00,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:40:02,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 629.55396 ± 142.951
2025-09-16 14:40:02,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [825.47, 602.7981, 462.44397, 866.27325, 626.23193, 560.8866, 428.44357, 597.32764, 534.6446, 791.0201]
2025-09-16 14:40:02,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 119.0, 97.0, 170.0, 133.0, 105.0, 82.0, 112.0, 100.0, 146.0]
2025-09-16 14:40:02,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (629.55) for latency 18
2025-09-16 14:40:02,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 4 seconds)
2025-09-16 14:42:01,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:42:02,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 621.76215 ± 144.947
2025-09-16 14:42:02,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [541.15814, 595.6836, 787.89246, 707.7759, 878.3688, 442.10004, 526.1859, 754.9139, 559.35834, 424.1841]
2025-09-16 14:42:02,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 119.0, 149.0, 134.0, 172.0, 82.0, 114.0, 136.0, 102.0, 77.0]
2025-09-16 14:42:02,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 18 seconds)
2025-09-16 14:43:59,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:44:01,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 619.24353 ± 225.013
2025-09-16 14:44:01,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [678.551, 390.94724, 878.9779, 1076.9882, 400.42877, 526.74756, 517.58734, 425.44162, 832.3002, 464.46527]
2025-09-16 14:44:01,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 74.0, 165.0, 221.0, 77.0, 95.0, 98.0, 81.0, 156.0, 89.0]
2025-09-16 14:44:01,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 12 seconds)
2025-09-16 14:46:00,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:46:02,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 570.58875 ± 166.244
2025-09-16 14:46:02,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [570.4123, 956.52374, 392.93588, 470.61917, 555.3383, 407.34253, 544.279, 648.9251, 740.98785, 418.52383]
2025-09-16 14:46:02,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 189.0, 87.0, 101.0, 102.0, 86.0, 101.0, 134.0, 133.0, 91.0]
2025-09-16 14:46:02,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 9 minutes, 21 seconds)
2025-09-16 14:48:00,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:48:02,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 618.92889 ± 304.296
2025-09-16 14:48:02,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [872.96106, 842.57874, 643.45105, 381.0737, 417.129, 494.91415, 508.944, 1306.7695, 161.25731, 560.2106]
2025-09-16 14:48:02,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 164.0, 131.0, 81.0, 86.0, 90.0, 98.0, 251.0, 31.0, 102.0]
2025-09-16 14:48:02,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 7 minutes, 51 seconds)
2025-09-16 14:49:58,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:50:00,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 566.63092 ± 143.538
2025-09-16 14:50:00,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [608.1575, 651.62616, 564.0844, 679.6886, 495.26102, 679.9522, 603.2844, 585.65094, 632.3851, 166.21857]
2025-09-16 14:50:00,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 117.0, 100.0, 130.0, 94.0, 142.0, 114.0, 119.0, 133.0, 32.0]
2025-09-16 14:50:00,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 5 minutes, 33 seconds)
2025-09-16 14:51:58,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:52:00,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 642.77148 ± 124.120
2025-09-16 14:52:00,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [481.7941, 574.5934, 910.6755, 548.3666, 746.4244, 586.4289, 500.21548, 673.4736, 717.0671, 688.6764]
2025-09-16 14:52:00,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 116.0, 187.0, 114.0, 135.0, 109.0, 101.0, 122.0, 127.0, 125.0]
2025-09-16 14:52:00,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (642.77) for latency 18
2025-09-16 14:52:00,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 3 minutes, 30 seconds)
2025-09-16 14:53:57,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:53:59,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 584.94781 ± 205.578
2025-09-16 14:53:59,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [955.163, 160.28165, 558.80237, 504.85507, 537.0532, 448.67935, 593.09546, 638.01984, 857.43994, 596.0885]
2025-09-16 14:53:59,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 31.0, 118.0, 93.0, 99.0, 96.0, 118.0, 137.0, 163.0, 125.0]
2025-09-16 14:53:59,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute, 28 seconds)
2025-09-16 14:55:58,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:56:00,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 649.85577 ± 272.755
2025-09-16 14:56:00,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [715.6787, 412.424, 144.93456, 1250.5399, 723.9148, 589.3615, 853.7671, 653.4074, 632.9509, 521.5779]
2025-09-16 14:56:00,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 81.0, 28.0, 255.0, 129.0, 108.0, 158.0, 141.0, 131.0, 112.0]
2025-09-16 14:56:00,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (649.86) for latency 18
2025-09-16 14:56:00,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 34 seconds)
2025-09-16 14:57:58,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:58:00,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 693.25989 ± 197.791
2025-09-16 14:58:00,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [554.122, 715.84283, 665.29565, 431.25394, 634.98285, 816.80676, 1206.4329, 559.7312, 648.1684, 699.96295]
2025-09-16 14:58:00,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 139.0, 127.0, 78.0, 113.0, 150.0, 232.0, 106.0, 121.0, 137.0]
2025-09-16 14:58:00,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (693.26) for latency 18
2025-09-16 14:58:00,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 35 seconds)
2025-09-16 14:59:58,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:00:00,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 625.75330 ± 181.251
2025-09-16 15:00:00,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1003.94556, 372.08957, 491.55957, 480.9961, 484.78192, 638.5517, 564.2054, 795.7357, 797.3815, 628.28564]
2025-09-16 15:00:00,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 71.0, 103.0, 85.0, 87.0, 116.0, 103.0, 151.0, 161.0, 132.0]
2025-09-16 15:00:00,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 53 seconds)
2025-09-16 15:01:58,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:01:59,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 648.52167 ± 140.100
2025-09-16 15:01:59,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [798.784, 542.9895, 462.5054, 649.1817, 849.1846, 588.5704, 484.35858, 851.4804, 541.88684, 716.2757]
2025-09-16 15:01:59,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 104.0, 93.0, 123.0, 151.0, 107.0, 86.0, 157.0, 105.0, 133.0]
2025-09-16 15:01:59,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 55 seconds)
2025-09-16 15:03:57,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:03:59,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 690.76587 ± 138.101
2025-09-16 15:03:59,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [758.0579, 822.9677, 672.118, 868.1584, 413.70132, 771.1368, 719.6221, 661.04694, 750.15533, 470.69446]
2025-09-16 15:03:59,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 156.0, 130.0, 154.0, 75.0, 146.0, 134.0, 134.0, 139.0, 85.0]
2025-09-16 15:03:59,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 58 seconds)
2025-09-16 15:05:58,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:06:00,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 647.12036 ± 321.738
2025-09-16 15:06:00,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1471.8286, 754.74927, 579.9025, 860.167, 601.85785, 259.0615, 515.2424, 359.03143, 447.48816, 621.8747]
2025-09-16 15:06:00,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [273.0, 156.0, 111.0, 160.0, 126.0, 49.0, 109.0, 68.0, 79.0, 121.0]
2025-09-16 15:06:00,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 58 seconds)
2025-09-16 15:07:57,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:07:59,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 690.42700 ± 53.093
2025-09-16 15:07:59,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [654.1728, 720.35345, 699.04376, 641.94714, 632.4294, 747.14526, 591.6275, 733.41473, 745.986, 738.1505]
2025-09-16 15:07:59,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 136.0, 122.0, 134.0, 115.0, 141.0, 108.0, 141.0, 131.0, 135.0]
2025-09-16 15:07:59,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 50 seconds)
2025-09-16 15:09:57,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:09:59,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 700.82904 ± 189.698
2025-09-16 15:09:59,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [514.65955, 955.40796, 622.9557, 676.3694, 407.79462, 580.62805, 601.66095, 711.9874, 979.4203, 957.4061]
2025-09-16 15:09:59,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 196.0, 127.0, 139.0, 76.0, 126.0, 116.0, 128.0, 181.0, 170.0]
2025-09-16 15:09:59,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (700.83) for latency 18
2025-09-16 15:09:59,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 50 seconds)
2025-09-16 15:11:57,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:11:59,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 688.98169 ± 256.250
2025-09-16 15:11:59,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1016.9417, 864.6391, 145.03542, 902.8327, 432.12372, 475.8041, 772.84296, 861.6645, 591.60864, 826.32385]
2025-09-16 15:11:59,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 163.0, 28.0, 181.0, 83.0, 94.0, 139.0, 168.0, 114.0, 169.0]
2025-09-16 15:11:59,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 52 seconds)
2025-09-16 15:13:57,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:13:59,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 711.06091 ± 211.930
2025-09-16 15:13:59,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [552.5977, 670.35016, 929.92535, 530.9041, 801.19995, 497.37976, 1219.644, 560.4016, 628.56824, 719.6381]
2025-09-16 15:13:59,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 135.0, 168.0, 112.0, 165.0, 89.0, 228.0, 114.0, 135.0, 139.0]
2025-09-16 15:13:59,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (711.06) for latency 18
2025-09-16 15:13:59,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 42 minutes, 2 seconds)
2025-09-16 15:15:58,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:16:01,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 754.23730 ± 163.316
2025-09-16 15:16:01,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [575.45557, 687.81384, 748.9311, 682.69684, 501.09647, 1041.954, 718.5456, 747.42053, 812.98785, 1025.4713]
2025-09-16 15:16:01,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 128.0, 140.0, 121.0, 107.0, 188.0, 149.0, 134.0, 144.0, 213.0]
2025-09-16 15:16:01,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (754.24) for latency 18
2025-09-16 15:16:01,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 40 minutes, 9 seconds)
2025-09-16 15:17:58,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:18:00,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 771.27496 ± 168.872
2025-09-16 15:18:00,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [523.3841, 629.51874, 555.0112, 992.7938, 802.92816, 919.1762, 835.0358, 677.4437, 1037.8182, 739.63904]
2025-09-16 15:18:00,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 114.0, 119.0, 183.0, 148.0, 178.0, 161.0, 124.0, 175.0, 137.0]
2025-09-16 15:18:00,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (771.27) for latency 18
2025-09-16 15:18:00,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 38 minutes, 14 seconds)
2025-09-16 15:20:00,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:20:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 883.14221 ± 497.718
2025-09-16 15:20:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [819.66077, 574.6779, 2166.59, 327.8945, 680.7619, 726.9983, 1240.3796, 723.2572, 1089.7129, 481.4887]
2025-09-16 15:20:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 100.0, 411.0, 64.0, 134.0, 147.0, 241.0, 134.0, 196.0, 104.0]
2025-09-16 15:20:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (883.14) for latency 18
2025-09-16 15:20:02,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes, 33 seconds)
2025-09-16 15:21:59,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:22:01,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 624.95032 ± 191.668
2025-09-16 15:22:01,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [670.1042, 644.7008, 571.7132, 471.53494, 713.1746, 727.0568, 657.77545, 150.59277, 913.7877, 729.06305]
2025-09-16 15:22:01,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 127.0, 123.0, 85.0, 129.0, 130.0, 134.0, 29.0, 178.0, 153.0]
2025-09-16 15:22:01,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 20 seconds)
2025-09-16 15:24:00,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:24:02,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 726.69592 ± 319.792
2025-09-16 15:24:02,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [496.93143, 849.71796, 479.66318, 512.3768, 408.48276, 839.99335, 1214.0605, 1336.71, 753.5055, 375.51788]
2025-09-16 15:24:02,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 154.0, 90.0, 92.0, 82.0, 153.0, 219.0, 249.0, 141.0, 81.0]
2025-09-16 15:24:02,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 25 seconds)
2025-09-16 15:26:00,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:26:02,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 675.35333 ± 235.302
2025-09-16 15:26:02,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [594.4219, 154.65535, 840.3072, 856.00714, 854.5645, 431.85867, 675.06256, 538.3414, 878.9758, 929.3391]
2025-09-16 15:26:02,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 30.0, 160.0, 162.0, 163.0, 91.0, 123.0, 110.0, 174.0, 178.0]
2025-09-16 15:26:02,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 14 seconds)
2025-09-16 15:28:01,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:28:03,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 744.66547 ± 284.423
2025-09-16 15:28:03,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [823.8045, 481.96124, 508.43634, 644.4555, 1034.4874, 513.6349, 717.6374, 779.8763, 509.72498, 1432.6359]
2025-09-16 15:28:03,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 105.0, 91.0, 138.0, 190.0, 105.0, 130.0, 155.0, 107.0, 285.0]
2025-09-16 15:28:03,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 24 seconds)
2025-09-16 15:30:02,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:30:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 879.13721 ± 360.472
2025-09-16 15:30:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1481.0111, 841.50946, 754.115, 917.5006, 785.3876, 639.95233, 423.32736, 753.5594, 1614.0933, 580.91565]
2025-09-16 15:30:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [291.0, 166.0, 161.0, 174.0, 159.0, 120.0, 77.0, 155.0, 300.0, 103.0]
2025-09-16 15:30:05,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 20 seconds)
2025-09-16 15:32:02,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:32:04,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 781.25470 ± 224.378
2025-09-16 15:32:04,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [713.92664, 629.1416, 909.3178, 716.6479, 548.18976, 1138.0471, 1132.5913, 717.57886, 891.7542, 415.35138]
2025-09-16 15:32:04,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 111.0, 172.0, 147.0, 119.0, 221.0, 200.0, 146.0, 162.0, 76.0]
2025-09-16 15:32:04,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 26 seconds)
2025-09-16 15:34:04,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:34:06,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 899.82336 ± 576.415
2025-09-16 15:34:06,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2252.1123, 689.197, 665.50183, 563.92847, 673.4583, 305.768, 670.83, 1752.3441, 831.3833, 593.70953]
2025-09-16 15:34:06,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [422.0, 122.0, 134.0, 118.0, 119.0, 58.0, 142.0, 321.0, 169.0, 102.0]
2025-09-16 15:34:06,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (899.82) for latency 18
2025-09-16 15:34:06,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 38 seconds)
2025-09-16 15:36:04,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:36:07,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 986.81671 ± 458.630
2025-09-16 15:36:07,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [242.05914, 1194.2664, 1102.7019, 698.22876, 1563.7594, 1699.427, 591.14514, 755.9949, 587.569, 1433.0143]
2025-09-16 15:36:07,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 228.0, 200.0, 124.0, 319.0, 315.0, 106.0, 130.0, 111.0, 258.0]
2025-09-16 15:36:07,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (986.82) for latency 18
2025-09-16 15:36:07,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 40 seconds)
2025-09-16 15:38:07,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:38:10,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 899.32648 ± 387.135
2025-09-16 15:38:10,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1888.4896, 460.60693, 776.2275, 1025.8337, 672.9717, 907.31805, 811.38983, 538.71136, 1180.7952, 730.92004]
2025-09-16 15:38:10,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [344.0, 86.0, 143.0, 192.0, 112.0, 168.0, 140.0, 118.0, 227.0, 132.0]
2025-09-16 15:38:10,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 49 seconds)
2025-09-16 15:40:06,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:40:09,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 812.24280 ± 249.860
2025-09-16 15:40:09,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [939.4181, 724.8353, 1335.6393, 1189.3367, 674.6362, 709.9632, 595.79694, 540.4096, 629.6737, 782.7193]
2025-09-16 15:40:09,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 144.0, 238.0, 230.0, 146.0, 126.0, 130.0, 94.0, 136.0, 156.0]
2025-09-16 15:40:09,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 30 seconds)
2025-09-16 15:42:09,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:42:12,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1137.86597 ± 482.177
2025-09-16 15:42:12,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [875.33124, 815.98175, 544.9363, 2029.8818, 1645.3809, 981.587, 878.0997, 547.8496, 1516.9952, 1542.6168]
2025-09-16 15:42:12,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 167.0, 111.0, 389.0, 290.0, 199.0, 166.0, 115.0, 276.0, 305.0]
2025-09-16 15:42:12,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1137.87) for latency 18
2025-09-16 15:42:12,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 15 minutes)
2025-09-16 15:44:11,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:44:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 667.32800 ± 343.922
2025-09-16 15:44:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [681.9062, 398.74026, 1529.8541, 474.25113, 447.55777, 726.24915, 746.81696, 762.15295, 161.1733, 744.57806]
2025-09-16 15:44:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 87.0, 279.0, 105.0, 95.0, 144.0, 147.0, 142.0, 31.0, 135.0]
2025-09-16 15:44:13,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 46 seconds)
2025-09-16 15:46:10,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:46:13,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1003.14807 ± 267.028
2025-09-16 15:46:13,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [802.764, 1108.9546, 831.62335, 1311.4458, 684.2368, 1295.3925, 914.59216, 555.64557, 1206.2437, 1320.5815]
2025-09-16 15:46:13,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 209.0, 150.0, 270.0, 124.0, 256.0, 171.0, 113.0, 211.0, 249.0]
2025-09-16 15:46:13,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 39 seconds)
2025-09-16 15:48:12,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:48:15,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 866.44073 ± 446.103
2025-09-16 15:48:15,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1229.2228, 794.42786, 798.61725, 565.0158, 609.7299, 145.23407, 1168.3007, 347.29642, 1506.4178, 1500.1448]
2025-09-16 15:48:15,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 144.0, 148.0, 100.0, 113.0, 28.0, 218.0, 66.0, 279.0, 284.0]
2025-09-16 15:48:15,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 34 seconds)
2025-09-16 15:50:15,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:50:17,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 671.27026 ± 281.164
2025-09-16 15:50:17,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [671.4671, 403.74472, 971.2176, 1065.115, 789.51624, 784.93585, 457.09857, 483.65192, 951.0144, 134.94087]
2025-09-16 15:50:17,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 76.0, 200.0, 196.0, 160.0, 143.0, 84.0, 103.0, 202.0, 26.0]
2025-09-16 15:50:17,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 58 seconds)
2025-09-16 15:52:14,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:52:16,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 720.76575 ± 154.589
2025-09-16 15:52:16,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [730.5151, 662.10486, 894.48975, 1004.19025, 680.9754, 674.55566, 388.15997, 691.40796, 807.4751, 673.78394]
2025-09-16 15:52:16,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 119.0, 169.0, 209.0, 131.0, 140.0, 72.0, 144.0, 143.0, 143.0]
2025-09-16 15:52:16,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 26 seconds)
2025-09-16 15:54:14,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:54:16,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 933.73145 ± 382.612
2025-09-16 15:54:16,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1295.1581, 1599.1439, 943.2886, 572.1602, 464.92142, 1454.9213, 959.03296, 616.1108, 526.30994, 906.26733]
2025-09-16 15:54:16,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [237.0, 296.0, 187.0, 117.0, 101.0, 291.0, 178.0, 109.0, 112.0, 189.0]
2025-09-16 15:54:16,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 22 seconds)
2025-09-16 15:56:16,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:56:19,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1141.27942 ± 456.040
2025-09-16 15:56:19,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1170.4501, 1064.775, 2126.8887, 1745.2957, 1061.0376, 1208.1434, 891.16614, 611.58655, 540.7369, 992.7142]
2025-09-16 15:56:19,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 194.0, 392.0, 319.0, 215.0, 221.0, 160.0, 111.0, 114.0, 171.0]
2025-09-16 15:56:19,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1141.28) for latency 18
2025-09-16 15:56:19,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 36 seconds)
2025-09-16 15:58:17,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:58:20,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1034.72595 ± 358.704
2025-09-16 15:58:20,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1177.5105, 868.57623, 708.1556, 607.72107, 945.3645, 820.49817, 1193.0221, 866.00714, 1931.782, 1228.622]
2025-09-16 15:58:20,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 175.0, 147.0, 130.0, 167.0, 145.0, 216.0, 166.0, 357.0, 233.0]
2025-09-16 15:58:20,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 31 seconds)
2025-09-16 16:00:20,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:00:23,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 807.10028 ± 298.260
2025-09-16 16:00:23,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [466.76013, 1035.7335, 519.78314, 532.5866, 637.4096, 1250.978, 550.94257, 1031.1884, 768.9813, 1276.6398]
2025-09-16 16:00:23,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 204.0, 90.0, 113.0, 116.0, 224.0, 98.0, 193.0, 154.0, 246.0]
2025-09-16 16:00:23,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 29 seconds)
2025-09-16 16:02:22,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:02:25,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1205.37695 ± 731.798
2025-09-16 16:02:25,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1313.117, 458.63513, 663.5652, 2440.7295, 691.86505, 1815.3422, 2341.818, 471.8178, 511.52124, 1345.3574]
2025-09-16 16:02:25,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [253.0, 98.0, 134.0, 446.0, 120.0, 350.0, 448.0, 99.0, 107.0, 259.0]
2025-09-16 16:02:25,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1205.38) for latency 18
2025-09-16 16:02:25,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 47 seconds)
2025-09-16 16:04:23,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:04:25,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1038.66174 ± 383.574
2025-09-16 16:04:25,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1747.7998, 670.9566, 1641.7135, 1249.0941, 647.9619, 892.4825, 916.1019, 874.7703, 1150.6989, 595.03705]
2025-09-16 16:04:25,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [309.0, 123.0, 305.0, 229.0, 121.0, 167.0, 163.0, 167.0, 211.0, 124.0]
2025-09-16 16:04:25,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 46 seconds)
2025-09-16 16:06:27,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:06:30,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1094.64575 ± 471.919
2025-09-16 16:06:30,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [794.7238, 954.5944, 862.17883, 1344.4125, 1003.27045, 1217.1952, 447.24258, 1089.7834, 904.5462, 2328.51]
2025-09-16 16:06:30,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 172.0, 171.0, 274.0, 180.0, 219.0, 83.0, 207.0, 168.0, 436.0]
2025-09-16 16:06:30,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 54 seconds)
2025-09-16 16:08:28,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:08:31,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1084.13367 ± 542.768
2025-09-16 16:08:31,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [624.8356, 1071.5934, 2503.5054, 1512.1359, 531.66425, 817.0867, 726.29254, 911.1731, 1038.072, 1104.9778]
2025-09-16 16:08:31,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 187.0, 472.0, 295.0, 112.0, 155.0, 129.0, 190.0, 205.0, 194.0]
2025-09-16 16:08:31,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 51 seconds)
2025-09-16 16:10:30,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:10:33,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1159.30298 ± 608.714
2025-09-16 16:10:33,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [654.1931, 1916.4346, 939.4142, 570.97925, 825.6997, 1080.2115, 1336.2058, 1299.4581, 2510.29, 460.1448]
2025-09-16 16:10:33,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 362.0, 166.0, 102.0, 151.0, 197.0, 277.0, 229.0, 478.0, 101.0]
2025-09-16 16:10:33,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 46 seconds)
2025-09-16 16:12:31,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:12:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1174.56799 ± 589.796
2025-09-16 16:12:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [891.2784, 605.9011, 2259.6846, 1582.0961, 763.6254, 454.81628, 1293.9702, 729.2715, 2082.0647, 1082.9714]
2025-09-16 16:12:35,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 124.0, 409.0, 293.0, 128.0, 84.0, 250.0, 130.0, 395.0, 211.0]
2025-09-16 16:12:35,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 42 seconds)
2025-09-16 16:14:33,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:14:37,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1234.42944 ± 709.974
2025-09-16 16:14:37,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [581.1268, 655.64246, 2888.6882, 488.7832, 1569.471, 913.1589, 589.00433, 1435.4517, 1732.8967, 1490.071]
2025-09-16 16:14:37,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 133.0, 540.0, 107.0, 304.0, 166.0, 120.0, 274.0, 331.0, 272.0]
2025-09-16 16:14:37,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1234.43) for latency 18
2025-09-16 16:14:37,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 47 seconds)
2025-09-16 16:16:37,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:16:39,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 947.58429 ± 377.844
2025-09-16 16:16:39,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [286.76105, 913.9727, 1385.9152, 927.8598, 1711.8474, 888.4809, 1013.00256, 721.718, 1056.3363, 569.94965]
2025-09-16 16:16:39,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 185.0, 242.0, 173.0, 313.0, 163.0, 187.0, 128.0, 178.0, 126.0]
2025-09-16 16:16:39,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 37 seconds)
2025-09-16 16:18:42,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:18:44,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 947.14172 ± 345.607
2025-09-16 16:18:44,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1603.2655, 1371.134, 850.1863, 644.3993, 1098.9242, 700.9148, 1097.5693, 630.99976, 1044.0706, 429.9532]
2025-09-16 16:18:44,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [304.0, 275.0, 178.0, 132.0, 188.0, 121.0, 217.0, 131.0, 181.0, 81.0]
2025-09-16 16:18:44,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 50 seconds)
2025-09-16 16:20:40,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:20:43,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1183.58533 ± 432.533
2025-09-16 16:20:43,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1375.248, 894.6543, 1766.8506, 2005.2819, 639.16693, 1143.5587, 545.77203, 1018.47424, 1226.0109, 1220.836]
2025-09-16 16:20:43,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [245.0, 176.0, 337.0, 387.0, 136.0, 226.0, 96.0, 177.0, 243.0, 211.0]
2025-09-16 16:20:43,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 37 seconds)
2025-09-16 16:22:44,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:22:50,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1930.60681 ± 1103.560
2025-09-16 16:22:50,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3881.5293, 565.90094, 1942.1754, 1992.1428, 1577.9846, 2349.324, 934.2351, 1481.9142, 3832.072, 748.7913]
2025-09-16 16:22:50,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [752.0, 98.0, 384.0, 393.0, 307.0, 426.0, 183.0, 286.0, 740.0, 161.0]
2025-09-16 16:22:50,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1930.61) for latency 18
2025-09-16 16:22:50,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 50 seconds)
2025-09-16 16:24:54,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:24:59,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1654.80920 ± 1266.216
2025-09-16 16:24:59,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2132.4329, 1206.7008, 1034.0632, 479.39825, 577.00977, 595.97894, 3366.1638, 823.58606, 4463.5024, 1869.2563]
2025-09-16 16:24:59,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [398.0, 216.0, 213.0, 88.0, 101.0, 125.0, 623.0, 171.0, 811.0, 346.0]
2025-09-16 16:24:59,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 10 seconds)
2025-09-16 16:26:54,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:26:58,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1467.76196 ± 1369.984
2025-09-16 16:26:58,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1646.4249, 1643.9413, 599.6596, 2381.731, 440.28964, 418.76303, 658.92596, 5151.697, 956.86804, 779.3199]
2025-09-16 16:26:58,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [321.0, 324.0, 129.0, 445.0, 91.0, 79.0, 122.0, 970.0, 198.0, 145.0]
2025-09-16 16:26:58,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 56 seconds)
2025-09-16 16:29:08,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:29:13,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1577.60620 ± 1369.668
2025-09-16 16:29:13,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4378.069, 1466.7063, 464.21097, 566.5216, 612.64166, 1081.2156, 1394.8021, 4103.433, 800.6554, 907.80707]
2025-09-16 16:29:13,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [856.0, 267.0, 85.0, 103.0, 112.0, 203.0, 247.0, 797.0, 147.0, 171.0]
2025-09-16 16:29:13,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 29 minutes, 19 seconds)
2025-09-16 16:31:03,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:31:08,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1862.27441 ± 1102.314
2025-09-16 16:31:08,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2326.8645, 855.9087, 886.8943, 4249.179, 2412.72, 3029.883, 384.2384, 1445.0927, 1582.7046, 1449.257]
2025-09-16 16:31:08,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [397.0, 151.0, 161.0, 831.0, 432.0, 536.0, 70.0, 258.0, 296.0, 251.0]
2025-09-16 16:31:08,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 27 minutes, 4 seconds)
2025-09-16 16:33:09,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:33:14,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1539.82007 ± 1305.855
2025-09-16 16:33:14,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [821.3682, 2157.4624, 1089.1367, 754.7906, 1067.5248, 2164.4548, 492.91055, 724.529, 5102.7866, 1023.2369]
2025-09-16 16:33:14,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 383.0, 200.0, 158.0, 203.0, 398.0, 93.0, 128.0, 1000.0, 175.0]
2025-09-16 16:33:14,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 58 seconds)
2025-09-16 16:35:21,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:35:26,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1655.55737 ± 1087.330
2025-09-16 16:35:26,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3943.0378, 3154.5613, 1005.5795, 1029.744, 2185.3284, 776.3345, 701.75195, 2101.2354, 847.9697, 810.03107]
2025-09-16 16:35:26,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [734.0, 597.0, 187.0, 199.0, 415.0, 141.0, 122.0, 374.0, 178.0, 142.0]
2025-09-16 16:35:26,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 59 seconds)
2025-09-16 16:37:16,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:37:22,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1881.77380 ± 1424.180
2025-09-16 16:37:22,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1595.8524, 1709.4325, 3294.2788, 1331.4121, 858.1157, 1377.1298, 675.60126, 1075.5728, 5634.9224, 1265.4209]
2025-09-16 16:37:22,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [272.0, 337.0, 602.0, 231.0, 175.0, 270.0, 126.0, 188.0, 1000.0, 255.0]
2025-09-16 16:37:22,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 46 seconds)
2025-09-16 16:39:27,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:39:36,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2925.60156 ± 1894.087
2025-09-16 16:39:36,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5343.0737, 791.15924, 1968.2119, 5483.9326, 917.0571, 4330.3896, 2557.0105, 5304.2583, 1912.7106, 648.2128]
2025-09-16 16:39:36,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 138.0, 356.0, 1000.0, 185.0, 812.0, 495.0, 1000.0, 391.0, 132.0]
2025-09-16 16:39:36,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2925.60) for latency 18
2025-09-16 16:39:36,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 42 seconds)
2025-09-16 16:41:36,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:41:45,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2746.28833 ± 1832.016
2025-09-16 16:41:45,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3705.5208, 2382.5994, 5018.255, 2849.7163, 4832.287, 1108.8776, 5493.542, 688.89014, 893.6014, 489.59366]
2025-09-16 16:41:45,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [696.0, 447.0, 911.0, 506.0, 937.0, 207.0, 1000.0, 147.0, 155.0, 85.0]
2025-09-16 16:41:45,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 58 seconds)
2025-09-16 16:43:47,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:43:54,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1983.84119 ± 1769.201
2025-09-16 16:43:54,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [493.38766, 5271.239, 1403.9977, 907.65027, 822.4498, 5299.047, 1618.488, 624.55096, 2769.9734, 627.6317]
2025-09-16 16:43:54,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 1000.0, 243.0, 164.0, 150.0, 1000.0, 298.0, 122.0, 534.0, 130.0]
2025-09-16 16:43:54,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 55 seconds)
2025-09-16 16:45:46,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:45:55,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2990.12646 ± 2030.610
2025-09-16 16:45:55,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1314.4059, 5522.107, 958.6555, 665.1633, 4377.255, 540.32776, 5221.759, 5403.0664, 4318.759, 1579.7631]
2025-09-16 16:45:55,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [239.0, 1000.0, 178.0, 137.0, 799.0, 117.0, 975.0, 999.0, 801.0, 301.0]
2025-09-16 16:45:55,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2990.13) for latency 18
2025-09-16 16:45:55,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 34 seconds)
2025-09-16 16:47:56,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:48:02,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2113.27930 ± 1824.064
2025-09-16 16:48:02,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5376.9414, 1562.8733, 614.8136, 901.4434, 1127.06, 883.26196, 998.18854, 4766.2427, 4401.4224, 500.54773]
2025-09-16 16:48:02,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 304.0, 114.0, 186.0, 223.0, 161.0, 172.0, 935.0, 824.0, 88.0]
2025-09-16 16:48:02,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 40 seconds)
2025-09-16 16:50:05,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:50:13,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2573.07422 ± 1960.774
2025-09-16 16:50:13,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [517.9005, 2096.3481, 5483.9097, 5383.6865, 783.7429, 5498.1294, 1546.353, 965.73444, 2358.8857, 1096.0538]
2025-09-16 16:50:13,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 365.0, 1000.0, 1000.0, 152.0, 1000.0, 274.0, 176.0, 447.0, 193.0]
2025-09-16 16:50:13,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 29 seconds)
2025-09-16 16:52:14,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:52:26,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3873.11060 ± 2007.650
2025-09-16 16:52:26,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5245.3696, 5640.4194, 3252.7866, 5456.7812, 5300.229, 879.62274, 437.0175, 5329.623, 5508.5737, 1680.6835]
2025-09-16 16:52:26,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 581.0, 1000.0, 1000.0, 148.0, 80.0, 1000.0, 1000.0, 331.0]
2025-09-16 16:52:26,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (3873.11) for latency 18
2025-09-16 16:52:26,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 24 seconds)
2025-09-16 16:54:23,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:54:34,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3702.64600 ± 2154.446
2025-09-16 16:54:34,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5397.281, 5494.8706, 3298.8613, 5698.607, 133.98146, 5484.854, 845.68005, 5453.075, 785.0263, 4434.2256]
2025-09-16 16:54:34,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 575.0, 1000.0, 26.0, 1000.0, 155.0, 1000.0, 139.0, 800.0]
2025-09-16 16:54:34,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 16 seconds)
2025-09-16 16:56:34,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:56:41,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2156.49365 ± 1283.303
2025-09-16 16:56:41,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1151.2059, 3465.0867, 2419.5095, 975.9961, 5285.6123, 1049.6394, 1198.8513, 1587.3546, 2120.0686, 2311.6118]
2025-09-16 16:56:41,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 675.0, 485.0, 180.0, 1000.0, 194.0, 246.0, 322.0, 431.0, 449.0]
2025-09-16 16:56:41,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 9 seconds)
2025-09-16 16:58:48,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:58:59,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3864.20435 ± 1700.851
2025-09-16 16:58:59,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5587.623, 5454.0586, 5343.7275, 692.8422, 4747.916, 1334.6415, 5397.675, 4037.154, 3346.436, 2699.968]
2025-09-16 16:58:59,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 121.0, 860.0, 235.0, 1000.0, 744.0, 595.0, 525.0]
2025-09-16 16:58:59,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1251 [DEBUG]: Training session finished
