2025-09-16 11:57:00,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.200-delay_6
2025-09-16 11:57:00,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.200-delay_6
2025-09-16 11:57:00,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x15354009c790>}
2025-09-16 11:57:00,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:57:00,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:57:00,706 baseline-bpql-noisepromille200-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=478, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:57:00,706 baseline-bpql-noisepromille200-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:57:02,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:57:02,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:58:46,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:58:47,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 337.05368 ± 28.469
2025-09-16 11:58:47,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [305.19482, 363.9374, 336.2764, 324.35977, 389.20178, 302.55524, 322.3923, 371.88098, 345.81357, 308.92474]
2025-09-16 11:58:47,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 67.0, 61.0, 59.0, 81.0, 55.0, 59.0, 66.0, 63.0, 56.0]
2025-09-16 11:58:47,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (337.05) for latency 6
2025-09-16 11:58:47,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 53 minutes, 34 seconds)
2025-09-16 12:00:41,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:00:42,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 412.58807 ± 69.606
2025-09-16 12:00:42,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [491.15817, 394.8226, 389.5439, 401.77905, 448.6157, 402.00568, 378.0615, 532.2699, 257.41986, 430.20468]
2025-09-16 12:00:42,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 75.0, 76.0, 84.0, 85.0, 75.0, 73.0, 100.0, 50.0, 83.0]
2025-09-16 12:00:42,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (412.59) for latency 6
2025-09-16 12:00:42,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 59 minutes, 57 seconds)
2025-09-16 12:02:35,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:02:36,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 445.41803 ± 94.103
2025-09-16 12:02:36,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [438.02356, 517.54236, 450.44565, 582.9905, 315.1198, 329.86673, 313.96838, 568.95404, 454.52054, 482.7485]
2025-09-16 12:02:36,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 107.0, 86.0, 112.0, 67.0, 71.0, 66.0, 108.0, 85.0, 99.0]
2025-09-16 12:02:36,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (445.42) for latency 6
2025-09-16 12:02:36,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 19 seconds)
2025-09-16 12:04:31,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:04:32,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 361.03574 ± 58.980
2025-09-16 12:04:32,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [415.22168, 287.77158, 290.57028, 312.964, 464.8662, 323.9376, 404.4529, 374.77176, 417.3057, 318.4957]
2025-09-16 12:04:32,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 55.0, 56.0, 68.0, 87.0, 62.0, 89.0, 72.0, 79.0, 60.0]
2025-09-16 12:04:32,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 5 seconds)
2025-09-16 12:06:25,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:06:26,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 395.41733 ± 69.585
2025-09-16 12:06:26,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [405.00385, 434.28412, 439.75934, 376.55173, 457.6767, 528.46295, 326.86893, 334.53592, 374.1973, 276.83264]
2025-09-16 12:06:26,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 83.0, 85.0, 69.0, 86.0, 101.0, 64.0, 71.0, 74.0, 55.0]
2025-09-16 12:06:26,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 58 minutes, 47 seconds)
2025-09-16 12:08:21,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:08:22,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 384.08768 ± 80.045
2025-09-16 12:08:22,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [237.34418, 383.95453, 539.553, 397.6042, 339.42917, 340.7236, 359.1978, 461.29587, 452.68607, 329.08835]
2025-09-16 12:08:22,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 70.0, 104.0, 74.0, 69.0, 70.0, 71.0, 89.0, 91.0, 64.0]
2025-09-16 12:08:22,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 seconds)
2025-09-16 12:10:16,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:10:17,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 407.94202 ± 118.353
2025-09-16 12:10:17,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [349.35892, 448.3482, 464.1195, 567.7662, 333.3386, 658.9881, 355.991, 284.4647, 310.2297, 306.81528]
2025-09-16 12:10:17,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 85.0, 102.0, 118.0, 62.0, 132.0, 79.0, 54.0, 59.0, 64.0]
2025-09-16 12:10:17,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 58 minutes, 9 seconds)
2025-09-16 12:12:11,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:12:12,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 473.46420 ± 153.679
2025-09-16 12:12:12,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [512.66376, 802.63135, 559.14764, 209.68515, 434.72305, 320.68417, 419.9858, 459.93124, 409.7458, 605.44415]
2025-09-16 12:12:12,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 165.0, 109.0, 41.0, 96.0, 61.0, 91.0, 99.0, 79.0, 115.0]
2025-09-16 12:12:12,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (473.46) for latency 6
2025-09-16 12:12:12,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 56 minutes, 32 seconds)
2025-09-16 12:14:07,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:14:08,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 418.87689 ± 73.943
2025-09-16 12:14:08,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [425.46515, 377.4194, 498.9332, 424.13895, 362.98697, 489.54163, 291.0464, 525.7287, 327.39017, 466.11884]
2025-09-16 12:14:08,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 70.0, 107.0, 78.0, 70.0, 95.0, 57.0, 116.0, 70.0, 99.0]
2025-09-16 12:14:08,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 54 minutes, 45 seconds)
2025-09-16 12:16:02,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:16:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 448.49957 ± 94.002
2025-09-16 12:16:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [562.20013, 372.36722, 426.6034, 455.1583, 426.3399, 498.9981, 309.68222, 580.4113, 307.35526, 545.87946]
2025-09-16 12:16:03,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 79.0, 90.0, 98.0, 80.0, 96.0, 59.0, 125.0, 61.0, 105.0]
2025-09-16 12:16:03,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 59 seconds)
2025-09-16 12:17:57,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:17:58,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 443.69711 ± 121.462
2025-09-16 12:17:58,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [319.06613, 411.81638, 584.62256, 495.51636, 258.42657, 341.4187, 667.85095, 533.3848, 371.16473, 453.70407]
2025-09-16 12:17:58,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 91.0, 115.0, 94.0, 49.0, 65.0, 124.0, 99.0, 68.0, 83.0]
2025-09-16 12:17:58,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 51 minutes, 6 seconds)
2025-09-16 12:19:52,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:19:53,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 433.15350 ± 81.133
2025-09-16 12:19:53,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [418.9776, 303.18054, 346.65106, 341.37555, 518.99854, 481.32532, 405.52582, 550.4555, 525.75916, 439.28638]
2025-09-16 12:19:53,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 68.0, 77.0, 64.0, 99.0, 99.0, 77.0, 103.0, 111.0, 99.0]
2025-09-16 12:19:53,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 49 minutes, 7 seconds)
2025-09-16 12:21:48,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:21:49,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 445.23544 ± 125.004
2025-09-16 12:21:49,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [489.7172, 343.00403, 759.1328, 439.6786, 490.414, 395.66434, 515.63654, 376.5872, 307.84875, 334.67084]
2025-09-16 12:21:49,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 66.0, 149.0, 87.0, 92.0, 77.0, 110.0, 71.0, 66.0, 62.0]
2025-09-16 12:21:49,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 47 minutes, 13 seconds)
2025-09-16 12:23:44,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:23:45,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 421.15103 ± 87.301
2025-09-16 12:23:45,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [402.62674, 314.82916, 354.63687, 565.1771, 464.58582, 354.19168, 377.0003, 366.00693, 423.87906, 588.5765]
2025-09-16 12:23:45,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 68.0, 72.0, 113.0, 99.0, 81.0, 85.0, 77.0, 94.0, 114.0]
2025-09-16 12:23:45,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 45 minutes, 29 seconds)
2025-09-16 12:25:40,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:25:41,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 515.15198 ± 114.601
2025-09-16 12:25:41,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [357.2049, 788.5561, 529.7922, 486.0835, 572.3582, 400.21378, 550.951, 404.54953, 537.3271, 524.48376]
2025-09-16 12:25:41,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 164.0, 100.0, 92.0, 122.0, 74.0, 118.0, 86.0, 105.0, 110.0]
2025-09-16 12:25:41,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (515.15) for latency 6
2025-09-16 12:25:41,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 43 minutes, 50 seconds)
2025-09-16 12:27:36,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:27:38,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 508.99820 ± 173.262
2025-09-16 12:27:38,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [398.11758, 493.02026, 443.39325, 449.12564, 528.6981, 374.14975, 445.0221, 987.7222, 601.12836, 369.60464]
2025-09-16 12:27:38,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 93.0, 84.0, 85.0, 99.0, 84.0, 92.0, 211.0, 130.0, 68.0]
2025-09-16 12:27:38,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 42 minutes, 8 seconds)
2025-09-16 12:29:33,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:29:34,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 603.09210 ± 119.206
2025-09-16 12:29:34,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [749.54254, 421.39325, 513.661, 775.5485, 487.0259, 501.10147, 628.1768, 541.70953, 718.452, 694.30975]
2025-09-16 12:29:34,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 79.0, 97.0, 149.0, 93.0, 93.0, 122.0, 120.0, 140.0, 130.0]
2025-09-16 12:29:34,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (603.09) for latency 6
2025-09-16 12:29:34,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 40 minutes, 38 seconds)
2025-09-16 12:31:29,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:31:31,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 535.71228 ± 185.669
2025-09-16 12:31:31,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [324.31662, 452.65363, 566.6009, 747.68005, 384.96817, 256.40634, 846.1175, 436.12964, 628.80396, 713.446]
2025-09-16 12:31:31,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 92.0, 109.0, 139.0, 82.0, 53.0, 165.0, 81.0, 133.0, 147.0]
2025-09-16 12:31:31,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 39 minutes, 6 seconds)
2025-09-16 12:33:26,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:33:27,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 511.60516 ± 147.462
2025-09-16 12:33:27,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [313.6688, 577.1948, 532.5371, 718.92004, 497.85504, 800.3098, 442.90106, 376.63828, 499.08298, 356.9436]
2025-09-16 12:33:27,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 104.0, 108.0, 133.0, 108.0, 152.0, 93.0, 84.0, 93.0, 67.0]
2025-09-16 12:33:27,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 37 minutes, 2 seconds)
2025-09-16 12:35:22,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:35:23,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 364.87314 ± 91.785
2025-09-16 12:35:23,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [372.36996, 253.28134, 434.07907, 185.74387, 444.80722, 512.1152, 430.26794, 324.0208, 337.84763, 354.19818]
2025-09-16 12:35:23,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 49.0, 96.0, 36.0, 86.0, 97.0, 95.0, 70.0, 63.0, 67.0]
2025-09-16 12:35:23,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 35 minutes, 1 second)
2025-09-16 12:37:18,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:37:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 518.06317 ± 127.955
2025-09-16 12:37:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [389.78635, 463.47974, 585.64355, 512.7877, 533.90155, 340.03522, 747.4303, 446.8365, 725.8051, 434.92593]
2025-09-16 12:37:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 84.0, 111.0, 94.0, 116.0, 61.0, 143.0, 81.0, 133.0, 83.0]
2025-09-16 12:37:19,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 33 minutes, 14 seconds)
2025-09-16 12:39:13,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:39:15,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 494.52515 ± 158.929
2025-09-16 12:39:15,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [631.7112, 442.37683, 339.28787, 382.18832, 896.3923, 507.41275, 530.9823, 439.75848, 439.80743, 335.3335]
2025-09-16 12:39:15,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 98.0, 74.0, 83.0, 175.0, 111.0, 113.0, 94.0, 84.0, 69.0]
2025-09-16 12:39:15,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 55 seconds)
2025-09-16 12:41:09,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:41:10,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 539.66302 ± 103.436
2025-09-16 12:41:10,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [623.43585, 423.0618, 777.1554, 501.35934, 490.8999, 579.2034, 499.09036, 574.54297, 535.47516, 392.40628]
2025-09-16 12:41:10,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 79.0, 150.0, 96.0, 92.0, 112.0, 108.0, 125.0, 109.0, 86.0]
2025-09-16 12:41:10,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 45 seconds)
2025-09-16 12:43:06,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:43:07,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 537.33557 ± 111.713
2025-09-16 12:43:07,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [359.54172, 759.85596, 492.5629, 406.3705, 594.85175, 429.91568, 571.74066, 557.56274, 607.64923, 593.3047]
2025-09-16 12:43:07,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 146.0, 91.0, 83.0, 114.0, 78.0, 107.0, 126.0, 115.0, 130.0]
2025-09-16 12:43:07,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 27 minutes, 2 seconds)
2025-09-16 12:45:02,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:45:04,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 599.16125 ± 166.567
2025-09-16 12:45:04,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [446.6052, 509.35352, 607.8071, 582.43365, 636.82794, 553.12854, 409.83453, 973.59827, 457.89404, 814.1298]
2025-09-16 12:45:04,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 96.0, 112.0, 113.0, 116.0, 122.0, 91.0, 185.0, 87.0, 156.0]
2025-09-16 12:45:04,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 25 minutes, 17 seconds)
2025-09-16 12:46:59,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:47:00,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 521.91101 ± 193.432
2025-09-16 12:47:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [557.43396, 938.455, 777.5578, 375.196, 280.53558, 444.74875, 591.8688, 501.77164, 363.94427, 387.59796]
2025-09-16 12:47:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 179.0, 146.0, 83.0, 55.0, 87.0, 127.0, 108.0, 69.0, 73.0]
2025-09-16 12:47:00,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 23 minutes, 10 seconds)
2025-09-16 12:48:57,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:48:59,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 688.55487 ± 184.348
2025-09-16 12:48:59,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [612.09656, 815.0331, 530.078, 976.0262, 867.9573, 935.5642, 533.8838, 605.79895, 600.8966, 408.2144]
2025-09-16 12:48:59,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 174.0, 100.0, 194.0, 166.0, 175.0, 101.0, 114.0, 129.0, 83.0]
2025-09-16 12:48:59,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (688.55) for latency 6
2025-09-16 12:48:59,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 22 minutes, 9 seconds)
2025-09-16 12:50:52,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:50:53,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 458.81439 ± 99.829
2025-09-16 12:50:53,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [363.59775, 467.92947, 459.80466, 438.38138, 538.2619, 553.5374, 596.9967, 228.88655, 434.47537, 506.27286]
2025-09-16 12:50:53,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 88.0, 86.0, 83.0, 100.0, 107.0, 119.0, 44.0, 81.0, 98.0]
2025-09-16 12:50:53,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 53 seconds)
2025-09-16 12:52:49,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:52:51,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 753.33704 ± 196.290
2025-09-16 12:52:51,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [674.0267, 926.306, 757.58527, 527.2089, 974.3052, 490.55432, 1149.8562, 701.31616, 616.5968, 715.61536]
2025-09-16 12:52:51,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 191.0, 148.0, 112.0, 206.0, 98.0, 231.0, 133.0, 135.0, 140.0]
2025-09-16 12:52:51,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (753.34) for latency 6
2025-09-16 12:52:52,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 18 minutes, 12 seconds)
2025-09-16 12:54:46,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:54:48,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 710.82709 ± 182.712
2025-09-16 12:54:48,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [615.0498, 662.663, 766.35516, 374.03537, 817.1051, 1027.5217, 658.3494, 666.8737, 965.9358, 554.38116]
2025-09-16 12:54:48,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 127.0, 143.0, 80.0, 169.0, 195.0, 124.0, 142.0, 189.0, 122.0]
2025-09-16 12:54:48,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 16 minutes, 17 seconds)
2025-09-16 12:56:43,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:56:45,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 700.38586 ± 159.914
2025-09-16 12:56:45,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [503.49634, 830.82666, 901.7796, 952.1871, 550.2703, 606.66846, 587.0854, 752.71844, 810.62836, 508.19833]
2025-09-16 12:56:45,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 154.0, 182.0, 188.0, 113.0, 124.0, 115.0, 147.0, 165.0, 94.0]
2025-09-16 12:56:45,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 14 minutes, 36 seconds)
2025-09-16 12:58:41,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:58:43,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 812.35162 ± 331.995
2025-09-16 12:58:43,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [615.2342, 722.62054, 709.81885, 696.85034, 350.84573, 1012.81226, 896.5759, 1680.4479, 729.6106, 708.6997]
2025-09-16 12:58:43,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 140.0, 147.0, 132.0, 67.0, 196.0, 163.0, 337.0, 130.0, 158.0]
2025-09-16 12:58:43,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (812.35) for latency 6
2025-09-16 12:58:43,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 27 seconds)
2025-09-16 13:00:39,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:00:40,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 602.46405 ± 200.223
2025-09-16 13:00:40,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [525.9721, 735.308, 147.96007, 582.58777, 658.63873, 902.40485, 471.9557, 763.77856, 747.95166, 488.08334]
2025-09-16 13:00:40,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 147.0, 29.0, 117.0, 131.0, 179.0, 92.0, 145.0, 138.0, 99.0]
2025-09-16 13:00:40,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 11 minutes, 6 seconds)
2025-09-16 13:02:35,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:02:37,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 689.42786 ± 224.254
2025-09-16 13:02:37,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [538.9574, 1250.0245, 597.3951, 805.6665, 695.94666, 634.48083, 856.87805, 589.4052, 454.18546, 471.3387]
2025-09-16 13:02:37,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 232.0, 113.0, 149.0, 142.0, 121.0, 180.0, 121.0, 87.0, 89.0]
2025-09-16 13:02:37,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 45 seconds)
2025-09-16 13:04:34,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:04:36,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 728.67902 ± 468.833
2025-09-16 13:04:36,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [570.29626, 418.6849, 479.83624, 1981.0647, 512.2663, 585.07214, 521.25214, 438.64883, 573.6646, 1206.0045]
2025-09-16 13:04:36,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 82.0, 88.0, 401.0, 99.0, 118.0, 106.0, 83.0, 109.0, 237.0]
2025-09-16 13:04:36,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 7 minutes, 19 seconds)
2025-09-16 13:06:31,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:06:33,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 743.38757 ± 249.781
2025-09-16 13:06:33,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [564.049, 1150.018, 757.72864, 464.39923, 756.19403, 684.4974, 706.9654, 609.8084, 487.0697, 1253.1462]
2025-09-16 13:06:33,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 225.0, 156.0, 91.0, 145.0, 133.0, 137.0, 119.0, 103.0, 251.0]
2025-09-16 13:06:33,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 22 seconds)
2025-09-16 13:08:27,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:08:29,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 725.13971 ± 201.401
2025-09-16 13:08:29,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [858.388, 873.61053, 722.0981, 582.527, 521.48224, 878.1222, 923.8297, 843.3132, 253.59608, 794.4297]
2025-09-16 13:08:29,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 167.0, 139.0, 108.0, 111.0, 161.0, 172.0, 165.0, 48.0, 159.0]
2025-09-16 13:08:29,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 6 seconds)
2025-09-16 13:10:26,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:10:28,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 754.33020 ± 275.663
2025-09-16 13:10:28,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [808.5785, 810.90076, 666.1842, 944.46893, 688.9762, 695.22974, 868.3803, 122.719185, 654.2699, 1283.5942]
2025-09-16 13:10:28,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 160.0, 138.0, 187.0, 144.0, 132.0, 167.0, 24.0, 131.0, 240.0]
2025-09-16 13:10:28,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 30 seconds)
2025-09-16 13:12:24,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:12:26,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 952.80389 ± 287.930
2025-09-16 13:12:26,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1185.69, 901.19464, 858.51624, 737.2543, 749.9057, 1149.8419, 1510.4717, 1230.0746, 644.7296, 560.3605]
2025-09-16 13:12:26,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [238.0, 170.0, 169.0, 144.0, 142.0, 225.0, 298.0, 234.0, 124.0, 104.0]
2025-09-16 13:12:26,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (952.80) for latency 6
2025-09-16 13:12:26,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 50 seconds)
2025-09-16 13:14:23,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:14:25,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 707.87622 ± 349.185
2025-09-16 13:14:25,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [729.2334, 239.3877, 626.8831, 827.86755, 1057.952, 229.80998, 1319.6298, 1049.6184, 656.0002, 342.38028]
2025-09-16 13:14:25,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 48.0, 125.0, 164.0, 200.0, 45.0, 247.0, 198.0, 125.0, 65.0]
2025-09-16 13:14:25,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 50 seconds)
2025-09-16 13:16:20,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:16:22,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 907.13574 ± 277.133
2025-09-16 13:16:22,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1045.6523, 653.545, 1042.7971, 939.1144, 1490.8275, 798.1395, 461.27536, 898.8842, 634.9833, 1106.1383]
2025-09-16 13:16:22,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 123.0, 197.0, 181.0, 292.0, 152.0, 87.0, 174.0, 124.0, 225.0]
2025-09-16 13:16:22,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 53 seconds)
2025-09-16 13:18:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:18:21,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 974.51477 ± 377.809
2025-09-16 13:18:21,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [775.75195, 1297.6084, 1942.9762, 682.8275, 712.77606, 1040.9934, 859.0788, 909.6536, 562.3687, 961.11255]
2025-09-16 13:18:21,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 254.0, 390.0, 130.0, 137.0, 195.0, 178.0, 176.0, 119.0, 185.0]
2025-09-16 13:18:21,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (974.51) for latency 6
2025-09-16 13:18:21,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 54 minutes, 20 seconds)
2025-09-16 13:20:17,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:20:19,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 959.40625 ± 304.425
2025-09-16 13:20:19,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [718.42395, 568.02997, 1048.3427, 559.63257, 1574.7302, 1024.4749, 1019.2, 716.34186, 1219.6327, 1145.2544]
2025-09-16 13:20:19,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 108.0, 204.0, 109.0, 311.0, 196.0, 204.0, 129.0, 263.0, 242.0]
2025-09-16 13:20:19,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 52 minutes, 17 seconds)
2025-09-16 13:22:18,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:22:21,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1115.59302 ± 374.642
2025-09-16 13:22:21,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [904.8795, 1226.3978, 646.42645, 706.5377, 1457.7125, 1552.7893, 1334.3516, 860.4939, 1738.4832, 727.8577]
2025-09-16 13:22:21,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 245.0, 120.0, 139.0, 303.0, 312.0, 269.0, 189.0, 338.0, 157.0]
2025-09-16 13:22:21,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1115.59) for latency 6
2025-09-16 13:22:21,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 50 minutes, 58 seconds)
2025-09-16 13:24:17,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:24:21,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1298.60840 ± 635.974
2025-09-16 13:24:21,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [508.32895, 1474.5045, 1353.2544, 2673.3342, 1171.1299, 830.89435, 1051.3418, 783.8861, 2197.4832, 941.92633]
2025-09-16 13:24:21,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 285.0, 268.0, 508.0, 219.0, 159.0, 203.0, 148.0, 434.0, 181.0]
2025-09-16 13:24:21,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1298.61) for latency 6
2025-09-16 13:24:21,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 14 seconds)
2025-09-16 13:26:13,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:26:16,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1220.30884 ± 446.417
2025-09-16 13:26:16,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [875.04645, 1531.6167, 540.7063, 677.8583, 1899.1661, 1035.6692, 1518.02, 1702.2393, 891.49023, 1531.2751]
2025-09-16 13:26:16,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 291.0, 103.0, 136.0, 368.0, 212.0, 285.0, 324.0, 169.0, 308.0]
2025-09-16 13:26:16,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 46 minutes, 55 seconds)
2025-09-16 13:28:15,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:28:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 999.30713 ± 267.208
2025-09-16 13:28:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [740.48004, 933.09204, 1091.795, 841.863, 1013.4698, 954.2748, 696.31415, 793.10095, 1606.1418, 1322.5398]
2025-09-16 13:28:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 197.0, 239.0, 163.0, 216.0, 178.0, 129.0, 172.0, 310.0, 255.0]
2025-09-16 13:28:18,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 25 seconds)
2025-09-16 13:30:13,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:30:15,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1048.46753 ± 301.477
2025-09-16 13:30:15,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1182.2233, 1110.6505, 852.15436, 1062.0553, 1195.6199, 741.55396, 1757.8597, 1031.5259, 575.489, 975.5428]
2025-09-16 13:30:15,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 242.0, 161.0, 201.0, 224.0, 143.0, 340.0, 186.0, 114.0, 202.0]
2025-09-16 13:30:15,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 19 seconds)
2025-09-16 13:32:12,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:32:15,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1159.18359 ± 431.520
2025-09-16 13:32:15,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1734.5249, 913.88806, 702.1128, 765.18756, 1256.6765, 1375.5569, 954.56793, 2067.0876, 736.1774, 1086.0565]
2025-09-16 13:32:15,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [332.0, 168.0, 144.0, 146.0, 266.0, 267.0, 183.0, 404.0, 140.0, 220.0]
2025-09-16 13:32:15,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 2 seconds)
2025-09-16 13:34:11,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:34:14,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1030.28430 ± 295.261
2025-09-16 13:34:14,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1255.6342, 783.1863, 717.20044, 941.9137, 1201.3807, 1326.4805, 708.1468, 780.42865, 953.96277, 1634.5089]
2025-09-16 13:34:14,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [245.0, 153.0, 136.0, 180.0, 257.0, 256.0, 145.0, 150.0, 177.0, 305.0]
2025-09-16 13:34:14,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 38 minutes, 52 seconds)
2025-09-16 13:36:09,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:36:12,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1022.86749 ± 270.072
2025-09-16 13:36:12,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [765.5664, 1566.8005, 1141.7814, 652.8872, 782.06573, 1091.0076, 1180.9879, 1156.8816, 1167.4813, 723.2152]
2025-09-16 13:36:12,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 303.0, 240.0, 147.0, 147.0, 218.0, 240.0, 219.0, 220.0, 155.0]
2025-09-16 13:36:12,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 17 seconds)
2025-09-16 13:38:09,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:38:11,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 938.51965 ± 389.309
2025-09-16 13:38:11,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [863.43085, 801.0258, 772.165, 400.1435, 747.1066, 1166.5509, 1277.2412, 966.7883, 548.28705, 1842.4569]
2025-09-16 13:38:11,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 158.0, 150.0, 77.0, 153.0, 218.0, 257.0, 185.0, 108.0, 369.0]
2025-09-16 13:38:11,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 1 second)
2025-09-16 13:40:09,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:40:13,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1666.71057 ± 1028.969
2025-09-16 13:40:13,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2185.0115, 1440.3954, 1392.7738, 751.9615, 1144.005, 859.4344, 2850.8066, 1426.17, 4061.5613, 554.98517]
2025-09-16 13:40:13,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [447.0, 278.0, 257.0, 168.0, 251.0, 155.0, 591.0, 272.0, 786.0, 106.0]
2025-09-16 13:40:13,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1666.71) for latency 6
2025-09-16 13:40:13,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 38 seconds)
2025-09-16 13:42:08,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:42:12,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1657.16968 ± 722.947
2025-09-16 13:42:12,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1765.8159, 749.18915, 2965.0942, 2208.3247, 1382.9773, 663.7991, 1675.1375, 1230.0518, 2651.5115, 1279.7961]
2025-09-16 13:42:12,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [355.0, 138.0, 576.0, 437.0, 262.0, 126.0, 315.0, 251.0, 540.0, 235.0]
2025-09-16 13:42:12,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 31 seconds)
2025-09-16 13:44:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:44:13,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1902.24353 ± 1251.985
2025-09-16 13:44:13,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1128.4532, 4310.609, 2975.3745, 308.01617, 977.01135, 1438.5187, 700.8839, 3202.2246, 2816.0745, 1165.2695]
2025-09-16 13:44:13,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [220.0, 853.0, 581.0, 70.0, 186.0, 287.0, 151.0, 615.0, 538.0, 222.0]
2025-09-16 13:44:13,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1902.24) for latency 6
2025-09-16 13:44:13,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 55 seconds)
2025-09-16 13:46:09,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:46:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1369.07458 ± 652.862
2025-09-16 13:46:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1649.3911, 1492.9167, 1134.3473, 970.90045, 1837.3591, 1089.0359, 746.5518, 975.23285, 767.04205, 3027.9685]
2025-09-16 13:46:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [340.0, 297.0, 229.0, 199.0, 357.0, 237.0, 156.0, 188.0, 165.0, 611.0]
2025-09-16 13:46:13,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 10 seconds)
2025-09-16 13:48:09,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:48:14,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1815.23511 ± 828.921
2025-09-16 13:48:14,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2193.3555, 2583.0034, 2380.435, 896.0225, 1031.089, 1487.9017, 730.4013, 1171.3904, 2308.7708, 3369.9832]
2025-09-16 13:48:14,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [438.0, 510.0, 454.0, 196.0, 201.0, 304.0, 142.0, 234.0, 476.0, 652.0]
2025-09-16 13:48:14,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 23 seconds)
2025-09-16 13:50:15,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:50:19,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1642.10974 ± 832.935
2025-09-16 13:50:19,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1139.3229, 1417.7826, 3394.1401, 1605.6935, 2822.3413, 655.0684, 2020.1553, 1376.8534, 725.204, 1264.5365]
2025-09-16 13:50:19,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [208.0, 267.0, 665.0, 300.0, 540.0, 123.0, 389.0, 266.0, 131.0, 240.0]
2025-09-16 13:50:19,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 48 seconds)
2025-09-16 13:52:15,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:52:20,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1829.79749 ± 1328.766
2025-09-16 13:52:20,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1055.8762, 5099.919, 2979.0767, 2800.61, 735.17413, 740.551, 1236.3901, 829.93756, 1486.4083, 1334.0304]
2025-09-16 13:52:20,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [211.0, 991.0, 585.0, 571.0, 159.0, 154.0, 260.0, 160.0, 316.0, 284.0]
2025-09-16 13:52:20,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 23 minutes, 3 seconds)
2025-09-16 13:54:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:54:19,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1422.16772 ± 661.147
2025-09-16 13:54:19,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [895.1945, 1176.9353, 896.24225, 1418.6947, 1236.8099, 1298.7498, 836.66187, 3235.0098, 1566.0261, 1661.3525]
2025-09-16 13:54:19,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 257.0, 180.0, 272.0, 230.0, 235.0, 181.0, 620.0, 302.0, 319.0]
2025-09-16 13:54:19,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 47 seconds)
2025-09-16 13:56:15,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:56:25,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3246.82593 ± 1476.433
2025-09-16 13:56:25,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [4779.131, 3702.8037, 1476.0535, 2439.3972, 4743.446, 1025.8385, 1537.819, 4507.7534, 3074.7185, 5181.299]
2025-09-16 13:56:25,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 728.0, 275.0, 484.0, 1000.0, 202.0, 312.0, 943.0, 587.0, 1000.0]
2025-09-16 13:56:25,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (3246.83) for latency 6
2025-09-16 13:56:25,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 19 minutes, 29 seconds)
2025-09-16 13:58:25,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:58:30,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2010.54199 ± 1651.723
2025-09-16 13:58:30,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [4981.7812, 377.0562, 2103.0676, 1468.432, 2745.279, 646.0633, 733.90454, 5066.914, 1119.8915, 863.03284]
2025-09-16 13:58:30,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 71.0, 410.0, 287.0, 508.0, 125.0, 147.0, 1000.0, 209.0, 166.0]
2025-09-16 13:58:30,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 18 minutes, 2 seconds)
2025-09-16 14:00:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:00:35,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2077.50220 ± 1254.255
2025-09-16 14:00:35,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1044.8229, 790.6577, 3180.2607, 3409.157, 1240.8389, 986.94904, 1237.5073, 2366.1355, 4767.3345, 1751.3578]
2025-09-16 14:00:35,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 139.0, 615.0, 679.0, 246.0, 204.0, 245.0, 443.0, 898.0, 325.0]
2025-09-16 14:00:35,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 15 minutes, 58 seconds)
2025-09-16 14:02:28,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:02:36,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3202.64600 ± 1748.712
2025-09-16 14:02:36,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1910.7362, 1094.6655, 4196.713, 1464.9841, 1525.2804, 1434.9603, 4799.812, 5199.591, 5249.719, 5149.998]
2025-09-16 14:02:36,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [369.0, 205.0, 797.0, 279.0, 295.0, 334.0, 923.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:02:36,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 13 minutes, 59 seconds)
2025-09-16 14:04:34,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:04:39,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1937.98438 ± 1135.359
2025-09-16 14:04:39,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3743.1636, 1262.92, 1288.8265, 1152.2441, 2554.47, 408.0987, 4073.366, 1240.9979, 1411.507, 2244.251]
2025-09-16 14:04:39,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [721.0, 256.0, 246.0, 244.0, 481.0, 78.0, 800.0, 240.0, 288.0, 421.0]
2025-09-16 14:04:39,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 12 minutes, 19 seconds)
2025-09-16 14:06:40,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:06:47,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2762.33057 ± 1271.654
2025-09-16 14:06:47,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1311.4321, 2982.4941, 1641.8455, 2175.888, 3768.2485, 5262.694, 1910.8147, 1230.5859, 3202.4353, 4136.866]
2025-09-16 14:06:47,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 568.0, 313.0, 419.0, 732.0, 1000.0, 369.0, 226.0, 615.0, 782.0]
2025-09-16 14:06:47,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 10 minutes, 34 seconds)
2025-09-16 14:08:40,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:08:51,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3941.88721 ± 1337.212
2025-09-16 14:08:51,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3022.7654, 4680.821, 5196.6416, 1147.4022, 3993.3804, 4645.937, 4531.2207, 5152.115, 2032.9996, 5015.5903]
2025-09-16 14:08:51,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [604.0, 904.0, 1000.0, 227.0, 780.0, 947.0, 886.0, 1000.0, 393.0, 1000.0]
2025-09-16 14:08:51,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (3941.89) for latency 6
2025-09-16 14:08:51,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 8 minutes, 17 seconds)
2025-09-16 14:10:54,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:11:04,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3844.62183 ± 1368.960
2025-09-16 14:11:04,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3230.6174, 4126.197, 3596.2798, 5207.304, 3248.8877, 2954.8567, 660.46466, 5221.0503, 5227.368, 4973.1934]
2025-09-16 14:11:04,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [609.0, 785.0, 683.0, 1000.0, 639.0, 563.0, 132.0, 1000.0, 1000.0, 942.0]
2025-09-16 14:11:04,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 7 minutes, 7 seconds)
2025-09-16 14:12:54,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:13:05,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4350.61133 ± 1086.988
2025-09-16 14:13:05,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [4197.555, 3567.2932, 5339.6533, 4538.4375, 4310.6333, 1500.5736, 4569.0977, 5092.2485, 5284.262, 5106.359]
2025-09-16 14:13:05,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [768.0, 672.0, 1000.0, 838.0, 802.0, 280.0, 856.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:13:05,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (4350.61) for latency 6
2025-09-16 14:13:05,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 4 minutes, 58 seconds)
2025-09-16 14:15:09,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:15:17,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3007.78149 ± 1600.463
2025-09-16 14:15:17,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [4882.638, 2247.7156, 3963.6997, 1089.8368, 5259.788, 5019.0283, 3335.0386, 1795.996, 1272.3723, 1211.6992]
2025-09-16 14:15:17,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [938.0, 427.0, 774.0, 208.0, 1000.0, 927.0, 636.0, 344.0, 267.0, 230.0]
2025-09-16 14:15:17,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 3 minutes, 46 seconds)
2025-09-16 14:17:09,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:17:22,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4694.85840 ± 776.577
2025-09-16 14:17:22,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3252.4795, 5214.0566, 5042.425, 4492.9526, 3145.4316, 5072.8403, 5254.2705, 5167.9517, 5250.0977, 5056.0796]
2025-09-16 14:17:22,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [611.0, 1000.0, 1000.0, 829.0, 591.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:17:22,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (4694.86) for latency 6
2025-09-16 14:17:22,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 1 minute, 19 seconds)
2025-09-16 14:19:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:19:37,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3527.24878 ± 1765.245
2025-09-16 14:19:37,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1843.2427, 5153.3384, 5271.696, 1205.3075, 2808.148, 5239.849, 745.74133, 5232.506, 5089.1753, 2683.484]
2025-09-16 14:19:37,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [348.0, 1000.0, 1000.0, 233.0, 551.0, 1000.0, 141.0, 1000.0, 1000.0, 495.0]
2025-09-16 14:19:37,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 19 seconds)
2025-09-16 14:21:26,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:21:38,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4099.28809 ± 837.148
2025-09-16 14:21:38,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3724.4602, 4567.241, 2903.214, 3054.1433, 4038.1982, 3242.9358, 5132.7446, 5045.945, 3995.6855, 5288.3076]
2025-09-16 14:21:38,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [723.0, 884.0, 542.0, 588.0, 778.0, 622.0, 1000.0, 1000.0, 767.0, 1000.0]
2025-09-16 14:21:38,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes)
2025-09-16 14:23:40,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:23:50,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3909.45898 ± 1401.016
2025-09-16 14:23:50,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5263.487, 2709.5115, 5127.4565, 5259.2188, 3599.8752, 5090.893, 3924.962, 2301.3796, 1042.1116, 4775.69]
2025-09-16 14:23:50,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 528.0, 1000.0, 1000.0, 668.0, 1000.0, 728.0, 437.0, 202.0, 1000.0]
2025-09-16 14:23:50,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 55 minutes, 54 seconds)
2025-09-16 14:25:51,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:26:02,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3969.11011 ± 1440.541
2025-09-16 14:26:02,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5285.0254, 2442.2153, 4758.466, 5295.878, 1607.7803, 2600.9028, 2304.9097, 5239.4224, 4995.7085, 5160.792]
2025-09-16 14:26:02,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 480.0, 915.0, 1000.0, 292.0, 481.0, 455.0, 1000.0, 936.0, 990.0]
2025-09-16 14:26:02,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 53 minutes, 45 seconds)
2025-09-16 14:28:03,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:28:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3318.37451 ± 1950.018
2025-09-16 14:28:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5243.577, 2464.9062, 5115.732, 389.66647, 5135.0474, 603.75806, 5122.5527, 5154.398, 2643.867, 1310.2411]
2025-09-16 14:28:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 481.0, 1000.0, 72.0, 1000.0, 124.0, 1000.0, 1000.0, 537.0, 277.0]
2025-09-16 14:28:12,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 52 minutes, 2 seconds)
2025-09-16 14:30:07,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:30:18,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4116.97510 ± 1338.207
2025-09-16 14:30:18,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [4793.174, 5287.9653, 4631.0674, 2833.515, 1718.2169, 3951.9927, 2065.024, 5274.9697, 5266.024, 5347.8003]
2025-09-16 14:30:18,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [909.0, 1000.0, 876.0, 528.0, 344.0, 740.0, 387.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:30:18,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 8 seconds)
2025-09-16 14:32:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:32:19,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3478.90234 ± 1615.273
2025-09-16 14:32:19,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5336.2295, 5365.809, 5433.1016, 3099.304, 2452.4824, 5418.4097, 2357.5732, 2208.6362, 1728.6754, 1388.7999]
2025-09-16 14:32:19,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 585.0, 461.0, 1000.0, 435.0, 403.0, 315.0, 263.0]
2025-09-16 14:32:19,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 2 seconds)
2025-09-16 14:34:20,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:34:32,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4447.75879 ± 1486.566
2025-09-16 14:34:32,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3929.8713, 5333.3384, 5234.5615, 705.10675, 5312.574, 5386.9424, 5311.923, 5227.746, 2779.842, 5255.6787]
2025-09-16 14:34:32,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [736.0, 1000.0, 1000.0, 156.0, 1000.0, 1000.0, 1000.0, 1000.0, 533.0, 1000.0]
2025-09-16 14:34:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 44 minutes, 57 seconds)
2025-09-16 14:36:25,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:36:37,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4539.21826 ± 1464.158
2025-09-16 14:36:37,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5330.89, 5210.911, 4207.2705, 5236.14, 5313.3115, 5167.633, 4359.0273, 298.04517, 5277.585, 4991.3716]
2025-09-16 14:36:37,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 790.0, 1000.0, 1000.0, 1000.0, 829.0, 59.0, 1000.0, 1000.0]
2025-09-16 14:36:37,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 42 minutes, 21 seconds)
2025-09-16 14:38:34,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:38:46,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4261.42969 ± 1651.561
2025-09-16 14:38:46,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5237.207, 5308.4004, 5240.8057, 5132.3022, 1539.2643, 5297.4585, 3690.1074, 5232.855, 5249.491, 686.40753]
2025-09-16 14:38:46,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 287.0, 1000.0, 688.0, 1000.0, 1000.0, 147.0]
2025-09-16 14:38:46,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 40 minutes, 7 seconds)
2025-09-16 14:40:50,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:40:58,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2947.95435 ± 1618.056
2025-09-16 14:40:58,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1984.849, 5190.6953, 1880.8093, 5330.9487, 866.41534, 2318.7444, 2367.6672, 1217.2704, 3065.1423, 5257.0005]
2025-09-16 14:40:58,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [361.0, 1000.0, 351.0, 1000.0, 177.0, 439.0, 469.0, 241.0, 604.0, 1000.0]
2025-09-16 14:40:58,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 22 seconds)
2025-09-16 14:42:54,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:43:09,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5124.47070 ± 114.747
2025-09-16 14:43:09,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5164.485, 4963.883, 5214.1455, 5209.05, 5238.366, 5189.456, 5093.186, 5220.3022, 5069.6987, 4882.1333]
2025-09-16 14:43:09,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 944.0]
2025-09-16 14:43:09,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5124.47) for latency 6
2025-09-16 14:43:09,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 36 minutes, 48 seconds)
2025-09-16 14:45:03,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:45:13,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3758.67139 ± 1626.270
2025-09-16 14:45:13,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3932.8198, 3839.3237, 5358.452, 3434.5916, 404.38428, 3336.1323, 1440.7991, 5272.9976, 5288.0107, 5279.203]
2025-09-16 14:45:13,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [745.0, 719.0, 1000.0, 658.0, 89.0, 632.0, 270.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:45:13,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 34 minutes, 9 seconds)
2025-09-16 14:47:10,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:47:22,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4391.74951 ± 1515.355
2025-09-16 14:47:22,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1289.513, 5221.9263, 5318.47, 1611.0295, 4072.9763, 5342.0635, 5276.133, 5255.004, 5286.139, 5244.2373]
2025-09-16 14:47:22,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [237.0, 1000.0, 1000.0, 295.0, 760.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:47:22,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 32 minutes, 12 seconds)
2025-09-16 14:49:26,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:49:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3650.32568 ± 1604.965
2025-09-16 14:49:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1821.0692, 2090.0203, 5208.1646, 5279.667, 4875.844, 4933.1543, 3553.23, 542.73016, 5015.8735, 3183.503]
2025-09-16 14:49:36,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [332.0, 393.0, 1000.0, 1000.0, 1000.0, 951.0, 655.0, 106.0, 1000.0, 618.0]
2025-09-16 14:49:36,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 20 seconds)
2025-09-16 14:51:28,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:51:42,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5059.83545 ± 280.463
2025-09-16 14:51:42,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5159.103, 5218.8994, 5117.748, 5062.437, 5113.001, 5263.0845, 4235.2803, 5154.819, 5176.5176, 5097.4644]
2025-09-16 14:51:42,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 809.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:51:42,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 27 minutes, 55 seconds)
2025-09-16 14:53:40,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:53:52,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4267.51416 ± 1528.646
2025-09-16 14:53:52,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5261.04, 5147.1147, 4987.273, 5273.1333, 3548.3452, 5187.6304, 2564.6216, 476.26083, 5068.9385, 5160.786]
2025-09-16 14:53:52,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 679.0, 1000.0, 531.0, 87.0, 1000.0, 1000.0]
2025-09-16 14:53:52,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 44 seconds)
2025-09-16 14:55:59,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:56:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5220.09082 ± 100.633
2025-09-16 14:56:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5299.512, 5258.558, 5263.609, 5162.58, 4942.6143, 5300.459, 5270.165, 5251.612, 5200.8022, 5251.001]
2025-09-16 14:56:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:56:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5220.09) for latency 6
2025-09-16 14:56:13,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 12 seconds)
2025-09-16 14:58:07,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:58:20,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4718.51416 ± 1323.596
2025-09-16 14:58:20,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5239.7285, 5305.851, 769.5919, 5179.498, 5228.8535, 5243.296, 4802.237, 5014.0845, 5192.2505, 5209.7515]
2025-09-16 14:58:20,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 142.0, 1000.0, 1000.0, 1000.0, 923.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:58:20,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 57 seconds)
2025-09-16 15:00:15,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:00:29,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4743.35107 ± 1318.973
2025-09-16 15:00:29,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5229.5063, 5192.207, 787.60913, 5176.9214, 5184.2886, 5184.508, 5118.5737, 5215.856, 5207.9873, 5136.056]
2025-09-16 15:00:29,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 152.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:00:29,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 34 seconds)
2025-09-16 15:02:32,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:02:46,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5086.86865 ± 538.296
2025-09-16 15:02:46,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3480.0496, 5342.6245, 5277.625, 5235.137, 5293.6562, 5288.412, 5286.8423, 5279.5356, 5260.2646, 5124.5405]
2025-09-16 15:02:46,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [647.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:02:46,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 42 seconds)
2025-09-16 15:04:37,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:04:50,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4620.69141 ± 1273.643
2025-09-16 15:04:50,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5314.809, 5330.6113, 5214.9375, 2353.2732, 5034.9277, 5263.7017, 1825.8157, 5248.582, 5322.0464, 5298.2114]
2025-09-16 15:04:50,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 434.0, 1000.0, 1000.0, 344.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:04:50,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 20 seconds)
2025-09-16 15:06:54,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:07:08,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4701.05322 ± 1050.221
2025-09-16 15:07:08,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5129.271, 4393.641, 5141.4336, 5121.31, 5194.9624, 5071.5103, 1621.8138, 5083.0957, 5197.218, 5056.273]
2025-09-16 15:07:08,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 871.0, 1000.0, 1000.0, 1000.0, 1000.0, 314.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:07:08,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 5 seconds)
2025-09-16 15:09:01,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:09:15,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5175.51172 ± 57.763
2025-09-16 15:09:15,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5175.6377, 5201.892, 5186.084, 5189.9214, 5148.8647, 5014.0684, 5197.3774, 5216.4854, 5231.081, 5193.7017]
2025-09-16 15:09:15,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:09:15,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 55 seconds)
2025-09-16 15:11:10,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:11:24,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4902.41846 ± 1006.080
2025-09-16 15:11:24,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5264.8115, 5217.458, 5202.689, 5229.7334, 5195.427, 5203.234, 5323.84, 5226.372, 1886.3114, 5274.307]
2025-09-16 15:11:24,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 365.0, 1000.0]
2025-09-16 15:11:24,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 44 seconds)
2025-09-16 15:13:27,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:13:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4698.14453 ± 985.309
2025-09-16 15:13:41,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5164.847, 2179.429, 5094.431, 5191.8984, 5207.186, 5061.5044, 5239.7495, 5219.745, 5174.4507, 3448.2063]
2025-09-16 15:13:41,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 435.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 664.0]
2025-09-16 15:13:41,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 32 seconds)
2025-09-16 15:15:34,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:15:48,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 5323.94824 ± 58.413
2025-09-16 15:15:48,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5276.794, 5402.7153, 5420.6787, 5359.8267, 5320.426, 5230.028, 5336.646, 5340.2197, 5300.6216, 5251.5264]
2025-09-16 15:15:48,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:15:48,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (5323.95) for latency 6
2025-09-16 15:15:48,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 23 seconds)
2025-09-16 15:17:57,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:18:10,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4894.32129 ± 1101.573
2025-09-16 15:18:10,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5268.285, 5261.1357, 5319.4536, 5245.5605, 5289.449, 5133.966, 5244.8096, 5282.019, 5305.7524, 1592.7863]
2025-09-16 15:18:10,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 295.0]
2025-09-16 15:18:10,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 12 seconds)
2025-09-16 15:20:08,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:20:22,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4970.68408 ± 685.549
2025-09-16 15:20:22,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5104.714, 2919.9875, 5251.201, 5266.6045, 5215.0137, 5188.518, 5248.5537, 5217.233, 5182.936, 5112.075]
2025-09-16 15:20:22,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 570.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:20:22,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1251 [DEBUG]: Training session finished
