2025-09-16 13:42:10,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.100-delay_18
2025-09-16 13:42:10,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.100-delay_18
2025-09-16 13:42:10,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x150e1dc3c850>}
2025-09-16 13:42:10,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 13:42:10,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 13:42:10,597 baseline-bpql-noisepromille100-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=682, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 13:42:10,597 baseline-bpql-noisepromille100-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 13:42:12,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 13:42:12,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 13:44:01,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:44:02,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 209.96445 ± 51.613
2025-09-16 13:44:02,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [175.02275, 208.61435, 262.99316, 344.3893, 189.46013, 188.64499, 185.63556, 195.23293, 188.62431, 161.02698]
2025-09-16 13:44:02,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 42.0, 51.0, 65.0, 40.0, 40.0, 39.0, 42.0, 40.0, 35.0]
2025-09-16 13:44:02,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (209.96) for latency 18
2025-09-16 13:44:02,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 1 minute, 24 seconds)
2025-09-16 13:46:00,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:46:01,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 264.13998 ± 87.822
2025-09-16 13:46:01,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [346.60263, 370.85025, 170.85698, 239.50095, 176.34459, 341.12463, 293.74414, 349.00522, 256.12225, 97.2483]
2025-09-16 13:46:01,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 69.0, 33.0, 48.0, 34.0, 67.0, 57.0, 73.0, 49.0, 19.0]
2025-09-16 13:46:01,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (264.14) for latency 18
2025-09-16 13:46:01,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 7 minutes, 14 seconds)
2025-09-16 13:48:00,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:48:00,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 230.30205 ± 143.208
2025-09-16 13:48:00,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [158.933, 197.98618, 377.46582, 95.986206, 484.1407, 461.89142, 124.79404, 107.049774, 123.838806, 170.9347]
2025-09-16 13:48:00,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 38.0, 72.0, 19.0, 102.0, 93.0, 24.0, 21.0, 24.0, 33.0]
2025-09-16 13:48:00,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 7 minutes, 46 seconds)
2025-09-16 13:49:58,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:49:59,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 359.32364 ± 155.968
2025-09-16 13:49:59,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [403.52682, 135.45602, 182.54295, 399.89853, 351.75964, 424.24295, 417.30594, 712.87683, 199.29683, 366.32974]
2025-09-16 13:49:59,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 26.0, 35.0, 73.0, 75.0, 82.0, 87.0, 144.0, 38.0, 77.0]
2025-09-16 13:49:59,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (359.32) for latency 18
2025-09-16 13:49:59,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 6 minutes, 53 seconds)
2025-09-16 13:51:57,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:51:57,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 212.50729 ± 79.104
2025-09-16 13:51:57,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [130.30925, 137.13405, 245.97891, 313.75482, 307.58563, 140.13895, 347.25275, 157.55743, 164.29442, 181.06664]
2025-09-16 13:51:57,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 48.0, 67.0, 68.0, 27.0, 67.0, 31.0, 32.0, 35.0]
2025-09-16 13:51:57,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 5 minutes, 19 seconds)
2025-09-16 13:53:54,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:53:55,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 180.93259 ± 85.673
2025-09-16 13:53:55,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [108.720856, 363.28348, 135.18524, 113.74845, 139.21165, 312.1156, 118.4261, 159.85648, 129.29308, 229.485]
2025-09-16 13:53:55,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 72.0, 26.0, 22.0, 27.0, 63.0, 23.0, 31.0, 25.0, 45.0]
2025-09-16 13:53:55,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 5 minutes, 51 seconds)
2025-09-16 13:55:52,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:55:53,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 193.72458 ± 67.504
2025-09-16 13:55:53,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [227.77417, 119.26656, 118.72768, 147.59853, 309.23987, 259.1227, 154.82683, 132.77773, 183.8206, 284.09125]
2025-09-16 13:55:53,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 23.0, 23.0, 29.0, 66.0, 52.0, 30.0, 26.0, 36.0, 57.0]
2025-09-16 13:55:53,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 3 minutes, 25 seconds)
2025-09-16 13:57:49,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:57:50,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 151.28082 ± 17.001
2025-09-16 13:57:50,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [159.52116, 115.69681, 179.98224, 154.60228, 137.56302, 143.54678, 141.46628, 152.57411, 169.20181, 158.65385]
2025-09-16 13:57:50,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 23.0, 36.0, 32.0, 27.0, 29.0, 28.0, 32.0, 35.0, 32.0]
2025-09-16 13:57:50,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 41 seconds)
2025-09-16 13:59:49,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:59:50,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 328.84247 ± 85.152
2025-09-16 13:59:50,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [303.92896, 268.02856, 397.21838, 406.1092, 392.7458, 108.7015, 330.60568, 387.98428, 325.69122, 367.41147]
2025-09-16 13:59:50,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 52.0, 77.0, 78.0, 76.0, 21.0, 63.0, 75.0, 62.0, 70.0]
2025-09-16 13:59:50,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 59 minutes, 18 seconds)
2025-09-16 14:01:48,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:01:49,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 249.61313 ± 128.831
2025-09-16 14:01:49,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [315.54877, 101.65887, 347.55972, 136.13594, 102.28815, 390.3345, 468.78552, 145.84554, 152.08904, 335.88522]
2025-09-16 14:01:49,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 20.0, 65.0, 26.0, 20.0, 72.0, 85.0, 28.0, 29.0, 64.0]
2025-09-16 14:01:49,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 57 minutes, 31 seconds)
2025-09-16 14:03:48,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:03:48,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 284.45825 ± 164.301
2025-09-16 14:03:48,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [148.71742, 423.24014, 149.43796, 95.52023, 118.4166, 363.99863, 341.07916, 155.48192, 455.40094, 593.2897]
2025-09-16 14:03:48,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 86.0, 29.0, 19.0, 23.0, 67.0, 63.0, 30.0, 85.0, 114.0]
2025-09-16 14:03:48,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 1 second)
2025-09-16 14:05:47,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:05:48,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 265.18109 ± 113.816
2025-09-16 14:05:48,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [218.3494, 328.4826, 416.03098, 161.14888, 159.72733, 398.886, 113.38194, 360.13165, 129.2939, 366.37817]
2025-09-16 14:05:48,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 62.0, 76.0, 31.0, 31.0, 77.0, 22.0, 67.0, 25.0, 68.0]
2025-09-16 14:05:48,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes, 38 seconds)
2025-09-16 14:07:47,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:07:48,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 301.08301 ± 127.402
2025-09-16 14:07:48,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [414.29974, 414.22324, 150.52039, 415.79602, 329.90204, 134.95506, 182.4102, 337.7804, 483.68854, 147.25452]
2025-09-16 14:07:48,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 79.0, 29.0, 88.0, 63.0, 26.0, 36.0, 64.0, 90.0, 29.0]
2025-09-16 14:07:48,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 32 seconds)
2025-09-16 14:09:47,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:09:48,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 300.44928 ± 123.803
2025-09-16 14:09:48,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [113.26131, 396.5529, 265.81595, 487.43744, 334.69064, 151.03061, 357.88458, 134.42244, 433.01508, 330.38147]
2025-09-16 14:09:48,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 75.0, 51.0, 107.0, 64.0, 29.0, 76.0, 26.0, 91.0, 61.0]
2025-09-16 14:09:48,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 51 minutes, 14 seconds)
2025-09-16 14:11:46,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:11:47,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 230.35762 ± 144.918
2025-09-16 14:11:47,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [151.55322, 155.54922, 357.15857, 140.16843, 118.77278, 119.18164, 363.1602, 128.36635, 193.2354, 576.4303]
2025-09-16 14:11:47,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 68.0, 27.0, 23.0, 23.0, 67.0, 25.0, 37.0, 124.0]
2025-09-16 14:11:47,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 49 minutes, 29 seconds)
2025-09-16 14:13:46,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:13:47,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 294.18237 ± 129.740
2025-09-16 14:13:47,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [368.49353, 119.138664, 261.83215, 202.61432, 144.76646, 476.87408, 396.37106, 342.06354, 151.37155, 478.2983]
2025-09-16 14:13:47,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 23.0, 51.0, 40.0, 28.0, 87.0, 77.0, 62.0, 29.0, 89.0]
2025-09-16 14:13:47,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 41 seconds)
2025-09-16 14:15:46,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:15:47,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 238.16031 ± 155.338
2025-09-16 14:15:47,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [125.696014, 151.68947, 135.90254, 551.5153, 149.53271, 123.670555, 134.38359, 441.2832, 151.03781, 416.89172]
2025-09-16 14:15:47,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 29.0, 26.0, 120.0, 29.0, 24.0, 26.0, 83.0, 29.0, 76.0]
2025-09-16 14:15:47,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 45 minutes, 40 seconds)
2025-09-16 14:17:46,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:17:47,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 366.36142 ± 153.748
2025-09-16 14:17:47,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [419.83514, 181.2414, 124.942375, 479.1088, 406.62787, 144.82867, 538.6309, 347.28677, 456.06946, 565.0431]
2025-09-16 14:17:47,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 35.0, 24.0, 91.0, 75.0, 28.0, 99.0, 66.0, 98.0, 114.0]
2025-09-16 14:17:47,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (366.36) for latency 18
2025-09-16 14:17:47,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 50 seconds)
2025-09-16 14:19:46,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:19:47,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 337.49188 ± 200.871
2025-09-16 14:19:47,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [287.6565, 118.551216, 429.78268, 360.14526, 822.8044, 455.08145, 308.72845, 108.36495, 134.4625, 349.34167]
2025-09-16 14:19:47,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 23.0, 81.0, 69.0, 160.0, 86.0, 59.0, 21.0, 26.0, 63.0]
2025-09-16 14:19:47,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 49 seconds)
2025-09-16 14:21:46,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:21:47,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 345.17249 ± 161.033
2025-09-16 14:21:47,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.90765, 102.02992, 463.4729, 374.83405, 472.03778, 326.634, 439.25333, 392.15216, 609.34796, 152.05524]
2025-09-16 14:21:47,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 20.0, 88.0, 80.0, 87.0, 62.0, 85.0, 74.0, 129.0, 29.0]
2025-09-16 14:21:47,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 56 seconds)
2025-09-16 14:23:45,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:23:47,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 396.55539 ± 151.602
2025-09-16 14:23:47,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [538.1026, 552.4523, 428.60455, 481.0181, 134.64618, 446.18408, 109.01278, 538.8011, 338.6364, 398.09567]
2025-09-16 14:23:47,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 119.0, 89.0, 90.0, 26.0, 80.0, 21.0, 102.0, 64.0, 76.0]
2025-09-16 14:23:47,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (396.56) for latency 18
2025-09-16 14:23:47,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 50 seconds)
2025-09-16 14:25:47,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:25:48,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 249.88977 ± 152.685
2025-09-16 14:25:48,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [337.56857, 163.18393, 112.87099, 459.04947, 560.56195, 168.12111, 319.76132, 123.64703, 102.50328, 151.63002]
2025-09-16 14:25:48,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 31.0, 22.0, 85.0, 117.0, 32.0, 64.0, 24.0, 20.0, 29.0]
2025-09-16 14:25:48,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 36 minutes, 13 seconds)
2025-09-16 14:27:46,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:27:48,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 410.06201 ± 112.789
2025-09-16 14:27:48,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [509.13138, 442.82785, 509.69882, 134.15242, 380.69562, 404.07846, 500.5699, 483.6313, 446.07675, 289.7577]
2025-09-16 14:27:48,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 83.0, 110.0, 26.0, 70.0, 73.0, 93.0, 91.0, 96.0, 55.0]
2025-09-16 14:27:48,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (410.06) for latency 18
2025-09-16 14:27:48,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 2 seconds)
2025-09-16 14:29:47,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:29:48,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 278.71875 ± 108.371
2025-09-16 14:29:48,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [381.85037, 165.6049, 361.73892, 130.30568, 412.7319, 392.6659, 198.045, 341.2346, 124.66254, 278.34766]
2025-09-16 14:29:48,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 32.0, 71.0, 25.0, 77.0, 75.0, 38.0, 66.0, 24.0, 52.0]
2025-09-16 14:29:48,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 11 seconds)
2025-09-16 14:31:47,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:31:48,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 392.79340 ± 208.559
2025-09-16 14:31:48,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [611.9647, 727.50793, 97.24307, 494.3298, 470.65707, 114.004745, 108.156364, 404.47983, 395.9479, 503.6425]
2025-09-16 14:31:48,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 140.0, 19.0, 94.0, 86.0, 22.0, 21.0, 74.0, 73.0, 94.0]
2025-09-16 14:31:48,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 12 seconds)
2025-09-16 14:33:47,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:33:48,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 377.25287 ± 190.066
2025-09-16 14:33:48,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [498.797, 751.0487, 305.93158, 119.45156, 161.0769, 354.67358, 527.152, 157.00839, 506.9191, 390.46967]
2025-09-16 14:33:48,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 139.0, 56.0, 23.0, 31.0, 66.0, 97.0, 30.0, 95.0, 72.0]
2025-09-16 14:33:48,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 19 seconds)
2025-09-16 14:35:47,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:35:48,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 360.18011 ± 132.594
2025-09-16 14:35:48,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [499.16223, 423.98123, 184.7564, 387.29138, 482.1471, 459.51282, 247.74133, 275.40173, 129.51678, 512.2901]
2025-09-16 14:35:48,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 77.0, 35.0, 72.0, 91.0, 84.0, 47.0, 53.0, 25.0, 95.0]
2025-09-16 14:35:48,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 7 seconds)
2025-09-16 14:37:49,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:37:50,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 399.47632 ± 152.435
2025-09-16 14:37:50,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [445.10605, 585.82104, 467.73245, 391.86475, 482.7084, 360.41977, 134.50017, 551.11005, 108.55955, 466.94098]
2025-09-16 14:37:50,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 119.0, 89.0, 86.0, 88.0, 74.0, 26.0, 102.0, 21.0, 90.0]
2025-09-16 14:37:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 31 seconds)
2025-09-16 14:39:49,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:39:50,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 396.06348 ± 142.889
2025-09-16 14:39:50,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [554.6286, 459.5506, 367.28943, 363.84653, 406.38644, 188.18832, 113.891266, 579.51135, 524.4946, 402.84763]
2025-09-16 14:39:50,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 86.0, 67.0, 67.0, 87.0, 36.0, 22.0, 107.0, 99.0, 77.0]
2025-09-16 14:39:50,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 38 seconds)
2025-09-16 14:41:49,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:41:50,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 357.00827 ± 174.581
2025-09-16 14:41:50,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [429.9984, 313.8213, 120.02974, 569.4187, 449.16455, 386.64648, 411.23285, 106.8719, 635.4414, 147.45747]
2025-09-16 14:41:50,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 59.0, 23.0, 120.0, 85.0, 71.0, 76.0, 21.0, 121.0, 29.0]
2025-09-16 14:41:50,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 32 seconds)
2025-09-16 14:43:49,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:43:51,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 475.67715 ± 96.625
2025-09-16 14:43:51,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [419.7488, 551.7008, 395.6516, 460.65137, 720.3344, 416.9612, 506.11932, 378.48898, 413.74243, 493.3725]
2025-09-16 14:43:51,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 101.0, 84.0, 95.0, 137.0, 76.0, 109.0, 71.0, 78.0, 89.0]
2025-09-16 14:43:51,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (475.68) for latency 18
2025-09-16 14:43:51,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 36 seconds)
2025-09-16 14:45:50,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:45:51,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 368.87634 ± 226.648
2025-09-16 14:45:51,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.32621, 119.08014, 536.04614, 158.40707, 119.69236, 422.4156, 396.02664, 530.1269, 451.60962, 836.03265]
2025-09-16 14:45:51,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 99.0, 30.0, 23.0, 79.0, 72.0, 99.0, 83.0, 162.0]
2025-09-16 14:45:51,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 37 seconds)
2025-09-16 14:47:51,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:47:52,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 336.61823 ± 217.856
2025-09-16 14:47:52,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [383.8529, 765.99023, 154.73776, 472.54614, 108.80348, 245.0144, 147.94162, 108.48146, 634.04663, 344.76767]
2025-09-16 14:47:52,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 146.0, 30.0, 88.0, 21.0, 50.0, 28.0, 21.0, 138.0, 65.0]
2025-09-16 14:47:52,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 14 minutes, 31 seconds)
2025-09-16 14:49:53,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:49:53,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 263.16385 ± 158.555
2025-09-16 14:49:53,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [124.33098, 432.508, 144.47954, 175.57402, 548.54596, 135.22473, 140.11131, 364.064, 113.494995, 453.30487]
2025-09-16 14:49:53,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 83.0, 28.0, 34.0, 103.0, 26.0, 27.0, 65.0, 22.0, 85.0]
2025-09-16 14:49:53,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 40 seconds)
2025-09-16 14:51:52,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:51:53,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 422.98639 ± 205.685
2025-09-16 14:51:53,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [144.04428, 124.517426, 409.53305, 674.87506, 442.62225, 821.4386, 334.95605, 413.56317, 319.49405, 544.8197]
2025-09-16 14:51:53,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 24.0, 75.0, 142.0, 82.0, 156.0, 63.0, 75.0, 59.0, 118.0]
2025-09-16 14:51:53,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 38 seconds)
2025-09-16 14:53:52,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:53:53,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 431.30963 ± 145.641
2025-09-16 14:53:53,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [555.33417, 508.83865, 506.96722, 166.86792, 449.32925, 553.2775, 542.1103, 409.29364, 480.73352, 140.34364]
2025-09-16 14:53:53,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 95.0, 92.0, 32.0, 83.0, 103.0, 116.0, 91.0, 90.0, 27.0]
2025-09-16 14:53:53,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 35 seconds)
2025-09-16 14:55:54,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:55:55,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 428.33789 ± 164.533
2025-09-16 14:55:55,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [171.30057, 535.3556, 442.365, 639.8841, 134.99655, 416.97467, 613.73035, 575.75183, 393.68497, 359.33563]
2025-09-16 14:55:55,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 113.0, 81.0, 117.0, 26.0, 79.0, 116.0, 123.0, 72.0, 68.0]
2025-09-16 14:55:55,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 47 seconds)
2025-09-16 14:57:54,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:57:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 498.35898 ± 155.965
2025-09-16 14:57:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [498.969, 467.86945, 385.30328, 444.7502, 467.20996, 715.5859, 712.7174, 580.97516, 149.32855, 560.8807]
2025-09-16 14:57:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 88.0, 70.0, 83.0, 96.0, 137.0, 150.0, 107.0, 29.0, 104.0]
2025-09-16 14:57:56,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (498.36) for latency 18
2025-09-16 14:57:56,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 44 seconds)
2025-09-16 14:59:56,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:59:57,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 330.42804 ± 192.382
2025-09-16 14:59:57,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [340.45135, 119.26162, 108.37609, 119.621025, 496.24518, 392.6722, 716.7237, 423.40024, 152.7992, 434.7297]
2025-09-16 14:59:57,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 23.0, 21.0, 23.0, 91.0, 83.0, 151.0, 79.0, 29.0, 81.0]
2025-09-16 14:59:57,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 39 seconds)
2025-09-16 15:01:56,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:01:57,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 500.72528 ± 157.777
2025-09-16 15:01:57,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [514.5979, 623.3075, 433.88684, 129.95163, 631.9477, 410.0121, 722.13696, 450.69904, 618.9064, 471.8071]
2025-09-16 15:01:57,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 117.0, 94.0, 25.0, 118.0, 89.0, 141.0, 83.0, 120.0, 87.0]
2025-09-16 15:01:57,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (500.73) for latency 18
2025-09-16 15:01:57,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 49 seconds)
2025-09-16 15:03:57,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:03:58,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 390.20023 ± 201.992
2025-09-16 15:03:58,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [323.94647, 165.9476, 729.9084, 541.8425, 444.14496, 129.62277, 130.14604, 332.42032, 450.06274, 653.96045]
2025-09-16 15:03:58,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 32.0, 141.0, 102.0, 94.0, 25.0, 25.0, 62.0, 87.0, 138.0]
2025-09-16 15:03:58,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 57 seconds)
2025-09-16 15:05:58,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:05:59,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 467.70093 ± 199.141
2025-09-16 15:05:59,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [108.81952, 491.2516, 717.03156, 420.13547, 598.68756, 716.9809, 518.5047, 522.4546, 458.71085, 124.43263]
2025-09-16 15:05:59,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 92.0, 136.0, 76.0, 110.0, 148.0, 104.0, 113.0, 97.0, 24.0]
2025-09-16 15:05:59,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 56 minutes, 51 seconds)
2025-09-16 15:07:59,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:08:00,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 453.89062 ± 222.496
2025-09-16 15:08:00,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [478.4193, 751.317, 97.057465, 487.51343, 125.32438, 296.52652, 794.044, 394.5834, 603.37714, 510.7436]
2025-09-16 15:08:00,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 134.0, 19.0, 93.0, 24.0, 56.0, 157.0, 73.0, 126.0, 94.0]
2025-09-16 15:08:00,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 54 minutes, 48 seconds)
2025-09-16 15:09:59,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:10:00,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 423.22754 ± 260.102
2025-09-16 15:10:00,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [527.9298, 426.7763, 479.37628, 108.54126, 351.82883, 102.50217, 937.045, 759.7043, 393.36713, 145.20413]
2025-09-16 15:10:00,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 91.0, 100.0, 21.0, 66.0, 20.0, 170.0, 142.0, 73.0, 28.0]
2025-09-16 15:10:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes, 41 seconds)
2025-09-16 15:12:01,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:12:02,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 461.97861 ± 217.240
2025-09-16 15:12:02,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [657.48553, 124.52225, 885.9237, 422.4051, 573.30566, 528.56, 406.5097, 118.63348, 435.14352, 467.29672]
2025-09-16 15:12:02,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 24.0, 171.0, 77.0, 106.0, 97.0, 78.0, 23.0, 92.0, 88.0]
2025-09-16 15:12:02,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes, 52 seconds)
2025-09-16 15:14:01,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:14:03,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 480.42343 ± 166.107
2025-09-16 15:14:03,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [637.4258, 691.76105, 141.14993, 400.894, 540.30975, 559.92944, 514.057, 602.0734, 479.79623, 236.8381]
2025-09-16 15:14:03,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 128.0, 27.0, 81.0, 116.0, 105.0, 94.0, 113.0, 93.0, 46.0]
2025-09-16 15:14:03,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 48 minutes, 46 seconds)
2025-09-16 15:16:02,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:16:03,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 301.20239 ± 178.845
2025-09-16 15:16:03,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [423.9705, 139.53163, 448.19092, 502.67123, 107.90274, 596.7128, 124.70217, 154.3812, 124.4302, 389.53058]
2025-09-16 15:16:03,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 27.0, 81.0, 91.0, 21.0, 112.0, 24.0, 30.0, 24.0, 70.0]
2025-09-16 15:16:03,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 40 seconds)
2025-09-16 15:18:04,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:18:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 377.34970 ± 224.035
2025-09-16 15:18:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [631.91907, 161.7961, 754.1239, 114.06661, 125.16073, 388.67267, 140.34186, 612.1751, 411.86823, 433.37314]
2025-09-16 15:18:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 31.0, 142.0, 22.0, 24.0, 70.0, 27.0, 116.0, 75.0, 84.0]
2025-09-16 15:18:05,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 50 seconds)
2025-09-16 15:20:04,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:20:05,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 449.09946 ± 239.915
2025-09-16 15:20:05,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [716.84015, 854.7771, 467.24564, 167.53212, 139.88402, 697.3461, 474.88666, 136.11575, 386.99588, 449.37106]
2025-09-16 15:20:05,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 175.0, 86.0, 32.0, 27.0, 133.0, 88.0, 26.0, 79.0, 84.0]
2025-09-16 15:20:05,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 42 minutes, 47 seconds)
2025-09-16 15:22:05,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:22:06,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 475.47778 ± 174.165
2025-09-16 15:22:06,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [588.4657, 534.73895, 475.27628, 580.4347, 421.9859, 440.04962, 382.01428, 828.8734, 389.1996, 113.73933]
2025-09-16 15:22:06,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 99.0, 87.0, 106.0, 78.0, 80.0, 71.0, 169.0, 71.0, 22.0]
2025-09-16 15:22:06,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 40 minutes, 41 seconds)
2025-09-16 15:24:06,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:24:07,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 355.42368 ± 265.159
2025-09-16 15:24:07,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [102.93895, 352.60938, 146.67484, 668.92334, 140.9616, 470.91736, 514.95935, 908.19586, 129.09383, 118.962395]
2025-09-16 15:24:07,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 65.0, 28.0, 119.0, 27.0, 85.0, 92.0, 168.0, 25.0, 23.0]
2025-09-16 15:24:07,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 38 minutes, 40 seconds)
2025-09-16 15:26:08,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:26:10,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 470.13223 ± 207.864
2025-09-16 15:26:10,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [363.47726, 687.63446, 460.52725, 515.698, 350.2174, 725.09753, 114.52002, 664.27106, 164.54341, 655.33606]
2025-09-16 15:26:10,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 144.0, 88.0, 100.0, 68.0, 136.0, 22.0, 123.0, 32.0, 124.0]
2025-09-16 15:26:10,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 37 minutes)
2025-09-16 15:28:07,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:28:09,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 446.94891 ± 245.628
2025-09-16 15:28:09,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.85978, 660.2188, 460.3239, 545.3306, 114.44751, 878.39435, 494.3123, 419.58423, 647.81476, 129.20311]
2025-09-16 15:28:09,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 126.0, 86.0, 100.0, 22.0, 190.0, 94.0, 76.0, 136.0, 25.0]
2025-09-16 15:28:09,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 36 seconds)
2025-09-16 15:30:09,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:30:10,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 389.92477 ± 184.199
2025-09-16 15:30:10,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [166.5046, 125.507866, 470.2069, 713.09796, 415.01083, 551.51434, 459.79886, 124.43518, 404.085, 469.08615]
2025-09-16 15:30:10,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 24.0, 92.0, 148.0, 78.0, 100.0, 85.0, 24.0, 73.0, 87.0]
2025-09-16 15:30:10,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 47 seconds)
2025-09-16 15:32:10,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:32:11,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 356.18765 ± 237.245
2025-09-16 15:32:11,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [118.867004, 119.3367, 540.9246, 117.53994, 589.7076, 647.96704, 527.2238, 145.755, 107.77682, 646.77795]
2025-09-16 15:32:11,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 97.0, 23.0, 108.0, 119.0, 94.0, 28.0, 21.0, 117.0]
2025-09-16 15:32:11,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 43 seconds)
2025-09-16 15:34:11,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:34:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 557.27362 ± 176.367
2025-09-16 15:34:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [613.5766, 763.94214, 140.03566, 480.95663, 654.71204, 753.08545, 686.988, 478.11374, 437.72736, 563.5984]
2025-09-16 15:34:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 136.0, 27.0, 90.0, 121.0, 148.0, 141.0, 88.0, 84.0, 102.0]
2025-09-16 15:34:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (557.27) for latency 18
2025-09-16 15:34:13,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 56 seconds)
2025-09-16 15:36:13,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:36:14,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 370.25955 ± 201.742
2025-09-16 15:36:14,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [599.319, 114.876564, 178.94408, 113.97108, 135.12166, 370.47446, 543.8004, 490.27875, 555.74304, 600.0662]
2025-09-16 15:36:14,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 22.0, 35.0, 22.0, 26.0, 67.0, 97.0, 90.0, 106.0, 114.0]
2025-09-16 15:36:14,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 36 seconds)
2025-09-16 15:38:14,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:38:15,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 474.88818 ± 229.275
2025-09-16 15:38:15,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [113.726166, 662.2718, 431.2415, 927.6713, 629.99976, 456.50296, 582.52155, 369.6343, 153.43866, 421.87402]
2025-09-16 15:38:15,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 123.0, 80.0, 170.0, 126.0, 102.0, 121.0, 77.0, 30.0, 86.0]
2025-09-16 15:38:15,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 56 seconds)
2025-09-16 15:40:14,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:40:15,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 290.44403 ± 190.321
2025-09-16 15:40:15,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [145.2657, 96.12687, 319.22324, 108.25882, 332.3877, 544.1559, 431.0975, 125.01847, 657.8133, 145.0927]
2025-09-16 15:40:15,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 19.0, 59.0, 21.0, 64.0, 102.0, 77.0, 24.0, 126.0, 28.0]
2025-09-16 15:40:15,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 39 seconds)
2025-09-16 15:42:16,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:42:18,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 526.18054 ± 161.079
2025-09-16 15:42:18,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [749.51587, 677.79315, 540.66907, 618.2235, 491.84348, 546.56494, 124.73917, 595.2838, 483.62772, 433.54514]
2025-09-16 15:42:18,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 128.0, 111.0, 111.0, 90.0, 103.0, 24.0, 108.0, 98.0, 77.0]
2025-09-16 15:42:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 53 seconds)
2025-09-16 15:44:18,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:44:19,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 529.67371 ± 202.077
2025-09-16 15:44:19,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [674.36554, 507.85803, 681.3917, 423.68668, 630.1844, 901.1575, 420.26236, 118.72465, 378.71094, 560.3953]
2025-09-16 15:44:19,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 93.0, 122.0, 89.0, 122.0, 164.0, 82.0, 23.0, 70.0, 100.0]
2025-09-16 15:44:19,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 47 seconds)
2025-09-16 15:46:21,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:46:22,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 462.67593 ± 180.839
2025-09-16 15:46:22,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [495.6664, 449.24268, 555.756, 576.8995, 590.21094, 135.75577, 102.233826, 621.0651, 476.95868, 622.9703]
2025-09-16 15:46:22,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 96.0, 103.0, 109.0, 108.0, 26.0, 20.0, 111.0, 101.0, 115.0]
2025-09-16 15:46:22,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 17 minutes, 3 seconds)
2025-09-16 15:48:20,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:48:21,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 447.18878 ± 210.174
2025-09-16 15:48:21,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [624.9409, 640.22235, 706.6963, 518.9722, 593.0216, 429.30383, 178.36227, 520.3051, 129.73607, 130.32727]
2025-09-16 15:48:21,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 126.0, 143.0, 105.0, 104.0, 94.0, 34.0, 93.0, 25.0, 25.0]
2025-09-16 15:48:21,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 41 seconds)
2025-09-16 15:50:21,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:50:22,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 474.92050 ± 192.237
2025-09-16 15:50:22,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [187.8621, 609.9501, 439.15125, 583.13495, 456.75888, 507.81717, 108.96957, 601.9692, 797.5582, 456.03394]
2025-09-16 15:50:22,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 113.0, 81.0, 107.0, 97.0, 95.0, 21.0, 122.0, 156.0, 84.0]
2025-09-16 15:50:22,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 52 seconds)
2025-09-16 15:52:25,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:52:27,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 611.33630 ± 252.736
2025-09-16 15:52:27,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [145.35115, 585.18616, 453.53516, 989.97687, 528.2767, 520.1985, 770.168, 932.02576, 827.5656, 361.07944]
2025-09-16 15:52:27,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 118.0, 83.0, 203.0, 94.0, 93.0, 151.0, 162.0, 168.0, 65.0]
2025-09-16 15:52:27,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (611.34) for latency 18
2025-09-16 15:52:27,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 11 minutes, 6 seconds)
2025-09-16 15:54:26,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:54:27,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 567.73236 ± 375.169
2025-09-16 15:54:27,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1381.4011, 897.8298, 124.36797, 731.3535, 721.1884, 518.0729, 580.115, 140.68983, 130.49216, 451.81317]
2025-09-16 15:54:27,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [258.0, 184.0, 24.0, 133.0, 126.0, 92.0, 107.0, 27.0, 25.0, 81.0]
2025-09-16 15:54:27,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 57 seconds)
2025-09-16 15:56:28,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:56:30,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 493.14420 ± 243.807
2025-09-16 15:56:30,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [701.35986, 590.782, 560.3039, 890.0316, 598.16864, 529.6591, 610.24976, 170.51303, 149.56372, 130.81024]
2025-09-16 15:56:30,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 128.0, 105.0, 168.0, 130.0, 102.0, 109.0, 33.0, 29.0, 25.0]
2025-09-16 15:56:30,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 49 seconds)
2025-09-16 15:58:28,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:58:30,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 715.56750 ± 447.738
2025-09-16 15:58:30,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [535.41895, 1309.4623, 434.26926, 623.7989, 1091.9705, 865.8828, 124.154755, 130.55511, 1515.1752, 524.98737]
2025-09-16 15:58:30,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 245.0, 77.0, 115.0, 221.0, 163.0, 24.0, 25.0, 296.0, 110.0]
2025-09-16 15:58:30,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (715.57) for latency 18
2025-09-16 15:58:30,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 59 seconds)
2025-09-16 16:00:30,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:00:32,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 640.81488 ± 234.431
2025-09-16 16:00:32,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [600.12604, 1019.1703, 511.21783, 603.8164, 1137.8792, 606.28674, 637.7809, 368.56555, 464.1172, 459.1888]
2025-09-16 16:00:32,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 192.0, 93.0, 128.0, 217.0, 111.0, 118.0, 72.0, 84.0, 82.0]
2025-09-16 16:00:32,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 59 seconds)
2025-09-16 16:02:32,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:02:33,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 493.15155 ± 254.642
2025-09-16 16:02:33,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [800.716, 664.53094, 107.88347, 166.39417, 173.66534, 448.95807, 441.19363, 729.24634, 595.387, 803.54047]
2025-09-16 16:02:33,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 114.0, 21.0, 32.0, 34.0, 85.0, 82.0, 133.0, 126.0, 145.0]
2025-09-16 16:02:33,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 38 seconds)
2025-09-16 16:04:34,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:04:36,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 595.41687 ± 282.078
2025-09-16 16:04:36,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [569.2653, 119.92263, 696.0419, 670.65765, 559.0308, 145.10341, 1018.9875, 533.60394, 641.60815, 999.94696]
2025-09-16 16:04:36,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 23.0, 139.0, 119.0, 127.0, 28.0, 197.0, 100.0, 115.0, 193.0]
2025-09-16 16:04:36,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 47 seconds)
2025-09-16 16:06:36,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:06:39,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 881.01270 ± 526.838
2025-09-16 16:06:39,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [618.8544, 992.5401, 1111.7915, 2068.0073, 165.03313, 867.10565, 731.3435, 1197.4131, 111.24636, 946.79144]
2025-09-16 16:06:39,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 209.0, 207.0, 382.0, 32.0, 155.0, 135.0, 230.0, 22.0, 188.0]
2025-09-16 16:06:39,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (881.01) for latency 18
2025-09-16 16:06:39,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 50 seconds)
2025-09-16 16:08:39,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:08:41,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 661.53888 ± 306.262
2025-09-16 16:08:41,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [560.0146, 613.9649, 813.5313, 436.5522, 119.39081, 747.2826, 471.3687, 1227.0173, 540.6425, 1085.6243]
2025-09-16 16:08:41,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 120.0, 151.0, 80.0, 23.0, 132.0, 102.0, 252.0, 98.0, 203.0]
2025-09-16 16:08:41,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 56 seconds)
2025-09-16 16:10:42,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:10:43,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 507.77008 ± 315.484
2025-09-16 16:10:43,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [101.85012, 753.9912, 585.74304, 906.60516, 504.87204, 102.716095, 1033.9745, 145.69394, 348.64584, 593.60834]
2025-09-16 16:10:43,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 155.0, 104.0, 160.0, 89.0, 20.0, 204.0, 28.0, 66.0, 106.0]
2025-09-16 16:10:43,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 57 seconds)
2025-09-16 16:12:42,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:12:44,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 668.55109 ± 274.420
2025-09-16 16:12:44,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [805.0932, 432.34253, 129.2085, 917.16095, 1123.0016, 790.3227, 411.61618, 767.88043, 535.4971, 773.38763]
2025-09-16 16:12:44,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 81.0, 25.0, 171.0, 219.0, 137.0, 74.0, 159.0, 107.0, 158.0]
2025-09-16 16:12:44,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 54 seconds)
2025-09-16 16:14:45,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:14:46,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 626.36707 ± 298.083
2025-09-16 16:14:46,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [891.94006, 134.84375, 507.6794, 519.9623, 1177.3026, 852.856, 410.438, 872.08417, 561.74164, 334.82263]
2025-09-16 16:14:46,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 26.0, 90.0, 97.0, 222.0, 173.0, 76.0, 168.0, 100.0, 62.0]
2025-09-16 16:14:46,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 51 seconds)
2025-09-16 16:16:48,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:16:49,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 470.16000 ± 332.549
2025-09-16 16:16:49,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [348.6134, 120.642944, 932.0299, 595.8469, 119.39376, 114.433815, 618.0654, 763.1419, 975.657, 113.77504]
2025-09-16 16:16:49,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 23.0, 178.0, 123.0, 23.0, 22.0, 127.0, 142.0, 191.0, 22.0]
2025-09-16 16:16:49,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 47 seconds)
2025-09-16 16:18:49,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:18:50,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 521.97986 ± 264.827
2025-09-16 16:18:50,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [539.7293, 423.1552, 435.73203, 795.1659, 458.65924, 102.56917, 948.20514, 665.183, 102.77935, 748.61957]
2025-09-16 16:18:50,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 84.0, 77.0, 141.0, 85.0, 20.0, 188.0, 141.0, 20.0, 131.0]
2025-09-16 16:18:50,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 42 seconds)
2025-09-16 16:20:50,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:20:51,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 461.48578 ± 234.961
2025-09-16 16:20:51,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [361.13602, 827.3876, 451.10623, 102.86883, 743.76373, 108.71042, 722.7458, 478.8766, 384.38028, 433.8829]
2025-09-16 16:20:51,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 156.0, 84.0, 20.0, 132.0, 21.0, 131.0, 85.0, 70.0, 77.0]
2025-09-16 16:20:51,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 35 seconds)
2025-09-16 16:22:52,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:22:54,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 706.59125 ± 406.353
2025-09-16 16:22:54,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [515.7833, 622.4258, 665.50836, 509.2254, 1101.4529, 119.702324, 811.03625, 120.18363, 1241.1974, 1359.3973]
2025-09-16 16:22:54,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 122.0, 121.0, 90.0, 217.0, 23.0, 176.0, 23.0, 213.0, 262.0]
2025-09-16 16:22:54,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 37 seconds)
2025-09-16 16:24:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:24:58,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 762.29309 ± 179.847
2025-09-16 16:24:58,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [567.1946, 975.8453, 698.697, 824.03253, 809.2259, 605.7442, 556.1592, 757.9326, 1155.5508, 672.54913]
2025-09-16 16:24:58,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 201.0, 132.0, 162.0, 151.0, 123.0, 104.0, 145.0, 234.0, 130.0]
2025-09-16 16:24:58,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 43 seconds)
2025-09-16 16:26:58,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:27:00,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 538.64838 ± 332.958
2025-09-16 16:27:00,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1025.9681, 495.722, 165.35129, 636.7533, 1159.8197, 411.43494, 124.99933, 493.94467, 685.4442, 187.04579]
2025-09-16 16:27:00,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 93.0, 32.0, 110.0, 209.0, 76.0, 24.0, 91.0, 118.0, 36.0]
2025-09-16 16:27:00,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 38 seconds)
2025-09-16 16:29:00,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:29:02,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 652.24695 ± 353.508
2025-09-16 16:29:02,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [137.23633, 760.2707, 1259.5315, 443.7062, 1075.1351, 880.45953, 448.89484, 639.28033, 125.06153, 752.89325]
2025-09-16 16:29:02,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 156.0, 236.0, 80.0, 200.0, 174.0, 80.0, 116.0, 24.0, 132.0]
2025-09-16 16:29:02,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 38 seconds)
2025-09-16 16:31:02,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:31:04,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 764.57129 ± 299.423
2025-09-16 16:31:04,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [590.3922, 661.6256, 145.78624, 775.0283, 736.28564, 1136.4221, 707.86804, 1318.1504, 893.96716, 680.187]
2025-09-16 16:31:04,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 127.0, 28.0, 149.0, 138.0, 209.0, 130.0, 236.0, 179.0, 129.0]
2025-09-16 16:31:04,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 41 seconds)
2025-09-16 16:33:04,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:33:06,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 613.49957 ± 346.829
2025-09-16 16:33:06,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [607.20056, 616.77936, 119.202675, 784.0789, 911.0115, 752.22687, 1214.106, 145.48152, 818.4863, 166.42172]
2025-09-16 16:33:06,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 127.0, 23.0, 147.0, 175.0, 136.0, 225.0, 28.0, 150.0, 32.0]
2025-09-16 16:33:06,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 36 seconds)
2025-09-16 16:35:06,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:35:08,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 729.16907 ± 466.226
2025-09-16 16:35:08,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [357.0599, 636.28217, 159.94363, 1024.4701, 1376.0637, 154.94998, 1042.352, 446.10513, 1540.4945, 553.969]
2025-09-16 16:35:08,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 118.0, 31.0, 169.0, 251.0, 30.0, 195.0, 84.0, 289.0, 120.0]
2025-09-16 16:35:08,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 29 seconds)
2025-09-16 16:37:09,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:37:10,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 489.04019 ± 513.572
2025-09-16 16:37:10,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [482.2686, 124.816185, 107.87723, 617.6682, 395.71472, 156.971, 102.79479, 1922.3738, 388.92267, 590.9947]
2025-09-16 16:37:10,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 24.0, 21.0, 124.0, 70.0, 30.0, 20.0, 388.0, 69.0, 117.0]
2025-09-16 16:37:10,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 27 seconds)
2025-09-16 16:39:11,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:39:13,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 650.64490 ± 310.044
2025-09-16 16:39:13,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [809.63934, 292.87534, 536.0623, 849.9386, 568.1915, 1235.7676, 821.55035, 868.19025, 144.7812, 379.45227]
2025-09-16 16:39:13,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 57.0, 110.0, 162.0, 107.0, 234.0, 176.0, 170.0, 28.0, 70.0]
2025-09-16 16:39:13,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 25 seconds)
2025-09-16 16:41:13,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:41:15,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 552.36871 ± 370.290
2025-09-16 16:41:15,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [144.44148, 118.766594, 1094.6125, 842.5178, 150.53235, 170.75581, 997.72015, 884.7499, 665.83466, 453.7559]
2025-09-16 16:41:15,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 23.0, 196.0, 149.0, 29.0, 33.0, 182.0, 182.0, 145.0, 83.0]
2025-09-16 16:41:15,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 23 seconds)
2025-09-16 16:43:15,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:43:17,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 704.14813 ± 349.245
2025-09-16 16:43:17,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [572.7619, 774.81604, 299.53946, 799.5085, 1291.2367, 1043.8793, 1106.4056, 546.5756, 135.00734, 471.75085]
2025-09-16 16:43:17,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 144.0, 55.0, 148.0, 234.0, 184.0, 193.0, 94.0, 26.0, 87.0]
2025-09-16 16:43:17,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 22 seconds)
2025-09-16 16:45:18,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:45:20,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 640.89191 ± 400.897
2025-09-16 16:45:20,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [298.81384, 493.14224, 1264.7306, 1322.1187, 118.563286, 770.1754, 723.8649, 818.14496, 464.60995, 134.75552]
2025-09-16 16:45:20,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 86.0, 243.0, 254.0, 23.0, 142.0, 129.0, 145.0, 100.0, 26.0]
2025-09-16 16:45:20,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 20 seconds)
2025-09-16 16:47:20,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:47:23,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 772.28400 ± 435.292
2025-09-16 16:47:23,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [810.91266, 766.3375, 165.69228, 452.31537, 1473.5756, 180.48875, 687.7449, 1369.5793, 1208.1832, 608.0104]
2025-09-16 16:47:23,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 146.0, 32.0, 83.0, 292.0, 35.0, 122.0, 252.0, 232.0, 125.0]
2025-09-16 16:47:23,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 20 seconds)
2025-09-16 16:49:22,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:49:25,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 920.08789 ± 496.873
2025-09-16 16:49:25,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [454.0599, 1017.6487, 1320.8514, 280.8877, 1886.5553, 1082.6813, 743.57336, 1264.1195, 978.4239, 172.07678]
2025-09-16 16:49:25,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 194.0, 254.0, 54.0, 332.0, 203.0, 139.0, 247.0, 184.0, 33.0]
2025-09-16 16:49:25,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (920.09) for latency 18
2025-09-16 16:49:25,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 17 seconds)
2025-09-16 16:51:27,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:51:28,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 599.75311 ± 336.118
2025-09-16 16:51:28,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1260.2761, 885.421, 108.00481, 812.643, 753.82666, 550.66327, 412.56583, 443.64627, 114.01266, 656.47144]
2025-09-16 16:51:28,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [239.0, 160.0, 21.0, 140.0, 145.0, 100.0, 73.0, 96.0, 22.0, 113.0]
2025-09-16 16:51:28,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 16 seconds)
2025-09-16 16:53:28,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:53:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 607.43030 ± 414.010
2025-09-16 16:53:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [608.7023, 430.88422, 119.15951, 963.5525, 102.79261, 1320.501, 480.98602, 102.62566, 1064.931, 880.1686]
2025-09-16 16:53:30,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 80.0, 23.0, 185.0, 20.0, 239.0, 87.0, 20.0, 188.0, 157.0]
2025-09-16 16:53:30,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 12 seconds)
2025-09-16 16:55:30,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:55:32,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 802.88104 ± 499.315
2025-09-16 16:55:32,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1749.2345, 130.33394, 472.36737, 1334.9175, 144.94058, 1136.3035, 764.9718, 865.22437, 408.49045, 1022.0261]
2025-09-16 16:55:32,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [311.0, 25.0, 90.0, 246.0, 28.0, 215.0, 137.0, 160.0, 78.0, 186.0]
2025-09-16 16:55:32,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 10 seconds)
2025-09-16 16:57:33,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:57:36,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1039.16382 ± 747.357
2025-09-16 16:57:36,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [570.8158, 650.4362, 1136.0498, 1178.777, 936.5766, 155.11296, 539.529, 428.64746, 2198.3228, 2597.3704]
2025-09-16 16:57:36,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 128.0, 227.0, 221.0, 186.0, 30.0, 95.0, 78.0, 379.0, 471.0]
2025-09-16 16:57:36,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1039.16) for latency 18
2025-09-16 16:57:36,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 8 seconds)
2025-09-16 16:59:36,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:59:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 741.45715 ± 569.496
2025-09-16 16:59:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [645.5606, 130.77411, 1479.2557, 580.38336, 551.3578, 584.1024, 128.79314, 573.7696, 650.70374, 2089.8708]
2025-09-16 16:59:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 25.0, 307.0, 106.0, 117.0, 109.0, 25.0, 108.0, 115.0, 404.0]
2025-09-16 16:59:39,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 5 seconds)
2025-09-16 17:01:37,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:01:39,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 677.32166 ± 414.008
2025-09-16 17:01:39,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.41067, 974.9319, 119.06902, 935.7781, 139.73557, 420.3951, 855.6995, 1288.7682, 871.5058, 1047.9226]
2025-09-16 17:01:39,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 192.0, 23.0, 179.0, 27.0, 79.0, 156.0, 243.0, 164.0, 203.0]
2025-09-16 17:01:39,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 2 seconds)
2025-09-16 17:03:40,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:03:41,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 538.38159 ± 534.134
2025-09-16 17:03:41,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [433.93307, 500.1293, 901.88525, 130.37717, 1332.5022, 124.06109, 119.30001, 108.87916, 103.33125, 1629.4174]
2025-09-16 17:03:41,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 85.0, 160.0, 25.0, 254.0, 24.0, 23.0, 21.0, 20.0, 305.0]
2025-09-16 17:03:41,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1251 [DEBUG]: Training session finished
