2025-09-16 15:13:12,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.150-delay_24
2025-09-16 15:13:12,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.150-delay_24
2025-09-16 15:13:12,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x147cfd698a50>}
2025-09-16 15:13:12,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 15:13:12,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 15:13:12,517 baseline-bpql-noisepromille150-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 15:13:12,517 baseline-bpql-noisepromille150-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 15:13:14,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 15:13:14,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 15:15:05,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:15:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 141.38307 ± 30.446
2025-09-16 15:15:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.613106, 159.5224, 184.83788, 124.49498, 173.93071, 142.57704, 107.177284, 178.0928, 108.37939, 139.20511]
2025-09-16 15:15:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 31.0, 37.0, 24.0, 36.0, 27.0, 21.0, 35.0, 21.0, 29.0]
2025-09-16 15:15:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (141.38) for latency 24
2025-09-16 15:15:06,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 4 minutes, 38 seconds)
2025-09-16 15:17:06,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:17:06,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 247.76108 ± 135.521
2025-09-16 15:17:06,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [431.97638, 363.1715, 101.67844, 134.47911, 362.1564, 89.93361, 366.1052, 384.10138, 124.53664, 119.472244]
2025-09-16 15:17:06,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 67.0, 20.0, 26.0, 71.0, 18.0, 66.0, 70.0, 24.0, 23.0]
2025-09-16 15:17:06,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (247.76) for latency 24
2025-09-16 15:17:06,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 9 minutes, 54 seconds)
2025-09-16 15:19:05,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:19:06,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 202.38419 ± 109.926
2025-09-16 15:19:06,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [103.30224, 108.28478, 162.27989, 113.52567, 360.04526, 356.09555, 230.38525, 114.17297, 109.24507, 366.50528]
2025-09-16 15:19:06,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 21.0, 31.0, 22.0, 68.0, 71.0, 46.0, 22.0, 21.0, 68.0]
2025-09-16 15:19:06,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 9 minutes, 47 seconds)
2025-09-16 15:21:05,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:21:06,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 220.14180 ± 137.502
2025-09-16 15:21:06,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [124.864685, 128.56001, 400.53043, 196.2929, 138.49792, 107.595955, 502.60532, 140.47531, 102.94801, 359.0475]
2025-09-16 15:21:06,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 76.0, 37.0, 27.0, 21.0, 95.0, 27.0, 20.0, 66.0]
2025-09-16 15:21:06,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 8 minutes, 42 seconds)
2025-09-16 15:23:05,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:23:05,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 175.75554 ± 125.395
2025-09-16 15:23:05,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [293.39615, 114.709526, 124.76148, 114.01212, 123.64181, 113.61882, 119.66662, 95.136826, 516.2325, 142.37958]
2025-09-16 15:23:05,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 22.0, 24.0, 22.0, 24.0, 22.0, 23.0, 19.0, 104.0, 27.0]
2025-09-16 15:23:05,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 7 minutes, 20 seconds)
2025-09-16 15:25:04,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:25:05,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 232.06082 ± 101.726
2025-09-16 15:25:05,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [185.05809, 262.90732, 131.3425, 293.66638, 378.64432, 101.38336, 193.91745, 196.8059, 149.2224, 427.66055]
2025-09-16 15:25:05,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 50.0, 25.0, 55.0, 74.0, 20.0, 38.0, 38.0, 29.0, 80.0]
2025-09-16 15:25:05,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 7 minutes, 47 seconds)
2025-09-16 15:27:03,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:27:04,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 165.33687 ± 73.321
2025-09-16 15:27:04,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [274.0255, 96.56573, 138.64958, 129.75809, 108.22593, 139.65703, 336.51108, 155.16585, 150.6973, 124.11256]
2025-09-16 15:27:04,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 19.0, 27.0, 25.0, 21.0, 27.0, 65.0, 30.0, 29.0, 24.0]
2025-09-16 15:27:04,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 5 minutes, 15 seconds)
2025-09-16 15:29:02,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:29:03,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 186.57181 ± 105.378
2025-09-16 15:29:03,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [108.758354, 114.18664, 96.928185, 428.28726, 130.42427, 111.88876, 163.35512, 301.9237, 134.14648, 275.81918]
2025-09-16 15:29:03,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 22.0, 19.0, 81.0, 25.0, 22.0, 31.0, 57.0, 26.0, 53.0]
2025-09-16 15:29:03,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 3 minutes, 5 seconds)
2025-09-16 15:31:01,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:31:02,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 176.56393 ± 88.103
2025-09-16 15:31:02,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [377.40326, 162.27245, 146.53178, 150.79233, 142.79935, 101.669975, 145.87032, 102.76138, 120.35187, 315.18668]
2025-09-16 15:31:02,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 32.0, 28.0, 29.0, 28.0, 20.0, 28.0, 20.0, 23.0, 60.0]
2025-09-16 15:31:02,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 53 seconds)
2025-09-16 15:33:00,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:33:01,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 148.46451 ± 69.263
2025-09-16 15:33:01,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [348.0505, 97.021675, 153.86847, 143.60051, 107.70742, 141.9656, 124.85676, 126.26919, 144.52548, 96.77939]
2025-09-16 15:33:01,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 19.0, 30.0, 28.0, 21.0, 28.0, 24.0, 25.0, 28.0, 19.0]
2025-09-16 15:33:01,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 58 minutes, 36 seconds)
2025-09-16 15:34:59,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:34:59,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 183.68768 ± 75.313
2025-09-16 15:34:59,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [113.364456, 123.92651, 247.87411, 145.37633, 128.66751, 139.38075, 353.76865, 158.08595, 269.33743, 157.0951]
2025-09-16 15:34:59,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 50.0, 28.0, 25.0, 27.0, 79.0, 31.0, 52.0, 30.0]
2025-09-16 15:34:59,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 21 seconds)
2025-09-16 15:36:57,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:36:57,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 129.41930 ± 25.732
2025-09-16 15:36:57,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [108.1068, 108.0006, 136.45227, 160.68805, 182.94806, 102.39309, 142.89445, 133.9502, 103.04888, 115.71072]
2025-09-16 15:36:57,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 26.0, 31.0, 35.0, 20.0, 28.0, 26.0, 20.0, 23.0]
2025-09-16 15:36:57,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes, 1 second)
2025-09-16 15:38:53,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:38:53,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 183.55338 ± 75.953
2025-09-16 15:38:53,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [246.94798, 190.60641, 119.56861, 100.970894, 285.75006, 158.53313, 324.75613, 192.08809, 119.46033, 96.85222]
2025-09-16 15:38:53,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 37.0, 23.0, 20.0, 57.0, 31.0, 61.0, 36.0, 23.0, 19.0]
2025-09-16 15:38:53,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 12 seconds)
2025-09-16 15:40:48,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:40:49,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 145.27701 ± 50.355
2025-09-16 15:40:49,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [108.1454, 95.95737, 268.75064, 135.16046, 127.85841, 177.22446, 96.71742, 120.33033, 187.56657, 135.05904]
2025-09-16 15:40:49,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 19.0, 51.0, 26.0, 25.0, 34.0, 19.0, 23.0, 36.0, 26.0]
2025-09-16 15:40:49,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 48 minutes, 14 seconds)
2025-09-16 15:42:43,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:42:44,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 128.10776 ± 40.077
2025-09-16 15:42:44,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [229.16403, 177.28537, 123.70658, 113.98051, 103.14336, 102.12951, 114.43648, 102.80131, 96.88994, 117.5405]
2025-09-16 15:42:44,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 34.0, 24.0, 22.0, 20.0, 20.0, 22.0, 20.0, 19.0, 23.0]
2025-09-16 15:42:44,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 45 minutes, 12 seconds)
2025-09-16 15:44:39,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:44:39,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 190.75308 ± 98.253
2025-09-16 15:44:39,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [139.15347, 90.38918, 114.05376, 243.12195, 240.32259, 107.64194, 132.33458, 360.84662, 120.14958, 359.51715]
2025-09-16 15:44:39,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 18.0, 22.0, 47.0, 47.0, 21.0, 26.0, 68.0, 23.0, 74.0]
2025-09-16 15:44:39,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 42 minutes, 22 seconds)
2025-09-16 15:46:35,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:46:36,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 156.03125 ± 69.919
2025-09-16 15:46:36,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [112.529465, 316.38007, 236.64691, 156.98122, 119.404396, 95.6409, 201.12561, 124.37168, 89.52463, 107.70746]
2025-09-16 15:46:36,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 62.0, 47.0, 30.0, 23.0, 19.0, 40.0, 24.0, 18.0, 21.0]
2025-09-16 15:46:36,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 40 minutes, 2 seconds)
2025-09-16 15:48:32,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:48:33,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 166.64920 ± 74.723
2025-09-16 15:48:33,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [161.7203, 139.3818, 125.42135, 227.77608, 108.95468, 113.66627, 269.77975, 101.701225, 97.13923, 320.9512]
2025-09-16 15:48:33,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 24.0, 44.0, 21.0, 22.0, 51.0, 20.0, 19.0, 65.0]
2025-09-16 15:48:33,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 38 minutes, 19 seconds)
2025-09-16 15:50:29,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:50:29,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 155.38222 ± 77.539
2025-09-16 15:50:29,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [150.50522, 96.87047, 351.9492, 102.91196, 157.86069, 170.81413, 90.33835, 227.25558, 109.097496, 96.21896]
2025-09-16 15:50:29,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 19.0, 67.0, 20.0, 31.0, 34.0, 18.0, 46.0, 21.0, 19.0]
2025-09-16 15:50:29,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 36 minutes, 46 seconds)
2025-09-16 15:52:25,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:52:25,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 194.76912 ± 101.523
2025-09-16 15:52:25,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [120.13175, 261.09872, 236.63264, 95.93951, 101.48347, 109.442726, 377.17456, 359.1277, 146.40276, 140.25743]
2025-09-16 15:52:25,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 50.0, 47.0, 19.0, 20.0, 21.0, 78.0, 70.0, 28.0, 27.0]
2025-09-16 15:52:25,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 35 minutes, 1 second)
2025-09-16 15:54:28,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:54:28,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 145.00217 ± 48.756
2025-09-16 15:54:28,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.859, 134.46165, 219.21579, 222.59746, 102.48367, 124.37829, 108.51464, 124.16806, 107.615105, 210.72795]
2025-09-16 15:54:28,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 27.0, 44.0, 44.0, 20.0, 24.0, 21.0, 24.0, 21.0, 42.0]
2025-09-16 15:54:28,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 35 minutes)
2025-09-16 15:56:31,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:56:31,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 197.85742 ± 90.253
2025-09-16 15:56:31,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [124.82929, 146.09026, 246.52715, 400.5304, 114.56289, 215.84663, 307.84537, 118.062904, 158.93214, 145.34714]
2025-09-16 15:56:31,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 28.0, 48.0, 78.0, 22.0, 41.0, 61.0, 23.0, 30.0, 28.0]
2025-09-16 15:56:31,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 50 seconds)
2025-09-16 15:58:29,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:58:29,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 117.34397 ± 30.569
2025-09-16 15:58:29,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [97.04389, 200.10316, 108.03442, 95.737045, 106.950134, 124.56713, 96.06949, 103.78003, 139.44597, 101.70852]
2025-09-16 15:58:29,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 39.0, 21.0, 19.0, 21.0, 24.0, 19.0, 20.0, 27.0, 20.0]
2025-09-16 15:58:29,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 3 seconds)
2025-09-16 16:00:23,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:00:24,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 143.75645 ± 63.819
2025-09-16 16:00:24,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [123.58425, 118.78683, 123.03421, 127.58725, 118.0442, 329.60123, 101.89799, 106.60004, 125.84071, 162.58788]
2025-09-16 16:00:24,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 23.0, 24.0, 25.0, 23.0, 65.0, 20.0, 21.0, 24.0, 31.0]
2025-09-16 16:00:24,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 32 seconds)
2025-09-16 16:02:18,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:02:19,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 168.62289 ± 71.030
2025-09-16 16:02:19,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [113.25695, 127.5444, 152.15714, 167.40791, 161.39261, 252.74907, 107.721016, 140.60658, 118.01499, 345.37817]
2025-09-16 16:02:19,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 25.0, 31.0, 32.0, 31.0, 52.0, 21.0, 27.0, 23.0, 70.0]
2025-09-16 16:02:19,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 28 minutes, 26 seconds)
2025-09-16 16:04:14,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:04:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 192.91190 ± 55.811
2025-09-16 16:04:14,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [195.87059, 113.294334, 294.84314, 155.92664, 183.64134, 148.47792, 279.71008, 214.22028, 139.29349, 203.8411]
2025-09-16 16:04:14,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 22.0, 57.0, 31.0, 35.0, 29.0, 55.0, 43.0, 27.0, 39.0]
2025-09-16 16:04:14,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 minutes, 37 seconds)
2025-09-16 16:06:09,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:06:10,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 159.11839 ± 72.561
2025-09-16 16:06:10,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [96.731476, 108.82582, 254.05267, 155.8353, 180.53409, 96.091484, 140.13925, 108.03205, 327.93842, 123.00328]
2025-09-16 16:06:10,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 49.0, 30.0, 35.0, 19.0, 27.0, 21.0, 65.0, 24.0]
2025-09-16 16:06:10,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 20 minutes, 45 seconds)
2025-09-16 16:08:05,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:08:05,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 208.33540 ± 118.406
2025-09-16 16:08:05,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [113.43257, 346.79605, 109.075096, 216.86884, 123.826164, 152.43152, 132.60976, 450.35834, 335.79904, 102.156395]
2025-09-16 16:08:05,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 67.0, 21.0, 42.0, 24.0, 30.0, 26.0, 84.0, 71.0, 20.0]
2025-09-16 16:08:05,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 18 minutes, 21 seconds)
2025-09-16 16:10:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:10:01,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 192.77252 ± 76.331
2025-09-16 16:10:01,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [150.5664, 154.56398, 145.61218, 325.0306, 190.1669, 354.08224, 185.13214, 159.29066, 113.95822, 149.32185]
2025-09-16 16:10:01,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 28.0, 68.0, 37.0, 74.0, 36.0, 31.0, 22.0, 29.0]
2025-09-16 16:10:01,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 16 minutes, 32 seconds)
2025-09-16 16:11:56,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:11:57,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 193.86276 ± 74.542
2025-09-16 16:11:57,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [136.84653, 177.23366, 154.12314, 266.0149, 156.23567, 291.108, 155.90619, 123.0224, 348.234, 129.90324]
2025-09-16 16:11:57,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 34.0, 30.0, 53.0, 31.0, 61.0, 30.0, 24.0, 67.0, 25.0]
2025-09-16 16:11:57,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 14 minutes, 52 seconds)
2025-09-16 16:13:53,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:13:53,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 143.11725 ± 25.866
2025-09-16 16:13:53,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [186.72699, 120.518936, 184.81335, 114.46502, 144.0797, 129.68391, 158.12865, 107.72266, 147.90913, 137.12411]
2025-09-16 16:13:53,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 23.0, 37.0, 22.0, 28.0, 25.0, 31.0, 21.0, 28.0, 27.0]
2025-09-16 16:13:53,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 13 minutes, 12 seconds)
2025-09-16 16:15:50,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:15:50,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 153.85258 ± 64.599
2025-09-16 16:15:50,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [102.45306, 321.2247, 123.44491, 103.24711, 108.67316, 198.42542, 176.07974, 119.784836, 113.843025, 171.35002]
2025-09-16 16:15:50,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 66.0, 24.0, 20.0, 21.0, 38.0, 33.0, 23.0, 22.0, 33.0]
2025-09-16 16:15:50,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 36 seconds)
2025-09-16 16:17:46,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:17:47,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 115.05017 ± 15.050
2025-09-16 16:17:47,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [118.29579, 102.06062, 114.23826, 108.07146, 90.89503, 114.14905, 145.64417, 119.15921, 103.69625, 134.29198]
2025-09-16 16:17:47,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 20.0, 22.0, 21.0, 18.0, 22.0, 28.0, 23.0, 20.0, 26.0]
2025-09-16 16:17:47,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 51 seconds)
2025-09-16 16:19:43,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:19:44,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 157.69308 ± 57.223
2025-09-16 16:19:44,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [129.06145, 137.6762, 183.83638, 106.991425, 172.06503, 312.45218, 119.47517, 125.85575, 170.30121, 119.21607]
2025-09-16 16:19:44,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 35.0, 21.0, 33.0, 63.0, 23.0, 24.0, 33.0, 23.0]
2025-09-16 16:19:44,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 17 seconds)
2025-09-16 16:21:40,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:21:40,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 147.01985 ± 38.595
2025-09-16 16:21:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [185.1233, 156.10432, 177.1309, 95.729866, 211.05779, 144.5121, 119.05748, 109.17678, 176.56897, 95.73696]
2025-09-16 16:21:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 30.0, 34.0, 19.0, 42.0, 28.0, 23.0, 21.0, 34.0, 19.0]
2025-09-16 16:21:40,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 23 seconds)
2025-09-16 16:23:37,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:23:37,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 142.62326 ± 45.769
2025-09-16 16:23:37,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [90.50925, 89.50761, 135.41805, 134.67929, 158.42049, 242.31853, 165.97664, 106.29245, 112.29759, 190.8127]
2025-09-16 16:23:37,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 26.0, 26.0, 31.0, 46.0, 32.0, 21.0, 22.0, 37.0]
2025-09-16 16:23:37,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 4 minutes, 32 seconds)
2025-09-16 16:25:34,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:25:34,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 181.74089 ± 82.869
2025-09-16 16:25:34,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [192.40237, 376.92514, 117.65683, 150.46452, 292.28082, 114.02808, 172.75847, 106.077675, 161.71071, 133.1044]
2025-09-16 16:25:34,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 75.0, 23.0, 29.0, 61.0, 22.0, 33.0, 21.0, 31.0, 26.0]
2025-09-16 16:25:34,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 39 seconds)
2025-09-16 16:27:30,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:27:31,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 169.82056 ± 72.389
2025-09-16 16:27:31,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [365.25934, 156.67046, 116.88704, 165.91986, 106.66483, 210.15025, 108.184044, 183.92096, 148.80562, 135.74316]
2025-09-16 16:27:31,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 30.0, 23.0, 32.0, 21.0, 41.0, 21.0, 37.0, 29.0, 26.0]
2025-09-16 16:27:31,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 44 seconds)
2025-09-16 16:29:27,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:29:28,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 150.09744 ± 52.450
2025-09-16 16:29:28,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [155.73, 108.34749, 123.8259, 112.90279, 296.5578, 160.99445, 108.90423, 158.1278, 134.02885, 141.55528]
2025-09-16 16:29:28,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 21.0, 24.0, 22.0, 60.0, 31.0, 21.0, 31.0, 26.0, 27.0]
2025-09-16 16:29:28,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 58 minutes, 43 seconds)
2025-09-16 16:31:24,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:31:24,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 149.61064 ± 29.962
2025-09-16 16:31:24,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [125.14907, 184.34784, 129.29695, 114.294205, 124.19509, 129.2985, 145.11685, 152.01662, 189.58195, 202.80943]
2025-09-16 16:31:24,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 35.0, 25.0, 22.0, 24.0, 25.0, 28.0, 29.0, 36.0, 39.0]
2025-09-16 16:31:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 56 minutes, 45 seconds)
2025-09-16 16:33:20,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:33:20,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 148.58258 ± 28.324
2025-09-16 16:33:20,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [114.68678, 148.51515, 113.36596, 175.36935, 141.87752, 175.65472, 160.57762, 200.73828, 141.42325, 113.61722]
2025-09-16 16:33:20,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 29.0, 22.0, 33.0, 28.0, 34.0, 31.0, 40.0, 27.0, 22.0]
2025-09-16 16:33:20,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 54 minutes, 40 seconds)
2025-09-16 16:35:16,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:35:17,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 157.51041 ± 61.717
2025-09-16 16:35:17,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [139.93976, 144.5545, 102.8457, 153.60953, 96.183395, 187.77344, 185.11537, 317.0161, 151.30588, 96.76039]
2025-09-16 16:35:17,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 28.0, 20.0, 30.0, 19.0, 36.0, 36.0, 64.0, 29.0, 19.0]
2025-09-16 16:35:17,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 52 minutes, 37 seconds)
2025-09-16 16:37:13,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:37:14,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 148.31680 ± 31.718
2025-09-16 16:37:14,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [112.60902, 135.04387, 148.55281, 205.7783, 146.60767, 127.32336, 129.47128, 176.69475, 107.57282, 193.51405]
2025-09-16 16:37:14,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 26.0, 29.0, 40.0, 29.0, 25.0, 25.0, 34.0, 21.0, 38.0]
2025-09-16 16:37:14,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 50 minutes, 40 seconds)
2025-09-16 16:39:10,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:39:10,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 128.19630 ± 29.992
2025-09-16 16:39:10,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [119.24891, 119.413284, 116.5217, 161.44171, 101.73205, 113.83087, 112.121605, 119.949524, 206.20076, 111.50252]
2025-09-16 16:39:10,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 23.0, 31.0, 20.0, 22.0, 22.0, 23.0, 42.0, 22.0]
2025-09-16 16:39:10,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 48 minutes, 44 seconds)
2025-09-16 16:41:06,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:41:07,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 177.25980 ± 85.137
2025-09-16 16:41:07,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [130.428, 400.01007, 170.92546, 118.12316, 107.38075, 179.67152, 154.48196, 123.52887, 129.68768, 258.3605]
2025-09-16 16:41:07,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 76.0, 33.0, 23.0, 21.0, 35.0, 30.0, 24.0, 25.0, 50.0]
2025-09-16 16:41:07,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 46 minutes, 46 seconds)
2025-09-16 16:43:03,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:43:03,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 177.12788 ± 71.773
2025-09-16 16:43:03,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [276.1361, 123.89364, 96.26404, 136.73997, 148.02382, 228.31876, 112.305855, 293.48114, 246.96931, 109.14613]
2025-09-16 16:43:03,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 25.0, 19.0, 27.0, 29.0, 45.0, 22.0, 58.0, 49.0, 21.0]
2025-09-16 16:43:03,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 44 minutes, 57 seconds)
2025-09-16 16:44:59,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:45:00,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 165.28061 ± 71.531
2025-09-16 16:45:00,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [89.87028, 108.574684, 157.41705, 159.39685, 204.20665, 96.31646, 344.40594, 158.2239, 210.52394, 123.87034]
2025-09-16 16:45:00,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 31.0, 31.0, 39.0, 19.0, 67.0, 31.0, 45.0, 24.0]
2025-09-16 16:45:00,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 59 seconds)
2025-09-16 16:46:56,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:46:57,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 161.13737 ± 64.329
2025-09-16 16:46:57,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [150.75574, 101.83601, 177.10573, 271.59122, 123.17998, 108.858894, 112.15725, 119.58899, 154.91432, 291.38565]
2025-09-16 16:46:57,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 20.0, 34.0, 53.0, 24.0, 21.0, 22.0, 23.0, 30.0, 56.0]
2025-09-16 16:46:57,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 41 minutes, 4 seconds)
2025-09-16 16:48:53,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:48:53,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 172.52151 ± 69.785
2025-09-16 16:48:53,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [358.60068, 112.74738, 133.13849, 148.69617, 208.34373, 179.47865, 132.48436, 130.22003, 201.86838, 119.63724]
2025-09-16 16:48:53,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 22.0, 26.0, 29.0, 40.0, 35.0, 26.0, 25.0, 39.0, 23.0]
2025-09-16 16:48:53,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 39 minutes, 5 seconds)
2025-09-16 16:50:49,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:50:50,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 194.28259 ± 61.990
2025-09-16 16:50:50,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [179.3677, 134.6947, 198.33221, 258.17053, 204.99347, 346.8556, 143.46147, 161.7735, 137.20992, 177.96675]
2025-09-16 16:50:50,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 26.0, 38.0, 50.0, 42.0, 70.0, 28.0, 31.0, 26.0, 35.0]
2025-09-16 16:50:50,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 37 minutes, 14 seconds)
2025-09-16 16:52:45,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:52:46,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 160.86484 ± 49.214
2025-09-16 16:52:46,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [119.170135, 235.76064, 139.00455, 196.13731, 141.60628, 127.503914, 108.598175, 246.47702, 187.39282, 106.9976]
2025-09-16 16:52:46,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 45.0, 27.0, 39.0, 27.0, 25.0, 21.0, 50.0, 36.0, 21.0]
2025-09-16 16:52:46,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 7 seconds)
2025-09-16 16:54:41,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:54:42,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 147.82706 ± 43.902
2025-09-16 16:54:42,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [125.260635, 159.77592, 146.54417, 108.32959, 118.866135, 152.62901, 138.25645, 112.95848, 269.8394, 145.81093]
2025-09-16 16:54:42,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 31.0, 28.0, 21.0, 23.0, 30.0, 27.0, 22.0, 54.0, 28.0]
2025-09-16 16:54:42,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 33 minutes, 4 seconds)
2025-09-16 16:56:36,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:56:37,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 141.79349 ± 36.632
2025-09-16 16:56:37,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [209.25572, 129.77808, 111.29211, 116.7282, 130.5646, 113.24548, 112.66603, 167.29329, 120.09475, 207.01657]
2025-09-16 16:56:37,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 25.0, 22.0, 23.0, 25.0, 22.0, 22.0, 32.0, 23.0, 41.0]
2025-09-16 16:56:37,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 53 seconds)
2025-09-16 16:58:32,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:58:32,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 181.88306 ± 67.644
2025-09-16 16:58:32,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [148.9089, 182.30685, 236.64851, 107.861626, 123.61336, 97.01991, 210.21413, 140.24834, 259.77908, 312.23004]
2025-09-16 16:58:32,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 35.0, 46.0, 21.0, 24.0, 19.0, 43.0, 27.0, 53.0, 64.0]
2025-09-16 16:58:32,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes, 46 seconds)
2025-09-16 17:00:26,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:00:27,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 169.36382 ± 58.695
2025-09-16 17:00:27,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [172.32437, 259.01675, 102.5515, 129.146, 184.59557, 280.869, 183.3659, 96.89283, 122.940735, 161.93553]
2025-09-16 17:00:27,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 55.0, 20.0, 25.0, 38.0, 54.0, 35.0, 19.0, 24.0, 31.0]
2025-09-16 17:00:27,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 26 minutes, 33 seconds)
2025-09-16 17:02:22,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:02:22,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 196.62622 ± 62.981
2025-09-16 17:02:22,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [142.57436, 269.52637, 182.6128, 108.20425, 160.91501, 304.56052, 175.3596, 287.47556, 174.19139, 160.84244]
2025-09-16 17:02:22,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 52.0, 35.0, 21.0, 31.0, 58.0, 34.0, 61.0, 34.0, 31.0]
2025-09-16 17:02:22,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 24 minutes, 32 seconds)
2025-09-16 17:04:17,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:04:17,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 143.45901 ± 61.233
2025-09-16 17:04:17,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [130.00502, 90.534874, 113.61244, 198.03917, 113.808334, 304.26367, 101.78766, 97.10778, 139.60997, 145.82123]
2025-09-16 17:04:17,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 18.0, 22.0, 38.0, 22.0, 63.0, 20.0, 19.0, 27.0, 28.0]
2025-09-16 17:04:17,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 22 minutes, 29 seconds)
2025-09-16 17:06:11,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:06:12,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 155.06038 ± 40.423
2025-09-16 17:06:12,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [239.53183, 153.6741, 96.9242, 126.63897, 201.85672, 170.45459, 137.41702, 147.94614, 166.7979, 109.362305]
2025-09-16 17:06:12,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [46.0, 30.0, 19.0, 25.0, 42.0, 34.0, 27.0, 29.0, 32.0, 21.0]
2025-09-16 17:06:12,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 20 minutes, 30 seconds)
2025-09-16 17:08:06,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:08:07,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 188.56819 ± 78.855
2025-09-16 17:08:07,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [113.957855, 245.8531, 103.393616, 182.02292, 370.90347, 147.3167, 102.66766, 166.4001, 239.37823, 213.78835]
2025-09-16 17:08:07,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 48.0, 20.0, 35.0, 74.0, 29.0, 20.0, 32.0, 48.0, 41.0]
2025-09-16 17:08:07,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 18 minutes, 32 seconds)
2025-09-16 17:10:02,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:10:02,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 166.66148 ± 51.804
2025-09-16 17:10:02,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.757126, 168.4498, 299.3393, 153.64328, 156.47777, 171.88432, 174.59985, 134.35748, 191.90578, 120.20025]
2025-09-16 17:10:02,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 34.0, 58.0, 30.0, 30.0, 33.0, 34.0, 26.0, 37.0, 23.0]
2025-09-16 17:10:02,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 16 minutes, 40 seconds)
2025-09-16 17:11:56,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:11:57,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 153.75717 ± 65.434
2025-09-16 17:11:57,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [155.54707, 328.3396, 102.08646, 181.50456, 107.862564, 113.48517, 150.97552, 177.52368, 90.307884, 129.93922]
2025-09-16 17:11:57,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 68.0, 20.0, 35.0, 21.0, 22.0, 29.0, 35.0, 18.0, 25.0]
2025-09-16 17:11:57,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 14 minutes, 41 seconds)
2025-09-16 17:13:52,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:13:52,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 171.31790 ± 60.167
2025-09-16 17:13:52,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [214.5237, 251.17021, 192.75621, 286.5298, 147.19225, 95.67733, 132.99843, 106.34768, 160.68306, 125.30047]
2025-09-16 17:13:52,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 50.0, 37.0, 56.0, 29.0, 19.0, 26.0, 21.0, 32.0, 24.0]
2025-09-16 17:13:52,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 12 minutes, 50 seconds)
2025-09-16 17:15:47,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:15:48,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 156.98410 ± 34.477
2025-09-16 17:15:48,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [102.10661, 163.26944, 234.87804, 151.8742, 155.10344, 123.23775, 166.22115, 183.65735, 128.87976, 160.61333]
2025-09-16 17:15:48,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 31.0, 46.0, 29.0, 31.0, 24.0, 34.0, 36.0, 25.0, 31.0]
2025-09-16 17:15:48,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 11 minutes)
2025-09-16 17:17:42,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:17:42,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 142.42444 ± 36.867
2025-09-16 17:17:42,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [117.37165, 195.06012, 179.08264, 113.753586, 149.28877, 96.815414, 150.66576, 123.55798, 201.19612, 97.452225]
2025-09-16 17:17:42,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 38.0, 34.0, 22.0, 29.0, 19.0, 29.0, 24.0, 41.0, 19.0]
2025-09-16 17:17:42,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 9 minutes, 1 second)
2025-09-16 17:19:36,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:19:37,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 151.73915 ± 23.127
2025-09-16 17:19:37,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [150.96153, 181.3083, 183.66386, 170.42465, 133.3446, 123.60799, 151.13219, 119.75172, 173.4177, 129.77892]
2025-09-16 17:19:37,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 35.0, 36.0, 33.0, 26.0, 24.0, 30.0, 23.0, 34.0, 25.0]
2025-09-16 17:19:37,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 7 minutes, 4 seconds)
2025-09-16 17:21:31,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:21:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 120.32378 ± 19.882
2025-09-16 17:21:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [165.18552, 95.99838, 132.05528, 141.38066, 124.78535, 108.14513, 108.990036, 115.084915, 109.12682, 102.48576]
2025-09-16 17:21:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 19.0, 26.0, 28.0, 24.0, 21.0, 21.0, 22.0, 21.0, 20.0]
2025-09-16 17:21:32,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 minutes, 8 seconds)
2025-09-16 17:23:26,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:23:26,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 127.09816 ± 22.380
2025-09-16 17:23:26,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [102.95391, 155.74464, 120.19646, 163.46056, 131.04622, 102.874115, 155.40877, 123.73689, 113.84772, 101.712234]
2025-09-16 17:23:26,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 30.0, 23.0, 33.0, 25.0, 20.0, 30.0, 24.0, 22.0, 20.0]
2025-09-16 17:23:26,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 3 minutes, 9 seconds)
2025-09-16 17:25:21,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:25:21,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 144.80649 ± 41.634
2025-09-16 17:25:21,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [155.6195, 190.70685, 160.37479, 220.10634, 102.8574, 118.007256, 102.80752, 114.86727, 186.2818, 96.43622]
2025-09-16 17:25:21,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 37.0, 31.0, 46.0, 20.0, 23.0, 20.0, 22.0, 37.0, 19.0]
2025-09-16 17:25:21,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 12 seconds)
2025-09-16 17:27:15,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:27:16,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 158.25626 ± 46.185
2025-09-16 17:27:16,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [148.40178, 175.95757, 97.52896, 183.61407, 160.60017, 272.0789, 168.929, 126.30273, 113.98998, 135.15953]
2025-09-16 17:27:16,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 34.0, 19.0, 35.0, 31.0, 53.0, 32.0, 25.0, 22.0, 26.0]
2025-09-16 17:27:16,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 59 minutes, 18 seconds)
2025-09-16 17:29:10,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:29:11,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 152.94785 ± 51.260
2025-09-16 17:29:11,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [223.86357, 192.46437, 118.7211, 167.53517, 255.6498, 118.56949, 123.66985, 108.252556, 103.58622, 117.16624]
2025-09-16 17:29:11,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 37.0, 23.0, 33.0, 52.0, 23.0, 24.0, 21.0, 20.0, 23.0]
2025-09-16 17:29:11,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 57 minutes, 23 seconds)
2025-09-16 17:31:05,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:31:05,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 153.26321 ± 60.797
2025-09-16 17:31:05,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [155.95078, 235.3744, 166.13887, 109.095955, 295.36554, 114.132805, 105.127716, 124.8059, 118.53285, 108.10731]
2025-09-16 17:31:05,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 46.0, 32.0, 21.0, 61.0, 22.0, 21.0, 24.0, 23.0, 21.0]
2025-09-16 17:31:05,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 55 minutes, 26 seconds)
2025-09-16 17:33:00,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:33:00,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 154.28642 ± 32.603
2025-09-16 17:33:00,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [119.53539, 156.7221, 141.81247, 180.4649, 166.5445, 102.61322, 130.76797, 139.78816, 216.08232, 188.53319]
2025-09-16 17:33:00,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 30.0, 28.0, 35.0, 32.0, 20.0, 25.0, 27.0, 42.0, 37.0]
2025-09-16 17:33:00,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 33 seconds)
2025-09-16 17:34:55,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:34:55,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 180.94530 ± 83.690
2025-09-16 17:34:55,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [130.91217, 90.11595, 101.37503, 203.59969, 348.65057, 319.99365, 158.01285, 188.6795, 133.52768, 134.58595]
2025-09-16 17:34:55,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 18.0, 20.0, 39.0, 71.0, 65.0, 30.0, 36.0, 26.0, 26.0]
2025-09-16 17:34:55,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 37 seconds)
2025-09-16 17:36:49,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:36:50,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 160.37756 ± 49.354
2025-09-16 17:36:50,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [244.06946, 128.17816, 220.89148, 135.54016, 130.67369, 133.72337, 103.299065, 226.57977, 108.72874, 172.0918]
2025-09-16 17:36:50,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 25.0, 46.0, 26.0, 25.0, 26.0, 20.0, 48.0, 21.0, 33.0]
2025-09-16 17:36:50,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 44 seconds)
2025-09-16 17:38:44,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:38:45,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 136.42572 ± 28.763
2025-09-16 17:38:45,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [169.63635, 114.24481, 114.2824, 129.5248, 96.744125, 170.92186, 102.47728, 168.609, 127.98022, 169.83626]
2025-09-16 17:38:45,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 22.0, 22.0, 25.0, 19.0, 33.0, 20.0, 33.0, 25.0, 33.0]
2025-09-16 17:38:45,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 50 seconds)
2025-09-16 17:40:39,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:40:40,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 149.54933 ± 48.859
2025-09-16 17:40:40,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [167.1162, 130.58064, 122.34565, 213.05876, 129.98695, 101.84338, 252.36995, 174.15575, 108.17107, 95.865005]
2025-09-16 17:40:40,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 25.0, 24.0, 41.0, 25.0, 20.0, 49.0, 34.0, 21.0, 19.0]
2025-09-16 17:40:40,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 58 seconds)
2025-09-16 17:42:34,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:42:35,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 155.31044 ± 52.000
2025-09-16 17:42:35,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [159.72041, 124.75539, 107.54875, 135.98482, 107.07314, 123.797844, 248.42651, 139.28973, 260.64536, 145.86235]
2025-09-16 17:42:35,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 24.0, 21.0, 26.0, 21.0, 24.0, 49.0, 27.0, 53.0, 29.0]
2025-09-16 17:42:35,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 44 minutes, 2 seconds)
2025-09-16 17:44:29,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:44:29,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 149.03691 ± 45.052
2025-09-16 17:44:29,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [113.36194, 95.92898, 250.58755, 160.74033, 176.35526, 112.870186, 169.74297, 96.07209, 144.56253, 170.14737]
2025-09-16 17:44:29,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 19.0, 50.0, 31.0, 34.0, 22.0, 33.0, 19.0, 28.0, 33.0]
2025-09-16 17:44:29,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 42 minutes, 6 seconds)
2025-09-16 17:46:24,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:46:24,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 168.49634 ± 63.301
2025-09-16 17:46:24,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [125.232834, 132.11763, 186.55486, 112.0298, 128.31734, 265.10153, 135.56992, 306.49567, 172.67677, 120.867004]
2025-09-16 17:46:24,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 26.0, 36.0, 22.0, 25.0, 52.0, 27.0, 62.0, 34.0, 23.0]
2025-09-16 17:46:24,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 40 minutes, 12 seconds)
2025-09-16 17:48:19,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:48:19,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 148.43530 ± 52.289
2025-09-16 17:48:19,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [129.54802, 108.618675, 142.01608, 129.40456, 143.11488, 187.26582, 101.93536, 135.94577, 290.2356, 116.26826]
2025-09-16 17:48:19,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 28.0, 25.0, 28.0, 37.0, 20.0, 26.0, 59.0, 23.0]
2025-09-16 17:48:19,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 16 seconds)
2025-09-16 17:50:13,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:50:14,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 129.68106 ± 27.348
2025-09-16 17:50:14,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [103.48269, 108.38489, 107.49626, 148.14122, 130.50967, 140.83516, 116.344696, 198.4618, 108.5108, 134.64328]
2025-09-16 17:50:14,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 21.0, 21.0, 29.0, 25.0, 27.0, 23.0, 39.0, 21.0, 26.0]
2025-09-16 17:50:14,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 20 seconds)
2025-09-16 17:52:08,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:52:08,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 141.07025 ± 21.213
2025-09-16 17:52:08,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [154.5357, 134.75563, 123.5538, 129.77728, 136.4112, 133.1961, 178.37807, 106.9507, 137.54362, 175.60043]
2025-09-16 17:52:08,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 26.0, 24.0, 25.0, 27.0, 26.0, 34.0, 21.0, 27.0, 34.0]
2025-09-16 17:52:08,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 23 seconds)
2025-09-16 17:54:02,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:54:03,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 150.88635 ± 51.118
2025-09-16 17:54:03,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.589455, 157.36655, 107.171005, 198.92068, 229.01874, 102.897766, 148.04492, 129.18765, 102.62079, 238.04594]
2025-09-16 17:54:03,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 30.0, 21.0, 39.0, 45.0, 20.0, 29.0, 25.0, 20.0, 47.0]
2025-09-16 17:54:03,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 29 seconds)
2025-09-16 17:55:57,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:55:57,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 121.42991 ± 23.083
2025-09-16 17:55:57,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [89.94161, 107.543274, 151.92249, 152.66684, 146.50163, 118.609726, 89.80017, 136.14192, 119.2318, 101.93952]
2025-09-16 17:55:57,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 30.0, 30.0, 28.0, 23.0, 18.0, 27.0, 23.0, 20.0]
2025-09-16 17:55:57,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 32 seconds)
2025-09-16 17:57:51,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:57:52,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 137.13718 ± 31.086
2025-09-16 17:57:52,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [123.869774, 169.83865, 134.66193, 125.6896, 122.03169, 120.700714, 114.804146, 217.20912, 107.68692, 134.87927]
2025-09-16 17:57:52,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 33.0, 26.0, 25.0, 24.0, 23.0, 22.0, 43.0, 21.0, 26.0]
2025-09-16 17:57:52,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 37 seconds)
2025-09-16 17:59:46,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:59:46,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 164.59856 ± 37.755
2025-09-16 17:59:46,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [127.6167, 252.27258, 158.6227, 210.65468, 157.87299, 168.06198, 124.40303, 163.79047, 129.34047, 153.35016]
2025-09-16 17:59:46,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 53.0, 31.0, 43.0, 31.0, 32.0, 24.0, 31.0, 25.0, 30.0]
2025-09-16 17:59:46,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 43 seconds)
2025-09-16 18:01:41,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:01:41,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 142.11398 ± 48.606
2025-09-16 18:01:41,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [102.58988, 114.693985, 192.09062, 239.24294, 113.89508, 109.932144, 118.5324, 96.17345, 124.31679, 209.67267]
2025-09-16 18:01:41,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 39.0, 47.0, 22.0, 21.0, 23.0, 19.0, 24.0, 41.0]
2025-09-16 18:01:41,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 49 seconds)
2025-09-16 18:03:35,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:03:36,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 144.85725 ± 34.270
2025-09-16 18:03:36,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [190.97784, 200.05354, 140.98155, 101.84206, 113.737625, 186.50542, 125.99044, 107.10342, 150.33134, 131.04935]
2025-09-16 18:03:36,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 39.0, 27.0, 20.0, 22.0, 36.0, 24.0, 21.0, 29.0, 25.0]
2025-09-16 18:03:36,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 55 seconds)
2025-09-16 18:05:30,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:05:30,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 138.01273 ± 26.695
2025-09-16 18:05:30,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [140.52522, 178.27017, 163.52802, 114.268906, 144.50122, 113.44922, 108.581764, 96.03645, 159.82076, 161.14534]
2025-09-16 18:05:30,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 35.0, 33.0, 22.0, 29.0, 22.0, 21.0, 19.0, 32.0, 32.0]
2025-09-16 18:05:30,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes)
2025-09-16 18:07:24,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:07:25,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 145.48534 ± 22.152
2025-09-16 18:07:25,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [140.72847, 178.87668, 144.62883, 182.1575, 124.163635, 130.35161, 157.62263, 108.73801, 154.67715, 132.90897]
2025-09-16 18:07:25,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 35.0, 28.0, 36.0, 24.0, 25.0, 31.0, 21.0, 30.0, 26.0]
2025-09-16 18:07:25,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 5 seconds)
2025-09-16 18:09:18,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:09:19,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 134.90207 ± 32.984
2025-09-16 18:09:19,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [96.902855, 125.46981, 113.471405, 210.53735, 102.50543, 135.96277, 135.80525, 175.77322, 114.12008, 138.4726]
2025-09-16 18:09:19,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 22.0, 44.0, 20.0, 26.0, 26.0, 34.0, 22.0, 27.0]
2025-09-16 18:09:19,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 10 seconds)
2025-09-16 18:11:13,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:11:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 160.46718 ± 54.350
2025-09-16 18:11:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [135.15552, 219.89214, 209.6651, 130.47252, 159.48743, 119.2308, 109.40909, 276.46326, 96.062706, 148.83318]
2025-09-16 18:11:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 47.0, 40.0, 25.0, 31.0, 23.0, 21.0, 52.0, 19.0, 29.0]
2025-09-16 18:11:14,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 16 seconds)
2025-09-16 18:13:08,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:13:08,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 139.98386 ± 33.989
2025-09-16 18:13:08,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [119.79652, 108.207016, 96.618515, 145.94829, 197.27165, 178.6445, 185.52525, 108.948524, 139.00485, 119.87344]
2025-09-16 18:13:08,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 21.0, 19.0, 28.0, 41.0, 35.0, 36.0, 21.0, 27.0, 23.0]
2025-09-16 18:13:08,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 21 seconds)
2025-09-16 18:15:02,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:15:03,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 150.46333 ± 59.233
2025-09-16 18:15:03,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [124.37206, 305.70963, 109.35392, 212.39285, 130.38597, 107.17942, 125.88481, 150.03209, 119.626, 119.69655]
2025-09-16 18:15:03,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 62.0, 21.0, 43.0, 25.0, 21.0, 24.0, 29.0, 23.0, 23.0]
2025-09-16 18:15:03,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 27 seconds)
2025-09-16 18:16:57,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:16:58,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 141.65115 ± 36.617
2025-09-16 18:16:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.28369, 123.05133, 170.3308, 222.08989, 108.13898, 107.21905, 116.49021, 149.86592, 142.79294, 175.24866]
2025-09-16 18:16:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 24.0, 33.0, 43.0, 21.0, 21.0, 23.0, 29.0, 27.0, 35.0]
2025-09-16 18:16:58,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 33 seconds)
2025-09-16 18:18:52,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:18:52,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 162.16554 ± 59.409
2025-09-16 18:18:52,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [97.029175, 294.88498, 226.47977, 102.467964, 167.50597, 140.01942, 204.59879, 139.81789, 130.30003, 118.55142]
2025-09-16 18:18:52,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 57.0, 46.0, 20.0, 32.0, 27.0, 40.0, 27.0, 25.0, 23.0]
2025-09-16 18:18:52,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 38 seconds)
2025-09-16 18:20:46,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:20:47,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 150.02896 ± 53.580
2025-09-16 18:20:47,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [152.89268, 130.59059, 135.29674, 160.69907, 101.237724, 286.89355, 102.92324, 111.85832, 197.95877, 119.93905]
2025-09-16 18:20:47,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 25.0, 26.0, 32.0, 20.0, 58.0, 20.0, 22.0, 38.0, 23.0]
2025-09-16 18:20:47,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 43 seconds)
2025-09-16 18:22:41,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:22:41,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 167.27365 ± 66.276
2025-09-16 18:22:41,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [136.2287, 243.081, 113.3974, 278.65228, 146.46678, 139.51443, 123.85831, 108.22544, 107.5646, 275.7477]
2025-09-16 18:22:41,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 48.0, 22.0, 54.0, 28.0, 27.0, 24.0, 21.0, 21.0, 53.0]
2025-09-16 18:22:41,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 49 seconds)
2025-09-16 18:24:36,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:24:36,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 175.90715 ± 62.236
2025-09-16 18:24:36,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [138.82603, 191.11444, 299.1792, 122.92504, 196.0008, 129.43707, 160.82288, 277.8831, 121.94712, 120.93595]
2025-09-16 18:24:36,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 37.0, 61.0, 24.0, 38.0, 25.0, 32.0, 56.0, 24.0, 24.0]
2025-09-16 18:24:36,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 54 seconds)
2025-09-16 18:26:30,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:26:31,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 165.20810 ± 69.077
2025-09-16 18:26:31,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [178.2489, 157.70146, 139.51553, 163.99046, 89.813774, 102.340576, 125.225525, 148.65045, 350.60425, 195.99013]
2025-09-16 18:26:31,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 30.0, 27.0, 32.0, 18.0, 20.0, 24.0, 29.0, 68.0, 38.0]
2025-09-16 18:26:31,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1251 [DEBUG]: Training session finished
