2025-09-16 15:08:05,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.100-delay_24
2025-09-16 15:08:05,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.100-delay_24
2025-09-16 15:08:05,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x1462dfa14850>}
2025-09-16 15:08:05,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 15:08:05,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 15:08:05,860 baseline-bpql-noisepromille100-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 15:08:05,860 baseline-bpql-noisepromille100-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 15:08:07,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 15:08:07,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 15:09:59,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:10:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 198.93013 ± 99.982
2025-09-16 15:10:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [155.49487, 138.1408, 143.87134, 335.69952, 130.13863, 134.54001, 293.52002, 407.87424, 114.18754, 135.83426]
2025-09-16 15:10:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 27.0, 28.0, 69.0, 25.0, 26.0, 57.0, 78.0, 22.0, 26.0]
2025-09-16 15:10:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (198.93) for latency 24
2025-09-16 15:10:00,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 6 minutes)
2025-09-16 15:12:01,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:12:01,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 187.27986 ± 110.567
2025-09-16 15:12:01,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [177.42966, 108.68791, 476.45926, 157.69327, 125.47705, 305.15698, 120.36809, 136.20932, 151.50668, 113.81041]
2025-09-16 15:12:01,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 21.0, 98.0, 31.0, 24.0, 60.0, 23.0, 26.0, 29.0, 22.0]
2025-09-16 15:12:01,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 11 minutes, 8 seconds)
2025-09-16 15:14:01,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:14:02,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 198.12032 ± 131.038
2025-09-16 15:14:02,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [562.2786, 184.17943, 119.17063, 289.79697, 125.70727, 125.69479, 140.81787, 178.86008, 134.61395, 120.08351]
2025-09-16 15:14:02,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 36.0, 23.0, 57.0, 24.0, 24.0, 27.0, 34.0, 26.0, 23.0]
2025-09-16 15:14:02,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 11 minutes, 18 seconds)
2025-09-16 15:16:00,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:16:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 241.72476 ± 72.613
2025-09-16 15:16:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [230.48795, 145.25618, 319.5652, 291.2819, 274.6615, 114.25741, 297.63605, 182.0875, 342.96445, 219.04965]
2025-09-16 15:16:01,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 28.0, 63.0, 60.0, 54.0, 22.0, 59.0, 35.0, 69.0, 42.0]
2025-09-16 15:16:01,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (241.72) for latency 24
2025-09-16 15:16:01,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 9 minutes, 34 seconds)
2025-09-16 15:18:01,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:18:02,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 258.84808 ± 116.496
2025-09-16 15:18:02,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [393.52274, 439.40015, 191.81497, 164.19627, 215.72124, 397.9337, 185.26976, 140.53531, 350.97635, 109.1107]
2025-09-16 15:18:02,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 84.0, 37.0, 31.0, 41.0, 80.0, 36.0, 27.0, 78.0, 21.0]
2025-09-16 15:18:02,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (258.85) for latency 24
2025-09-16 15:18:02,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 8 minutes, 15 seconds)
2025-09-16 15:20:01,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:20:01,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 175.42317 ± 41.487
2025-09-16 15:20:01,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [151.15843, 182.82114, 169.49953, 151.36348, 130.88878, 145.19542, 155.61667, 161.51233, 238.80693, 267.3691]
2025-09-16 15:20:01,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 35.0, 33.0, 29.0, 25.0, 28.0, 30.0, 31.0, 46.0, 53.0]
2025-09-16 15:20:01,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 8 minutes, 30 seconds)
2025-09-16 15:22:02,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:22:03,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 179.91563 ± 68.899
2025-09-16 15:22:03,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [152.11717, 343.18277, 124.327324, 217.45409, 128.63277, 261.14056, 129.33597, 168.08096, 139.49594, 135.38892]
2025-09-16 15:22:03,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 73.0, 24.0, 41.0, 25.0, 55.0, 25.0, 33.0, 27.0, 26.0]
2025-09-16 15:22:03,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 6 minutes, 28 seconds)
2025-09-16 15:24:05,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:24:06,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 159.86778 ± 59.861
2025-09-16 15:24:06,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.83838, 170.09554, 160.85832, 186.0656, 324.93515, 146.20622, 141.06784, 114.39561, 120.091446, 115.123795]
2025-09-16 15:24:06,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 33.0, 31.0, 36.0, 67.0, 28.0, 27.0, 22.0, 23.0, 22.0]
2025-09-16 15:24:06,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 5 minutes, 7 seconds)
2025-09-16 15:26:08,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:26:09,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 184.70193 ± 77.042
2025-09-16 15:26:09,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [130.48921, 146.36989, 108.25779, 145.41756, 125.27223, 356.6134, 261.96695, 135.61678, 174.86548, 262.1501]
2025-09-16 15:26:09,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 28.0, 21.0, 28.0, 24.0, 74.0, 50.0, 26.0, 34.0, 49.0]
2025-09-16 15:26:09,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 4 minutes, 22 seconds)
2025-09-16 15:28:12,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:28:12,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 173.39268 ± 76.792
2025-09-16 15:28:12,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [109.09225, 169.00003, 156.4132, 113.01644, 161.63503, 157.58952, 385.10037, 210.67345, 162.57771, 108.82882]
2025-09-16 15:28:12,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 32.0, 30.0, 22.0, 31.0, 30.0, 77.0, 41.0, 31.0, 21.0]
2025-09-16 15:28:12,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 3 minutes, 15 seconds)
2025-09-16 15:30:14,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:30:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 167.76756 ± 44.642
2025-09-16 15:30:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [250.25098, 117.466805, 197.61151, 170.65988, 219.89282, 115.08674, 103.54854, 160.43672, 171.04123, 171.68045]
2025-09-16 15:30:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 23.0, 38.0, 33.0, 43.0, 22.0, 20.0, 31.0, 33.0, 33.0]
2025-09-16 15:30:15,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 1 minute, 53 seconds)
2025-09-16 15:32:14,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:32:15,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 188.05843 ± 56.931
2025-09-16 15:32:15,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [176.86073, 314.03687, 162.85344, 196.70578, 236.44125, 119.3694, 137.47113, 135.43896, 161.98247, 239.42435]
2025-09-16 15:32:15,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 63.0, 31.0, 38.0, 46.0, 23.0, 27.0, 26.0, 31.0, 49.0]
2025-09-16 15:32:15,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 59 minutes, 38 seconds)
2025-09-16 15:34:14,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:34:14,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 168.57397 ± 65.676
2025-09-16 15:34:14,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [267.9164, 118.40328, 158.85722, 129.80878, 125.46068, 161.09978, 140.31471, 319.20813, 108.66544, 156.00537]
2025-09-16 15:34:14,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 23.0, 31.0, 25.0, 24.0, 31.0, 27.0, 65.0, 21.0, 30.0]
2025-09-16 15:34:14,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 56 minutes, 31 seconds)
2025-09-16 15:36:14,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:36:14,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 158.81270 ± 56.446
2025-09-16 15:36:14,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [113.4155, 139.90324, 136.00023, 113.9153, 187.91582, 160.65208, 137.92603, 124.45207, 315.14328, 158.80336]
2025-09-16 15:36:14,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 27.0, 26.0, 22.0, 36.0, 31.0, 27.0, 24.0, 63.0, 31.0]
2025-09-16 15:36:14,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 53 minutes, 33 seconds)
2025-09-16 15:38:13,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:38:14,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 244.42368 ± 97.790
2025-09-16 15:38:14,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [241.07367, 294.15094, 197.97906, 355.59616, 128.73949, 173.12991, 155.20757, 129.9268, 381.95468, 386.47855]
2025-09-16 15:38:14,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 57.0, 38.0, 68.0, 25.0, 33.0, 30.0, 25.0, 72.0, 79.0]
2025-09-16 15:38:14,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 50 minutes, 21 seconds)
2025-09-16 15:40:14,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:40:14,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 144.92076 ± 24.347
2025-09-16 15:40:14,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [137.65054, 171.10858, 166.21512, 119.242905, 160.5839, 132.17815, 167.12096, 96.09169, 168.86339, 130.1523]
2025-09-16 15:40:14,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 33.0, 32.0, 23.0, 31.0, 25.0, 32.0, 19.0, 33.0, 25.0]
2025-09-16 15:40:14,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 55 seconds)
2025-09-16 15:42:14,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:42:14,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 245.28194 ± 102.966
2025-09-16 15:42:14,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [338.1184, 129.5591, 377.3582, 367.09003, 141.28857, 296.2588, 125.18759, 135.04933, 199.44525, 343.4641]
2025-09-16 15:42:14,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 25.0, 72.0, 71.0, 27.0, 58.0, 24.0, 26.0, 38.0, 68.0]
2025-09-16 15:42:14,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 45 minutes, 47 seconds)
2025-09-16 15:44:13,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:44:14,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 197.15227 ± 103.456
2025-09-16 15:44:14,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [125.60226, 408.5187, 139.55293, 395.92978, 154.69179, 175.22366, 158.76912, 134.56161, 144.11528, 134.5575]
2025-09-16 15:44:14,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 81.0, 27.0, 84.0, 30.0, 34.0, 31.0, 26.0, 28.0, 26.0]
2025-09-16 15:44:14,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 53 seconds)
2025-09-16 15:46:13,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:46:14,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 158.91357 ± 20.275
2025-09-16 15:46:14,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [140.56831, 157.0192, 184.59113, 149.0442, 124.4499, 180.93626, 176.98679, 145.63733, 145.38974, 184.51286]
2025-09-16 15:46:14,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 30.0, 37.0, 29.0, 24.0, 35.0, 34.0, 28.0, 28.0, 36.0]
2025-09-16 15:46:14,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 52 seconds)
2025-09-16 15:48:12,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:48:13,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 228.92053 ± 123.470
2025-09-16 15:48:13,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [194.94687, 108.67025, 365.8673, 135.55846, 125.479645, 133.66176, 493.78244, 255.77568, 334.06354, 141.39934]
2025-09-16 15:48:13,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 21.0, 72.0, 26.0, 24.0, 26.0, 96.0, 51.0, 67.0, 27.0]
2025-09-16 15:48:13,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 45 seconds)
2025-09-16 15:50:11,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:50:11,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 149.91640 ± 29.752
2025-09-16 15:50:11,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [153.24913, 158.03545, 108.4009, 139.32259, 134.68294, 114.05054, 130.46812, 208.5631, 186.72997, 165.66125]
2025-09-16 15:50:11,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 31.0, 21.0, 27.0, 26.0, 22.0, 25.0, 41.0, 36.0, 32.0]
2025-09-16 15:50:11,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 9 seconds)
2025-09-16 15:52:09,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:52:10,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 181.38451 ± 56.149
2025-09-16 15:52:10,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [169.78452, 260.5629, 206.83311, 133.65186, 157.61836, 303.89966, 129.24841, 140.39651, 177.46582, 134.3839]
2025-09-16 15:52:10,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 54.0, 41.0, 26.0, 31.0, 60.0, 25.0, 27.0, 35.0, 26.0]
2025-09-16 15:52:10,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 46 seconds)
2025-09-16 15:54:08,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:54:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 170.12912 ± 50.565
2025-09-16 15:54:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [124.544266, 183.19724, 185.31331, 174.21909, 125.34987, 119.74474, 299.32983, 133.95389, 192.10675, 163.53217]
2025-09-16 15:54:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 36.0, 36.0, 34.0, 24.0, 23.0, 59.0, 26.0, 37.0, 32.0]
2025-09-16 15:54:08,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 33 seconds)
2025-09-16 15:56:07,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:56:07,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 150.25854 ± 26.723
2025-09-16 15:56:07,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.290955, 119.05438, 150.17474, 159.8092, 159.11943, 137.43167, 180.03758, 124.15229, 145.92514, 207.59003]
2025-09-16 15:56:07,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 29.0, 31.0, 31.0, 27.0, 35.0, 24.0, 28.0, 41.0]
2025-09-16 15:56:07,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 19 seconds)
2025-09-16 15:58:05,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:58:06,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 160.71281 ± 56.076
2025-09-16 15:58:06,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [129.55507, 120.27759, 168.7269, 119.65786, 141.24443, 165.54922, 158.95576, 164.18156, 120.08467, 318.89502]
2025-09-16 15:58:06,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 23.0, 33.0, 23.0, 27.0, 33.0, 31.0, 32.0, 23.0, 66.0]
2025-09-16 15:58:06,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 28 minutes, 15 seconds)
2025-09-16 16:00:03,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:00:04,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 158.60501 ± 35.024
2025-09-16 16:00:04,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [134.8675, 246.73544, 123.983925, 176.8981, 167.49335, 140.67036, 163.16281, 150.74834, 115.42772, 166.0625]
2025-09-16 16:00:04,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 49.0, 24.0, 35.0, 32.0, 27.0, 32.0, 29.0, 22.0, 33.0]
2025-09-16 16:00:04,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 14 seconds)
2025-09-16 16:02:02,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:02:03,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 160.64746 ± 48.741
2025-09-16 16:02:03,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [165.63516, 134.0581, 177.72943, 108.13375, 181.93503, 164.6746, 129.64886, 286.09436, 149.78285, 108.78243]
2025-09-16 16:02:03,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 26.0, 35.0, 21.0, 37.0, 32.0, 25.0, 60.0, 29.0, 21.0]
2025-09-16 16:02:03,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 20 seconds)
2025-09-16 16:04:00,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:04:00,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 146.32185 ± 21.936
2025-09-16 16:04:00,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [151.1491, 150.16046, 113.472595, 131.43146, 167.04767, 160.6843, 169.28717, 109.06445, 136.0358, 174.88554]
2025-09-16 16:04:00,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 22.0, 25.0, 32.0, 32.0, 33.0, 21.0, 26.0, 34.0]
2025-09-16 16:04:00,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 5 seconds)
2025-09-16 16:05:56,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:05:56,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 148.17928 ± 21.702
2025-09-16 16:05:56,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [161.9351, 175.34512, 148.04858, 153.17508, 108.62869, 119.39177, 140.25352, 183.36621, 150.95238, 140.69624]
2025-09-16 16:05:56,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 35.0, 29.0, 30.0, 21.0, 23.0, 27.0, 36.0, 29.0, 27.0]
2025-09-16 16:05:56,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 28 seconds)
2025-09-16 16:07:52,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:07:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 161.85892 ± 21.551
2025-09-16 16:07:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [155.48543, 124.352295, 148.25099, 169.8832, 161.99811, 155.9935, 158.52705, 163.3709, 215.61046, 165.11716]
2025-09-16 16:07:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 24.0, 29.0, 33.0, 31.0, 30.0, 31.0, 32.0, 43.0, 33.0]
2025-09-16 16:07:52,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 16 minutes, 48 seconds)
2025-09-16 16:09:46,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:09:47,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 149.57460 ± 19.181
2025-09-16 16:09:47,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [149.7621, 186.48305, 156.0764, 125.01157, 125.04095, 146.0262, 125.28833, 166.7632, 162.19043, 153.10376]
2025-09-16 16:09:47,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 37.0, 31.0, 24.0, 24.0, 28.0, 24.0, 33.0, 32.0, 30.0]
2025-09-16 16:09:47,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 14 minutes, 3 seconds)
2025-09-16 16:11:42,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:11:42,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 140.57834 ± 16.522
2025-09-16 16:11:42,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [149.9352, 135.31909, 158.04445, 134.49184, 124.525665, 167.32906, 152.74133, 129.38766, 145.13608, 108.87295]
2025-09-16 16:11:42,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 26.0, 32.0, 26.0, 24.0, 32.0, 30.0, 25.0, 28.0, 21.0]
2025-09-16 16:11:42,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 16 seconds)
2025-09-16 16:13:37,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:13:37,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 152.33049 ± 22.510
2025-09-16 16:13:37,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [150.78476, 163.71089, 142.07556, 149.665, 143.20705, 134.59454, 158.35689, 151.1959, 119.72586, 209.98857]
2025-09-16 16:13:37,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 32.0, 27.0, 29.0, 28.0, 26.0, 31.0, 29.0, 23.0, 42.0]
2025-09-16 16:13:37,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 8 minutes, 51 seconds)
2025-09-16 16:15:35,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:15:36,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 158.59641 ± 19.166
2025-09-16 16:15:36,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [163.98964, 123.0505, 145.42744, 163.64291, 197.47906, 135.9421, 167.82803, 163.09167, 159.86607, 165.64662]
2025-09-16 16:15:36,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 24.0, 28.0, 32.0, 39.0, 26.0, 33.0, 32.0, 31.0, 32.0]
2025-09-16 16:15:36,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 24 seconds)
2025-09-16 16:17:30,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:17:31,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 164.64774 ± 26.135
2025-09-16 16:17:31,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [170.77893, 119.30239, 168.93097, 120.18102, 173.66385, 156.39322, 193.83125, 168.51743, 168.92175, 205.95636]
2025-09-16 16:17:31,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 23.0, 33.0, 23.0, 34.0, 31.0, 38.0, 33.0, 33.0, 40.0]
2025-09-16 16:17:31,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 5 minutes, 24 seconds)
2025-09-16 16:19:25,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:19:26,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 178.38347 ± 50.443
2025-09-16 16:19:26,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [162.13435, 171.11546, 140.152, 160.28459, 140.80951, 255.13008, 173.9964, 178.89548, 113.98265, 287.33423]
2025-09-16 16:19:26,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 34.0, 27.0, 31.0, 27.0, 50.0, 34.0, 35.0, 22.0, 59.0]
2025-09-16 16:19:26,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 3 minutes, 33 seconds)
2025-09-16 16:21:21,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:21:22,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 144.64691 ± 18.003
2025-09-16 16:21:22,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.1351, 130.65611, 145.16751, 161.79808, 155.1362, 139.16646, 173.32243, 113.91197, 151.36646, 156.80869]
2025-09-16 16:21:22,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 25.0, 28.0, 32.0, 30.0, 27.0, 34.0, 22.0, 29.0, 30.0]
2025-09-16 16:21:22,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 1 minute, 43 seconds)
2025-09-16 16:23:17,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:23:18,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 163.60191 ± 34.545
2025-09-16 16:23:18,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [155.45741, 158.86893, 144.21803, 158.2284, 117.99688, 254.0036, 161.28902, 171.51337, 180.00604, 134.43733]
2025-09-16 16:23:18,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 31.0, 28.0, 31.0, 23.0, 51.0, 32.0, 34.0, 36.0, 26.0]
2025-09-16 16:23:18,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 59 minutes, 55 seconds)
2025-09-16 16:25:13,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:25:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 138.43442 ± 19.062
2025-09-16 16:25:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [155.43881, 164.7139, 136.75203, 139.87599, 130.05151, 103.71469, 139.50945, 134.69203, 113.77464, 165.8211]
2025-09-16 16:25:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 32.0, 26.0, 27.0, 25.0, 20.0, 27.0, 26.0, 22.0, 33.0]
2025-09-16 16:25:13,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 57 minutes, 27 seconds)
2025-09-16 16:27:09,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:27:09,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 175.92441 ± 73.653
2025-09-16 16:27:09,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [112.09047, 358.86246, 148.30524, 269.34836, 155.16646, 129.14153, 125.43273, 171.55064, 138.57004, 150.77596]
2025-09-16 16:27:09,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 74.0, 29.0, 59.0, 30.0, 25.0, 24.0, 34.0, 27.0, 30.0]
2025-09-16 16:27:09,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 55 minutes, 37 seconds)
2025-09-16 16:29:06,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:29:07,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 148.24174 ± 28.127
2025-09-16 16:29:07,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [170.80296, 130.24231, 119.40478, 154.8564, 114.51905, 186.36043, 108.07081, 144.77303, 190.25456, 163.1332]
2025-09-16 16:29:07,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 25.0, 23.0, 30.0, 22.0, 36.0, 21.0, 28.0, 37.0, 32.0]
2025-09-16 16:29:07,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 54 minutes, 13 seconds)
2025-09-16 16:31:05,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:31:05,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 167.29185 ± 64.550
2025-09-16 16:31:05,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [113.777626, 144.33612, 156.13863, 165.5717, 156.45934, 155.80313, 145.86523, 130.25137, 148.44371, 356.2717]
2025-09-16 16:31:05,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 28.0, 30.0, 33.0, 31.0, 30.0, 28.0, 25.0, 29.0, 75.0]
2025-09-16 16:31:05,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 52 minutes, 48 seconds)
2025-09-16 16:33:03,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:33:03,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 175.34390 ± 48.547
2025-09-16 16:33:03,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [168.84644, 150.4826, 160.26808, 164.26086, 183.06425, 135.84116, 299.13907, 220.13896, 120.49749, 150.90028]
2025-09-16 16:33:03,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 29.0, 31.0, 33.0, 36.0, 26.0, 59.0, 48.0, 23.0, 29.0]
2025-09-16 16:33:03,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 51 minutes, 17 seconds)
2025-09-16 16:34:59,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:35:00,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 149.48175 ± 12.756
2025-09-16 16:35:00,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [137.92624, 139.65585, 177.415, 153.4132, 134.30861, 150.56331, 145.9486, 166.9189, 144.74261, 143.92513]
2025-09-16 16:35:00,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 27.0, 35.0, 30.0, 26.0, 29.0, 28.0, 32.0, 28.0, 28.0]
2025-09-16 16:35:00,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 25 seconds)
2025-09-16 16:36:56,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:36:57,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 207.89883 ± 86.237
2025-09-16 16:36:57,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [138.97461, 214.12175, 283.92038, 114.35684, 135.80057, 155.7945, 410.15717, 181.82182, 171.27974, 272.76083]
2025-09-16 16:36:57,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 43.0, 56.0, 22.0, 26.0, 31.0, 82.0, 37.0, 34.0, 53.0]
2025-09-16 16:36:57,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 47 minutes, 42 seconds)
2025-09-16 16:38:52,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:38:52,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 152.99197 ± 14.326
2025-09-16 16:38:52,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [158.57133, 153.8815, 166.27934, 145.39874, 165.13654, 162.90738, 124.630356, 157.32741, 166.13033, 129.65675]
2025-09-16 16:38:52,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 30.0, 33.0, 28.0, 32.0, 32.0, 24.0, 31.0, 33.0, 25.0]
2025-09-16 16:38:52,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 22 seconds)
2025-09-16 16:40:47,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:40:47,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 145.10971 ± 19.192
2025-09-16 16:40:47,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [171.22665, 144.0788, 151.94511, 159.90944, 135.65154, 113.92892, 108.268936, 155.05777, 155.79314, 155.23691]
2025-09-16 16:40:47,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 28.0, 30.0, 31.0, 26.0, 22.0, 21.0, 30.0, 30.0, 30.0]
2025-09-16 16:40:47,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 48 seconds)
2025-09-16 16:42:41,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:42:42,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 144.33101 ± 23.376
2025-09-16 16:42:42,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [138.87708, 113.86411, 169.9828, 178.3724, 160.42772, 114.1277, 118.81958, 134.39359, 171.99516, 142.44994]
2025-09-16 16:42:42,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 33.0, 35.0, 31.0, 22.0, 23.0, 26.0, 34.0, 28.0]
2025-09-16 16:42:42,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 40 minutes, 12 seconds)
2025-09-16 16:44:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:44:37,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 149.72733 ± 25.431
2025-09-16 16:44:37,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [108.946236, 165.49403, 166.14285, 124.89097, 156.81978, 113.9422, 172.05789, 135.20079, 185.63988, 168.13863]
2025-09-16 16:44:37,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 33.0, 32.0, 24.0, 31.0, 22.0, 35.0, 26.0, 37.0, 33.0]
2025-09-16 16:44:37,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 38 minutes, 6 seconds)
2025-09-16 16:46:31,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:46:32,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 145.46735 ± 16.352
2025-09-16 16:46:32,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [114.745445, 150.51602, 163.66809, 156.17836, 125.38625, 158.6614, 160.01852, 152.43675, 125.10288, 147.95967]
2025-09-16 16:46:32,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 29.0, 32.0, 31.0, 24.0, 31.0, 32.0, 29.0, 24.0, 29.0]
2025-09-16 16:46:32,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 35 minutes, 51 seconds)
2025-09-16 16:48:27,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:48:27,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 138.67595 ± 18.783
2025-09-16 16:48:27,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [144.4949, 146.56213, 107.66442, 153.80785, 157.04959, 119.00056, 138.3782, 161.2433, 149.65988, 108.89866]
2025-09-16 16:48:27,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 21.0, 30.0, 31.0, 23.0, 27.0, 31.0, 29.0, 21.0]
2025-09-16 16:48:27,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 33 minutes, 55 seconds)
2025-09-16 16:50:22,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:50:22,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 146.84787 ± 23.850
2025-09-16 16:50:22,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [151.9746, 123.121796, 140.04858, 150.74352, 150.0794, 206.42433, 119.64816, 163.10472, 134.21046, 129.12318]
2025-09-16 16:50:22,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 24.0, 27.0, 30.0, 29.0, 43.0, 23.0, 33.0, 26.0, 25.0]
2025-09-16 16:50:22,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 4 seconds)
2025-09-16 16:52:17,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:52:18,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 145.08435 ± 23.890
2025-09-16 16:52:18,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [149.2934, 113.641945, 174.9339, 128.74239, 102.93551, 170.10446, 173.74094, 139.85805, 137.29938, 160.29375]
2025-09-16 16:52:18,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 22.0, 35.0, 25.0, 20.0, 33.0, 34.0, 27.0, 27.0, 31.0]
2025-09-16 16:52:18,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 15 seconds)
2025-09-16 16:54:12,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:54:13,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 145.80113 ± 22.665
2025-09-16 16:54:13,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [160.94127, 114.583115, 148.55571, 140.06706, 168.20798, 185.06332, 140.65363, 159.73914, 107.701775, 132.49828]
2025-09-16 16:54:13,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 22.0, 29.0, 27.0, 33.0, 36.0, 27.0, 31.0, 21.0, 26.0]
2025-09-16 16:54:13,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes, 19 seconds)
2025-09-16 16:56:08,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:56:08,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 147.83511 ± 13.497
2025-09-16 16:56:08,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [163.2034, 119.35172, 155.19064, 148.03009, 143.19975, 149.89325, 139.01112, 135.4759, 166.60114, 158.39418]
2025-09-16 16:56:08,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 23.0, 30.0, 29.0, 28.0, 29.0, 27.0, 26.0, 33.0, 31.0]
2025-09-16 16:56:08,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 26 minutes, 29 seconds)
2025-09-16 16:58:03,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:58:04,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 148.31572 ± 24.436
2025-09-16 16:58:04,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [153.0043, 151.84361, 164.25613, 159.38818, 125.2612, 158.09428, 118.61957, 102.635086, 191.41795, 158.6369]
2025-09-16 16:58:04,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 33.0, 31.0, 24.0, 31.0, 23.0, 20.0, 37.0, 31.0]
2025-09-16 16:58:04,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 24 minutes, 32 seconds)
2025-09-16 16:59:58,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:59:59,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 142.26567 ± 17.898
2025-09-16 16:59:59,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [144.04738, 145.14276, 119.24123, 125.43854, 190.05522, 138.58972, 134.55298, 141.40958, 145.33514, 138.84428]
2025-09-16 16:59:59,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 28.0, 23.0, 24.0, 39.0, 27.0, 26.0, 27.0, 28.0, 27.0]
2025-09-16 16:59:59,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 22 minutes, 34 seconds)
2025-09-16 17:01:53,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:01:53,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 139.09547 ± 21.011
2025-09-16 17:01:53,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [164.54022, 152.3598, 154.24503, 108.547386, 114.01787, 114.60353, 169.02592, 124.58195, 148.66098, 140.37212]
2025-09-16 17:01:53,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 30.0, 30.0, 21.0, 22.0, 22.0, 33.0, 24.0, 29.0, 27.0]
2025-09-16 17:01:53,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 20 minutes, 35 seconds)
2025-09-16 17:03:48,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:03:49,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 142.61055 ± 18.461
2025-09-16 17:03:49,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [146.06851, 154.67247, 159.45865, 158.62712, 125.364624, 138.1624, 114.6466, 144.52078, 170.63954, 113.944664]
2025-09-16 17:03:49,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 30.0, 31.0, 31.0, 24.0, 27.0, 22.0, 28.0, 34.0, 22.0]
2025-09-16 17:03:49,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 18 minutes, 42 seconds)
2025-09-16 17:05:43,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:05:44,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 138.83871 ± 14.265
2025-09-16 17:05:44,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [154.52267, 151.83794, 148.89615, 153.67613, 114.057396, 123.517105, 151.66588, 125.14854, 135.55876, 129.50664]
2025-09-16 17:05:44,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 29.0, 30.0, 22.0, 24.0, 30.0, 24.0, 26.0, 25.0]
2025-09-16 17:05:44,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 16 minutes, 41 seconds)
2025-09-16 17:07:39,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:07:39,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 141.28772 ± 20.215
2025-09-16 17:07:39,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [163.15633, 170.06346, 114.433754, 108.38707, 135.78592, 119.65495, 154.45511, 140.09468, 151.31294, 155.53299]
2025-09-16 17:07:39,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 33.0, 22.0, 21.0, 26.0, 23.0, 30.0, 27.0, 29.0, 30.0]
2025-09-16 17:07:39,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 14 minutes, 46 seconds)
2025-09-16 17:09:33,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:09:34,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 150.02837 ± 29.227
2025-09-16 17:09:34,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [162.31499, 159.76804, 138.87315, 119.641266, 213.29404, 164.60577, 128.79755, 124.34578, 113.842575, 174.80057]
2025-09-16 17:09:34,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 27.0, 23.0, 42.0, 32.0, 25.0, 24.0, 22.0, 34.0]
2025-09-16 17:09:34,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 12 minutes, 51 seconds)
2025-09-16 17:11:29,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:11:29,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 145.93716 ± 20.466
2025-09-16 17:11:29,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [144.19144, 108.66335, 163.36844, 134.55553, 176.25302, 139.76797, 120.54513, 145.16704, 154.81177, 172.04785]
2025-09-16 17:11:29,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 21.0, 33.0, 26.0, 35.0, 27.0, 23.0, 28.0, 30.0, 34.0]
2025-09-16 17:11:29,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 11 minutes, 1 second)
2025-09-16 17:13:23,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:13:24,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 139.17593 ± 24.203
2025-09-16 17:13:24,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [155.81392, 171.10783, 163.19475, 119.2618, 108.79501, 125.13408, 96.43585, 152.7366, 138.7287, 160.55093]
2025-09-16 17:13:24,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 35.0, 32.0, 23.0, 21.0, 24.0, 19.0, 31.0, 27.0, 32.0]
2025-09-16 17:13:24,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 9 minutes)
2025-09-16 17:15:18,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:15:19,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 148.29514 ± 22.608
2025-09-16 17:15:19,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [190.31743, 130.68759, 114.001114, 159.97363, 169.57048, 167.37973, 134.69283, 129.75407, 156.07657, 130.49808]
2025-09-16 17:15:19,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 25.0, 22.0, 31.0, 35.0, 33.0, 26.0, 25.0, 30.0, 25.0]
2025-09-16 17:15:19,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 7 minutes, 5 seconds)
2025-09-16 17:17:13,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:17:14,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 134.21645 ± 20.168
2025-09-16 17:17:14,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [108.73818, 129.5041, 162.13252, 143.66429, 119.0849, 149.17491, 114.40349, 164.87714, 142.36652, 108.21841]
2025-09-16 17:17:14,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 25.0, 32.0, 28.0, 23.0, 29.0, 22.0, 33.0, 28.0, 21.0]
2025-09-16 17:17:14,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 minutes, 9 seconds)
2025-09-16 17:19:08,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:19:08,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 143.29785 ± 18.775
2025-09-16 17:19:08,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [148.58766, 169.27596, 124.108055, 123.571686, 141.07893, 114.45562, 178.22827, 143.27135, 144.67339, 145.72772]
2025-09-16 17:19:08,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 34.0, 24.0, 24.0, 27.0, 22.0, 35.0, 28.0, 28.0, 28.0]
2025-09-16 17:19:08,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 3 minutes, 11 seconds)
2025-09-16 17:21:03,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:21:04,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 144.76450 ± 17.523
2025-09-16 17:21:04,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [145.15474, 134.5585, 175.46996, 114.73997, 160.58278, 147.19827, 129.82289, 129.13684, 164.65767, 146.3233]
2025-09-16 17:21:04,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 26.0, 36.0, 22.0, 31.0, 29.0, 25.0, 25.0, 32.0, 28.0]
2025-09-16 17:21:04,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 16 seconds)
2025-09-16 17:22:58,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:22:58,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 145.78023 ± 24.236
2025-09-16 17:22:58,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [169.23138, 144.63011, 125.10439, 129.38686, 186.46045, 178.74774, 128.18669, 113.9605, 124.95781, 157.13634]
2025-09-16 17:22:58,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 28.0, 24.0, 25.0, 37.0, 36.0, 25.0, 22.0, 24.0, 31.0]
2025-09-16 17:22:58,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 59 minutes, 21 seconds)
2025-09-16 17:24:53,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:24:53,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 145.23349 ± 29.061
2025-09-16 17:24:53,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [154.5495, 130.21025, 157.81743, 217.92352, 113.80123, 151.90254, 138.56128, 129.74413, 149.42157, 108.40362]
2025-09-16 17:24:53,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 25.0, 31.0, 44.0, 22.0, 30.0, 27.0, 25.0, 29.0, 21.0]
2025-09-16 17:24:53,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 57 minutes, 29 seconds)
2025-09-16 17:26:48,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:26:48,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 129.81923 ± 18.062
2025-09-16 17:26:48,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [123.358635, 154.91368, 114.7503, 113.66589, 125.67772, 150.58333, 149.36497, 148.87929, 108.604, 108.39444]
2025-09-16 17:26:48,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 30.0, 22.0, 22.0, 24.0, 29.0, 29.0, 29.0, 21.0, 21.0]
2025-09-16 17:26:48,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 55 minutes, 30 seconds)
2025-09-16 17:28:43,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:28:44,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 156.16887 ± 60.120
2025-09-16 17:28:44,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [149.45007, 145.9853, 101.885025, 135.79358, 120.483604, 191.46593, 134.49193, 145.72868, 113.7073, 322.6973]
2025-09-16 17:28:44,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 28.0, 20.0, 26.0, 23.0, 38.0, 26.0, 29.0, 22.0, 66.0]
2025-09-16 17:28:44,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 43 seconds)
2025-09-16 17:30:39,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:30:39,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 141.20578 ± 19.800
2025-09-16 17:30:39,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [168.6682, 160.43457, 124.408966, 114.33898, 135.64435, 134.16882, 113.62997, 165.40018, 159.56258, 135.80103]
2025-09-16 17:30:39,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 31.0, 24.0, 22.0, 26.0, 26.0, 22.0, 32.0, 31.0, 26.0]
2025-09-16 17:30:39,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 48 seconds)
2025-09-16 17:32:33,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:32:34,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 183.74551 ± 85.839
2025-09-16 17:32:34,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [157.67969, 167.40356, 144.08055, 139.99603, 164.8734, 138.54588, 177.09305, 438.82166, 158.3528, 150.6085]
2025-09-16 17:32:34,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 33.0, 28.0, 27.0, 32.0, 27.0, 35.0, 86.0, 31.0, 29.0]
2025-09-16 17:32:34,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 54 seconds)
2025-09-16 17:34:29,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:34:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 156.85925 ± 56.034
2025-09-16 17:34:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [312.06955, 139.43463, 163.5556, 125.31165, 102.31001, 144.21005, 118.809975, 172.95596, 165.44833, 124.48663]
2025-09-16 17:34:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 27.0, 33.0, 24.0, 20.0, 28.0, 23.0, 34.0, 32.0, 24.0]
2025-09-16 17:34:29,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 58 seconds)
2025-09-16 17:36:24,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:36:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 144.32309 ± 18.600
2025-09-16 17:36:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [144.01845, 170.96619, 156.53856, 135.839, 163.42328, 108.94061, 160.64081, 124.95503, 128.63882, 149.27017]
2025-09-16 17:36:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 33.0, 31.0, 26.0, 32.0, 21.0, 32.0, 24.0, 25.0, 29.0]
2025-09-16 17:36:25,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 7 seconds)
2025-09-16 17:38:19,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:38:19,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 163.69009 ± 46.101
2025-09-16 17:38:19,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [114.55393, 165.46167, 276.6906, 161.04103, 119.505775, 148.74313, 161.53217, 162.89558, 207.27345, 119.20377]
2025-09-16 17:38:19,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 32.0, 54.0, 32.0, 23.0, 29.0, 32.0, 32.0, 41.0, 23.0]
2025-09-16 17:38:19,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 44 minutes, 7 seconds)
2025-09-16 17:40:14,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:40:14,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 154.04395 ± 47.499
2025-09-16 17:40:14,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [143.47299, 142.99069, 119.613304, 123.89854, 119.23581, 170.19424, 159.67155, 124.33055, 149.47414, 287.55765]
2025-09-16 17:40:14,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 28.0, 23.0, 24.0, 23.0, 34.0, 31.0, 24.0, 29.0, 58.0]
2025-09-16 17:40:14,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 42 minutes, 10 seconds)
2025-09-16 17:42:10,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:42:10,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 158.90996 ± 11.024
2025-09-16 17:42:10,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [154.58669, 153.64381, 154.68715, 170.64009, 172.52512, 170.37825, 151.73167, 168.17952, 157.41685, 135.31047]
2025-09-16 17:42:10,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 30.0, 32.0, 33.0, 34.0, 33.0, 30.0, 33.0, 31.0, 26.0]
2025-09-16 17:42:10,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 40 minutes, 20 seconds)
2025-09-16 17:44:05,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:44:06,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 148.78796 ± 18.918
2025-09-16 17:44:06,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [155.93446, 167.3626, 134.57834, 151.08864, 128.63805, 163.91039, 125.29899, 129.76, 186.36926, 144.9388]
2025-09-16 17:44:06,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 33.0, 26.0, 29.0, 25.0, 33.0, 24.0, 25.0, 39.0, 28.0]
2025-09-16 17:44:06,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 26 seconds)
2025-09-16 17:46:01,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:46:02,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 160.61305 ± 48.650
2025-09-16 17:46:02,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [171.88058, 293.53452, 117.65382, 150.06398, 158.21486, 153.02757, 157.96132, 102.46587, 162.21313, 139.11491]
2025-09-16 17:46:02,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 60.0, 23.0, 29.0, 31.0, 30.0, 31.0, 20.0, 32.0, 27.0]
2025-09-16 17:46:02,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 32 seconds)
2025-09-16 17:47:57,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:47:58,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 145.39035 ± 19.373
2025-09-16 17:47:58,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [143.823, 129.82701, 125.151794, 143.38925, 122.6774, 171.32692, 123.694565, 164.99692, 177.16179, 151.85497]
2025-09-16 17:47:58,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 25.0, 24.0, 28.0, 24.0, 34.0, 24.0, 33.0, 35.0, 30.0]
2025-09-16 17:47:58,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 41 seconds)
2025-09-16 17:49:52,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:49:53,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 159.50818 ± 35.584
2025-09-16 17:49:53,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [154.66003, 135.40114, 124.83897, 149.6704, 164.01991, 152.71025, 129.67125, 155.0721, 171.33331, 257.7044]
2025-09-16 17:49:53,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 26.0, 24.0, 29.0, 33.0, 30.0, 25.0, 30.0, 33.0, 52.0]
2025-09-16 17:49:53,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 46 seconds)
2025-09-16 17:51:47,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:51:48,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 152.38826 ± 19.136
2025-09-16 17:51:48,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [153.37497, 120.19708, 134.3166, 188.17348, 163.193, 162.96329, 152.11351, 169.74664, 130.52504, 149.27899]
2025-09-16 17:51:48,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 23.0, 26.0, 38.0, 32.0, 32.0, 30.0, 33.0, 25.0, 30.0]
2025-09-16 17:51:48,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 47 seconds)
2025-09-16 17:53:42,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:53:43,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 139.93803 ± 19.827
2025-09-16 17:53:43,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [124.40967, 114.07164, 149.17444, 111.73312, 159.21304, 134.38055, 124.42362, 170.56877, 161.25496, 150.15057]
2025-09-16 17:53:43,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 29.0, 22.0, 31.0, 26.0, 24.0, 34.0, 32.0, 29.0]
2025-09-16 17:53:43,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 50 seconds)
2025-09-16 17:55:38,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:55:39,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 143.23389 ± 24.456
2025-09-16 17:55:39,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [102.98166, 139.11403, 145.26717, 153.27608, 118.90808, 158.62383, 123.640305, 138.78209, 196.79965, 154.94594]
2025-09-16 17:55:39,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 27.0, 28.0, 30.0, 23.0, 31.0, 24.0, 27.0, 38.0, 30.0]
2025-09-16 17:55:39,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 56 seconds)
2025-09-16 17:57:33,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:57:34,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 143.31860 ± 14.863
2025-09-16 17:57:34,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [166.01138, 124.30509, 129.39526, 130.46425, 125.31295, 152.03012, 140.68549, 155.77121, 163.4543, 145.75581]
2025-09-16 17:57:34,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 24.0, 25.0, 25.0, 24.0, 30.0, 27.0, 30.0, 34.0, 28.0]
2025-09-16 17:57:34,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 58 seconds)
2025-09-16 17:59:28,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:59:29,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 138.42485 ± 24.505
2025-09-16 17:59:29,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [133.13544, 157.1911, 150.82965, 166.71439, 114.39231, 103.141365, 108.71745, 118.91541, 172.94049, 158.27103]
2025-09-16 17:59:29,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 30.0, 29.0, 33.0, 22.0, 20.0, 21.0, 23.0, 34.0, 31.0]
2025-09-16 17:59:29,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 2 seconds)
2025-09-16 18:01:23,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:01:24,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 195.46498 ± 69.549
2025-09-16 18:01:24,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [172.46976, 152.13138, 206.03566, 325.43582, 124.80311, 211.52736, 153.33705, 130.40608, 322.51987, 155.98367]
2025-09-16 18:01:24,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 30.0, 43.0, 65.0, 24.0, 43.0, 30.0, 25.0, 67.0, 30.0]
2025-09-16 18:01:24,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 7 seconds)
2025-09-16 18:03:19,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:03:19,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 162.97299 ± 79.669
2025-09-16 18:03:19,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [125.16174, 137.99248, 398.8551, 148.88467, 135.82649, 142.13397, 107.995544, 130.44543, 156.24283, 146.19179]
2025-09-16 18:03:19,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 27.0, 82.0, 29.0, 26.0, 28.0, 21.0, 25.0, 31.0, 29.0]
2025-09-16 18:03:19,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 13 seconds)
2025-09-16 18:05:14,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:05:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 141.19058 ± 22.749
2025-09-16 18:05:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [158.84534, 145.57396, 144.5505, 158.3457, 114.91548, 146.09529, 163.75125, 108.454056, 168.72392, 102.65031]
2025-09-16 18:05:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 28.0, 28.0, 31.0, 22.0, 29.0, 32.0, 21.0, 33.0, 20.0]
2025-09-16 18:05:15,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 16 seconds)
2025-09-16 18:07:07,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:07:08,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 139.60855 ± 12.579
2025-09-16 18:07:08,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [118.61212, 152.74898, 125.796974, 130.70079, 162.5335, 135.20117, 144.45187, 139.60568, 135.99963, 150.43465]
2025-09-16 18:07:08,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 30.0, 24.0, 25.0, 32.0, 26.0, 28.0, 27.0, 26.0, 29.0]
2025-09-16 18:07:08,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 18 seconds)
2025-09-16 18:09:01,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:09:01,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 148.56119 ± 16.223
2025-09-16 18:09:01,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [165.81851, 159.60992, 129.39555, 153.65553, 128.94705, 163.21346, 157.43115, 136.0176, 123.76525, 167.75795]
2025-09-16 18:09:01,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 25.0, 30.0, 25.0, 32.0, 31.0, 26.0, 24.0, 33.0]
2025-09-16 18:09:01,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 21 seconds)
2025-09-16 18:10:57,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:10:58,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 146.33107 ± 19.627
2025-09-16 18:10:58,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.68538, 150.71852, 150.47906, 181.26276, 164.54181, 148.66592, 144.2132, 108.45469, 140.36598, 154.92331]
2025-09-16 18:10:58,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 29.0, 29.0, 36.0, 32.0, 29.0, 28.0, 21.0, 27.0, 30.0]
2025-09-16 18:10:58,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 28 seconds)
2025-09-16 18:12:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:12:56,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 155.37697 ± 51.677
2025-09-16 18:12:56,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.20867, 125.46337, 114.16383, 171.85323, 156.16214, 160.35863, 108.37676, 118.083664, 192.25882, 287.8406]
2025-09-16 18:12:56,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 24.0, 22.0, 34.0, 31.0, 31.0, 21.0, 23.0, 39.0, 59.0]
2025-09-16 18:12:56,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 36 seconds)
2025-09-16 18:14:52,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:14:52,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 162.76643 ± 16.655
2025-09-16 18:14:52,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [158.10876, 141.14519, 197.23256, 140.44756, 163.14436, 161.96338, 175.55403, 177.95549, 148.52106, 163.59201]
2025-09-16 18:14:52,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 39.0, 27.0, 32.0, 32.0, 34.0, 36.0, 29.0, 32.0]
2025-09-16 18:14:52,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 41 seconds)
2025-09-16 18:16:47,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:16:47,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 140.75032 ± 20.132
2025-09-16 18:16:47,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [156.32248, 145.39386, 157.68272, 150.06511, 119.69941, 114.24843, 155.83768, 129.15286, 108.68852, 170.41205]
2025-09-16 18:16:47,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 28.0, 31.0, 29.0, 23.0, 22.0, 31.0, 25.0, 21.0, 34.0]
2025-09-16 18:16:47,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 47 seconds)
2025-09-16 18:18:42,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:18:42,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 134.36162 ± 11.159
2025-09-16 18:18:42,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [114.19101, 130.03311, 133.51418, 124.27801, 157.8163, 144.71706, 140.94032, 130.36467, 134.08333, 133.67816]
2025-09-16 18:18:42,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 25.0, 26.0, 24.0, 31.0, 28.0, 27.0, 25.0, 26.0, 26.0]
2025-09-16 18:18:42,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 52 seconds)
2025-09-16 18:20:35,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:20:36,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 149.09958 ± 15.342
2025-09-16 18:20:36,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [154.5979, 163.60371, 155.32475, 130.23303, 144.5014, 163.23967, 143.8762, 164.81422, 114.81115, 155.99371]
2025-09-16 18:20:36,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 33.0, 30.0, 25.0, 28.0, 32.0, 28.0, 33.0, 22.0, 31.0]
2025-09-16 18:20:36,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 55 seconds)
2025-09-16 18:22:28,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:22:29,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 142.33981 ± 12.249
2025-09-16 18:22:29,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [148.68483, 146.46202, 147.09981, 138.0016, 158.8868, 148.75699, 119.37932, 148.81345, 120.11423, 147.19893]
2025-09-16 18:22:29,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 29.0, 27.0, 31.0, 29.0, 23.0, 29.0, 23.0, 29.0]
2025-09-16 18:22:29,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1251 [DEBUG]: Training session finished
