2025-09-16 14:46:38,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.100-delay_21
2025-09-16 14:46:38,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.100-delay_21
2025-09-16 14:46:38,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x151d16320950>}
2025-09-16 14:46:38,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:46:38,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:46:38,029 baseline-bpql-noisepromille100-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=733, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:46:38,029 baseline-bpql-noisepromille100-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:46:39,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:46:39,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:48:32,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:48:33,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 291.28915 ± 136.953
2025-09-16 14:48:33,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [461.2072, 505.6819, 166.47267, 379.10727, 240.11034, 119.264, 403.96133, 139.97212, 153.74733, 343.36752]
2025-09-16 14:48:33,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 97.0, 33.0, 73.0, 47.0, 23.0, 74.0, 27.0, 30.0, 66.0]
2025-09-16 14:48:33,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (291.29) for latency 21
2025-09-16 14:48:33,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 7 minutes, 32 seconds)
2025-09-16 14:50:34,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:50:35,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 257.99481 ± 124.865
2025-09-16 14:50:35,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [130.73065, 317.30173, 155.2056, 114.19974, 380.54572, 371.54028, 393.6517, 151.39674, 434.4436, 130.9323]
2025-09-16 14:50:35,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 61.0, 30.0, 22.0, 72.0, 75.0, 72.0, 29.0, 85.0, 25.0]
2025-09-16 14:50:35,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 12 minutes, 16 seconds)
2025-09-16 14:52:34,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:52:34,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 211.62387 ± 120.955
2025-09-16 14:52:34,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [139.89906, 114.18498, 226.38306, 129.73479, 374.5601, 357.95566, 108.297485, 431.14584, 120.01702, 114.06079]
2025-09-16 14:52:34,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 44.0, 25.0, 72.0, 71.0, 21.0, 81.0, 23.0, 22.0]
2025-09-16 14:52:34,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 11 minutes, 19 seconds)
2025-09-16 14:54:34,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:54:34,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 261.60236 ± 122.846
2025-09-16 14:54:34,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [354.87704, 186.64758, 492.43268, 247.84602, 287.32184, 135.33138, 144.87842, 204.51146, 437.40884, 124.76834]
2025-09-16 14:54:34,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 36.0, 102.0, 49.0, 56.0, 26.0, 28.0, 39.0, 89.0, 24.0]
2025-09-16 14:54:34,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 10 minutes, 3 seconds)
2025-09-16 14:56:33,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:56:33,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 185.97946 ± 42.003
2025-09-16 14:56:33,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [190.60081, 257.50604, 166.42055, 135.3739, 223.41321, 177.52208, 136.35086, 251.59082, 157.07701, 163.93921]
2025-09-16 14:56:33,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 51.0, 33.0, 26.0, 43.0, 34.0, 26.0, 50.0, 30.0, 32.0]
2025-09-16 14:56:33,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 8 minutes, 3 seconds)
2025-09-16 14:58:33,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:58:34,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 206.95317 ± 49.748
2025-09-16 14:58:34,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [177.33362, 181.26283, 170.53374, 257.53387, 134.67989, 214.54343, 190.9441, 278.24527, 171.28232, 293.1725]
2025-09-16 14:58:34,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 35.0, 33.0, 52.0, 26.0, 42.0, 37.0, 57.0, 33.0, 59.0]
2025-09-16 14:58:34,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 8 minutes, 16 seconds)
2025-09-16 15:00:36,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:00:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 219.51913 ± 103.049
2025-09-16 15:00:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [147.5043, 150.04625, 376.00446, 356.58862, 236.32646, 119.820465, 377.8362, 128.78409, 155.0903, 147.19044]
2025-09-16 15:00:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 74.0, 76.0, 47.0, 23.0, 78.0, 25.0, 30.0, 28.0]
2025-09-16 15:00:36,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 6 minutes, 26 seconds)
2025-09-16 15:02:37,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:02:38,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 184.16687 ± 99.016
2025-09-16 15:02:38,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [370.27344, 119.84388, 130.19792, 191.71545, 382.12085, 108.78423, 108.9436, 130.8267, 165.6323, 133.33028]
2025-09-16 15:02:38,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 23.0, 25.0, 37.0, 76.0, 21.0, 21.0, 25.0, 32.0, 26.0]
2025-09-16 15:02:38,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 5 minutes, 6 seconds)
2025-09-16 15:04:37,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:04:37,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 159.82440 ± 37.687
2025-09-16 15:04:37,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [183.56786, 144.92542, 238.20308, 113.18539, 203.46843, 149.09267, 143.72707, 165.76242, 147.88864, 108.42298]
2025-09-16 15:04:37,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 28.0, 48.0, 22.0, 40.0, 29.0, 28.0, 33.0, 29.0, 21.0]
2025-09-16 15:04:37,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 2 minutes, 52 seconds)
2025-09-16 15:06:35,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:06:36,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 187.68982 ± 77.175
2025-09-16 15:06:36,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [120.427055, 162.66345, 201.38994, 317.0672, 140.47597, 124.75503, 267.0956, 119.57241, 309.68546, 113.76595]
2025-09-16 15:06:36,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 31.0, 39.0, 62.0, 27.0, 24.0, 52.0, 23.0, 61.0, 22.0]
2025-09-16 15:06:36,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 47 seconds)
2025-09-16 15:08:33,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:08:34,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 162.16452 ± 64.019
2025-09-16 15:08:34,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [187.03426, 137.94748, 145.81294, 341.70477, 138.61926, 171.94023, 113.81828, 113.734024, 151.69814, 119.335686]
2025-09-16 15:08:34,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 27.0, 28.0, 68.0, 27.0, 33.0, 22.0, 22.0, 29.0, 23.0]
2025-09-16 15:08:34,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 57 minutes, 56 seconds)
2025-09-16 15:10:32,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:10:32,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 217.15813 ± 69.844
2025-09-16 15:10:32,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [319.48825, 166.61546, 210.42789, 149.7866, 130.16695, 167.68385, 299.65924, 315.39636, 158.96823, 253.38852]
2025-09-16 15:10:32,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 33.0, 41.0, 29.0, 25.0, 32.0, 60.0, 61.0, 31.0, 50.0]
2025-09-16 15:10:32,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes, 49 seconds)
2025-09-16 15:12:30,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:12:30,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 161.33124 ± 45.843
2025-09-16 15:12:30,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.27468, 137.82587, 150.64043, 123.346664, 278.7602, 208.5432, 154.40836, 142.4837, 134.65579, 163.3735]
2025-09-16 15:12:30,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 27.0, 29.0, 24.0, 55.0, 41.0, 30.0, 28.0, 26.0, 33.0]
2025-09-16 15:12:30,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 45 seconds)
2025-09-16 15:14:27,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:14:28,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 134.47806 ± 21.788
2025-09-16 15:14:28,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [134.75084, 139.21199, 108.67455, 107.73892, 124.84649, 125.45528, 153.20653, 140.7588, 186.14558, 123.99141]
2025-09-16 15:14:28,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 21.0, 21.0, 24.0, 24.0, 30.0, 27.0, 36.0, 24.0]
2025-09-16 15:14:28,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 49 minutes, 14 seconds)
2025-09-16 15:16:24,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:16:24,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 132.39297 ± 9.930
2025-09-16 15:16:24,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [126.53076, 113.44277, 144.857, 139.50017, 142.79323, 123.30492, 143.39412, 124.26106, 130.88905, 134.95683]
2025-09-16 15:16:24,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 28.0, 27.0, 28.0, 24.0, 28.0, 24.0, 26.0, 26.0]
2025-09-16 15:16:24,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 46 minutes, 43 seconds)
2025-09-16 15:18:20,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:18:21,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 139.14243 ± 10.370
2025-09-16 15:18:21,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [142.2849, 129.20375, 138.64905, 148.0303, 141.64082, 141.06392, 118.71777, 127.912025, 150.29605, 153.62578]
2025-09-16 15:18:21,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 25.0, 27.0, 29.0, 28.0, 28.0, 23.0, 25.0, 30.0, 30.0]
2025-09-16 15:18:21,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 44 minutes, 25 seconds)
2025-09-16 15:20:17,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:20:17,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 141.38211 ± 24.621
2025-09-16 15:20:17,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [153.22461, 95.903534, 157.31184, 108.84288, 114.77614, 176.63281, 159.0129, 147.1237, 156.83498, 144.15787]
2025-09-16 15:20:17,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 31.0, 21.0, 22.0, 35.0, 31.0, 30.0, 31.0, 28.0]
2025-09-16 15:20:17,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 41 minutes, 55 seconds)
2025-09-16 15:22:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:22:19,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 274.49762 ± 133.924
2025-09-16 15:22:19,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [113.24119, 317.73538, 340.50742, 124.89724, 135.4173, 401.4836, 302.23288, 542.1162, 316.00055, 151.34448]
2025-09-16 15:22:19,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 61.0, 66.0, 24.0, 26.0, 77.0, 61.0, 111.0, 62.0, 29.0]
2025-09-16 15:22:19,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 48 seconds)
2025-09-16 15:24:19,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:24:20,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 324.63556 ± 102.770
2025-09-16 15:24:20,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [435.35663, 396.10135, 135.19492, 309.01617, 256.53506, 153.3795, 381.76413, 401.23148, 408.448, 369.32837]
2025-09-16 15:24:20,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 74.0, 26.0, 57.0, 52.0, 30.0, 72.0, 76.0, 77.0, 69.0]
2025-09-16 15:24:20,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (324.64) for latency 21
2025-09-16 15:24:20,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 39 minutes, 50 seconds)
2025-09-16 15:26:20,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:26:21,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 269.96515 ± 145.058
2025-09-16 15:26:21,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [140.28648, 114.36179, 124.080795, 389.48584, 566.20123, 140.67203, 359.9697, 275.1016, 394.73062, 194.7611]
2025-09-16 15:26:21,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 24.0, 72.0, 112.0, 27.0, 66.0, 53.0, 72.0, 37.0]
2025-09-16 15:26:21,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 9 seconds)
2025-09-16 15:28:22,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:28:22,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 239.42007 ± 113.109
2025-09-16 15:28:22,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [269.2206, 147.00047, 120.045654, 146.07677, 117.75428, 124.01239, 369.93924, 348.48813, 350.57852, 401.08463]
2025-09-16 15:28:22,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 28.0, 23.0, 28.0, 23.0, 24.0, 68.0, 65.0, 65.0, 75.0]
2025-09-16 15:28:22,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 38 minutes, 23 seconds)
2025-09-16 15:30:23,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:30:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 343.37180 ± 129.417
2025-09-16 15:30:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [540.1188, 282.41397, 440.52707, 286.5663, 141.40659, 417.18512, 113.6862, 369.90546, 439.12683, 402.7815]
2025-09-16 15:30:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 54.0, 86.0, 54.0, 27.0, 79.0, 22.0, 68.0, 83.0, 76.0]
2025-09-16 15:30:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (343.37) for latency 21
2025-09-16 15:30:24,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 37 minutes, 41 seconds)
2025-09-16 15:32:24,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:32:25,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 310.38379 ± 189.071
2025-09-16 15:32:25,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [162.01363, 103.18861, 553.40753, 507.84583, 557.19806, 404.85007, 153.09174, 107.72198, 108.98289, 445.53772]
2025-09-16 15:32:25,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 20.0, 104.0, 96.0, 103.0, 76.0, 29.0, 21.0, 21.0, 82.0]
2025-09-16 15:32:25,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 35 minutes, 38 seconds)
2025-09-16 15:34:27,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:34:28,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 328.24808 ± 116.545
2025-09-16 15:34:28,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [381.33487, 262.9003, 409.88458, 337.37527, 130.15804, 444.67554, 330.14667, 435.05563, 113.89627, 437.05338]
2025-09-16 15:34:28,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 50.0, 79.0, 63.0, 25.0, 95.0, 61.0, 81.0, 22.0, 81.0]
2025-09-16 15:34:28,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 33 minutes, 59 seconds)
2025-09-16 15:36:29,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:36:30,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 263.27722 ± 109.171
2025-09-16 15:36:30,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [319.53226, 113.94358, 308.19962, 130.00243, 383.01556, 165.71678, 316.74612, 128.27876, 385.979, 381.3581]
2025-09-16 15:36:30,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 22.0, 59.0, 25.0, 72.0, 32.0, 63.0, 25.0, 69.0, 71.0]
2025-09-16 15:36:30,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 32 minutes, 17 seconds)
2025-09-16 15:38:34,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:38:35,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 323.98764 ± 169.844
2025-09-16 15:38:35,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [345.4633, 343.1765, 543.2999, 130.69438, 488.05157, 604.06915, 165.05246, 131.26248, 348.67535, 140.13127]
2025-09-16 15:38:35,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 63.0, 102.0, 25.0, 93.0, 114.0, 32.0, 25.0, 64.0, 27.0]
2025-09-16 15:38:35,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 30 minutes, 59 seconds)
2025-09-16 15:40:39,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:40:40,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 377.14526 ± 182.457
2025-09-16 15:40:40,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [234.5582, 349.6372, 102.92755, 375.94446, 161.9769, 516.34894, 783.3914, 392.53616, 408.52713, 445.60477]
2025-09-16 15:40:40,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 66.0, 20.0, 69.0, 31.0, 101.0, 148.0, 72.0, 74.0, 82.0]
2025-09-16 15:40:40,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (377.15) for latency 21
2025-09-16 15:40:40,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 29 minutes, 59 seconds)
2025-09-16 15:42:43,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:42:44,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 268.03085 ± 162.892
2025-09-16 15:42:44,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [124.63137, 133.77434, 398.40277, 135.50078, 466.67953, 114.63574, 564.14813, 400.1493, 222.68394, 119.702736]
2025-09-16 15:42:44,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 26.0, 72.0, 26.0, 99.0, 22.0, 104.0, 76.0, 43.0, 23.0]
2025-09-16 15:42:44,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 28 minutes, 34 seconds)
2025-09-16 15:44:48,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:44:49,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 386.48920 ± 165.065
2025-09-16 15:44:49,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [108.58165, 558.6562, 412.77075, 382.409, 119.08227, 627.90454, 317.11554, 533.9863, 465.57367, 338.81207]
2025-09-16 15:44:49,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 104.0, 76.0, 69.0, 23.0, 118.0, 60.0, 101.0, 86.0, 63.0]
2025-09-16 15:44:49,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (386.49) for latency 21
2025-09-16 15:44:50,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 27 minutes, 11 seconds)
2025-09-16 15:46:50,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:46:51,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 398.37006 ± 114.433
2025-09-16 15:46:51,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [400.52405, 336.53757, 113.87473, 519.82465, 481.43484, 545.7132, 397.54483, 421.13724, 353.01212, 414.09744]
2025-09-16 15:46:51,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 61.0, 22.0, 94.0, 89.0, 101.0, 74.0, 79.0, 66.0, 77.0]
2025-09-16 15:46:51,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (398.37) for latency 21
2025-09-16 15:46:51,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 24 minutes, 54 seconds)
2025-09-16 15:48:51,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:48:52,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 234.33765 ± 134.161
2025-09-16 15:48:52,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [167.76996, 441.3979, 103.331406, 119.619255, 146.16873, 433.67236, 172.29546, 227.72194, 422.21744, 109.18191]
2025-09-16 15:48:52,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 81.0, 20.0, 23.0, 28.0, 82.0, 33.0, 44.0, 78.0, 21.0]
2025-09-16 15:48:52,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 21 minutes, 58 seconds)
2025-09-16 15:50:53,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:50:54,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 326.17636 ± 150.294
2025-09-16 15:50:54,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [256.08142, 481.2669, 384.7612, 531.1614, 151.09631, 513.95197, 193.23967, 140.90651, 432.37646, 176.92189]
2025-09-16 15:50:54,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 92.0, 72.0, 111.0, 29.0, 97.0, 37.0, 27.0, 79.0, 34.0]
2025-09-16 15:50:54,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 19 minutes)
2025-09-16 15:52:56,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:52:57,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 402.44830 ± 95.074
2025-09-16 15:52:57,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [453.32285, 411.90033, 396.74115, 433.51776, 453.73395, 401.16922, 125.20164, 462.32556, 455.4883, 431.0823]
2025-09-16 15:52:57,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 78.0, 82.0, 80.0, 84.0, 74.0, 24.0, 87.0, 83.0, 94.0]
2025-09-16 15:52:57,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (402.45) for latency 21
2025-09-16 15:52:57,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 16 minutes, 54 seconds)
2025-09-16 15:54:57,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:54:58,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 329.04544 ± 138.600
2025-09-16 15:54:58,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [130.00078, 341.60553, 481.40186, 344.46448, 144.5516, 443.4927, 345.54932, 500.77386, 432.6459, 125.96826]
2025-09-16 15:54:58,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 63.0, 91.0, 64.0, 28.0, 82.0, 65.0, 92.0, 80.0, 24.0]
2025-09-16 15:54:58,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 52 seconds)
2025-09-16 15:56:58,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:56:59,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 299.23175 ± 157.855
2025-09-16 15:56:59,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [457.2022, 417.25598, 125.78896, 536.1147, 177.47601, 494.87244, 135.79535, 129.05656, 181.42247, 337.33267]
2025-09-16 15:56:59,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 76.0, 24.0, 110.0, 34.0, 96.0, 26.0, 25.0, 35.0, 62.0]
2025-09-16 15:56:59,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes, 35 seconds)
2025-09-16 15:58:59,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:59:00,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 223.04439 ± 134.885
2025-09-16 15:59:00,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [484.8082, 119.10416, 429.07187, 129.93881, 135.05278, 129.41821, 140.72908, 177.13744, 355.1587, 130.0245]
2025-09-16 15:59:00,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 23.0, 79.0, 25.0, 26.0, 25.0, 27.0, 34.0, 65.0, 25.0]
2025-09-16 15:59:00,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 44 seconds)
2025-09-16 16:01:00,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:01:02,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 517.13892 ± 68.774
2025-09-16 16:01:02,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [658.84674, 519.82336, 498.56848, 499.95633, 562.19116, 462.09195, 538.179, 377.18396, 502.02103, 552.5275]
2025-09-16 16:01:02,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 97.0, 91.0, 93.0, 106.0, 86.0, 101.0, 70.0, 95.0, 102.0]
2025-09-16 16:01:02,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (517.14) for latency 21
2025-09-16 16:01:02,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 44 seconds)
2025-09-16 16:03:03,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:03:04,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 320.44061 ± 146.196
2025-09-16 16:03:04,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [368.39377, 385.58313, 114.31347, 151.93405, 108.500336, 347.55078, 420.74744, 596.3829, 387.02884, 323.97168]
2025-09-16 16:03:04,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 73.0, 22.0, 29.0, 21.0, 66.0, 82.0, 113.0, 72.0, 60.0]
2025-09-16 16:03:04,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 23 seconds)
2025-09-16 16:05:04,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:05:05,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 237.14957 ± 165.543
2025-09-16 16:05:05,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [520.70544, 449.41025, 156.02353, 114.46685, 114.16517, 162.55367, 124.0599, 135.51762, 492.1187, 102.47424]
2025-09-16 16:05:05,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 88.0, 30.0, 22.0, 22.0, 31.0, 24.0, 26.0, 94.0, 20.0]
2025-09-16 16:05:05,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 25 seconds)
2025-09-16 16:07:06,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:07:07,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 381.01907 ± 215.974
2025-09-16 16:07:07,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [725.51935, 651.398, 166.16039, 505.17935, 103.22794, 187.96344, 436.58472, 336.5359, 562.709, 134.91249]
2025-09-16 16:07:07,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 123.0, 32.0, 94.0, 20.0, 36.0, 81.0, 64.0, 105.0, 26.0]
2025-09-16 16:07:07,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 39 seconds)
2025-09-16 16:09:08,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:09:09,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 325.52234 ± 140.183
2025-09-16 16:09:09,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [135.46715, 378.97006, 420.08582, 108.15826, 427.60065, 118.79648, 441.8924, 502.9219, 379.4917, 341.8391]
2025-09-16 16:09:09,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 70.0, 81.0, 21.0, 91.0, 23.0, 97.0, 92.0, 69.0, 64.0]
2025-09-16 16:09:09,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 59 minutes, 50 seconds)
2025-09-16 16:11:10,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:11:11,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 363.52353 ± 197.375
2025-09-16 16:11:11,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [145.97794, 572.8503, 519.0306, 113.03807, 618.6274, 129.01678, 478.4317, 468.05566, 124.90625, 465.30063]
2025-09-16 16:11:11,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 107.0, 95.0, 22.0, 118.0, 25.0, 88.0, 101.0, 24.0, 86.0]
2025-09-16 16:11:11,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 57 minutes, 42 seconds)
2025-09-16 16:13:11,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:13:12,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 388.25586 ± 95.503
2025-09-16 16:13:12,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [464.3716, 339.06677, 368.74588, 449.70398, 412.8027, 406.21368, 393.06668, 431.42514, 130.63396, 486.52798]
2025-09-16 16:13:12,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 63.0, 67.0, 94.0, 78.0, 77.0, 73.0, 82.0, 25.0, 91.0]
2025-09-16 16:13:13,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 39 seconds)
2025-09-16 16:15:14,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:15:15,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 358.32220 ± 177.978
2025-09-16 16:15:15,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [545.9687, 113.87146, 502.58206, 524.03046, 425.56787, 432.3195, 140.24632, 564.6088, 177.45708, 156.57018]
2025-09-16 16:15:15,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 22.0, 97.0, 99.0, 79.0, 79.0, 27.0, 110.0, 34.0, 30.0]
2025-09-16 16:15:15,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 53 seconds)
2025-09-16 16:17:16,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:17:17,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 402.26169 ± 249.867
2025-09-16 16:17:17,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [407.43222, 531.80695, 362.82703, 465.7256, 996.5586, 435.77216, 125.66532, 108.32705, 119.04372, 469.45834]
2025-09-16 16:17:17,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 99.0, 70.0, 87.0, 193.0, 78.0, 24.0, 21.0, 23.0, 87.0]
2025-09-16 16:17:17,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 48 seconds)
2025-09-16 16:19:14,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:19:15,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 438.75552 ± 133.202
2025-09-16 16:19:15,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [565.5112, 421.55038, 352.25366, 141.84631, 498.90848, 392.13672, 357.8798, 597.6126, 604.8777, 454.9784]
2025-09-16 16:19:15,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 78.0, 65.0, 27.0, 93.0, 73.0, 66.0, 111.0, 114.0, 95.0]
2025-09-16 16:19:15,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 3 seconds)
2025-09-16 16:21:13,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:21:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 395.55316 ± 176.261
2025-09-16 16:21:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [383.7022, 734.95764, 392.70126, 433.5492, 140.06792, 510.28665, 515.9477, 454.8437, 103.1124, 286.3627]
2025-09-16 16:21:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 139.0, 71.0, 83.0, 27.0, 97.0, 100.0, 84.0, 20.0, 53.0]
2025-09-16 16:21:14,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 34 seconds)
2025-09-16 16:23:12,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:23:14,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 478.49225 ± 191.619
2025-09-16 16:23:14,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [723.563, 830.0114, 331.40848, 332.60455, 546.8585, 587.0511, 418.56226, 397.8647, 476.26544, 140.73296]
2025-09-16 16:23:14,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 158.0, 61.0, 63.0, 105.0, 108.0, 90.0, 84.0, 89.0, 27.0]
2025-09-16 16:23:14,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 11 seconds)
2025-09-16 16:25:12,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:25:13,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 394.13953 ± 301.899
2025-09-16 16:25:13,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [444.43866, 889.3186, 103.33218, 193.09343, 134.69275, 543.3239, 130.02219, 458.37918, 108.66153, 936.13275]
2025-09-16 16:25:13,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 179.0, 20.0, 37.0, 26.0, 100.0, 25.0, 85.0, 21.0, 182.0]
2025-09-16 16:25:13,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 36 seconds)
2025-09-16 16:27:11,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:27:12,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 258.97473 ± 147.621
2025-09-16 16:27:12,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [163.20862, 416.22348, 145.74203, 146.2102, 520.6972, 162.881, 359.00262, 130.26585, 108.74802, 436.76825]
2025-09-16 16:27:12,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 76.0, 28.0, 28.0, 96.0, 31.0, 68.0, 25.0, 21.0, 80.0]
2025-09-16 16:27:12,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 9 seconds)
2025-09-16 16:29:10,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:29:11,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 275.59167 ± 178.682
2025-09-16 16:29:11,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [412.7641, 157.37126, 102.95702, 122.95048, 364.7505, 114.29982, 608.8723, 157.42763, 182.56801, 531.9556]
2025-09-16 16:29:11,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 30.0, 20.0, 24.0, 66.0, 22.0, 126.0, 30.0, 35.0, 103.0]
2025-09-16 16:29:11,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 16 seconds)
2025-09-16 16:31:08,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:31:09,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 307.49417 ± 119.927
2025-09-16 16:31:09,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [386.58273, 140.02469, 401.05252, 359.1869, 396.85928, 398.76904, 366.63757, 103.089584, 134.39174, 388.3476]
2025-09-16 16:31:09,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 27.0, 74.0, 67.0, 73.0, 73.0, 69.0, 20.0, 26.0, 70.0]
2025-09-16 16:31:09,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 14 seconds)
2025-09-16 16:33:06,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:33:07,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 296.73843 ± 198.128
2025-09-16 16:33:07,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [518.5589, 177.93294, 469.63623, 478.55298, 140.32956, 103.22154, 128.78525, 130.97202, 657.8492, 161.54588]
2025-09-16 16:33:07,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 34.0, 85.0, 90.0, 27.0, 20.0, 25.0, 25.0, 121.0, 31.0]
2025-09-16 16:33:07,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 1 second)
2025-09-16 16:35:10,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:35:11,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 250.61125 ± 148.422
2025-09-16 16:35:11,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [479.31302, 413.0278, 124.63164, 129.88496, 169.9272, 364.29785, 109.24564, 456.5481, 134.81422, 124.422195]
2025-09-16 16:35:11,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 78.0, 24.0, 25.0, 33.0, 66.0, 21.0, 96.0, 26.0, 24.0]
2025-09-16 16:35:11,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 44 seconds)
2025-09-16 16:37:26,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:37:27,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 378.89249 ± 219.786
2025-09-16 16:37:27,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [425.0751, 779.4205, 563.41486, 112.577774, 125.5252, 180.31024, 547.68884, 419.02548, 510.7751, 125.11155]
2025-09-16 16:37:27,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 163.0, 119.0, 22.0, 24.0, 34.0, 119.0, 76.0, 107.0, 24.0]
2025-09-16 16:37:27,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 32 minutes, 15 seconds)
2025-09-16 16:39:45,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:39:46,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 423.92682 ± 268.599
2025-09-16 16:39:46,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [690.8808, 408.3239, 145.57059, 400.97845, 178.78249, 119.27732, 585.6888, 673.0931, 122.76538, 913.90717]
2025-09-16 16:39:46,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 76.0, 28.0, 74.0, 34.0, 23.0, 106.0, 134.0, 24.0, 182.0]
2025-09-16 16:39:46,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 33 minutes, 12 seconds)
2025-09-16 16:42:03,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:42:04,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 381.41998 ± 145.441
2025-09-16 16:42:04,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [311.63538, 407.3363, 529.57605, 349.25455, 152.81479, 393.5448, 502.74985, 490.3822, 562.4058, 114.50016]
2025-09-16 16:42:04,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 87.0, 99.0, 69.0, 29.0, 72.0, 112.0, 94.0, 102.0, 22.0]
2025-09-16 16:42:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 33 minutes, 53 seconds)
2025-09-16 16:44:21,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:44:22,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 367.04025 ± 164.064
2025-09-16 16:44:22,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [473.89273, 373.28964, 125.15484, 442.22296, 458.86804, 146.83351, 573.70355, 559.49347, 387.52032, 129.42352]
2025-09-16 16:44:22,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 68.0, 24.0, 80.0, 101.0, 28.0, 105.0, 110.0, 72.0, 25.0]
2025-09-16 16:44:22,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 34 minutes, 27 seconds)
2025-09-16 16:46:42,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:46:44,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 357.64441 ± 225.879
2025-09-16 16:46:44,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [629.5389, 129.56508, 498.61508, 107.165344, 128.20535, 175.91405, 525.35974, 705.3138, 156.39098, 520.3759]
2025-09-16 16:46:44,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 25.0, 108.0, 21.0, 25.0, 34.0, 97.0, 130.0, 30.0, 93.0]
2025-09-16 16:46:44,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 34 minutes, 38 seconds)
2025-09-16 16:48:58,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:48:59,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 211.02637 ± 123.631
2025-09-16 16:48:59,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [135.70512, 124.846344, 156.78899, 136.24738, 145.61906, 97.0734, 377.9702, 398.87222, 119.96014, 417.1808]
2025-09-16 16:48:59,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 24.0, 30.0, 26.0, 28.0, 19.0, 69.0, 76.0, 23.0, 75.0]
2025-09-16 16:48:59,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 32 minutes, 16 seconds)
2025-09-16 16:51:20,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:51:21,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 359.25885 ± 214.883
2025-09-16 16:51:21,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [114.4056, 478.36032, 108.25251, 130.13069, 612.8006, 341.38434, 613.73004, 674.94464, 366.76654, 151.8129]
2025-09-16 16:51:21,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 88.0, 21.0, 25.0, 129.0, 62.0, 116.0, 124.0, 68.0, 29.0]
2025-09-16 16:51:21,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 30 minutes, 20 seconds)
2025-09-16 16:53:38,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:53:39,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 317.72052 ± 193.838
2025-09-16 16:53:39,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [387.05847, 146.97762, 124.1099, 684.41077, 167.6145, 108.89381, 364.62695, 479.87665, 166.90854, 546.728]
2025-09-16 16:53:39,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 28.0, 24.0, 125.0, 32.0, 21.0, 65.0, 87.0, 32.0, 114.0]
2025-09-16 16:53:39,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 27 minutes, 59 seconds)
2025-09-16 16:55:59,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:56:00,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 438.81543 ± 161.304
2025-09-16 16:56:00,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [160.67393, 540.8642, 563.85516, 152.2791, 684.6941, 414.6457, 405.00995, 477.25262, 445.31036, 543.569]
2025-09-16 16:56:00,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 100.0, 104.0, 29.0, 131.0, 77.0, 74.0, 86.0, 82.0, 103.0]
2025-09-16 16:56:00,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 26 minutes, 7 seconds)
2025-09-16 16:58:19,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:58:20,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 366.35248 ± 228.998
2025-09-16 16:58:20,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [96.69617, 125.7711, 441.53088, 555.8348, 498.6766, 328.25253, 577.9426, 119.04838, 135.95497, 783.8167]
2025-09-16 16:58:20,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 82.0, 106.0, 100.0, 59.0, 109.0, 23.0, 26.0, 168.0]
2025-09-16 16:58:20,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 23 minutes, 34 seconds)
2025-09-16 17:00:39,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:00:40,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 404.87134 ± 196.514
2025-09-16 17:00:40,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [621.0473, 586.4449, 475.0483, 542.3373, 644.47614, 389.16354, 395.95737, 118.52468, 125.027626, 150.6863]
2025-09-16 17:00:40,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 112.0, 88.0, 100.0, 135.0, 70.0, 74.0, 23.0, 24.0, 29.0]
2025-09-16 17:00:40,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 21 minutes, 47 seconds)
2025-09-16 17:02:59,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:03:01,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 423.41611 ± 220.464
2025-09-16 17:03:01,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [498.6139, 836.7264, 160.51913, 398.30475, 124.7405, 476.84964, 129.52274, 478.12268, 470.29697, 660.4642]
2025-09-16 17:03:01,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 157.0, 31.0, 78.0, 24.0, 103.0, 25.0, 90.0, 91.0, 123.0]
2025-09-16 17:03:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 19 minutes, 17 seconds)
2025-09-16 17:05:18,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:05:20,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 479.09503 ± 279.159
2025-09-16 17:05:20,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [760.2656, 699.6713, 353.15366, 410.26312, 114.68382, 703.2936, 490.423, 958.8002, 124.34277, 176.05278]
2025-09-16 17:05:20,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 147.0, 65.0, 74.0, 22.0, 145.0, 89.0, 188.0, 24.0, 34.0]
2025-09-16 17:05:20,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 17 minutes, 4 seconds)
2025-09-16 17:07:39,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:07:41,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 483.16486 ± 226.330
2025-09-16 17:07:41,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [113.18436, 615.0481, 457.2559, 550.4063, 765.4628, 166.09299, 335.4901, 853.05444, 404.50485, 571.1484]
2025-09-16 17:07:41,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 114.0, 99.0, 104.0, 165.0, 32.0, 61.0, 154.0, 83.0, 110.0]
2025-09-16 17:07:41,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 14 minutes, 41 seconds)
2025-09-16 17:10:00,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:10:01,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 350.59998 ± 208.679
2025-09-16 17:10:01,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [664.2819, 437.7971, 346.76495, 119.72596, 368.8354, 126.112686, 619.7278, 574.16046, 141.46307, 107.13061]
2025-09-16 17:10:01,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 84.0, 65.0, 23.0, 68.0, 24.0, 128.0, 124.0, 27.0, 21.0]
2025-09-16 17:10:01,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 12 minutes, 28 seconds)
2025-09-16 17:12:20,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:12:21,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 278.29544 ± 175.543
2025-09-16 17:12:21,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [448.02847, 420.2195, 134.60077, 108.671005, 130.1531, 145.57982, 147.13492, 503.14996, 578.418, 166.99884]
2025-09-16 17:12:21,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 77.0, 26.0, 21.0, 25.0, 28.0, 28.0, 92.0, 120.0, 32.0]
2025-09-16 17:12:21,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 10 minutes, 4 seconds)
2025-09-16 17:14:40,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:14:41,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 319.61313 ± 174.678
2025-09-16 17:14:41,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [102.642845, 411.23874, 393.2307, 450.40384, 130.74333, 130.80704, 114.45973, 612.26154, 478.97977, 371.36398]
2025-09-16 17:14:41,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 77.0, 70.0, 85.0, 25.0, 25.0, 22.0, 112.0, 88.0, 67.0]
2025-09-16 17:14:41,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 7 minutes, 39 seconds)
2025-09-16 17:16:57,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:16:58,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 436.10767 ± 247.861
2025-09-16 17:16:58,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [175.65831, 124.58702, 439.93933, 482.15878, 354.3965, 131.0305, 983.3518, 510.65805, 553.7197, 605.5768]
2025-09-16 17:16:58,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 24.0, 84.0, 93.0, 70.0, 25.0, 180.0, 94.0, 103.0, 111.0]
2025-09-16 17:16:58,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 5 minutes, 10 seconds)
2025-09-16 17:19:20,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:19:21,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 354.64322 ± 148.247
2025-09-16 17:19:21,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [417.86017, 389.15268, 484.31375, 473.88116, 468.75247, 468.37662, 102.50291, 444.94983, 129.23752, 167.40495]
2025-09-16 17:19:21,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 71.0, 89.0, 86.0, 85.0, 85.0, 20.0, 86.0, 25.0, 32.0]
2025-09-16 17:19:21,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 3 minutes, 2 seconds)
2025-09-16 17:21:38,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:21:39,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 402.53479 ± 114.682
2025-09-16 17:21:39,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [469.05844, 425.9065, 485.56967, 332.5574, 416.9644, 405.5616, 440.42688, 570.7569, 364.7645, 113.78171]
2025-09-16 17:21:39,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 77.0, 88.0, 63.0, 88.0, 75.0, 80.0, 101.0, 67.0, 22.0]
2025-09-16 17:21:39,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 28 seconds)
2025-09-16 17:23:59,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:24:00,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 397.71591 ± 194.981
2025-09-16 17:24:00,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [642.498, 503.2533, 141.31253, 603.17975, 385.16864, 433.0119, 440.57364, 595.20056, 97.33224, 135.62865]
2025-09-16 17:24:00,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 92.0, 27.0, 108.0, 71.0, 79.0, 96.0, 107.0, 19.0, 26.0]
2025-09-16 17:24:00,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 58 minutes, 18 seconds)
2025-09-16 17:26:19,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:26:20,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 272.89435 ± 226.694
2025-09-16 17:26:20,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [145.08525, 124.82189, 181.03073, 593.7664, 428.66757, 114.38848, 120.24427, 113.548195, 136.84972, 770.5409]
2025-09-16 17:26:20,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 24.0, 35.0, 110.0, 77.0, 22.0, 23.0, 22.0, 26.0, 140.0]
2025-09-16 17:26:20,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 55 minutes, 57 seconds)
2025-09-16 17:28:40,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:28:41,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 504.53741 ± 296.315
2025-09-16 17:28:41,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [499.47977, 802.07733, 123.192535, 510.63806, 824.7466, 651.79376, 972.48834, 417.56757, 114.13328, 129.25688]
2025-09-16 17:28:41,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 146.0, 24.0, 105.0, 153.0, 127.0, 184.0, 78.0, 22.0, 25.0]
2025-09-16 17:28:41,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 53 minutes, 55 seconds)
2025-09-16 17:31:00,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:31:02,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 354.69452 ± 190.607
2025-09-16 17:31:02,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [147.18405, 136.16078, 363.59995, 598.8079, 123.403336, 130.48923, 467.82376, 505.31097, 596.427, 477.73828]
2025-09-16 17:31:02,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 26.0, 68.0, 112.0, 24.0, 25.0, 88.0, 93.0, 122.0, 90.0]
2025-09-16 17:31:02,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 51 minutes, 22 seconds)
2025-09-16 17:33:21,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:33:22,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 467.86334 ± 238.803
2025-09-16 17:33:22,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [413.10168, 161.94632, 700.394, 878.5051, 580.1617, 510.7481, 581.3368, 567.07947, 108.568825, 176.79105]
2025-09-16 17:33:22,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 31.0, 133.0, 179.0, 125.0, 96.0, 106.0, 110.0, 21.0, 34.0]
2025-09-16 17:33:22,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 49 minutes, 12 seconds)
2025-09-16 17:35:41,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:35:43,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 414.74854 ± 196.062
2025-09-16 17:35:43,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [135.02216, 408.29666, 647.163, 640.2473, 469.5766, 581.94104, 474.9303, 130.97853, 140.37059, 518.9588]
2025-09-16 17:35:43,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 76.0, 133.0, 128.0, 86.0, 110.0, 92.0, 25.0, 27.0, 96.0]
2025-09-16 17:35:43,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 46 minutes, 49 seconds)
2025-09-16 17:38:02,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:38:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 500.80411 ± 146.769
2025-09-16 17:38:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [602.9074, 473.62842, 114.49354, 672.87146, 552.4884, 626.1501, 454.69876, 551.5087, 506.37674, 452.91772]
2025-09-16 17:38:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 86.0, 22.0, 137.0, 97.0, 127.0, 83.0, 98.0, 93.0, 84.0]
2025-09-16 17:38:03,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 44 minutes, 31 seconds)
2025-09-16 17:40:23,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:40:25,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 438.28189 ± 234.181
2025-09-16 17:40:25,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [633.7139, 114.215836, 427.6455, 411.73587, 583.0225, 860.4731, 151.77997, 554.12805, 119.18829, 526.91595]
2025-09-16 17:40:25,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 22.0, 79.0, 75.0, 114.0, 168.0, 29.0, 112.0, 23.0, 96.0]
2025-09-16 17:40:25,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 42 minutes, 12 seconds)
2025-09-16 17:42:44,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:42:45,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 299.72031 ± 183.709
2025-09-16 17:42:45,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [562.79346, 173.31616, 130.39334, 364.11606, 119.59527, 119.95739, 123.98379, 472.90704, 325.12518, 605.0154]
2025-09-16 17:42:45,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 33.0, 25.0, 76.0, 23.0, 23.0, 24.0, 87.0, 66.0, 112.0]
2025-09-16 17:42:45,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 39 minutes, 50 seconds)
2025-09-16 17:45:04,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:45:05,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 374.99542 ± 239.365
2025-09-16 17:45:05,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [502.64047, 900.92645, 513.3216, 112.576744, 119.196594, 367.6376, 509.78287, 135.59792, 432.01117, 156.26305]
2025-09-16 17:45:05,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 160.0, 97.0, 22.0, 23.0, 72.0, 110.0, 26.0, 83.0, 30.0]
2025-09-16 17:45:05,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 37 minutes, 30 seconds)
2025-09-16 17:47:23,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:47:24,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 428.65680 ± 286.655
2025-09-16 17:47:24,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [167.42917, 672.4047, 151.7813, 699.0365, 481.6082, 553.14624, 161.96883, 118.550674, 273.51114, 1007.1312]
2025-09-16 17:47:24,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 134.0, 29.0, 136.0, 89.0, 102.0, 31.0, 23.0, 55.0, 195.0]
2025-09-16 17:47:24,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 35 minutes, 4 seconds)
2025-09-16 17:49:44,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:49:44,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 315.43167 ± 203.313
2025-09-16 17:49:44,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [423.90564, 120.15149, 134.49518, 606.2013, 481.05618, 119.75109, 108.69332, 495.3137, 103.48292, 561.2659]
2025-09-16 17:49:44,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 23.0, 26.0, 111.0, 91.0, 23.0, 21.0, 92.0, 20.0, 100.0]
2025-09-16 17:49:45,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 32 minutes, 43 seconds)
2025-09-16 17:52:04,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:52:06,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 460.25287 ± 194.088
2025-09-16 17:52:06,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [496.11554, 488.45807, 113.63856, 594.4843, 693.27844, 114.16898, 664.8491, 377.11053, 486.48392, 573.94104]
2025-09-16 17:52:06,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 89.0, 22.0, 115.0, 127.0, 22.0, 124.0, 69.0, 88.0, 111.0]
2025-09-16 17:52:06,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 30 minutes, 22 seconds)
2025-09-16 17:54:23,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:54:24,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 435.38043 ± 233.607
2025-09-16 17:54:24,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [437.84033, 764.0395, 590.0015, 457.26926, 705.69965, 129.36452, 410.53253, 631.8831, 125.0517, 102.12207]
2025-09-16 17:54:24,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 147.0, 106.0, 83.0, 131.0, 25.0, 73.0, 117.0, 24.0, 20.0]
2025-09-16 17:54:24,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 58 seconds)
2025-09-16 17:56:44,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:56:46,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 538.60669 ± 268.882
2025-09-16 17:56:46,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [497.04297, 114.01759, 971.4086, 532.4189, 750.40955, 464.29053, 698.01984, 840.4878, 382.08994, 135.88116]
2025-09-16 17:56:46,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 22.0, 172.0, 96.0, 135.0, 88.0, 130.0, 155.0, 70.0, 26.0]
2025-09-16 17:56:46,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (538.61) for latency 21
2025-09-16 17:56:46,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 41 seconds)
2025-09-16 17:59:04,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:59:05,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 364.96109 ± 203.279
2025-09-16 17:59:05,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [409.7903, 568.67474, 523.6074, 452.9469, 123.80166, 130.95186, 119.9244, 134.73505, 532.5425, 652.6361]
2025-09-16 17:59:05,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 102.0, 97.0, 86.0, 24.0, 25.0, 23.0, 26.0, 98.0, 121.0]
2025-09-16 17:59:05,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 21 seconds)
2025-09-16 18:01:25,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:01:26,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 545.63184 ± 162.014
2025-09-16 18:01:26,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [706.8851, 680.6838, 509.49942, 369.1073, 645.2722, 529.4502, 599.0212, 572.93524, 155.94084, 687.5232]
2025-09-16 18:01:26,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 142.0, 93.0, 69.0, 136.0, 101.0, 108.0, 110.0, 30.0, 123.0]
2025-09-16 18:01:26,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (545.63) for latency 21
2025-09-16 18:01:26,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 3 seconds)
2025-09-16 18:03:45,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:03:46,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 429.39713 ± 316.633
2025-09-16 18:03:46,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [138.97914, 1177.2767, 119.60063, 130.10696, 702.6795, 439.57004, 443.38916, 486.70718, 520.76886, 134.8932]
2025-09-16 18:03:46,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 241.0, 23.0, 25.0, 126.0, 81.0, 83.0, 103.0, 93.0, 26.0]
2025-09-16 18:03:46,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 40 seconds)
2025-09-16 18:06:06,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:06:08,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 474.88681 ± 227.064
2025-09-16 18:06:08,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [349.44812, 581.19727, 809.994, 618.19324, 103.11933, 606.8138, 389.8151, 728.9753, 118.108215, 443.20343]
2025-09-16 18:06:08,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 112.0, 175.0, 112.0, 20.0, 116.0, 84.0, 135.0, 23.0, 87.0]
2025-09-16 18:06:08,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 24 seconds)
2025-09-16 18:08:25,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:08:27,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 431.43164 ± 230.556
2025-09-16 18:08:27,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [113.37597, 579.7748, 451.77167, 577.178, 790.5223, 109.04958, 119.75234, 386.53152, 558.6828, 627.67737]
2025-09-16 18:08:27,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 123.0, 93.0, 108.0, 157.0, 21.0, 23.0, 73.0, 101.0, 134.0]
2025-09-16 18:08:27,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes)
2025-09-16 18:10:47,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:10:48,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 415.42612 ± 293.729
2025-09-16 18:10:48,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [727.12427, 131.05022, 134.65001, 557.1412, 412.6505, 802.60205, 114.52054, 900.7344, 134.99391, 238.79405]
2025-09-16 18:10:48,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 25.0, 26.0, 103.0, 76.0, 146.0, 22.0, 172.0, 26.0, 46.0]
2025-09-16 18:10:48,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 43 seconds)
2025-09-16 18:13:05,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:13:07,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 421.46167 ± 316.113
2025-09-16 18:13:07,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [565.21423, 447.91574, 118.824135, 581.3138, 140.26054, 501.55814, 1194.7434, 417.2253, 129.68459, 117.877014]
2025-09-16 18:13:07,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 81.0, 23.0, 107.0, 27.0, 97.0, 218.0, 74.0, 25.0, 23.0]
2025-09-16 18:13:07,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 20 seconds)
2025-09-16 18:15:28,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:15:29,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 339.73358 ± 172.766
2025-09-16 18:15:29,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [527.4411, 141.37874, 430.05493, 125.1231, 140.55058, 493.72772, 120.14343, 469.7267, 425.57462, 523.6151]
2025-09-16 18:15:29,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 27.0, 79.0, 24.0, 27.0, 91.0, 23.0, 86.0, 90.0, 98.0]
2025-09-16 18:15:29,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 1 second)
2025-09-16 18:17:47,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:17:49,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 393.41168 ± 258.437
2025-09-16 18:17:49,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [145.3353, 118.74937, 635.51044, 739.9505, 169.12396, 134.94589, 642.9063, 610.53217, 119.05209, 618.0108]
2025-09-16 18:17:49,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 23.0, 139.0, 147.0, 33.0, 26.0, 128.0, 113.0, 23.0, 119.0]
2025-09-16 18:17:49,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 40 seconds)
2025-09-16 18:20:08,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:20:09,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 382.61234 ± 261.815
2025-09-16 18:20:09,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [760.2737, 167.03162, 373.33072, 113.73974, 148.23433, 581.46954, 160.45555, 124.51897, 615.0645, 782.0046]
2025-09-16 18:20:09,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 32.0, 69.0, 22.0, 29.0, 118.0, 31.0, 24.0, 113.0, 136.0]
2025-09-16 18:20:09,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 20 seconds)
2025-09-16 18:22:26,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:22:27,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 480.29510 ± 268.445
2025-09-16 18:22:27,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [96.58142, 720.2538, 592.5737, 921.71533, 458.04666, 656.5358, 124.26095, 644.2491, 130.02156, 458.71298]
2025-09-16 18:22:27,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 144.0, 108.0, 171.0, 89.0, 123.0, 24.0, 133.0, 25.0, 102.0]
2025-09-16 18:22:27,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1251 [DEBUG]: Training session finished
