2025-09-16 14:47:08,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.150-delay_21
2025-09-16 14:47:08,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.150-delay_21
2025-09-16 14:47:08,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x1522a10d49d0>}
2025-09-16 14:47:08,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:47:08,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:47:08,768 baseline-bpql-noisepromille150-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=733, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:47:08,768 baseline-bpql-noisepromille150-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:47:10,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:47:10,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:49:01,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:49:01,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 222.46829 ± 141.754
2025-09-16 14:49:01,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [308.08914, 129.45169, 343.8748, 161.65375, 107.80333, 118.41537, 567.6771, 254.23384, 119.25615, 114.227875]
2025-09-16 14:49:01,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 25.0, 66.0, 31.0, 21.0, 23.0, 121.0, 50.0, 23.0, 22.0]
2025-09-16 14:49:01,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (222.47) for latency 21
2025-09-16 14:49:01,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 3 minutes, 53 seconds)
2025-09-16 14:51:00,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:51:01,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 234.20190 ± 119.292
2025-09-16 14:51:01,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [174.04099, 134.743, 130.8443, 225.019, 404.71005, 102.52453, 106.243675, 365.21228, 273.29892, 425.3821]
2025-09-16 14:51:01,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 26.0, 25.0, 46.0, 80.0, 20.0, 21.0, 73.0, 55.0, 80.0]
2025-09-16 14:51:01,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (234.20) for latency 21
2025-09-16 14:51:01,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 8 minutes, 23 seconds)
2025-09-16 14:53:00,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:53:00,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 227.76237 ± 110.762
2025-09-16 14:53:00,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [368.96677, 124.45209, 340.35226, 348.05014, 218.92673, 102.71571, 369.19226, 182.36618, 108.04594, 114.55555]
2025-09-16 14:53:00,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 24.0, 64.0, 68.0, 42.0, 20.0, 73.0, 35.0, 21.0, 22.0]
2025-09-16 14:53:00,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 8 minutes, 47 seconds)
2025-09-16 14:54:58,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:54:58,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 220.74832 ± 102.738
2025-09-16 14:54:58,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [340.19354, 108.01112, 321.3294, 113.38363, 108.75118, 321.35406, 149.80435, 130.12144, 252.27148, 362.26285]
2025-09-16 14:54:58,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 21.0, 64.0, 22.0, 21.0, 64.0, 29.0, 25.0, 50.0, 68.0]
2025-09-16 14:54:58,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 7 minutes, 17 seconds)
2025-09-16 14:56:56,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:56:56,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 199.60844 ± 112.254
2025-09-16 14:56:56,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [425.9077, 123.41884, 117.56451, 333.4798, 145.4885, 101.75927, 145.02682, 133.8514, 129.7729, 339.81473]
2025-09-16 14:56:56,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 24.0, 23.0, 67.0, 28.0, 20.0, 28.0, 26.0, 25.0, 67.0]
2025-09-16 14:56:56,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 5 minutes, 41 seconds)
2025-09-16 14:58:54,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:58:54,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 224.47757 ± 115.172
2025-09-16 14:58:54,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [129.68045, 145.405, 161.63922, 165.9654, 455.3389, 133.54929, 411.35712, 257.33508, 264.32315, 120.18204]
2025-09-16 14:58:54,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 28.0, 31.0, 32.0, 100.0, 26.0, 94.0, 52.0, 54.0, 23.0]
2025-09-16 14:58:54,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 5 minutes, 45 seconds)
2025-09-16 15:00:52,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:00:52,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 167.63028 ± 114.712
2025-09-16 15:00:52,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [96.51695, 180.72392, 187.10374, 497.84625, 96.3569, 156.50928, 130.12825, 127.61914, 106.56089, 96.93736]
2025-09-16 15:00:52,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 36.0, 37.0, 97.0, 19.0, 30.0, 25.0, 25.0, 21.0, 19.0]
2025-09-16 15:00:52,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 3 minutes, 25 seconds)
2025-09-16 15:02:50,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:02:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 149.98544 ± 45.744
2025-09-16 15:02:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [107.734955, 252.19576, 177.05983, 119.89974, 168.45752, 100.87606, 113.57886, 139.84302, 196.63116, 123.577484]
2025-09-16 15:02:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 50.0, 34.0, 23.0, 33.0, 20.0, 22.0, 27.0, 38.0, 24.0]
2025-09-16 15:02:50,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 53 seconds)
2025-09-16 15:04:46,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:04:47,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 125.12775 ± 19.832
2025-09-16 15:04:47,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [130.64767, 158.3592, 102.512245, 97.13388, 134.3898, 107.84944, 123.45962, 157.62697, 119.54372, 119.75493]
2025-09-16 15:04:47,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 31.0, 20.0, 19.0, 26.0, 21.0, 24.0, 31.0, 23.0, 23.0]
2025-09-16 15:04:47,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 58 minutes, 26 seconds)
2025-09-16 15:06:43,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:06:44,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 169.87421 ± 54.678
2025-09-16 15:06:44,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [140.86076, 141.29381, 152.64621, 124.57226, 215.3323, 273.37387, 253.22461, 96.726685, 155.39096, 145.32056]
2025-09-16 15:06:44,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 27.0, 31.0, 24.0, 43.0, 56.0, 51.0, 19.0, 32.0, 29.0]
2025-09-16 15:06:44,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 8 seconds)
2025-09-16 15:08:40,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:08:40,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 130.38881 ± 22.240
2025-09-16 15:08:40,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.08526, 113.90688, 141.46294, 141.9788, 167.71677, 158.51935, 118.79002, 138.01428, 126.13392, 96.27976]
2025-09-16 15:08:40,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 28.0, 28.0, 33.0, 31.0, 23.0, 27.0, 24.0, 19.0]
2025-09-16 15:08:40,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 53 minutes, 50 seconds)
2025-09-16 15:10:37,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:10:37,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 140.71449 ± 16.231
2025-09-16 15:10:37,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [144.22287, 123.17336, 149.04495, 124.49866, 154.01666, 107.99284, 141.95209, 142.85197, 161.6724, 157.71907]
2025-09-16 15:10:37,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 24.0, 29.0, 24.0, 30.0, 21.0, 28.0, 28.0, 32.0, 31.0]
2025-09-16 15:10:37,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 51 minutes, 30 seconds)
2025-09-16 15:12:33,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:12:34,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 128.24330 ± 17.601
2025-09-16 15:12:34,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [112.89738, 103.14066, 123.87887, 137.2729, 153.80281, 97.086784, 133.90881, 144.1666, 141.76471, 134.51357]
2025-09-16 15:12:34,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 24.0, 27.0, 31.0, 19.0, 26.0, 28.0, 28.0, 26.0]
2025-09-16 15:12:34,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 49 minutes, 15 seconds)
2025-09-16 15:14:30,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:14:31,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 133.67413 ± 32.627
2025-09-16 15:14:31,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [95.54426, 101.27532, 152.41809, 127.269424, 190.47139, 160.8671, 134.87032, 90.01848, 171.48035, 112.52657]
2025-09-16 15:14:31,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 29.0, 25.0, 37.0, 31.0, 26.0, 18.0, 34.0, 22.0]
2025-09-16 15:14:31,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 47 minutes, 28 seconds)
2025-09-16 15:16:28,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:16:28,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 181.65036 ± 66.204
2025-09-16 15:16:28,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [120.334175, 176.38763, 120.11825, 102.434364, 312.90567, 255.24957, 125.19928, 237.80623, 160.76659, 205.30183]
2025-09-16 15:16:28,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 34.0, 23.0, 20.0, 63.0, 52.0, 24.0, 46.0, 31.0, 42.0]
2025-09-16 15:16:28,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 45 minutes, 41 seconds)
2025-09-16 15:18:25,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:18:25,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 173.57516 ± 41.403
2025-09-16 15:18:25,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [157.975, 255.484, 203.14523, 204.62257, 166.03096, 189.12111, 184.1887, 119.67908, 148.19463, 107.31034]
2025-09-16 15:18:25,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 50.0, 43.0, 43.0, 35.0, 37.0, 36.0, 23.0, 29.0, 21.0]
2025-09-16 15:18:26,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 43 minutes, 52 seconds)
2025-09-16 15:20:22,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:20:22,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 153.40085 ± 12.333
2025-09-16 15:20:22,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [129.95087, 171.64838, 145.52473, 157.44376, 158.40103, 160.23532, 158.82396, 133.6183, 159.6361, 158.726]
2025-09-16 15:20:22,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 35.0, 28.0, 30.0, 31.0, 32.0, 31.0, 26.0, 31.0, 31.0]
2025-09-16 15:20:22,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 41 minutes, 56 seconds)
2025-09-16 15:22:19,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:22:19,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 139.30705 ± 34.291
2025-09-16 15:22:19,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [123.56706, 101.2993, 173.4878, 112.85152, 106.55841, 143.39429, 220.85564, 123.89248, 133.98186, 153.18228]
2025-09-16 15:22:19,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 34.0, 22.0, 21.0, 28.0, 46.0, 24.0, 26.0, 31.0]
2025-09-16 15:22:19,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 39 minutes, 57 seconds)
2025-09-16 15:24:16,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:24:16,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 157.86269 ± 53.379
2025-09-16 15:24:16,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [192.12668, 102.6394, 117.93744, 133.96443, 177.89806, 295.1083, 158.80392, 158.01337, 112.19215, 129.94312]
2025-09-16 15:24:16,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 20.0, 23.0, 26.0, 35.0, 60.0, 31.0, 32.0, 22.0, 25.0]
2025-09-16 15:24:16,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 38 minutes, 5 seconds)
2025-09-16 15:26:13,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:26:13,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 150.56856 ± 45.047
2025-09-16 15:26:13,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [256.09723, 125.03527, 123.3774, 198.82523, 169.81157, 114.26859, 102.513565, 119.13458, 163.14687, 133.47522]
2025-09-16 15:26:13,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 24.0, 24.0, 40.0, 33.0, 22.0, 20.0, 23.0, 33.0, 26.0]
2025-09-16 15:26:13,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 35 minutes, 59 seconds)
2025-09-16 15:28:10,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:28:10,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 134.20663 ± 22.816
2025-09-16 15:28:10,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [157.96109, 122.8238, 129.85266, 96.04896, 173.42575, 163.17476, 118.53231, 114.693, 136.57294, 128.98122]
2025-09-16 15:28:10,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 24.0, 25.0, 19.0, 34.0, 32.0, 23.0, 22.0, 26.0, 25.0]
2025-09-16 15:28:10,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 34 minutes)
2025-09-16 15:30:07,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:30:07,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 131.19832 ± 19.675
2025-09-16 15:30:07,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [107.35348, 142.23485, 161.83817, 118.84723, 157.1379, 112.328636, 139.22311, 117.93566, 147.45668, 107.6275]
2025-09-16 15:30:07,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 28.0, 33.0, 23.0, 31.0, 22.0, 27.0, 23.0, 29.0, 21.0]
2025-09-16 15:30:07,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes, 1 second)
2025-09-16 15:32:03,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:32:04,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 137.70407 ± 30.029
2025-09-16 15:32:04,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [158.2223, 102.28862, 119.13414, 139.33627, 108.70128, 159.32156, 114.604004, 144.92792, 207.57706, 122.927666]
2025-09-16 15:32:04,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 20.0, 23.0, 27.0, 21.0, 31.0, 22.0, 28.0, 42.0, 24.0]
2025-09-16 15:32:04,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes)
2025-09-16 15:34:00,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:34:00,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 141.32326 ± 24.732
2025-09-16 15:34:00,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [133.28215, 129.14574, 138.39243, 138.70776, 156.45425, 97.725655, 128.37862, 189.651, 173.96259, 127.53232]
2025-09-16 15:34:00,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 27.0, 27.0, 30.0, 19.0, 25.0, 39.0, 34.0, 25.0]
2025-09-16 15:34:00,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 27 minutes, 57 seconds)
2025-09-16 15:35:57,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:35:58,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 176.25430 ± 79.643
2025-09-16 15:35:58,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [123.383026, 100.58193, 317.8787, 117.36438, 135.44571, 143.53033, 246.72362, 312.94006, 156.24051, 108.454704]
2025-09-16 15:35:58,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 68.0, 23.0, 26.0, 28.0, 50.0, 64.0, 30.0, 21.0]
2025-09-16 15:35:58,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 26 minutes, 6 seconds)
2025-09-16 15:37:55,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:37:55,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 171.37263 ± 63.065
2025-09-16 15:37:55,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [207.28325, 192.92378, 144.73026, 178.48895, 334.62286, 107.43865, 114.18671, 171.87111, 129.81456, 132.36613]
2025-09-16 15:37:55,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 39.0, 29.0, 35.0, 75.0, 21.0, 22.0, 34.0, 25.0, 26.0]
2025-09-16 15:37:55,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 minutes, 13 seconds)
2025-09-16 15:39:51,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:39:52,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 169.84402 ± 69.086
2025-09-16 15:39:52,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [238.37976, 111.24996, 144.7377, 184.36395, 331.6114, 203.88956, 97.3139, 102.6787, 134.4099, 149.8054]
2025-09-16 15:39:52,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 22.0, 28.0, 37.0, 72.0, 42.0, 19.0, 20.0, 26.0, 29.0]
2025-09-16 15:39:52,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 22 minutes, 18 seconds)
2025-09-16 15:41:50,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:41:50,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 177.86200 ± 67.930
2025-09-16 15:41:50,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [197.82008, 117.90736, 238.23952, 160.22052, 128.69437, 346.47864, 160.7607, 174.72943, 96.49597, 157.27345]
2025-09-16 15:41:50,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 23.0, 48.0, 31.0, 25.0, 72.0, 31.0, 34.0, 19.0, 30.0]
2025-09-16 15:41:50,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 20 minutes, 45 seconds)
2025-09-16 15:43:47,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:43:47,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 157.66087 ± 54.808
2025-09-16 15:43:47,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [118.05854, 197.75099, 118.55739, 145.64517, 128.68301, 144.30522, 102.38612, 301.47882, 174.42255, 145.3209]
2025-09-16 15:43:47,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 39.0, 23.0, 28.0, 25.0, 28.0, 20.0, 60.0, 35.0, 28.0]
2025-09-16 15:43:47,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 18 minutes, 54 seconds)
2025-09-16 15:45:44,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:45:44,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 125.57644 ± 34.307
2025-09-16 15:45:44,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [144.72603, 112.71383, 204.23438, 95.3461, 152.16841, 103.146736, 96.73358, 108.05618, 90.08379, 148.55542]
2025-09-16 15:45:44,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 22.0, 42.0, 19.0, 30.0, 20.0, 19.0, 21.0, 18.0, 29.0]
2025-09-16 15:45:44,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 16 minutes, 48 seconds)
2025-09-16 15:47:40,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:47:41,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 147.15720 ± 25.746
2025-09-16 15:47:41,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.520294, 160.74518, 160.92046, 192.16335, 150.87163, 150.94388, 103.47368, 140.42271, 151.18576, 159.32495]
2025-09-16 15:47:41,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 31.0, 31.0, 38.0, 29.0, 29.0, 20.0, 27.0, 30.0, 32.0]
2025-09-16 15:47:41,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 14 minutes, 40 seconds)
2025-09-16 15:49:36,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:49:36,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 145.60876 ± 38.986
2025-09-16 15:49:36,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [102.80121, 233.83943, 123.81004, 119.17841, 143.97272, 130.28888, 176.3164, 123.50544, 187.77267, 114.60238]
2025-09-16 15:49:36,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 46.0, 24.0, 23.0, 28.0, 25.0, 35.0, 24.0, 36.0, 22.0]
2025-09-16 15:49:36,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 28 seconds)
2025-09-16 15:51:30,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:51:31,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 170.67073 ± 50.698
2025-09-16 15:51:31,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [97.58195, 195.46819, 217.49084, 265.45544, 175.57417, 107.77499, 201.24449, 176.78835, 157.30296, 112.0259]
2025-09-16 15:51:31,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 38.0, 42.0, 53.0, 34.0, 21.0, 39.0, 36.0, 30.0, 22.0]
2025-09-16 15:51:31,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 40 seconds)
2025-09-16 15:53:25,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:53:26,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 167.36591 ± 40.202
2025-09-16 15:53:26,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [209.94887, 150.28906, 144.73596, 135.65475, 183.49461, 195.4708, 183.26648, 96.18399, 239.14415, 135.47029]
2025-09-16 15:53:26,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 29.0, 28.0, 26.0, 37.0, 41.0, 36.0, 19.0, 47.0, 26.0]
2025-09-16 15:53:26,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 13 seconds)
2025-09-16 15:55:20,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:55:20,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 138.58302 ± 57.387
2025-09-16 15:55:20,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [113.11499, 154.28535, 123.40195, 107.99955, 96.5578, 136.40099, 112.790306, 134.57626, 303.42596, 103.27695]
2025-09-16 15:55:20,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 32.0, 24.0, 21.0, 19.0, 26.0, 22.0, 26.0, 66.0, 20.0]
2025-09-16 15:55:20,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 4 minutes, 48 seconds)
2025-09-16 15:57:12,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:57:13,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 138.51497 ± 24.748
2025-09-16 15:57:13,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [153.34361, 137.29884, 103.20497, 150.6561, 148.80508, 133.96632, 194.90767, 113.59947, 134.65941, 114.70812]
2025-09-16 15:57:13,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 27.0, 20.0, 29.0, 29.0, 26.0, 41.0, 22.0, 26.0, 22.0]
2025-09-16 15:57:13,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes, 5 seconds)
2025-09-16 15:59:04,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:59:05,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 156.98270 ± 14.379
2025-09-16 15:59:05,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [156.99817, 191.35522, 150.91005, 128.83948, 159.48763, 152.4374, 157.255, 156.4512, 154.32227, 161.77057]
2025-09-16 15:59:05,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 40.0, 30.0, 25.0, 32.0, 30.0, 33.0, 31.0, 30.0, 32.0]
2025-09-16 15:59:05,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 59 minutes, 20 seconds)
2025-09-16 16:00:55,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:00:56,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 130.07463 ± 26.281
2025-09-16 16:00:56,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.55106, 135.0132, 156.41412, 153.03151, 114.31311, 102.53895, 175.31268, 102.12552, 107.70644, 152.73972]
2025-09-16 16:00:56,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 27.0, 31.0, 31.0, 22.0, 20.0, 36.0, 20.0, 21.0, 30.0]
2025-09-16 16:00:56,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 56 minutes, 46 seconds)
2025-09-16 16:02:48,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:02:49,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 148.10104 ± 20.769
2025-09-16 16:02:49,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [138.15761, 148.75151, 128.60881, 113.873245, 161.58147, 197.63884, 151.25401, 148.26108, 148.84729, 144.03671]
2025-09-16 16:02:49,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 29.0, 25.0, 22.0, 34.0, 41.0, 31.0, 30.0, 30.0, 29.0]
2025-09-16 16:02:49,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 54 minutes, 31 seconds)
2025-09-16 16:05:00,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:05:01,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 131.58041 ± 20.742
2025-09-16 16:05:01,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [136.48785, 156.68878, 119.11056, 102.80522, 153.48608, 158.52032, 138.57147, 138.69765, 108.99072, 102.44541]
2025-09-16 16:05:01,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 31.0, 23.0, 20.0, 31.0, 31.0, 27.0, 28.0, 21.0, 20.0]
2025-09-16 16:05:01,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 56 minutes, 7 seconds)
2025-09-16 16:07:21,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:07:21,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 124.55692 ± 19.756
2025-09-16 16:07:21,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [143.85536, 101.59309, 107.2616, 127.5849, 139.02704, 139.65474, 108.30641, 144.83722, 143.74004, 89.708885]
2025-09-16 16:07:21,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 20.0, 21.0, 25.0, 27.0, 27.0, 21.0, 29.0, 28.0, 18.0]
2025-09-16 16:07:21,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 59 minutes, 35 seconds)
2025-09-16 16:09:41,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:09:42,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 137.33322 ± 17.739
2025-09-16 16:09:42,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [143.65005, 145.9158, 153.50543, 114.38837, 161.06845, 145.6147, 151.6067, 102.43419, 126.83645, 128.312]
2025-09-16 16:09:42,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 28.0, 31.0, 22.0, 32.0, 29.0, 31.0, 20.0, 25.0, 25.0]
2025-09-16 16:09:42,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 3 minutes, 8 seconds)
2025-09-16 16:12:03,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:12:03,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 135.49622 ± 15.399
2025-09-16 16:12:03,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [147.43224, 152.36275, 117.668625, 143.91957, 101.57896, 137.94196, 144.34882, 137.45059, 123.55375, 148.70496]
2025-09-16 16:12:03,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 23.0, 29.0, 20.0, 28.0, 29.0, 27.0, 24.0, 29.0]
2025-09-16 16:12:03,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 6 minutes, 47 seconds)
2025-09-16 16:14:21,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:14:22,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 116.78572 ± 16.535
2025-09-16 16:14:22,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [119.11139, 108.98161, 96.18466, 156.57286, 106.70185, 113.17933, 101.122856, 118.564896, 133.74727, 113.690414]
2025-09-16 16:14:22,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 21.0, 19.0, 32.0, 21.0, 22.0, 20.0, 23.0, 26.0, 22.0]
2025-09-16 16:14:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 9 minutes, 19 seconds)
2025-09-16 16:16:53,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:16:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 128.24667 ± 16.625
2025-09-16 16:16:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [114.651054, 102.62623, 142.06781, 134.0314, 131.47185, 119.64872, 152.40422, 153.55046, 122.72887, 109.28613]
2025-09-16 16:16:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 28.0, 26.0, 26.0, 23.0, 31.0, 30.0, 25.0, 21.0]
2025-09-16 16:16:54,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 10 minutes, 42 seconds)
2025-09-16 16:19:21,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:19:21,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 123.26540 ± 20.084
2025-09-16 16:19:21,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [89.82, 140.97261, 150.89702, 145.00542, 108.52321, 108.05526, 102.00319, 114.60393, 141.14314, 131.63017]
2025-09-16 16:19:21,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 27.0, 29.0, 28.0, 21.0, 21.0, 20.0, 22.0, 28.0, 26.0]
2025-09-16 16:19:21,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 9 minutes, 40 seconds)
2025-09-16 16:21:42,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:21:42,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 129.77348 ± 14.599
2025-09-16 16:21:42,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [147.65372, 142.05496, 123.012566, 132.86366, 137.1241, 132.71194, 95.32504, 118.72639, 124.22676, 144.03563]
2025-09-16 16:21:42,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 24.0, 26.0, 28.0, 26.0, 19.0, 23.0, 24.0, 29.0]
2025-09-16 16:21:42,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 7 minutes, 16 seconds)
2025-09-16 16:24:07,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:24:08,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 130.29977 ± 9.486
2025-09-16 16:24:08,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [145.61661, 139.07193, 127.764015, 117.17578, 136.97523, 132.31114, 135.97514, 130.45518, 124.5452, 113.1076]
2025-09-16 16:24:08,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 27.0, 25.0, 23.0, 28.0, 26.0, 28.0, 25.0, 24.0, 22.0]
2025-09-16 16:24:08,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 5 minutes, 39 seconds)
2025-09-16 16:26:33,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:26:33,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 124.48515 ± 15.983
2025-09-16 16:26:33,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [126.625916, 111.2325, 137.54323, 140.20326, 100.071, 107.58413, 107.10022, 137.95581, 148.54904, 127.98632]
2025-09-16 16:26:33,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 27.0, 28.0, 20.0, 21.0, 21.0, 27.0, 29.0, 25.0]
2025-09-16 16:26:33,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 4 minutes, 21 seconds)
2025-09-16 16:28:59,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:29:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 132.33719 ± 15.605
2025-09-16 16:29:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [148.85355, 105.92075, 142.45378, 144.28555, 141.25333, 122.72566, 146.11925, 120.15082, 108.14599, 143.46326]
2025-09-16 16:29:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 21.0, 28.0, 29.0, 28.0, 24.0, 29.0, 23.0, 21.0, 28.0]
2025-09-16 16:29:00,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 1 minute, 1 second)
2025-09-16 16:31:27,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:31:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 160.53746 ± 50.944
2025-09-16 16:31:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [230.31435, 186.13797, 97.069496, 136.28116, 108.65158, 116.43718, 169.09381, 252.77232, 189.78206, 118.834694]
2025-09-16 16:31:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 38.0, 19.0, 27.0, 21.0, 23.0, 35.0, 49.0, 39.0, 23.0]
2025-09-16 16:31:27,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 58 minutes, 33 seconds)
2025-09-16 16:33:56,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:33:57,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 278.84058 ± 131.691
2025-09-16 16:33:57,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [483.92374, 327.45676, 463.5967, 304.2823, 119.66029, 101.73193, 301.69336, 343.96426, 240.01064, 102.08573]
2025-09-16 16:33:57,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 63.0, 89.0, 57.0, 23.0, 20.0, 64.0, 68.0, 47.0, 20.0]
2025-09-16 16:33:57,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (278.84) for latency 21
2025-09-16 16:33:57,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 57 minutes, 34 seconds)
2025-09-16 16:36:30,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:36:30,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 198.20450 ± 102.905
2025-09-16 16:36:30,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [123.76844, 211.41637, 127.70414, 96.312386, 384.07068, 363.29672, 137.76164, 130.94394, 289.39035, 117.38018]
2025-09-16 16:36:30,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 43.0, 25.0, 19.0, 73.0, 67.0, 27.0, 25.0, 61.0, 23.0]
2025-09-16 16:36:30,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 56 minutes, 20 seconds)
2025-09-16 16:39:02,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:39:03,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 234.72510 ± 128.830
2025-09-16 16:39:03,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [350.7501, 463.6787, 348.8892, 341.741, 274.31088, 117.87363, 116.97298, 107.1386, 95.91425, 129.9814]
2025-09-16 16:39:03,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 86.0, 68.0, 71.0, 52.0, 23.0, 23.0, 21.0, 19.0, 25.0]
2025-09-16 16:39:03,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 54 minutes, 57 seconds)
2025-09-16 16:41:35,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:41:36,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 281.42734 ± 169.855
2025-09-16 16:41:36,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [503.9224, 117.67157, 96.60313, 113.10941, 557.35565, 114.20111, 171.51617, 420.95297, 390.0147, 328.92615]
2025-09-16 16:41:36,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 23.0, 19.0, 22.0, 117.0, 22.0, 33.0, 80.0, 73.0, 61.0]
2025-09-16 16:41:36,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (281.43) for latency 21
2025-09-16 16:41:36,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 53 minutes, 26 seconds)
2025-09-16 16:44:08,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:44:10,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 290.04251 ± 125.180
2025-09-16 16:44:10,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [112.62463, 289.18372, 370.65427, 299.9442, 119.60346, 140.38152, 404.78806, 282.70996, 373.85208, 506.68314]
2025-09-16 16:44:10,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 56.0, 83.0, 57.0, 24.0, 27.0, 79.0, 55.0, 72.0, 107.0]
2025-09-16 16:44:10,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (290.04) for latency 21
2025-09-16 16:44:10,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 51 minutes, 47 seconds)
2025-09-16 16:46:41,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:46:42,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 212.40381 ± 112.578
2025-09-16 16:46:42,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [353.52203, 349.1305, 123.831345, 135.36949, 96.67043, 140.83571, 101.95609, 371.88342, 319.9245, 130.91454]
2025-09-16 16:46:42,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 66.0, 24.0, 26.0, 19.0, 27.0, 20.0, 69.0, 62.0, 25.0]
2025-09-16 16:46:42,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 49 minutes, 36 seconds)
2025-09-16 16:49:14,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:49:15,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 220.49600 ± 115.967
2025-09-16 16:49:15,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [334.799, 352.3273, 124.001175, 155.7027, 108.39573, 393.9325, 120.00839, 357.99753, 96.518814, 161.27692]
2025-09-16 16:49:15,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 69.0, 24.0, 30.0, 21.0, 73.0, 24.0, 70.0, 19.0, 31.0]
2025-09-16 16:49:15,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 47 minutes, 1 second)
2025-09-16 16:51:45,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:51:45,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 194.36261 ± 107.955
2025-09-16 16:51:45,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [119.02949, 101.10386, 107.246475, 96.20888, 349.3882, 260.046, 382.9665, 128.84076, 290.7359, 108.06022]
2025-09-16 16:51:45,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 20.0, 21.0, 19.0, 67.0, 52.0, 82.0, 25.0, 59.0, 21.0]
2025-09-16 16:51:45,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 44 minutes, 11 seconds)
2025-09-16 16:54:13,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:54:14,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 285.86169 ± 136.680
2025-09-16 16:54:14,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [363.4654, 107.694885, 286.06815, 124.011345, 314.12872, 269.5023, 108.54638, 343.83475, 567.2935, 374.0715]
2025-09-16 16:54:14,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 21.0, 55.0, 24.0, 61.0, 54.0, 21.0, 68.0, 108.0, 73.0]
2025-09-16 16:54:14,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 41 minutes, 2 seconds)
2025-09-16 16:56:42,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:56:43,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 301.09573 ± 205.719
2025-09-16 16:56:43,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [96.37624, 479.54465, 138.00143, 693.0514, 423.94092, 112.643875, 96.40851, 337.36758, 123.280594, 510.3423]
2025-09-16 16:56:43,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 94.0, 27.0, 133.0, 88.0, 22.0, 19.0, 71.0, 24.0, 99.0]
2025-09-16 16:56:43,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (301.10) for latency 21
2025-09-16 16:56:43,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 37 minutes, 59 seconds)
2025-09-16 16:59:14,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:59:15,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 272.32913 ± 168.074
2025-09-16 16:59:15,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [101.8092, 467.1145, 526.5819, 140.44626, 108.484116, 364.875, 374.49518, 439.82318, 96.71691, 102.94513]
2025-09-16 16:59:15,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 86.0, 98.0, 27.0, 21.0, 70.0, 80.0, 83.0, 19.0, 20.0]
2025-09-16 16:59:15,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 35 minutes, 28 seconds)
2025-09-16 17:01:43,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:01:44,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 159.48990 ± 75.201
2025-09-16 17:01:44,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [351.2018, 102.35449, 250.44318, 108.10576, 145.69717, 119.12263, 124.770164, 137.24788, 120.411385, 135.54457]
2025-09-16 17:01:44,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 20.0, 50.0, 21.0, 28.0, 23.0, 24.0, 27.0, 23.0, 26.0]
2025-09-16 17:01:44,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 32 minutes, 20 seconds)
2025-09-16 17:04:14,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:04:15,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 355.31332 ± 276.841
2025-09-16 17:04:15,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [329.70367, 328.1118, 124.8469, 196.73445, 743.60614, 304.00214, 301.53677, 113.71694, 998.1487, 112.725945]
2025-09-16 17:04:15,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 62.0, 24.0, 42.0, 144.0, 58.0, 57.0, 22.0, 197.0, 22.0]
2025-09-16 17:04:15,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (355.31) for latency 21
2025-09-16 17:04:15,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 30 minutes, 1 second)
2025-09-16 17:06:41,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:06:42,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 212.61832 ± 110.252
2025-09-16 17:06:42,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [317.1697, 351.7984, 97.38584, 115.55472, 101.850624, 118.81918, 102.638824, 322.71207, 363.55087, 234.70287]
2025-09-16 17:06:42,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 76.0, 19.0, 23.0, 20.0, 23.0, 20.0, 60.0, 69.0, 49.0]
2025-09-16 17:06:42,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 27 minutes, 13 seconds)
2025-09-16 17:09:12,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:09:12,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 190.48630 ± 98.763
2025-09-16 17:09:12,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [129.82944, 346.58243, 108.79333, 302.23294, 101.58954, 114.505714, 305.83588, 107.99697, 284.65408, 102.8426]
2025-09-16 17:09:12,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 64.0, 21.0, 60.0, 20.0, 22.0, 60.0, 21.0, 55.0, 20.0]
2025-09-16 17:09:12,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 24 minutes, 53 seconds)
2025-09-16 17:11:40,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:11:41,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 207.18393 ± 115.476
2025-09-16 17:11:41,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [436.0974, 345.35938, 131.80257, 298.4619, 108.42678, 117.21424, 113.95748, 282.74045, 118.44281, 119.33635]
2025-09-16 17:11:41,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 70.0, 26.0, 55.0, 21.0, 23.0, 22.0, 58.0, 23.0, 23.0]
2025-09-16 17:11:41,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 21 minutes, 58 seconds)
2025-09-16 17:14:07,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:14:08,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 326.24481 ± 158.210
2025-09-16 17:14:08,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [444.42316, 632.9239, 108.36668, 143.01753, 119.24485, 353.4381, 365.3535, 447.05133, 351.9178, 296.7114]
2025-09-16 17:14:08,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 132.0, 21.0, 28.0, 23.0, 69.0, 67.0, 100.0, 66.0, 59.0]
2025-09-16 17:14:08,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 19 minutes, 23 seconds)
2025-09-16 17:16:40,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:16:41,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 290.15814 ± 144.091
2025-09-16 17:16:41,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [140.74196, 468.3393, 443.63455, 133.40385, 334.56708, 107.85587, 350.17548, 470.30807, 340.29144, 112.26393]
2025-09-16 17:16:41,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 97.0, 82.0, 26.0, 65.0, 21.0, 68.0, 93.0, 64.0, 22.0]
2025-09-16 17:16:41,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 16 minutes, 59 seconds)
2025-09-16 17:19:12,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:19:13,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 222.87463 ± 131.549
2025-09-16 17:19:13,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [136.08595, 140.89922, 480.6273, 256.0228, 108.601395, 293.37003, 137.50272, 434.4195, 108.79425, 132.4231]
2025-09-16 17:19:13,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 91.0, 51.0, 21.0, 59.0, 27.0, 84.0, 21.0, 26.0]
2025-09-16 17:19:13,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 15 minutes, 7 seconds)
2025-09-16 17:21:41,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:21:42,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 169.26199 ± 78.286
2025-09-16 17:21:42,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [96.40199, 95.63947, 139.55379, 318.65622, 166.33958, 262.04257, 96.274254, 266.09088, 114.370926, 137.25006]
2025-09-16 17:21:42,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 27.0, 61.0, 32.0, 53.0, 19.0, 53.0, 22.0, 26.0]
2025-09-16 17:21:42,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 12 minutes, 27 seconds)
2025-09-16 17:24:11,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:24:12,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 235.38676 ± 105.360
2025-09-16 17:24:12,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [246.55196, 164.6193, 101.15811, 116.43967, 352.0635, 397.46512, 102.870186, 222.49083, 325.59695, 324.61215]
2025-09-16 17:24:12,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 32.0, 20.0, 23.0, 69.0, 78.0, 20.0, 45.0, 63.0, 64.0]
2025-09-16 17:24:12,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 10 minutes, 6 seconds)
2025-09-16 17:26:42,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:26:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 281.12262 ± 158.657
2025-09-16 17:26:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [140.36049, 221.27644, 568.28125, 479.85712, 328.3671, 112.690994, 123.049194, 410.27325, 101.80949, 325.26074]
2025-09-16 17:26:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 44.0, 110.0, 92.0, 72.0, 22.0, 24.0, 78.0, 20.0, 63.0]
2025-09-16 17:26:43,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 7 minutes, 55 seconds)
2025-09-16 17:29:10,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:29:11,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 272.03387 ± 114.813
2025-09-16 17:29:11,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [430.14557, 133.54178, 312.82858, 155.73683, 362.71207, 320.54004, 441.0866, 152.32309, 134.40738, 277.01654]
2025-09-16 17:29:11,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 26.0, 61.0, 30.0, 67.0, 58.0, 82.0, 29.0, 26.0, 53.0]
2025-09-16 17:29:11,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 4 minutes, 59 seconds)
2025-09-16 17:31:42,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:31:43,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 276.22342 ± 170.779
2025-09-16 17:31:43,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [322.6795, 535.6994, 543.1122, 388.83408, 133.69308, 384.19705, 101.7285, 113.9261, 109.88729, 128.47716]
2025-09-16 17:31:43,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 102.0, 105.0, 75.0, 26.0, 73.0, 20.0, 22.0, 22.0, 25.0]
2025-09-16 17:31:43,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 2 minutes, 30 seconds)
2025-09-16 17:34:10,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:34:10,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 221.07608 ± 126.459
2025-09-16 17:34:10,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [298.02808, 399.6558, 124.08083, 483.38422, 147.76329, 151.34482, 107.67445, 248.49205, 96.46762, 153.86961]
2025-09-16 17:34:10,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 77.0, 24.0, 90.0, 28.0, 29.0, 21.0, 48.0, 19.0, 30.0]
2025-09-16 17:34:10,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 59 minutes, 52 seconds)
2025-09-16 17:36:37,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:36:38,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 277.63013 ± 179.551
2025-09-16 17:36:38,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [137.56274, 431.4355, 128.30565, 384.4904, 668.5625, 133.85884, 318.41608, 101.46679, 369.8005, 102.40226]
2025-09-16 17:36:38,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 83.0, 25.0, 79.0, 128.0, 26.0, 62.0, 20.0, 69.0, 20.0]
2025-09-16 17:36:38,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 57 minutes, 13 seconds)
2025-09-16 17:39:05,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:39:06,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 378.91754 ± 243.384
2025-09-16 17:39:06,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [520.45135, 146.86151, 987.27075, 112.6332, 382.20795, 113.83226, 340.3044, 343.90976, 442.7621, 398.9421]
2025-09-16 17:39:06,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 28.0, 202.0, 22.0, 71.0, 22.0, 64.0, 66.0, 98.0, 73.0]
2025-09-16 17:39:06,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (378.92) for latency 21
2025-09-16 17:39:06,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 54 minutes, 30 seconds)
2025-09-16 17:41:35,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:41:36,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 248.70500 ± 151.301
2025-09-16 17:41:36,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [114.59912, 523.6015, 285.6085, 125.38621, 351.70215, 96.74233, 295.24878, 101.61291, 127.79647, 464.75204]
2025-09-16 17:41:36,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 101.0, 57.0, 24.0, 66.0, 19.0, 57.0, 20.0, 25.0, 98.0]
2025-09-16 17:41:36,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 52 minutes, 10 seconds)
2025-09-16 17:44:02,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:44:03,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 303.82870 ± 155.460
2025-09-16 17:44:03,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [320.69583, 118.71363, 144.84256, 597.4093, 506.44226, 311.19943, 247.12262, 326.50922, 375.32602, 90.02637]
2025-09-16 17:44:03,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 23.0, 28.0, 118.0, 94.0, 59.0, 49.0, 62.0, 84.0, 18.0]
2025-09-16 17:44:03,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 49 minutes, 20 seconds)
2025-09-16 17:46:31,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:46:32,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 329.36679 ± 124.464
2025-09-16 17:46:32,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [402.97025, 412.10428, 292.0039, 401.25964, 496.50516, 96.02585, 421.05438, 343.5646, 117.8685, 310.31122]
2025-09-16 17:46:32,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 76.0, 53.0, 76.0, 94.0, 19.0, 78.0, 65.0, 23.0, 58.0]
2025-09-16 17:46:32,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 46 minutes, 56 seconds)
2025-09-16 17:48:58,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:48:59,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 208.26570 ± 158.939
2025-09-16 17:48:59,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [291.39502, 111.54833, 125.19525, 111.72737, 118.5986, 95.92579, 386.2651, 598.3009, 102.80728, 140.89326]
2025-09-16 17:48:59,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 22.0, 24.0, 22.0, 23.0, 19.0, 84.0, 116.0, 20.0, 27.0]
2025-09-16 17:48:59,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 44 minutes, 26 seconds)
2025-09-16 17:51:29,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:51:29,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 225.88120 ± 110.011
2025-09-16 17:51:29,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [140.37383, 315.96716, 96.70401, 340.8922, 305.0943, 357.76355, 122.31063, 115.15747, 111.966805, 352.58212]
2025-09-16 17:51:29,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 60.0, 19.0, 66.0, 57.0, 67.0, 24.0, 22.0, 22.0, 65.0]
2025-09-16 17:51:29,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 42 minutes, 7 seconds)
2025-09-16 17:53:59,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:54:00,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 250.33899 ± 140.540
2025-09-16 17:54:00,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [135.1863, 475.76532, 310.3199, 113.16474, 313.98248, 315.73297, 478.01096, 128.99242, 123.44069, 108.79413]
2025-09-16 17:54:00,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 90.0, 60.0, 22.0, 59.0, 60.0, 93.0, 25.0, 24.0, 21.0]
2025-09-16 17:54:00,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 39 minutes, 40 seconds)
2025-09-16 17:56:30,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:56:31,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 291.92023 ± 158.389
2025-09-16 17:56:31,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [102.584076, 266.56104, 357.80566, 450.03238, 102.56021, 126.927025, 401.19788, 513.96655, 124.68613, 472.88162]
2025-09-16 17:56:31,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 54.0, 68.0, 84.0, 20.0, 25.0, 79.0, 103.0, 24.0, 93.0]
2025-09-16 17:56:31,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 37 minutes, 22 seconds)
2025-09-16 17:59:03,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:59:04,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 291.24472 ± 158.659
2025-09-16 17:59:04,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [107.977905, 158.22432, 130.25444, 480.347, 113.17332, 583.3448, 402.11453, 337.8489, 239.86244, 359.29968]
2025-09-16 17:59:04,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 32.0, 25.0, 94.0, 22.0, 117.0, 85.0, 64.0, 47.0, 66.0]
2025-09-16 17:59:04,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 35 minutes, 7 seconds)
2025-09-16 18:01:35,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:01:35,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 189.22476 ± 116.744
2025-09-16 18:01:35,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [107.59267, 107.26278, 115.174675, 107.78223, 327.65878, 108.14702, 139.59428, 114.02996, 415.86133, 349.1438]
2025-09-16 18:01:35,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 22.0, 21.0, 61.0, 21.0, 27.0, 22.0, 78.0, 65.0]
2025-09-16 18:01:35,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 32 minutes, 46 seconds)
2025-09-16 18:04:06,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:04:06,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 207.76688 ± 122.876
2025-09-16 18:04:06,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [105.69172, 134.43364, 128.67107, 370.3392, 388.1748, 133.6994, 421.76978, 107.934555, 158.27896, 128.6756]
2025-09-16 18:04:06,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 26.0, 25.0, 70.0, 70.0, 26.0, 77.0, 21.0, 31.0, 25.0]
2025-09-16 18:04:06,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 30 minutes, 16 seconds)
2025-09-16 18:06:37,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:06:37,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 210.55522 ± 131.846
2025-09-16 18:06:37,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [152.7686, 124.998436, 128.32391, 313.34265, 507.46597, 95.817276, 128.63557, 376.33264, 132.89415, 144.97307]
2025-09-16 18:06:37,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 24.0, 25.0, 59.0, 98.0, 19.0, 25.0, 71.0, 26.0, 28.0]
2025-09-16 18:06:37,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 27 minutes, 46 seconds)
2025-09-16 18:09:08,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:09:09,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 243.88589 ± 130.313
2025-09-16 18:09:09,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [141.4231, 337.61908, 96.84649, 117.5122, 129.68697, 418.46646, 334.24765, 366.0099, 400.6693, 96.37761]
2025-09-16 18:09:09,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 65.0, 19.0, 23.0, 25.0, 78.0, 63.0, 68.0, 74.0, 19.0]
2025-09-16 18:09:09,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 25 minutes, 17 seconds)
2025-09-16 18:11:38,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:11:39,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 218.44499 ± 119.462
2025-09-16 18:11:39,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [387.7764, 140.66042, 317.2088, 113.48891, 420.13028, 102.67453, 307.83206, 113.33195, 172.31067, 109.03593]
2025-09-16 18:11:39,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 27.0, 63.0, 22.0, 79.0, 20.0, 56.0, 22.0, 33.0, 21.0]
2025-09-16 18:11:39,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 22 minutes, 38 seconds)
2025-09-16 18:14:14,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:14:15,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 301.63947 ± 188.990
2025-09-16 18:14:15,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [317.9048, 392.73978, 417.4197, 108.562225, 308.52997, 108.49963, 731.3593, 114.155174, 394.74326, 122.48091]
2025-09-16 18:14:15,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 75.0, 79.0, 21.0, 60.0, 21.0, 157.0, 22.0, 76.0, 24.0]
2025-09-16 18:14:15,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 15 seconds)
2025-09-16 18:16:45,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:16:46,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 254.32712 ± 110.335
2025-09-16 18:16:46,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [111.90232, 101.52504, 344.16312, 438.56552, 288.62775, 257.14047, 307.9161, 362.56546, 117.925385, 212.94008]
2025-09-16 18:16:46,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 66.0, 87.0, 53.0, 49.0, 58.0, 68.0, 23.0, 41.0]
2025-09-16 18:16:46,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 17 minutes, 43 seconds)
2025-09-16 18:19:13,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:19:15,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 292.88092 ± 166.117
2025-09-16 18:19:15,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [144.49046, 534.58167, 124.55409, 511.12518, 253.60226, 89.65773, 137.61235, 310.89026, 300.9266, 521.3686]
2025-09-16 18:19:15,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 100.0, 24.0, 95.0, 50.0, 18.0, 27.0, 57.0, 56.0, 109.0]
2025-09-16 18:19:15,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 8 seconds)
2025-09-16 18:21:46,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:21:47,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 274.04987 ± 147.880
2025-09-16 18:21:47,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [485.9847, 95.79866, 146.58072, 124.349525, 268.26562, 337.04648, 336.52206, 294.36218, 537.52856, 114.05973]
2025-09-16 18:21:47,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 19.0, 28.0, 24.0, 56.0, 66.0, 66.0, 58.0, 103.0, 22.0]
2025-09-16 18:21:47,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 37 seconds)
2025-09-16 18:24:17,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:24:18,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 266.51962 ± 106.843
2025-09-16 18:24:18,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [150.16838, 336.58954, 373.23358, 324.2259, 328.56903, 417.16168, 324.17715, 113.89144, 146.26286, 150.91681]
2025-09-16 18:24:18,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 63.0, 68.0, 59.0, 60.0, 78.0, 59.0, 22.0, 28.0, 29.0]
2025-09-16 18:24:18,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 7 seconds)
2025-09-16 18:26:52,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:26:53,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 238.65372 ± 105.308
2025-09-16 18:26:53,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [300.7437, 233.14235, 372.22876, 112.54847, 119.47355, 383.297, 109.2293, 309.7156, 134.0376, 312.12082]
2025-09-16 18:26:53,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 45.0, 68.0, 22.0, 23.0, 71.0, 21.0, 69.0, 26.0, 64.0]
2025-09-16 18:26:53,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 34 seconds)
2025-09-16 18:29:21,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:29:22,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 252.29594 ± 102.126
2025-09-16 18:29:22,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [359.46625, 270.91275, 139.44472, 102.01513, 346.50433, 161.47385, 122.75118, 337.72995, 334.76764, 347.89343]
2025-09-16 18:29:22,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 53.0, 27.0, 20.0, 68.0, 31.0, 24.0, 65.0, 65.0, 66.0]
2025-09-16 18:29:22,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 2 seconds)
2025-09-16 18:31:55,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:31:56,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 251.53658 ± 129.824
2025-09-16 18:31:56,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [409.44617, 119.92594, 129.44203, 141.30817, 334.63986, 385.98126, 101.46799, 362.4391, 125.635056, 405.08002]
2025-09-16 18:31:56,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 23.0, 25.0, 27.0, 65.0, 72.0, 20.0, 70.0, 24.0, 87.0]
2025-09-16 18:31:56,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 32 seconds)
2025-09-16 18:34:25,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:34:26,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 328.33362 ± 177.175
2025-09-16 18:34:26,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [369.9516, 226.54932, 414.29816, 112.84493, 383.2639, 113.35007, 682.9383, 113.51966, 482.43286, 384.18744]
2025-09-16 18:34:26,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 44.0, 76.0, 22.0, 72.0, 22.0, 129.0, 22.0, 92.0, 74.0]
2025-09-16 18:34:26,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1251 [DEBUG]: Training session finished
