2025-09-16 13:33:43,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.200-delay_15
2025-09-16 13:33:43,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.200-delay_15
2025-09-16 13:33:43,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x153af0a40810>}
2025-09-16 13:33:43,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 13:33:43,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 13:33:43,363 baseline-bpql-noisepromille200-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=631, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 13:33:43,363 baseline-bpql-noisepromille200-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 13:33:45,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 13:33:45,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 13:35:33,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:35:34,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 190.00720 ± 87.171
2025-09-16 13:35:34,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.98965, 239.3979, 119.81439, 307.41266, 108.15525, 95.51825, 237.10315, 319.53156, 113.17349, 263.97565]
2025-09-16 13:35:34,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 49.0, 23.0, 59.0, 21.0, 19.0, 47.0, 61.0, 22.0, 54.0]
2025-09-16 13:35:34,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (190.01) for latency 15
2025-09-16 13:35:34,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 59 minutes, 44 seconds)
2025-09-16 13:37:31,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:37:32,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 277.20703 ± 98.375
2025-09-16 13:37:32,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [330.58234, 147.45901, 329.49884, 289.84732, 142.43088, 107.71505, 367.05832, 346.56873, 322.45938, 388.45007]
2025-09-16 13:37:32,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 28.0, 65.0, 54.0, 27.0, 21.0, 75.0, 70.0, 72.0, 70.0]
2025-09-16 13:37:32,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (277.21) for latency 15
2025-09-16 13:37:32,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 50 seconds)
2025-09-16 13:39:30,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:39:31,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 246.54648 ± 127.366
2025-09-16 13:39:31,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [354.15472, 336.21707, 292.3197, 101.70442, 89.08882, 418.3536, 396.67606, 107.84643, 273.30127, 95.80261]
2025-09-16 13:39:31,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 65.0, 56.0, 20.0, 18.0, 77.0, 83.0, 21.0, 51.0, 19.0]
2025-09-16 13:39:31,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 6 minutes, 26 seconds)
2025-09-16 13:41:28,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:41:28,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 203.80699 ± 118.484
2025-09-16 13:41:28,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.19871, 107.28482, 398.8212, 285.35956, 118.634476, 107.673874, 353.89542, 106.845795, 102.75871, 343.59738]
2025-09-16 13:41:28,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 80.0, 56.0, 23.0, 21.0, 72.0, 21.0, 20.0, 66.0]
2025-09-16 13:41:28,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 5 minutes, 25 seconds)
2025-09-16 13:43:26,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:43:27,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 290.09210 ± 108.074
2025-09-16 13:43:27,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.438965, 424.19714, 330.481, 377.26837, 330.221, 295.73483, 90.642136, 284.1838, 300.59985, 378.15396]
2025-09-16 13:43:27,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 83.0, 63.0, 70.0, 63.0, 58.0, 18.0, 52.0, 60.0, 79.0]
2025-09-16 13:43:27,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (290.09) for latency 15
2025-09-16 13:43:27,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 4 minutes, 29 seconds)
2025-09-16 13:45:25,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:45:25,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 274.12921 ± 93.708
2025-09-16 13:45:25,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [296.92822, 118.74772, 352.525, 89.14398, 263.93182, 296.51178, 279.0811, 308.2072, 408.90192, 327.31345]
2025-09-16 13:45:25,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 23.0, 77.0, 18.0, 52.0, 58.0, 54.0, 59.0, 81.0, 69.0]
2025-09-16 13:45:25,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 5 minutes, 28 seconds)
2025-09-16 13:47:24,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:47:25,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 263.71417 ± 119.086
2025-09-16 13:47:25,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [311.86084, 100.66716, 232.2045, 90.10028, 411.46097, 327.7717, 320.29883, 328.23254, 101.71171, 412.83325]
2025-09-16 13:47:25,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 20.0, 45.0, 18.0, 85.0, 63.0, 61.0, 63.0, 20.0, 83.0]
2025-09-16 13:47:25,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 3 minutes, 43 seconds)
2025-09-16 13:49:23,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:49:23,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 251.44543 ± 125.437
2025-09-16 13:49:23,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [375.1302, 475.54095, 325.46805, 272.8218, 128.12656, 96.11327, 101.968155, 260.62292, 130.16347, 348.49893]
2025-09-16 13:49:23,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 96.0, 60.0, 53.0, 25.0, 19.0, 20.0, 51.0, 25.0, 65.0]
2025-09-16 13:49:23,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 1 minute, 44 seconds)
2025-09-16 13:51:23,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:51:23,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 287.24872 ± 153.874
2025-09-16 13:51:23,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [124.34757, 403.23334, 375.31232, 95.102615, 101.53064, 95.019295, 405.3431, 358.1446, 501.4303, 413.02344]
2025-09-16 13:51:23,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 80.0, 72.0, 19.0, 20.0, 19.0, 78.0, 67.0, 102.0, 79.0]
2025-09-16 13:51:23,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 34 seconds)
2025-09-16 13:53:21,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:53:22,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 284.29456 ± 106.089
2025-09-16 13:53:22,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.58367, 262.9718, 336.05212, 382.7699, 368.89786, 88.93346, 297.159, 241.97284, 392.08484, 370.52032]
2025-09-16 13:53:22,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 55.0, 62.0, 75.0, 68.0, 18.0, 59.0, 47.0, 72.0, 67.0]
2025-09-16 13:53:22,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 58 minutes, 28 seconds)
2025-09-16 13:55:20,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:55:21,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 229.28093 ± 127.421
2025-09-16 13:55:21,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [84.167564, 88.86551, 123.79401, 148.71104, 289.696, 101.58298, 427.50708, 400.0088, 338.04343, 290.4328]
2025-09-16 13:55:21,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 18.0, 24.0, 29.0, 54.0, 20.0, 81.0, 75.0, 62.0, 54.0]
2025-09-16 13:55:21,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 39 seconds)
2025-09-16 13:57:19,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:57:20,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 281.62170 ± 115.835
2025-09-16 13:57:20,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [100.86706, 239.40477, 345.30774, 254.78745, 135.712, 355.71606, 150.52061, 398.9881, 405.1592, 429.75394]
2025-09-16 13:57:20,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 46.0, 63.0, 52.0, 26.0, 65.0, 29.0, 73.0, 77.0, 78.0]
2025-09-16 13:57:20,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes, 28 seconds)
2025-09-16 13:59:17,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:59:18,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 301.43518 ± 112.948
2025-09-16 13:59:18,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [375.46863, 282.82925, 366.79095, 89.35353, 94.86399, 409.65466, 325.72876, 360.85748, 419.12894, 289.6755]
2025-09-16 13:59:18,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 51.0, 69.0, 18.0, 19.0, 77.0, 61.0, 66.0, 78.0, 53.0]
2025-09-16 13:59:18,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (301.44) for latency 15
2025-09-16 13:59:18,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 52 minutes, 32 seconds)
2025-09-16 14:01:17,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:01:17,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 246.94577 ± 119.931
2025-09-16 14:01:17,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [305.43356, 378.8518, 111.73269, 101.62752, 330.6646, 101.711845, 302.4908, 143.86711, 241.74225, 451.33545]
2025-09-16 14:01:17,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 82.0, 22.0, 20.0, 61.0, 20.0, 57.0, 28.0, 44.0, 82.0]
2025-09-16 14:01:17,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 50 minutes, 16 seconds)
2025-09-16 14:03:16,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:03:17,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 306.48880 ± 157.520
2025-09-16 14:03:17,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [122.21377, 304.8666, 113.48358, 308.43542, 302.01, 562.86646, 123.88882, 550.82104, 248.38538, 427.91666]
2025-09-16 14:03:17,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 58.0, 22.0, 57.0, 57.0, 111.0, 24.0, 102.0, 46.0, 86.0]
2025-09-16 14:03:17,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (306.49) for latency 15
2025-09-16 14:03:17,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 48 minutes, 29 seconds)
2025-09-16 14:05:15,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:05:16,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 213.16272 ± 110.271
2025-09-16 14:05:16,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [259.7072, 113.53441, 389.13293, 348.22278, 101.66261, 315.80054, 277.66232, 123.156746, 107.57521, 95.17247]
2025-09-16 14:05:16,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 22.0, 77.0, 64.0, 20.0, 59.0, 57.0, 24.0, 21.0, 19.0]
2025-09-16 14:05:16,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 46 minutes, 38 seconds)
2025-09-16 14:07:14,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:07:15,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 366.45370 ± 163.482
2025-09-16 14:07:15,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [239.53235, 276.36105, 303.0002, 511.60748, 352.05557, 446.15488, 101.58104, 354.08218, 741.0534, 339.1088]
2025-09-16 14:07:15,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [49.0, 51.0, 68.0, 97.0, 65.0, 83.0, 20.0, 64.0, 147.0, 63.0]
2025-09-16 14:07:15,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (366.45) for latency 15
2025-09-16 14:07:15,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 49 seconds)
2025-09-16 14:09:14,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:09:15,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 333.39725 ± 191.781
2025-09-16 14:09:15,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [237.6924, 661.5413, 631.56665, 456.8546, 129.84196, 363.8336, 370.33786, 251.5825, 140.99402, 89.727325]
2025-09-16 14:09:15,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [46.0, 136.0, 133.0, 87.0, 25.0, 70.0, 71.0, 47.0, 27.0, 18.0]
2025-09-16 14:09:15,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 9 seconds)
2025-09-16 14:11:14,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:11:15,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 356.34344 ± 184.633
2025-09-16 14:11:15,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [646.04816, 351.40308, 577.72064, 107.28003, 600.8399, 356.7544, 237.51103, 301.01575, 105.23586, 279.62558]
2025-09-16 14:11:15,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 67.0, 106.0, 21.0, 119.0, 74.0, 45.0, 55.0, 21.0, 52.0]
2025-09-16 14:11:15,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 26 seconds)
2025-09-16 14:13:13,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:13:14,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 271.48328 ± 150.727
2025-09-16 14:13:14,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [313.98523, 333.66028, 367.13214, 371.55237, 101.88125, 117.884224, 376.4872, 545.9668, 95.89017, 90.39317]
2025-09-16 14:13:14,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 60.0, 67.0, 79.0, 20.0, 23.0, 71.0, 110.0, 19.0, 18.0]
2025-09-16 14:13:14,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 13 seconds)
2025-09-16 14:15:12,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:15:12,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 235.27097 ± 122.925
2025-09-16 14:15:12,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.215385, 263.45987, 411.90033, 298.7842, 373.6156, 355.07895, 90.13904, 89.30741, 273.95615, 89.25258]
2025-09-16 14:15:12,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 49.0, 80.0, 57.0, 74.0, 65.0, 18.0, 18.0, 51.0, 18.0]
2025-09-16 14:15:12,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 2 seconds)
2025-09-16 14:17:12,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:17:13,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 393.67273 ± 157.958
2025-09-16 14:17:13,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [436.8955, 453.1853, 697.0055, 552.1704, 107.90627, 317.09006, 372.20337, 198.79585, 379.5374, 421.93756]
2025-09-16 14:17:13,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 84.0, 133.0, 103.0, 21.0, 57.0, 67.0, 42.0, 71.0, 77.0]
2025-09-16 14:17:13,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (393.67) for latency 15
2025-09-16 14:17:13,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 21 seconds)
2025-09-16 14:19:11,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:19:12,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 293.72900 ± 141.967
2025-09-16 14:19:12,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [320.97858, 275.2828, 89.61664, 385.9405, 353.1165, 539.33844, 351.97394, 110.82447, 102.43567, 407.78262]
2025-09-16 14:19:12,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 53.0, 18.0, 72.0, 66.0, 101.0, 69.0, 22.0, 20.0, 74.0]
2025-09-16 14:19:12,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 6 seconds)
2025-09-16 14:21:10,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:21:11,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 302.72940 ± 89.503
2025-09-16 14:21:11,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [192.55617, 427.80612, 359.87827, 133.99579, 311.69135, 364.80096, 349.73627, 392.3238, 235.7729, 258.73233]
2025-09-16 14:21:11,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 79.0, 70.0, 26.0, 56.0, 68.0, 66.0, 73.0, 47.0, 49.0]
2025-09-16 14:21:11,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 51 seconds)
2025-09-16 14:23:10,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:23:10,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 170.27138 ± 107.292
2025-09-16 14:23:10,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [118.63617, 117.915634, 106.457375, 119.49991, 146.52374, 101.01142, 400.16397, 106.964226, 365.9107, 119.63078]
2025-09-16 14:23:10,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 21.0, 23.0, 28.0, 20.0, 77.0, 21.0, 69.0, 23.0]
2025-09-16 14:23:10,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 5 seconds)
2025-09-16 14:25:08,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:25:09,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 273.93951 ± 111.879
2025-09-16 14:25:09,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [123.884995, 108.35826, 304.21213, 261.60245, 432.32837, 402.97546, 349.0391, 129.25922, 277.34256, 350.39264]
2025-09-16 14:25:09,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 21.0, 56.0, 49.0, 80.0, 75.0, 78.0, 25.0, 54.0, 65.0]
2025-09-16 14:25:09,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 27 minutes, 7 seconds)
2025-09-16 14:27:08,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:27:08,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 213.91794 ± 133.460
2025-09-16 14:27:08,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [332.94598, 107.42783, 457.70734, 84.137, 89.16652, 144.03716, 118.946396, 378.30292, 312.85953, 113.64886]
2025-09-16 14:27:08,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 21.0, 84.0, 17.0, 18.0, 28.0, 23.0, 81.0, 65.0, 22.0]
2025-09-16 14:27:08,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 56 seconds)
2025-09-16 14:29:07,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:29:08,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 304.05182 ± 141.467
2025-09-16 14:29:08,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [364.14835, 96.88479, 381.39435, 405.76828, 489.98117, 440.6213, 101.63748, 125.04941, 390.57037, 244.46297]
2025-09-16 14:29:08,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 19.0, 70.0, 80.0, 99.0, 85.0, 20.0, 24.0, 73.0, 46.0]
2025-09-16 14:29:08,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 minutes, 2 seconds)
2025-09-16 14:31:06,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:31:06,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 253.86763 ± 141.164
2025-09-16 14:31:06,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [422.8961, 393.7018, 404.12695, 336.7823, 95.91482, 107.29481, 101.45691, 179.12643, 96.38382, 400.9925]
2025-09-16 14:31:06,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 72.0, 76.0, 63.0, 19.0, 21.0, 20.0, 34.0, 19.0, 74.0]
2025-09-16 14:31:06,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 57 seconds)
2025-09-16 14:33:06,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:33:07,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 280.66220 ± 91.586
2025-09-16 14:33:07,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [271.03784, 354.5772, 310.01526, 360.02704, 324.76236, 341.71014, 352.45593, 276.2393, 125.54407, 90.25286]
2025-09-16 14:33:07,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 79.0, 66.0, 66.0, 61.0, 63.0, 69.0, 53.0, 24.0, 18.0]
2025-09-16 14:33:07,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 9 seconds)
2025-09-16 14:35:06,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:35:07,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 273.99899 ± 152.545
2025-09-16 14:35:07,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [385.6827, 278.0239, 96.92172, 359.06668, 532.60126, 376.62302, 95.868546, 401.70175, 96.05691, 117.443306]
2025-09-16 14:35:07,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 54.0, 19.0, 65.0, 99.0, 73.0, 19.0, 73.0, 19.0, 23.0]
2025-09-16 14:35:07,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 17 minutes, 36 seconds)
2025-09-16 14:37:05,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:37:06,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 328.13300 ± 181.586
2025-09-16 14:37:06,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [331.21274, 484.59372, 609.95984, 84.24708, 112.72002, 310.91208, 342.407, 338.97467, 88.949425, 577.35315]
2025-09-16 14:37:06,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 93.0, 112.0, 17.0, 22.0, 59.0, 63.0, 69.0, 18.0, 113.0]
2025-09-16 14:37:06,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 27 seconds)
2025-09-16 14:39:05,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:39:06,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 329.09680 ± 134.739
2025-09-16 14:39:06,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [296.88303, 164.90938, 391.37708, 633.54205, 283.1224, 284.21713, 418.54773, 346.39777, 352.6421, 119.32938]
2025-09-16 14:39:06,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 32.0, 71.0, 129.0, 55.0, 51.0, 77.0, 63.0, 64.0, 23.0]
2025-09-16 14:39:06,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 36 seconds)
2025-09-16 14:41:07,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:41:08,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 256.09851 ± 149.636
2025-09-16 14:41:08,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [355.3086, 314.02832, 113.73693, 95.03849, 139.69272, 543.10516, 122.90357, 298.21774, 136.44438, 442.5092]
2025-09-16 14:41:08,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 59.0, 22.0, 19.0, 27.0, 100.0, 24.0, 57.0, 27.0, 87.0]
2025-09-16 14:41:08,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 18 seconds)
2025-09-16 14:43:10,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:43:11,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 326.82220 ± 99.266
2025-09-16 14:43:11,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [320.07986, 398.45206, 264.27505, 102.32227, 430.03476, 305.76157, 476.12088, 285.03622, 301.9109, 384.22876]
2025-09-16 14:43:11,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 73.0, 51.0, 20.0, 80.0, 55.0, 90.0, 53.0, 57.0, 71.0]
2025-09-16 14:43:11,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 51 seconds)
2025-09-16 14:45:12,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:45:13,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 371.18427 ± 193.346
2025-09-16 14:45:13,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [287.44415, 447.15933, 454.4695, 409.88412, 274.33636, 832.5268, 126.473885, 339.15396, 426.86163, 113.5327]
2025-09-16 14:45:13,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 87.0, 83.0, 78.0, 51.0, 171.0, 25.0, 66.0, 82.0, 22.0]
2025-09-16 14:45:13,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 13 seconds)
2025-09-16 14:47:14,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:47:15,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 333.62628 ± 143.619
2025-09-16 14:47:15,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.49479, 554.673, 494.63348, 463.47855, 361.39975, 312.9401, 246.47096, 325.62646, 367.20706, 102.33855]
2025-09-16 14:47:15,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 104.0, 100.0, 87.0, 69.0, 57.0, 46.0, 62.0, 68.0, 20.0]
2025-09-16 14:47:15,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 52 seconds)
2025-09-16 14:49:16,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:49:18,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 393.46762 ± 144.550
2025-09-16 14:49:18,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [327.87076, 493.37253, 376.3124, 453.97104, 118.893105, 327.57257, 511.3335, 206.65013, 499.57013, 619.1302]
2025-09-16 14:49:18,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 92.0, 68.0, 85.0, 23.0, 61.0, 93.0, 40.0, 96.0, 118.0]
2025-09-16 14:49:18,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 6 minutes, 24 seconds)
2025-09-16 14:51:16,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:51:17,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 340.04449 ± 154.208
2025-09-16 14:51:17,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [384.31247, 498.46756, 128.32785, 95.62111, 455.10312, 529.8868, 385.2708, 373.25128, 124.57689, 425.62708]
2025-09-16 14:51:17,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 93.0, 25.0, 19.0, 96.0, 99.0, 71.0, 70.0, 24.0, 83.0]
2025-09-16 14:51:17,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 46 seconds)
2025-09-16 14:53:15,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:53:16,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 332.36850 ± 126.873
2025-09-16 14:53:16,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [468.42172, 413.1805, 495.7913, 95.37007, 362.2473, 333.04013, 119.558426, 292.65817, 404.86163, 338.55576]
2025-09-16 14:53:16,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 77.0, 93.0, 19.0, 69.0, 61.0, 23.0, 55.0, 74.0, 61.0]
2025-09-16 14:53:16,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 59 seconds)
2025-09-16 14:55:14,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:55:15,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 332.97162 ± 160.108
2025-09-16 14:55:15,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [119.138054, 354.40262, 139.1686, 94.95177, 321.31705, 435.0722, 571.9054, 496.92896, 318.2544, 478.5768]
2025-09-16 14:55:15,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 66.0, 27.0, 19.0, 58.0, 79.0, 118.0, 92.0, 60.0, 106.0]
2025-09-16 14:55:15,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 27 seconds)
2025-09-16 14:57:14,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:57:15,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 294.96039 ± 201.084
2025-09-16 14:57:15,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [112.42222, 453.91132, 94.99666, 258.2521, 745.0542, 418.54355, 89.858604, 294.27512, 380.27118, 102.018745]
2025-09-16 14:57:15,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 86.0, 19.0, 49.0, 138.0, 77.0, 18.0, 58.0, 69.0, 20.0]
2025-09-16 14:57:15,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 58 seconds)
2025-09-16 14:59:12,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:59:13,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 259.12668 ± 178.990
2025-09-16 14:59:13,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [454.4561, 667.58826, 96.4827, 105.78339, 317.96454, 300.89816, 113.45822, 112.74979, 130.57947, 291.30634]
2025-09-16 14:59:13,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 136.0, 19.0, 21.0, 66.0, 55.0, 22.0, 22.0, 25.0, 53.0]
2025-09-16 14:59:13,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 2 seconds)
2025-09-16 15:01:11,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:01:12,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 270.03122 ± 114.139
2025-09-16 15:01:12,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [338.4598, 397.30936, 291.82895, 376.26007, 111.63373, 96.3347, 352.8409, 371.29758, 112.08289, 252.26411]
2025-09-16 15:01:12,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 73.0, 58.0, 71.0, 22.0, 19.0, 66.0, 66.0, 22.0, 49.0]
2025-09-16 15:01:12,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 6 seconds)
2025-09-16 15:03:10,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:03:11,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 357.07941 ± 140.957
2025-09-16 15:03:11,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [135.42859, 420.6836, 291.7928, 100.96507, 338.39597, 349.31955, 530.5362, 515.8807, 394.54974, 493.242]
2025-09-16 15:03:11,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 78.0, 54.0, 20.0, 69.0, 78.0, 98.0, 95.0, 74.0, 90.0]
2025-09-16 15:03:11,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 12 seconds)
2025-09-16 15:05:09,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:05:10,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 228.58188 ± 137.612
2025-09-16 15:05:10,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [365.70544, 107.88813, 148.59071, 128.74014, 137.01665, 445.8529, 432.10324, 101.72263, 322.6088, 95.59008]
2025-09-16 15:05:10,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 21.0, 29.0, 25.0, 26.0, 82.0, 81.0, 20.0, 63.0, 19.0]
2025-09-16 15:05:10,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 1 second)
2025-09-16 15:07:08,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:07:09,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 302.59869 ± 175.979
2025-09-16 15:07:09,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [380.73218, 106.82562, 370.3014, 628.501, 444.8559, 424.09702, 354.2165, 117.56294, 96.81834, 102.07612]
2025-09-16 15:07:09,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 21.0, 67.0, 115.0, 94.0, 78.0, 67.0, 23.0, 19.0, 20.0]
2025-09-16 15:07:09,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 44 minutes, 57 seconds)
2025-09-16 15:09:07,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:09:08,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 364.41666 ± 93.155
2025-09-16 15:09:08,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [421.63412, 352.84558, 501.79593, 460.2355, 260.3227, 408.87878, 364.93024, 385.48544, 163.18217, 324.8559]
2025-09-16 15:09:08,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 67.0, 92.0, 88.0, 48.0, 77.0, 68.0, 71.0, 31.0, 58.0]
2025-09-16 15:09:08,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 15 seconds)
2025-09-16 15:11:07,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:11:08,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 396.53589 ± 114.233
2025-09-16 15:11:08,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [386.65375, 489.3051, 356.55573, 377.29697, 523.9236, 396.76566, 387.72495, 101.8105, 522.9683, 422.3546]
2025-09-16 15:11:08,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 105.0, 68.0, 71.0, 99.0, 73.0, 70.0, 20.0, 96.0, 76.0]
2025-09-16 15:11:08,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (396.54) for latency 15
2025-09-16 15:11:08,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 21 seconds)
2025-09-16 15:13:07,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:13:08,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 261.19650 ± 130.167
2025-09-16 15:13:08,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [124.03165, 311.6134, 352.61075, 318.20612, 340.81186, 144.73204, 118.75299, 517.4125, 294.4322, 89.36165]
2025-09-16 15:13:08,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 62.0, 66.0, 58.0, 75.0, 28.0, 23.0, 97.0, 55.0, 18.0]
2025-09-16 15:13:08,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 25 seconds)
2025-09-16 15:15:06,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:15:07,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 350.11252 ± 169.640
2025-09-16 15:15:07,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [495.4088, 419.5818, 84.1808, 336.08264, 564.7277, 409.4961, 101.89053, 361.10217, 165.3299, 563.32477]
2025-09-16 15:15:07,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 77.0, 17.0, 61.0, 116.0, 74.0, 20.0, 68.0, 31.0, 108.0]
2025-09-16 15:15:07,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 27 seconds)
2025-09-16 15:17:05,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:17:05,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 309.29779 ± 167.878
2025-09-16 15:17:05,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [465.41272, 512.10254, 356.58157, 370.04382, 113.749535, 422.4945, 121.68355, 515.8771, 112.29343, 102.73914]
2025-09-16 15:17:05,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 95.0, 66.0, 66.0, 22.0, 80.0, 24.0, 100.0, 22.0, 20.0]
2025-09-16 15:17:05,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 26 seconds)
2025-09-16 15:19:04,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:19:05,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 272.50677 ± 188.341
2025-09-16 15:19:05,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [346.68665, 89.59617, 392.33948, 136.10822, 582.59784, 593.9692, 108.4639, 95.672615, 265.47723, 114.15619]
2025-09-16 15:19:05,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 18.0, 71.0, 26.0, 107.0, 121.0, 21.0, 19.0, 51.0, 22.0]
2025-09-16 15:19:05,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 25 seconds)
2025-09-16 15:21:04,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:21:05,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 495.45566 ± 153.789
2025-09-16 15:21:05,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [365.6185, 692.6992, 560.5789, 400.85367, 563.26385, 438.9953, 367.7168, 340.43884, 397.5242, 826.8674]
2025-09-16 15:21:05,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 141.0, 115.0, 77.0, 108.0, 81.0, 69.0, 71.0, 75.0, 157.0]
2025-09-16 15:21:05,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (495.46) for latency 15
2025-09-16 15:21:05,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 34 seconds)
2025-09-16 15:23:06,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:23:07,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 375.80475 ± 166.340
2025-09-16 15:23:07,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [315.44977, 369.12878, 95.791885, 443.1566, 102.9455, 524.7383, 490.26315, 627.6413, 484.48395, 304.4485]
2025-09-16 15:23:07,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 66.0, 19.0, 81.0, 20.0, 102.0, 102.0, 119.0, 89.0, 58.0]
2025-09-16 15:23:07,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 52 seconds)
2025-09-16 15:25:09,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:25:10,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 270.54858 ± 190.140
2025-09-16 15:25:10,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [489.98538, 368.44855, 95.83445, 118.55101, 681.2186, 250.06563, 128.789, 125.55121, 356.8556, 90.18644]
2025-09-16 15:25:10,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 82.0, 19.0, 23.0, 126.0, 48.0, 25.0, 24.0, 64.0, 18.0]
2025-09-16 15:25:10,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 29 seconds)
2025-09-16 15:27:10,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:27:11,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 391.11758 ± 171.409
2025-09-16 15:27:11,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [548.0545, 89.12405, 118.57549, 425.58252, 317.5805, 503.88364, 391.92892, 633.49915, 340.05618, 542.8908]
2025-09-16 15:27:11,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 18.0, 23.0, 85.0, 60.0, 94.0, 70.0, 135.0, 65.0, 102.0]
2025-09-16 15:27:11,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 45 seconds)
2025-09-16 15:29:12,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:29:13,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 411.22202 ± 188.091
2025-09-16 15:29:13,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [535.79535, 363.83395, 404.9533, 702.714, 89.9026, 101.09435, 496.51913, 620.72125, 405.0552, 391.63123]
2025-09-16 15:29:13,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 66.0, 76.0, 132.0, 18.0, 20.0, 89.0, 113.0, 77.0, 76.0]
2025-09-16 15:29:13,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 25 minutes, 8 seconds)
2025-09-16 15:31:15,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:31:16,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 317.40225 ± 186.645
2025-09-16 15:31:16,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [585.62836, 315.05554, 101.471725, 376.52527, 157.40665, 413.93967, 400.71902, 120.21746, 83.92813, 619.1306]
2025-09-16 15:31:16,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 58.0, 20.0, 69.0, 30.0, 95.0, 72.0, 23.0, 17.0, 121.0]
2025-09-16 15:31:16,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 23 minutes, 23 seconds)
2025-09-16 15:33:13,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:33:14,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 371.16537 ± 140.343
2025-09-16 15:33:14,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [333.03802, 560.29205, 169.57195, 408.11932, 503.21472, 396.51956, 307.1895, 530.93665, 107.617676, 395.15427]
2025-09-16 15:33:14,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 104.0, 33.0, 72.0, 100.0, 72.0, 57.0, 100.0, 21.0, 73.0]
2025-09-16 15:33:14,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 56 seconds)
2025-09-16 15:35:12,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:35:13,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 320.41568 ± 185.638
2025-09-16 15:35:13,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.30677, 88.8025, 436.4901, 426.04086, 502.4638, 581.08, 96.16115, 438.34378, 119.32978, 420.13794]
2025-09-16 15:35:13,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 83.0, 79.0, 92.0, 127.0, 19.0, 80.0, 23.0, 91.0]
2025-09-16 15:35:13,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 23 seconds)
2025-09-16 15:37:11,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:37:12,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 379.78043 ± 160.121
2025-09-16 15:37:12,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [577.02606, 610.67395, 441.23273, 316.05154, 392.54297, 108.15966, 466.09512, 406.04962, 107.3829, 372.58997]
2025-09-16 15:37:12,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 113.0, 82.0, 58.0, 73.0, 21.0, 94.0, 84.0, 21.0, 68.0]
2025-09-16 15:37:12,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 9 seconds)
2025-09-16 15:39:06,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:39:08,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 404.75781 ± 140.467
2025-09-16 15:39:08,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [441.6489, 644.32135, 482.44614, 467.34424, 521.6505, 119.76978, 298.09433, 416.62103, 399.8546, 255.82718]
2025-09-16 15:39:08,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 123.0, 86.0, 91.0, 99.0, 23.0, 54.0, 75.0, 74.0, 50.0]
2025-09-16 15:39:08,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 20 seconds)
2025-09-16 15:41:02,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:41:04,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 386.89618 ± 189.091
2025-09-16 15:41:04,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [442.0444, 583.14105, 114.79936, 546.98517, 474.62762, 463.91968, 422.9966, 107.36854, 110.14194, 602.9378]
2025-09-16 15:41:04,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 110.0, 22.0, 116.0, 87.0, 85.0, 80.0, 21.0, 22.0, 118.0]
2025-09-16 15:41:04,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 33 seconds)
2025-09-16 15:42:57,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:42:58,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 312.83902 ± 161.501
2025-09-16 15:42:58,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [129.55266, 111.84693, 96.02165, 448.10422, 432.87994, 515.68097, 362.55206, 492.39645, 152.3463, 387.009]
2025-09-16 15:42:58,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 19.0, 81.0, 82.0, 97.0, 66.0, 90.0, 29.0, 71.0]
2025-09-16 15:42:58,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 5 seconds)
2025-09-16 15:44:51,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:44:52,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 397.85541 ± 150.869
2025-09-16 15:44:52,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [108.57977, 396.37598, 521.0414, 455.41226, 469.7962, 458.88828, 503.77283, 541.1336, 107.2407, 416.31293]
2025-09-16 15:44:52,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 74.0, 113.0, 82.0, 102.0, 81.0, 101.0, 97.0, 21.0, 75.0]
2025-09-16 15:44:52,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 minutes, 39 seconds)
2025-09-16 15:46:46,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:46:47,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 275.47275 ± 228.931
2025-09-16 15:46:47,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [677.4468, 101.5663, 95.724045, 108.15834, 96.61853, 102.96554, 549.5912, 595.6583, 325.02676, 101.971596]
2025-09-16 15:46:47,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 20.0, 19.0, 21.0, 19.0, 20.0, 120.0, 107.0, 60.0, 20.0]
2025-09-16 15:46:47,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 3 minutes, 14 seconds)
2025-09-16 15:48:42,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:48:42,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 265.18304 ± 134.473
2025-09-16 15:48:42,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [401.54282, 389.16165, 339.19623, 367.542, 129.38354, 129.45934, 473.0059, 150.39235, 182.67097, 89.4755]
2025-09-16 15:48:42,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 88.0, 63.0, 68.0, 25.0, 25.0, 83.0, 29.0, 35.0, 18.0]
2025-09-16 15:48:42,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 18 seconds)
2025-09-16 15:50:38,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:50:38,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 275.35681 ± 289.065
2025-09-16 15:50:38,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.19246, 83.68626, 533.5192, 303.5131, 1051.9829, 143.27574, 140.56601, 152.51201, 118.49981, 123.82065]
2025-09-16 15:50:38,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 17.0, 97.0, 56.0, 210.0, 28.0, 27.0, 29.0, 23.0, 24.0]
2025-09-16 15:50:38,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 59 minutes, 23 seconds)
2025-09-16 15:52:31,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:52:32,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 305.30792 ± 181.701
2025-09-16 15:52:32,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.413376, 449.02094, 101.16823, 94.7223, 290.91205, 539.6155, 454.18433, 389.94485, 536.3779, 107.719666]
2025-09-16 15:52:32,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 80.0, 20.0, 19.0, 54.0, 104.0, 84.0, 71.0, 121.0, 21.0]
2025-09-16 15:52:32,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 57 minutes, 26 seconds)
2025-09-16 15:54:33,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:54:34,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 335.84644 ± 166.585
2025-09-16 15:54:34,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [548.82416, 119.27268, 264.11075, 500.9937, 364.44507, 375.27557, 518.7641, 90.24819, 126.798584, 449.73135]
2025-09-16 15:54:34,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 23.0, 53.0, 94.0, 67.0, 67.0, 95.0, 18.0, 25.0, 94.0]
2025-09-16 15:54:34,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 12 seconds)
2025-09-16 15:56:35,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:56:37,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 475.71857 ± 205.405
2025-09-16 15:56:37,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [402.23962, 701.3626, 385.02, 450.8128, 308.73138, 236.37173, 469.1409, 612.1703, 260.89926, 930.4372]
2025-09-16 15:56:37,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 127.0, 70.0, 87.0, 57.0, 46.0, 91.0, 120.0, 49.0, 173.0]
2025-09-16 15:56:37,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 1 second)
2025-09-16 15:58:33,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:58:34,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 403.71628 ± 262.761
2025-09-16 15:58:34,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [936.2994, 604.26587, 119.49386, 609.3806, 170.5614, 482.42096, 101.62742, 111.54602, 420.09943, 481.46805]
2025-09-16 15:58:34,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 123.0, 23.0, 110.0, 32.0, 90.0, 20.0, 22.0, 84.0, 87.0]
2025-09-16 15:58:34,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 16 seconds)
2025-09-16 16:00:27,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:00:28,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 230.51575 ± 134.247
2025-09-16 16:00:28,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [136.69598, 101.23812, 151.72652, 124.24501, 119.27469, 345.6742, 413.74704, 130.81094, 299.69687, 482.04813]
2025-09-16 16:00:28,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 20.0, 30.0, 24.0, 23.0, 73.0, 77.0, 25.0, 61.0, 89.0]
2025-09-16 16:00:28,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 4 seconds)
2025-09-16 16:02:21,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:02:22,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 336.66974 ± 214.568
2025-09-16 16:02:22,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [339.15555, 418.33588, 95.93837, 119.30728, 330.34348, 606.2734, 553.8311, 687.76855, 126.13217, 89.6119]
2025-09-16 16:02:22,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 82.0, 19.0, 23.0, 61.0, 106.0, 97.0, 127.0, 24.0, 18.0]
2025-09-16 16:02:22,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 7 seconds)
2025-09-16 16:04:16,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:04:16,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 307.54855 ± 207.159
2025-09-16 16:04:16,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [435.73596, 89.493744, 654.3238, 136.0161, 499.55612, 129.89824, 458.58896, 89.81005, 486.34808, 95.71469]
2025-09-16 16:04:16,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 18.0, 117.0, 26.0, 91.0, 26.0, 83.0, 18.0, 94.0, 19.0]
2025-09-16 16:04:16,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 35 seconds)
2025-09-16 16:06:09,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:06:10,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 448.65985 ± 234.945
2025-09-16 16:06:10,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [129.43071, 487.20874, 533.8386, 764.79956, 424.23434, 448.00488, 369.50006, 894.2598, 96.71576, 338.6064]
2025-09-16 16:06:10,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 88.0, 112.0, 148.0, 75.0, 80.0, 69.0, 173.0, 19.0, 62.0]
2025-09-16 16:06:10,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 56 seconds)
2025-09-16 16:08:02,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:08:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 397.52725 ± 319.757
2025-09-16 16:08:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [118.475624, 588.8498, 306.53195, 319.9696, 135.5025, 283.98572, 396.7146, 497.58112, 1238.4081, 89.2534]
2025-09-16 16:08:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 126.0, 58.0, 61.0, 26.0, 56.0, 74.0, 90.0, 249.0, 18.0]
2025-09-16 16:08:03,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 42 seconds)
2025-09-16 16:09:56,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:09:57,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 374.30710 ± 215.070
2025-09-16 16:09:57,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [537.19666, 424.90857, 470.44345, 290.33957, 108.379745, 402.61688, 463.33835, 821.66534, 105.96094, 118.22162]
2025-09-16 16:09:57,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 85.0, 91.0, 59.0, 21.0, 77.0, 88.0, 157.0, 21.0, 23.0]
2025-09-16 16:09:57,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 49 seconds)
2025-09-16 16:11:50,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:11:50,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 328.33871 ± 163.209
2025-09-16 16:11:50,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [145.9305, 330.34598, 654.4245, 320.3719, 434.9643, 394.98395, 120.271866, 341.58224, 96.08069, 444.4312]
2025-09-16 16:11:50,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 60.0, 124.0, 60.0, 83.0, 70.0, 23.0, 63.0, 19.0, 78.0]
2025-09-16 16:11:51,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 55 seconds)
2025-09-16 16:13:45,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:13:46,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 351.68530 ± 184.575
2025-09-16 16:13:46,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [467.24103, 96.86901, 383.15543, 89.704216, 569.9202, 381.75714, 379.20413, 90.718094, 607.8737, 450.4098]
2025-09-16 16:13:46,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 19.0, 71.0, 18.0, 108.0, 69.0, 67.0, 18.0, 114.0, 80.0]
2025-09-16 16:13:46,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 3 seconds)
2025-09-16 16:15:40,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:15:41,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 289.40216 ± 177.316
2025-09-16 16:15:41,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [467.8277, 108.11768, 635.65674, 318.60757, 333.7213, 84.06337, 110.958176, 406.96567, 96.11492, 331.98828]
2025-09-16 16:15:41,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 21.0, 124.0, 60.0, 61.0, 17.0, 22.0, 72.0, 19.0, 60.0]
2025-09-16 16:15:41,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 15 seconds)
2025-09-16 16:17:35,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:17:36,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 310.30322 ± 171.511
2025-09-16 16:17:36,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [575.92645, 106.998535, 318.6642, 446.3572, 114.04443, 125.228874, 473.22644, 125.318665, 482.185, 335.08234]
2025-09-16 16:17:36,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 21.0, 65.0, 81.0, 22.0, 24.0, 83.0, 24.0, 86.0, 61.0]
2025-09-16 16:17:36,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 28 seconds)
2025-09-16 16:19:31,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:19:32,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 385.63678 ± 265.757
2025-09-16 16:19:32,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [116.942726, 125.76555, 89.7567, 797.0341, 295.87888, 107.438644, 794.7016, 424.64676, 549.2743, 554.9281]
2025-09-16 16:19:32,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 24.0, 18.0, 147.0, 55.0, 21.0, 143.0, 90.0, 109.0, 103.0]
2025-09-16 16:19:32,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 40 seconds)
2025-09-16 16:21:24,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:21:25,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 367.46674 ± 288.226
2025-09-16 16:21:25,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.72336, 89.129234, 114.195366, 634.7958, 272.52045, 571.7474, 389.80643, 89.98926, 1010.97925, 405.78107]
2025-09-16 16:21:25,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 22.0, 136.0, 51.0, 103.0, 72.0, 18.0, 205.0, 89.0]
2025-09-16 16:21:25,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 44 seconds)
2025-09-16 16:23:20,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:23:21,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 397.97552 ± 249.971
2025-09-16 16:23:21,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [604.295, 107.9036, 335.93387, 358.66714, 107.8307, 630.5798, 388.58844, 431.81247, 101.22438, 912.91986]
2025-09-16 16:23:21,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 21.0, 61.0, 72.0, 21.0, 117.0, 75.0, 75.0, 20.0, 173.0]
2025-09-16 16:23:21,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 51 seconds)
2025-09-16 16:25:14,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:25:15,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 419.44693 ± 218.113
2025-09-16 16:25:15,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [353.34476, 321.64917, 348.0413, 131.12671, 637.13226, 722.41724, 790.9493, 343.14285, 428.02533, 118.640594]
2025-09-16 16:25:15,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 58.0, 65.0, 25.0, 120.0, 145.0, 158.0, 65.0, 82.0, 23.0]
2025-09-16 16:25:15,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 54 seconds)
2025-09-16 16:27:10,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:27:11,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 410.69220 ± 178.515
2025-09-16 16:27:11,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [118.51332, 593.871, 526.61774, 453.46115, 538.8463, 316.20493, 477.13162, 642.8791, 337.706, 101.69079]
2025-09-16 16:27:11,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 104.0, 112.0, 84.0, 99.0, 60.0, 104.0, 120.0, 73.0, 20.0]
2025-09-16 16:27:11,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 59 seconds)
2025-09-16 16:29:05,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:29:06,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 393.37836 ± 241.199
2025-09-16 16:29:06,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [845.8602, 90.18843, 445.5585, 301.76297, 492.9882, 454.47836, 126.71821, 96.37356, 369.5469, 710.30804]
2025-09-16 16:29:06,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 18.0, 82.0, 58.0, 111.0, 82.0, 25.0, 19.0, 69.0, 127.0]
2025-09-16 16:29:06,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 3 seconds)
2025-09-16 16:31:00,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:31:02,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 415.47607 ± 231.298
2025-09-16 16:31:02,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.711784, 540.44525, 122.08242, 511.12268, 487.73444, 328.81982, 613.29596, 809.11865, 544.6311, 101.798676]
2025-09-16 16:31:02,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 101.0, 24.0, 92.0, 96.0, 61.0, 116.0, 149.0, 115.0, 20.0]
2025-09-16 16:31:02,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 12 seconds)
2025-09-16 16:32:55,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:32:57,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 525.42151 ± 236.125
2025-09-16 16:32:57,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [594.9697, 668.6311, 125.10828, 101.51876, 409.31635, 571.39905, 840.2676, 619.8902, 535.95953, 787.15485]
2025-09-16 16:32:57,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 144.0, 24.0, 20.0, 74.0, 128.0, 173.0, 111.0, 120.0, 145.0]
2025-09-16 16:32:57,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (525.42) for latency 15
2025-09-16 16:32:57,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 16 seconds)
2025-09-16 16:34:51,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:34:52,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 370.71426 ± 225.212
2025-09-16 16:34:52,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [172.96794, 612.6842, 428.90088, 113.64124, 487.64493, 428.70538, 432.7897, 802.73676, 102.04601, 125.02531]
2025-09-16 16:34:52,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 108.0, 83.0, 22.0, 109.0, 78.0, 81.0, 160.0, 20.0, 24.0]
2025-09-16 16:34:52,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 22 seconds)
2025-09-16 16:36:47,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:36:48,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 420.99698 ± 267.728
2025-09-16 16:36:48,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [108.07797, 692.70215, 329.4316, 423.8899, 367.24298, 586.42896, 102.19864, 936.8423, 579.2358, 83.9192]
2025-09-16 16:36:48,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 137.0, 60.0, 87.0, 68.0, 105.0, 20.0, 170.0, 108.0, 17.0]
2025-09-16 16:36:48,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 27 seconds)
2025-09-16 16:38:41,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:38:42,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 472.07730 ± 295.660
2025-09-16 16:38:42,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [562.6561, 1041.6306, 96.68846, 421.53027, 740.3952, 674.9005, 464.37552, 90.13296, 103.070206, 525.39325]
2025-09-16 16:38:42,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 197.0, 19.0, 81.0, 140.0, 135.0, 99.0, 18.0, 20.0, 93.0]
2025-09-16 16:38:42,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 31 seconds)
2025-09-16 16:40:36,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:40:38,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 394.14145 ± 367.363
2025-09-16 16:40:38,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [120.29251, 1391.2717, 431.0712, 96.33471, 406.67755, 399.34378, 430.9379, 90.23788, 95.72451, 479.5227]
2025-09-16 16:40:38,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 269.0, 78.0, 19.0, 87.0, 77.0, 81.0, 18.0, 19.0, 103.0]
2025-09-16 16:40:38,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 36 seconds)
2025-09-16 16:42:33,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:42:34,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 452.27237 ± 145.791
2025-09-16 16:42:34,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [84.232216, 544.6139, 432.13654, 479.99338, 392.578, 525.00616, 507.0392, 672.54877, 500.6896, 383.88562]
2025-09-16 16:42:34,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 100.0, 77.0, 93.0, 70.0, 95.0, 90.0, 119.0, 99.0, 71.0]
2025-09-16 16:42:34,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 41 seconds)
2025-09-16 16:44:27,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:44:28,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 425.45050 ± 173.822
2025-09-16 16:44:28,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [777.2076, 394.44577, 514.8638, 332.9631, 527.6713, 380.12872, 346.5136, 564.3547, 88.89955, 327.45682]
2025-09-16 16:44:28,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 68.0, 97.0, 64.0, 102.0, 68.0, 62.0, 107.0, 18.0, 62.0]
2025-09-16 16:44:28,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 46 seconds)
2025-09-16 16:46:23,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:46:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 407.93262 ± 242.830
2025-09-16 16:46:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [401.28378, 119.04545, 89.42305, 693.3152, 656.7247, 344.64337, 468.21146, 397.79092, 101.38227, 807.50616]
2025-09-16 16:46:24,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 23.0, 18.0, 125.0, 121.0, 64.0, 85.0, 74.0, 20.0, 145.0]
2025-09-16 16:46:24,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 50 seconds)
2025-09-16 16:48:19,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:48:20,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 385.18417 ± 139.270
2025-09-16 16:48:20,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [438.7004, 409.4801, 292.73425, 469.66458, 679.039, 384.63687, 414.2351, 375.3436, 278.9477, 109.06001]
2025-09-16 16:48:20,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 73.0, 56.0, 88.0, 125.0, 68.0, 74.0, 72.0, 56.0, 21.0]
2025-09-16 16:48:20,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 55 seconds)
2025-09-16 16:50:15,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 16:50:16,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 396.69244 ± 184.545
2025-09-16 16:50:16,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [484.84, 130.0194, 102.3757, 644.9202, 461.47638, 534.5679, 279.92758, 421.26425, 267.37308, 640.1601]
2025-09-16 16:50:16,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 25.0, 20.0, 122.0, 85.0, 92.0, 59.0, 76.0, 51.0, 116.0]
2025-09-16 16:50:16,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1251 [DEBUG]: Training session finished
