2025-09-16 12:15:42,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.200-delay_12
2025-09-16 12:15:42,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.200-delay_12
2025-09-16 12:15:42,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'12': <latency_env.delayed_mdp.ConstantDelay object at 0x149df8c849d0>}
2025-09-16 12:15:42,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:15:42,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:15:42,224 baseline-bpql-noisepromille200-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=580, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:15:42,224 baseline-bpql-noisepromille200-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:15:43,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:15:43,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:17:30,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:17:31,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 279.35864 ± 86.424
2025-09-16 12:17:31,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [307.5478, 343.31152, 325.7373, 344.95532, 290.3288, 113.28459, 337.04306, 112.828995, 346.7056, 271.84332]
2025-09-16 12:17:31,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 62.0, 66.0, 65.0, 56.0, 22.0, 62.0, 22.0, 64.0, 51.0]
2025-09-16 12:17:31,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (279.36) for latency 12
2025-09-16 12:17:31,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 57 minutes, 40 seconds)
2025-09-16 12:19:27,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:19:28,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 266.55389 ± 64.984
2025-09-16 12:19:28,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [165.17719, 308.0023, 278.3464, 324.34903, 295.54443, 117.388725, 281.11542, 284.1512, 299.72763, 311.73694]
2025-09-16 12:19:28,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 57.0, 51.0, 58.0, 55.0, 23.0, 53.0, 55.0, 56.0, 63.0]
2025-09-16 12:19:28,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 3 minutes, 14 seconds)
2025-09-16 12:21:24,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:21:24,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 254.10066 ± 104.510
2025-09-16 12:21:24,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [416.5599, 95.36196, 123.126144, 356.0686, 107.814285, 256.6372, 287.19675, 295.79095, 274.77036, 327.68045]
2025-09-16 12:21:24,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 19.0, 24.0, 65.0, 21.0, 47.0, 52.0, 54.0, 53.0, 60.0]
2025-09-16 12:21:24,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 3 minutes, 50 seconds)
2025-09-16 12:23:21,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:23:22,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 293.83374 ± 69.289
2025-09-16 12:23:22,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [327.32602, 253.47021, 300.49423, 350.29858, 321.00928, 352.22803, 290.73822, 254.03941, 369.64838, 119.084885]
2025-09-16 12:23:22,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 46.0, 67.0, 75.0, 59.0, 69.0, 54.0, 54.0, 68.0, 23.0]
2025-09-16 12:23:22,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (293.83) for latency 12
2025-09-16 12:23:22,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 25 seconds)
2025-09-16 12:25:18,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:25:19,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 338.83316 ± 63.546
2025-09-16 12:25:19,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [461.91458, 267.90222, 261.39325, 292.04608, 341.15057, 423.27573, 358.91867, 334.39816, 368.36118, 278.9713]
2025-09-16 12:25:19,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 50.0, 48.0, 54.0, 62.0, 80.0, 66.0, 74.0, 68.0, 52.0]
2025-09-16 12:25:19,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (338.83) for latency 12
2025-09-16 12:25:19,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 9 seconds)
2025-09-16 12:27:16,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:27:17,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 350.63788 ± 145.510
2025-09-16 12:27:17,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [529.9262, 236.01743, 310.65814, 292.05142, 378.80978, 674.62164, 304.93613, 361.4411, 290.5349, 127.381836]
2025-09-16 12:27:17,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 44.0, 62.0, 55.0, 69.0, 133.0, 56.0, 67.0, 53.0, 25.0]
2025-09-16 12:27:17,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (350.64) for latency 12
2025-09-16 12:27:17,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 35 seconds)
2025-09-16 12:29:13,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:29:14,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 374.30670 ± 170.134
2025-09-16 12:29:14,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [96.18774, 347.7424, 276.59592, 415.5032, 350.0382, 414.92438, 380.39606, 306.01654, 342.91498, 812.7475]
2025-09-16 12:29:14,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 67.0, 50.0, 92.0, 65.0, 75.0, 71.0, 59.0, 65.0, 156.0]
2025-09-16 12:29:14,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (374.31) for latency 12
2025-09-16 12:29:14,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 38 seconds)
2025-09-16 12:31:10,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:31:10,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 328.02173 ± 46.314
2025-09-16 12:31:10,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [332.41046, 374.49612, 249.47127, 396.56754, 295.11563, 341.59387, 329.88196, 388.08212, 281.65594, 290.94226]
2025-09-16 12:31:10,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 69.0, 47.0, 73.0, 56.0, 64.0, 61.0, 73.0, 52.0, 53.0]
2025-09-16 12:31:10,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 59 minutes, 42 seconds)
2025-09-16 12:33:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:33:08,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 365.56088 ± 97.318
2025-09-16 12:33:08,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [249.3523, 290.9237, 471.35672, 380.1676, 268.90668, 337.02203, 583.6208, 407.35208, 370.27884, 296.62805]
2025-09-16 12:33:08,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 53.0, 87.0, 70.0, 52.0, 61.0, 116.0, 74.0, 69.0, 55.0]
2025-09-16 12:33:08,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 55 seconds)
2025-09-16 12:35:04,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:35:05,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 341.62317 ± 44.359
2025-09-16 12:35:05,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [348.4255, 287.48965, 369.77377, 382.6679, 414.39026, 354.73795, 304.5345, 257.33054, 338.87744, 358.0041]
2025-09-16 12:35:05,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 53.0, 71.0, 69.0, 74.0, 76.0, 56.0, 48.0, 72.0, 71.0]
2025-09-16 12:35:05,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 2 seconds)
2025-09-16 12:37:02,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:37:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 333.31842 ± 149.171
2025-09-16 12:37:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [273.84854, 691.12866, 259.6565, 485.11136, 411.8589, 272.29663, 148.58826, 241.30473, 235.12392, 314.2665]
2025-09-16 12:37:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 136.0, 50.0, 89.0, 77.0, 51.0, 28.0, 51.0, 44.0, 59.0]
2025-09-16 12:37:03,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 53 minutes, 52 seconds)
2025-09-16 12:38:59,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:39:00,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 393.69928 ± 100.626
2025-09-16 12:39:00,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [477.68005, 356.42163, 288.5661, 421.67563, 386.98215, 531.577, 380.25577, 282.7343, 562.3104, 248.78978]
2025-09-16 12:39:00,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 66.0, 52.0, 77.0, 73.0, 102.0, 72.0, 58.0, 106.0, 48.0]
2025-09-16 12:39:00,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (393.70) for latency 12
2025-09-16 12:39:00,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 52 minutes, 6 seconds)
2025-09-16 12:40:57,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:40:58,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 300.75717 ± 114.406
2025-09-16 12:40:58,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [339.51617, 354.47778, 347.72894, 335.81262, 260.11996, 106.7803, 346.29266, 515.8169, 288.16113, 112.86523]
2025-09-16 12:40:58,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 79.0, 64.0, 62.0, 48.0, 21.0, 65.0, 101.0, 54.0, 22.0]
2025-09-16 12:40:58,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 50 minutes, 15 seconds)
2025-09-16 12:42:54,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:42:55,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 291.95792 ± 122.907
2025-09-16 12:42:55,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.61491, 409.46854, 344.66074, 113.02306, 125.11426, 350.621, 304.63837, 376.47998, 368.46747, 431.49084]
2025-09-16 12:42:55,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 73.0, 64.0, 22.0, 24.0, 64.0, 56.0, 82.0, 69.0, 80.0]
2025-09-16 12:42:55,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 48 minutes, 10 seconds)
2025-09-16 12:44:52,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:44:53,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 329.65729 ± 95.828
2025-09-16 12:44:53,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [361.53864, 243.5613, 450.1487, 369.59253, 328.52457, 278.34924, 320.6429, 125.202896, 338.72717, 480.28503]
2025-09-16 12:44:53,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 45.0, 99.0, 67.0, 70.0, 51.0, 66.0, 24.0, 64.0, 88.0]
2025-09-16 12:44:53,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 46 minutes, 34 seconds)
2025-09-16 12:46:50,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:46:51,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 321.53754 ± 91.207
2025-09-16 12:46:51,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [333.36185, 415.18475, 238.71521, 422.4149, 280.36777, 389.0541, 142.30225, 391.2277, 382.59253, 220.1546]
2025-09-16 12:46:51,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 79.0, 43.0, 78.0, 52.0, 71.0, 27.0, 71.0, 69.0, 43.0]
2025-09-16 12:46:51,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 44 minutes, 38 seconds)
2025-09-16 12:48:47,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:48:48,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 376.60370 ± 103.755
2025-09-16 12:48:48,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [527.17035, 316.19672, 336.28525, 363.87885, 304.29196, 286.4573, 336.39957, 569.2332, 479.46585, 246.65796]
2025-09-16 12:48:48,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 64.0, 64.0, 67.0, 57.0, 52.0, 63.0, 106.0, 93.0, 45.0]
2025-09-16 12:48:48,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 42 minutes, 38 seconds)
2025-09-16 12:50:45,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:50:46,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 293.54471 ± 117.076
2025-09-16 12:50:46,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [119.58082, 318.0779, 101.352844, 315.05072, 309.5299, 497.38483, 317.8493, 425.97668, 205.01193, 325.63205]
2025-09-16 12:50:46,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 67.0, 20.0, 56.0, 57.0, 100.0, 57.0, 80.0, 38.0, 59.0]
2025-09-16 12:50:46,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 44 seconds)
2025-09-16 12:52:42,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:52:43,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 347.10101 ± 88.191
2025-09-16 12:52:43,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [360.6097, 126.085144, 390.7114, 465.30322, 333.82993, 444.1107, 340.5054, 335.73474, 298.9106, 375.20917]
2025-09-16 12:52:43,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 24.0, 73.0, 87.0, 60.0, 82.0, 63.0, 63.0, 56.0, 70.0]
2025-09-16 12:52:43,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 38 minutes, 45 seconds)
2025-09-16 12:54:40,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:54:40,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 316.70239 ± 119.002
2025-09-16 12:54:40,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [220.43962, 124.57785, 266.55167, 347.4426, 293.96945, 401.97278, 235.28862, 364.8839, 320.9329, 590.96466]
2025-09-16 12:54:40,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 24.0, 49.0, 66.0, 56.0, 74.0, 44.0, 66.0, 60.0, 116.0]
2025-09-16 12:54:40,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes, 35 seconds)
2025-09-16 12:56:38,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:56:39,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 401.08975 ± 141.387
2025-09-16 12:56:39,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [334.32944, 323.77695, 397.44513, 367.33643, 595.76514, 355.70648, 347.5803, 272.45584, 738.44666, 278.0551]
2025-09-16 12:56:39,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 64.0, 74.0, 69.0, 112.0, 66.0, 75.0, 49.0, 141.0, 56.0]
2025-09-16 12:56:39,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (401.09) for latency 12
2025-09-16 12:56:39,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 34 minutes, 43 seconds)
2025-09-16 12:58:36,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:58:37,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 383.12369 ± 237.474
2025-09-16 12:58:37,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [607.8649, 292.50555, 95.00776, 308.13794, 636.39276, 837.6233, 429.48575, 128.37254, 102.55723, 393.289]
2025-09-16 12:58:37,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 55.0, 19.0, 56.0, 128.0, 158.0, 77.0, 25.0, 20.0, 75.0]
2025-09-16 12:58:37,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 33 minutes, 4 seconds)
2025-09-16 13:00:33,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:00:34,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 344.80835 ± 123.693
2025-09-16 13:00:34,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [366.8258, 411.85815, 101.26658, 454.53207, 338.33038, 391.47305, 111.10438, 406.80478, 417.94437, 447.94397]
2025-09-16 13:00:34,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 77.0, 20.0, 85.0, 62.0, 72.0, 22.0, 76.0, 75.0, 87.0]
2025-09-16 13:00:34,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes, 58 seconds)
2025-09-16 13:02:31,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:02:32,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 434.29517 ± 128.084
2025-09-16 13:02:32,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [467.6272, 273.0759, 524.66187, 552.3797, 494.25824, 291.94098, 365.66678, 319.7128, 359.447, 694.1813]
2025-09-16 13:02:32,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 53.0, 109.0, 103.0, 100.0, 54.0, 64.0, 58.0, 64.0, 130.0]
2025-09-16 13:02:32,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (434.30) for latency 12
2025-09-16 13:02:32,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 29 minutes, 7 seconds)
2025-09-16 13:04:28,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:04:29,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 403.98300 ± 142.815
2025-09-16 13:04:29,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [593.29425, 135.87863, 372.8015, 492.52444, 326.67413, 466.6447, 340.12253, 645.24585, 282.05328, 384.59082]
2025-09-16 13:04:29,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 26.0, 69.0, 91.0, 61.0, 84.0, 62.0, 131.0, 51.0, 80.0]
2025-09-16 13:04:29,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 27 minutes, 14 seconds)
2025-09-16 13:06:27,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:06:28,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 368.14645 ± 73.312
2025-09-16 13:06:28,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [280.291, 371.491, 322.68393, 308.1619, 271.5228, 367.67557, 522.00446, 390.7969, 442.391, 404.44598]
2025-09-16 13:06:28,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 67.0, 62.0, 58.0, 51.0, 72.0, 99.0, 70.0, 82.0, 76.0]
2025-09-16 13:06:28,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 25 minutes, 20 seconds)
2025-09-16 13:08:24,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:08:26,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 479.19669 ± 159.441
2025-09-16 13:08:26,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [473.07187, 572.2439, 685.0952, 293.8186, 496.95435, 736.8819, 259.26172, 354.50995, 323.33054, 596.79865]
2025-09-16 13:08:26,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 122.0, 127.0, 61.0, 92.0, 139.0, 58.0, 67.0, 59.0, 108.0]
2025-09-16 13:08:26,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (479.20) for latency 12
2025-09-16 13:08:26,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 23 minutes, 13 seconds)
2025-09-16 13:10:23,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:10:24,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 358.94482 ± 227.386
2025-09-16 13:10:24,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [276.69873, 361.4142, 822.57416, 83.605385, 472.51086, 395.6701, 94.46455, 637.6772, 323.26428, 121.56875]
2025-09-16 13:10:24,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 66.0, 160.0, 17.0, 88.0, 72.0, 19.0, 121.0, 60.0, 24.0]
2025-09-16 13:10:24,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 21 minutes, 42 seconds)
2025-09-16 13:12:21,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:12:22,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 424.89893 ± 137.533
2025-09-16 13:12:22,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [135.6211, 289.613, 559.3644, 484.5713, 547.3796, 299.5104, 388.86154, 436.30606, 546.7044, 561.05725]
2025-09-16 13:12:22,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 52.0, 119.0, 88.0, 99.0, 55.0, 73.0, 83.0, 101.0, 119.0]
2025-09-16 13:12:22,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 39 seconds)
2025-09-16 13:14:19,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:14:20,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 442.22162 ± 107.249
2025-09-16 13:14:20,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [455.05246, 543.26056, 621.50024, 262.55905, 372.55692, 525.10785, 543.9022, 368.1703, 345.38275, 384.7237]
2025-09-16 13:14:20,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 97.0, 132.0, 48.0, 70.0, 94.0, 114.0, 65.0, 60.0, 81.0]
2025-09-16 13:14:20,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes, 52 seconds)
2025-09-16 13:16:18,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:16:19,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 370.91989 ± 128.574
2025-09-16 13:16:19,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [394.94437, 293.06183, 443.03912, 281.95285, 403.43542, 258.64456, 285.38376, 370.12372, 269.5964, 709.0171]
2025-09-16 13:16:19,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 54.0, 97.0, 60.0, 75.0, 48.0, 52.0, 66.0, 54.0, 137.0]
2025-09-16 13:16:19,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 15 minutes, 56 seconds)
2025-09-16 13:18:17,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:18:18,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 427.36981 ± 136.750
2025-09-16 13:18:18,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [650.76715, 361.13434, 95.174034, 423.55875, 401.30634, 530.247, 380.76196, 487.7886, 497.6052, 445.35455]
2025-09-16 13:18:18,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 67.0, 19.0, 78.0, 73.0, 97.0, 71.0, 92.0, 91.0, 81.0]
2025-09-16 13:18:18,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 14 minutes, 13 seconds)
2025-09-16 13:20:14,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:20:15,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 472.86093 ± 235.736
2025-09-16 13:20:15,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [558.68524, 83.3143, 570.927, 481.90686, 934.111, 549.175, 649.8697, 122.19025, 393.12283, 385.30704]
2025-09-16 13:20:15,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 17.0, 114.0, 105.0, 178.0, 98.0, 117.0, 24.0, 70.0, 70.0]
2025-09-16 13:20:15,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 12 minutes, 1 second)
2025-09-16 13:22:12,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:22:13,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 417.71417 ± 289.496
2025-09-16 13:22:13,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [838.67444, 397.91238, 312.24023, 948.4365, 672.82043, 120.35057, 100.995926, 107.32102, 288.1953, 390.19467]
2025-09-16 13:22:13,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 73.0, 59.0, 171.0, 125.0, 23.0, 20.0, 21.0, 54.0, 82.0]
2025-09-16 13:22:13,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 10 minutes, 9 seconds)
2025-09-16 13:24:11,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:24:12,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 368.93906 ± 260.231
2025-09-16 13:24:12,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [378.64362, 342.04678, 330.13205, 95.64195, 990.7281, 290.46185, 593.61975, 100.989555, 95.74944, 471.3775]
2025-09-16 13:24:12,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 65.0, 58.0, 19.0, 209.0, 54.0, 112.0, 20.0, 19.0, 83.0]
2025-09-16 13:24:12,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 8 minutes, 11 seconds)
2025-09-16 13:26:09,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:26:10,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 395.02182 ± 109.421
2025-09-16 13:26:10,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [391.38867, 302.40494, 285.55722, 353.03464, 315.64783, 430.98358, 327.30402, 675.03046, 477.45688, 391.41022]
2025-09-16 13:26:10,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 54.0, 51.0, 66.0, 56.0, 78.0, 59.0, 129.0, 87.0, 70.0]
2025-09-16 13:26:10,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 6 minutes, 10 seconds)
2025-09-16 13:28:07,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:28:09,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 461.30576 ± 230.224
2025-09-16 13:28:09,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [421.6183, 154.97026, 426.80582, 272.92984, 519.99603, 932.4036, 687.28357, 537.93256, 128.42363, 530.6942]
2025-09-16 13:28:09,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 29.0, 79.0, 62.0, 112.0, 169.0, 126.0, 99.0, 25.0, 111.0]
2025-09-16 13:28:09,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 4 minutes, 5 seconds)
2025-09-16 13:30:07,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:30:08,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 424.28595 ± 161.804
2025-09-16 13:30:08,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.82145, 718.6027, 541.8887, 586.463, 306.4776, 378.52884, 309.4528, 457.17236, 466.11108, 370.3408]
2025-09-16 13:30:08,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 131.0, 101.0, 108.0, 57.0, 66.0, 56.0, 84.0, 101.0, 67.0]
2025-09-16 13:30:08,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 2 minutes, 23 seconds)
2025-09-16 13:32:05,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:32:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 405.02588 ± 159.326
2025-09-16 13:32:06,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [695.49786, 112.731415, 339.52438, 484.19543, 325.3001, 549.46564, 386.38016, 283.2402, 557.46185, 316.4617]
2025-09-16 13:32:06,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 22.0, 71.0, 89.0, 61.0, 103.0, 83.0, 61.0, 102.0, 56.0]
2025-09-16 13:32:06,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 33 seconds)
2025-09-16 13:34:03,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:34:04,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 500.33481 ± 208.180
2025-09-16 13:34:04,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [345.2673, 123.85051, 671.69604, 493.25314, 605.74054, 931.7419, 378.70877, 459.23343, 607.5191, 386.33746]
2025-09-16 13:34:04,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 24.0, 142.0, 99.0, 110.0, 175.0, 67.0, 99.0, 106.0, 70.0]
2025-09-16 13:34:04,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (500.33) for latency 12
2025-09-16 13:34:04,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 58 minutes, 29 seconds)
2025-09-16 13:36:02,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:36:03,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 560.72400 ± 281.100
2025-09-16 13:36:03,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [144.13927, 472.5636, 726.09125, 680.2316, 592.49725, 499.56866, 350.37573, 445.6167, 435.8668, 1260.2898]
2025-09-16 13:36:03,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 87.0, 130.0, 127.0, 109.0, 96.0, 67.0, 81.0, 78.0, 232.0]
2025-09-16 13:36:03,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (560.72) for latency 12
2025-09-16 13:36:03,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 56 minutes, 37 seconds)
2025-09-16 13:38:01,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:38:02,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 456.22641 ± 145.725
2025-09-16 13:38:02,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [598.7348, 771.25604, 479.01398, 525.91376, 324.69998, 384.81665, 273.65488, 310.7197, 518.6549, 374.7999]
2025-09-16 13:38:02,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 147.0, 86.0, 110.0, 58.0, 68.0, 49.0, 60.0, 93.0, 70.0]
2025-09-16 13:38:02,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 54 minutes, 43 seconds)
2025-09-16 13:39:59,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:40:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 500.06390 ± 298.907
2025-09-16 13:40:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [505.53995, 125.68697, 96.1372, 897.5362, 657.9778, 434.36798, 107.27849, 825.7897, 473.699, 876.6258]
2025-09-16 13:40:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 24.0, 19.0, 189.0, 144.0, 79.0, 21.0, 167.0, 82.0, 162.0]
2025-09-16 13:40:01,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 52 minutes, 38 seconds)
2025-09-16 13:41:58,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:42:00,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 501.66455 ± 243.431
2025-09-16 13:42:00,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [675.3434, 300.8466, 554.54474, 443.20197, 1057.0072, 354.35925, 564.7324, 135.80484, 615.9281, 314.87674]
2025-09-16 13:42:00,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 55.0, 100.0, 82.0, 196.0, 62.0, 103.0, 26.0, 110.0, 56.0]
2025-09-16 13:42:00,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 50 minutes, 44 seconds)
2025-09-16 13:43:56,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:43:58,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 494.25870 ± 259.099
2025-09-16 13:43:58,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [383.98022, 501.87213, 289.9428, 135.13237, 480.97977, 507.91428, 425.52325, 411.17654, 1169.6833, 636.3821]
2025-09-16 13:43:58,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 94.0, 54.0, 26.0, 83.0, 91.0, 77.0, 73.0, 228.0, 123.0]
2025-09-16 13:43:58,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 48 minutes, 46 seconds)
2025-09-16 13:45:56,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:45:57,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 351.18170 ± 167.342
2025-09-16 13:45:57,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [352.70782, 296.07413, 434.2845, 301.07773, 124.203964, 324.85135, 133.81831, 274.0248, 652.9943, 617.7802]
2025-09-16 13:45:57,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 53.0, 79.0, 54.0, 24.0, 70.0, 26.0, 50.0, 137.0, 117.0]
2025-09-16 13:45:57,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 46 minutes, 50 seconds)
2025-09-16 13:47:54,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:47:55,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 630.53650 ± 281.784
2025-09-16 13:47:55,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [410.22638, 1087.3944, 348.4701, 530.1737, 690.0976, 145.99164, 692.22217, 644.8607, 675.048, 1080.8806]
2025-09-16 13:47:55,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 200.0, 63.0, 93.0, 132.0, 28.0, 130.0, 114.0, 131.0, 229.0]
2025-09-16 13:47:55,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (630.54) for latency 12
2025-09-16 13:47:55,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 44 minutes, 49 seconds)
2025-09-16 13:49:53,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:49:55,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 571.84076 ± 212.262
2025-09-16 13:49:55,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [536.1821, 417.1006, 146.44867, 572.5444, 369.54425, 645.523, 852.65283, 871.0381, 563.5426, 743.83093]
2025-09-16 13:49:55,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 73.0, 28.0, 106.0, 67.0, 125.0, 152.0, 156.0, 118.0, 135.0]
2025-09-16 13:49:55,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes)
2025-09-16 13:51:53,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:51:54,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 562.15375 ± 384.898
2025-09-16 13:51:54,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [689.97314, 737.2826, 159.17291, 736.698, 1515.6228, 390.91586, 243.77444, 459.71017, 124.763916, 563.6235]
2025-09-16 13:51:54,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 133.0, 30.0, 153.0, 280.0, 70.0, 44.0, 85.0, 24.0, 114.0]
2025-09-16 13:51:54,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 5 seconds)
2025-09-16 13:53:51,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:53:52,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 489.03998 ± 155.911
2025-09-16 13:53:52,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [473.48288, 562.6556, 317.4622, 403.66312, 287.4984, 301.3943, 691.27026, 488.37335, 750.3695, 614.23047]
2025-09-16 13:53:52,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 105.0, 69.0, 70.0, 53.0, 55.0, 125.0, 90.0, 131.0, 128.0]
2025-09-16 13:53:52,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 7 seconds)
2025-09-16 13:55:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:55:51,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 595.54114 ± 244.646
2025-09-16 13:55:51,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [301.55838, 947.18024, 394.9431, 529.5145, 342.98798, 877.5477, 894.8465, 703.7528, 672.9075, 290.17276]
2025-09-16 13:55:51,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 169.0, 68.0, 99.0, 63.0, 158.0, 159.0, 127.0, 131.0, 51.0]
2025-09-16 13:55:51,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 4 seconds)
2025-09-16 13:57:49,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:57:51,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 673.63947 ± 360.655
2025-09-16 13:57:51,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [771.39844, 257.53488, 612.5508, 1299.2526, 113.086235, 474.3452, 1040.5848, 660.18365, 1086.369, 421.08896]
2025-09-16 13:57:51,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 46.0, 121.0, 240.0, 22.0, 89.0, 191.0, 116.0, 195.0, 77.0]
2025-09-16 13:57:51,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (673.64) for latency 12
2025-09-16 13:57:51,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 15 seconds)
2025-09-16 13:59:49,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:59:50,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 526.98517 ± 116.983
2025-09-16 13:59:50,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [473.70578, 343.0684, 449.7942, 631.3195, 473.78357, 719.2453, 585.8616, 478.06165, 423.6678, 691.344]
2025-09-16 13:59:50,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 66.0, 94.0, 111.0, 86.0, 135.0, 106.0, 92.0, 79.0, 131.0]
2025-09-16 13:59:50,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 17 seconds)
2025-09-16 14:01:48,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:01:49,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 449.15375 ± 189.126
2025-09-16 14:01:49,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [458.063, 669.9517, 378.575, 478.7542, 129.98152, 536.10425, 466.8547, 102.349754, 677.67535, 593.22815]
2025-09-16 14:01:49,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 128.0, 67.0, 84.0, 25.0, 98.0, 86.0, 20.0, 118.0, 108.0]
2025-09-16 14:01:49,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 12 seconds)
2025-09-16 14:03:46,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:03:47,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 563.29773 ± 378.624
2025-09-16 14:03:47,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [572.21564, 664.4613, 374.38876, 101.35588, 1559.6146, 445.24146, 323.04834, 328.1274, 782.3499, 482.17435]
2025-09-16 14:03:47,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 131.0, 69.0, 20.0, 290.0, 76.0, 61.0, 57.0, 144.0, 84.0]
2025-09-16 14:03:47,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 11 seconds)
2025-09-16 14:05:45,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:05:46,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 456.42383 ± 178.225
2025-09-16 14:05:46,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.0949, 269.49646, 506.8803, 648.5784, 399.2663, 711.3412, 556.9192, 358.85153, 393.56955, 617.2404]
2025-09-16 14:05:46,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 49.0, 92.0, 117.0, 72.0, 126.0, 103.0, 63.0, 73.0, 114.0]
2025-09-16 14:05:46,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 27 minutes, 16 seconds)
2025-09-16 14:07:44,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:07:45,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 586.14453 ± 170.393
2025-09-16 14:07:45,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [464.62332, 709.4042, 290.44214, 820.2774, 531.8605, 434.3503, 418.7059, 728.6468, 704.25836, 758.8763]
2025-09-16 14:07:45,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 129.0, 51.0, 147.0, 111.0, 81.0, 77.0, 138.0, 143.0, 145.0]
2025-09-16 14:07:45,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 10 seconds)
2025-09-16 14:09:42,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:09:43,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 446.10101 ± 322.141
2025-09-16 14:09:43,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1302.8793, 113.079346, 572.53076, 312.42493, 318.94272, 457.4189, 315.1918, 101.0838, 441.1905, 526.26843]
2025-09-16 14:09:43,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [245.0, 22.0, 111.0, 56.0, 59.0, 99.0, 60.0, 20.0, 86.0, 102.0]
2025-09-16 14:09:43,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 22 minutes, 59 seconds)
2025-09-16 14:11:40,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:11:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 433.33612 ± 198.876
2025-09-16 14:11:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [609.10767, 549.2243, 586.1402, 304.69714, 474.46643, 428.05902, 749.6508, 411.6947, 119.226456, 101.0945]
2025-09-16 14:11:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 102.0, 101.0, 54.0, 90.0, 79.0, 138.0, 93.0, 23.0, 20.0]
2025-09-16 14:11:42,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes, 57 seconds)
2025-09-16 14:13:41,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:13:43,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 563.73254 ± 269.069
2025-09-16 14:13:43,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [463.7981, 516.9667, 369.2236, 101.57575, 488.26736, 425.332, 635.5099, 739.66846, 721.1714, 1175.8121]
2025-09-16 14:13:43,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 104.0, 65.0, 20.0, 87.0, 75.0, 115.0, 136.0, 147.0, 217.0]
2025-09-16 14:13:43,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 25 seconds)
2025-09-16 14:15:38,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:15:40,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 611.90833 ± 340.035
2025-09-16 14:15:40,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [390.6064, 101.22784, 871.6097, 889.7748, 871.666, 309.20844, 976.77185, 997.01086, 100.75025, 610.45703]
2025-09-16 14:15:40,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 20.0, 153.0, 165.0, 179.0, 54.0, 201.0, 178.0, 20.0, 111.0]
2025-09-16 14:15:40,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 10 seconds)
2025-09-16 14:17:39,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:17:40,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 547.01746 ± 115.391
2025-09-16 14:17:40,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [568.4208, 645.64124, 749.26385, 540.7835, 649.04663, 489.25073, 342.47864, 493.61636, 591.61957, 400.05252]
2025-09-16 14:17:40,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 116.0, 146.0, 95.0, 113.0, 87.0, 75.0, 107.0, 105.0, 72.0]
2025-09-16 14:17:40,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 22 seconds)
2025-09-16 14:19:37,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:19:39,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 792.74475 ± 917.011
2025-09-16 14:19:39,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [585.9466, 1060.2886, 308.8764, 319.0797, 1030.6295, 3394.5642, 345.4354, 373.83987, 88.56259, 420.22458]
2025-09-16 14:19:39,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 198.0, 58.0, 62.0, 192.0, 632.0, 78.0, 68.0, 18.0, 83.0]
2025-09-16 14:19:39,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (792.74) for latency 12
2025-09-16 14:19:39,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 29 seconds)
2025-09-16 14:21:37,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:21:38,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 522.58380 ± 215.863
2025-09-16 14:21:38,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [782.8295, 285.44882, 467.20337, 534.74335, 661.064, 901.1871, 496.55713, 467.57373, 107.16686, 522.06396]
2025-09-16 14:21:38,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 52.0, 87.0, 99.0, 124.0, 159.0, 88.0, 82.0, 21.0, 92.0]
2025-09-16 14:21:38,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 32 seconds)
2025-09-16 14:23:36,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:23:37,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 479.15851 ± 195.480
2025-09-16 14:23:37,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [534.59735, 112.61593, 489.92865, 534.10956, 112.96751, 485.40973, 642.34564, 727.3134, 561.9652, 590.33185]
2025-09-16 14:23:37,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 22.0, 92.0, 117.0, 22.0, 87.0, 133.0, 136.0, 106.0, 105.0]
2025-09-16 14:23:37,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 19 seconds)
2025-09-16 14:25:35,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:25:36,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 727.46759 ± 247.520
2025-09-16 14:25:36,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [977.9809, 648.28326, 826.2878, 874.4908, 1280.2214, 649.0547, 474.91266, 506.10413, 535.8049, 501.5349]
2025-09-16 14:25:36,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 133.0, 164.0, 154.0, 245.0, 116.0, 83.0, 97.0, 94.0, 90.0]
2025-09-16 14:25:36,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 36 seconds)
2025-09-16 14:27:35,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:27:37,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 663.56311 ± 305.430
2025-09-16 14:27:37,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1436.7684, 416.5535, 555.82684, 702.8476, 475.33954, 568.65356, 340.24866, 479.0031, 724.605, 935.7851]
2025-09-16 14:27:37,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [270.0, 74.0, 103.0, 139.0, 89.0, 124.0, 72.0, 83.0, 126.0, 175.0]
2025-09-16 14:27:37,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 39 seconds)
2025-09-16 14:29:34,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:29:35,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 698.00134 ± 303.486
2025-09-16 14:29:35,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [445.28522, 430.75436, 754.9539, 587.36414, 872.21136, 969.84534, 668.6473, 136.49164, 1271.4424, 843.01807]
2025-09-16 14:29:35,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 86.0, 136.0, 109.0, 162.0, 173.0, 124.0, 26.0, 230.0, 148.0]
2025-09-16 14:29:35,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 37 seconds)
2025-09-16 14:31:35,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:31:37,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 703.02258 ± 350.582
2025-09-16 14:31:37,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [810.33594, 323.68195, 1380.2537, 524.67633, 1121.1445, 124.7817, 534.6922, 577.02496, 772.5541, 861.07983]
2025-09-16 14:31:37,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 62.0, 258.0, 90.0, 201.0, 24.0, 100.0, 102.0, 158.0, 149.0]
2025-09-16 14:31:37,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 55 seconds)
2025-09-16 14:33:33,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:33:35,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 570.38354 ± 242.744
2025-09-16 14:33:35,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [406.00418, 455.31622, 354.32358, 980.6057, 775.6148, 156.25722, 614.48755, 436.5463, 646.47546, 878.2044]
2025-09-16 14:33:35,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 101.0, 63.0, 186.0, 140.0, 30.0, 114.0, 80.0, 115.0, 182.0]
2025-09-16 14:33:35,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 46 seconds)
2025-09-16 14:35:34,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:35:35,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 635.31854 ± 644.479
2025-09-16 14:35:35,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [135.97966, 645.541, 99.83913, 667.5175, 95.435104, 1257.2592, 107.990074, 2260.2864, 572.4829, 510.85455]
2025-09-16 14:35:35,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 137.0, 20.0, 120.0, 19.0, 237.0, 21.0, 416.0, 104.0, 109.0]
2025-09-16 14:35:35,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 53 seconds)
2025-09-16 14:37:33,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:37:34,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 524.47937 ± 312.644
2025-09-16 14:37:34,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [517.4097, 394.71286, 102.07747, 289.5465, 301.7115, 1327.0591, 581.5212, 685.7627, 478.22037, 566.7729]
2025-09-16 14:37:34,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 77.0, 20.0, 53.0, 53.0, 232.0, 100.0, 143.0, 87.0, 105.0]
2025-09-16 14:37:34,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 43 seconds)
2025-09-16 14:39:31,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:39:33,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 738.58887 ± 424.578
2025-09-16 14:39:33,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [503.9405, 1326.6152, 517.6095, 928.2153, 1052.1981, 1489.5962, 402.6726, 679.07544, 396.353, 89.61291]
2025-09-16 14:39:33,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 244.0, 98.0, 190.0, 191.0, 294.0, 75.0, 123.0, 69.0, 18.0]
2025-09-16 14:39:33,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 47 seconds)
2025-09-16 14:41:32,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:41:34,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1074.99170 ± 665.841
2025-09-16 14:41:34,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2368.6372, 897.7754, 1040.483, 361.559, 1337.4144, 313.4922, 703.6412, 377.0878, 2010.1829, 1339.6437]
2025-09-16 14:41:34,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [452.0, 159.0, 190.0, 65.0, 232.0, 58.0, 126.0, 69.0, 378.0, 265.0]
2025-09-16 14:41:34,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1074.99) for latency 12
2025-09-16 14:41:34,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 46 seconds)
2025-09-16 14:43:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:43:35,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 921.12939 ± 529.540
2025-09-16 14:43:35,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [905.99774, 815.3371, 1113.7415, 650.2954, 454.8705, 1132.5883, 1238.7118, 2172.8994, 618.6651, 108.187935]
2025-09-16 14:43:35,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 150.0, 221.0, 117.0, 83.0, 205.0, 235.0, 411.0, 119.0, 21.0]
2025-09-16 14:43:35,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes)
2025-09-16 14:45:33,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:45:35,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 751.53918 ± 335.033
2025-09-16 14:45:35,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [147.05092, 631.3376, 783.6466, 605.7745, 1333.7527, 643.2149, 559.45, 586.59296, 961.8677, 1262.7035]
2025-09-16 14:45:35,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 139.0, 140.0, 109.0, 255.0, 123.0, 101.0, 114.0, 179.0, 231.0]
2025-09-16 14:45:35,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 59 seconds)
2025-09-16 14:47:33,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:47:35,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 730.62653 ± 555.904
2025-09-16 14:47:35,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [545.7533, 325.71616, 625.0383, 2154.678, 350.2854, 371.96664, 677.3253, 1034.8024, 1087.579, 133.12038]
2025-09-16 14:47:35,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 60.0, 111.0, 391.0, 61.0, 67.0, 131.0, 203.0, 202.0, 26.0]
2025-09-16 14:47:35,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 5 seconds)
2025-09-16 14:49:33,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:49:35,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 757.99072 ± 241.024
2025-09-16 14:49:35,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [759.1377, 612.66473, 783.815, 753.7518, 940.5298, 696.1948, 396.4725, 413.83826, 1005.56494, 1217.9379]
2025-09-16 14:49:35,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 108.0, 149.0, 136.0, 170.0, 128.0, 72.0, 77.0, 185.0, 211.0]
2025-09-16 14:49:35,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 7 seconds)
2025-09-16 14:51:34,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:51:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 697.36737 ± 355.697
2025-09-16 14:51:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [816.2005, 1397.6821, 332.55157, 1292.3136, 533.9585, 605.96405, 696.26733, 544.67285, 306.3822, 447.68127]
2025-09-16 14:51:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 284.0, 62.0, 231.0, 98.0, 106.0, 141.0, 96.0, 56.0, 79.0]
2025-09-16 14:51:35,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 3 seconds)
2025-09-16 14:53:34,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:53:36,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 747.24524 ± 491.109
2025-09-16 14:53:36,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1519.5767, 834.1868, 118.59922, 425.06137, 1683.0585, 1033.3822, 357.61823, 439.08282, 457.24762, 604.63965]
2025-09-16 14:53:36,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [274.0, 150.0, 23.0, 90.0, 308.0, 181.0, 64.0, 76.0, 101.0, 111.0]
2025-09-16 14:53:36,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 4 seconds)
2025-09-16 14:55:35,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:55:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 934.89856 ± 617.318
2025-09-16 14:55:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [458.61865, 555.39215, 507.16617, 746.5524, 688.53796, 2510.7551, 907.2402, 797.2668, 1641.2072, 536.2487]
2025-09-16 14:55:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 107.0, 84.0, 131.0, 145.0, 470.0, 162.0, 146.0, 296.0, 110.0]
2025-09-16 14:55:38,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 9 seconds)
2025-09-16 14:57:37,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:57:39,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 895.88068 ± 741.409
2025-09-16 14:57:39,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [898.29834, 370.55814, 910.8401, 394.921, 497.52332, 550.82367, 2693.087, 521.4349, 1856.7429, 264.57733]
2025-09-16 14:57:39,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 66.0, 166.0, 73.0, 91.0, 97.0, 509.0, 97.0, 352.0, 53.0]
2025-09-16 14:57:39,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 14 seconds)
2025-09-16 14:59:34,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:59:36,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 734.78156 ± 639.971
2025-09-16 14:59:36,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2045.432, 1961.2089, 393.28134, 354.2954, 368.25012, 494.83176, 342.74423, 545.76715, 551.4903, 290.514]
2025-09-16 14:59:36,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [369.0, 376.0, 76.0, 74.0, 64.0, 90.0, 67.0, 99.0, 103.0, 52.0]
2025-09-16 14:59:36,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 4 seconds)
2025-09-16 15:01:35,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:01:38,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1095.33264 ± 1130.936
2025-09-16 15:01:38,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [793.40186, 1319.9375, 102.56235, 1363.4813, 144.84549, 516.4236, 3823.0974, 102.13611, 459.02975, 2328.4116]
2025-09-16 15:01:38,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 261.0, 20.0, 258.0, 28.0, 97.0, 716.0, 20.0, 78.0, 429.0]
2025-09-16 15:01:38,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1095.33) for latency 12
2025-09-16 15:01:38,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 9 seconds)
2025-09-16 15:03:38,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:03:41,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1069.62622 ± 399.274
2025-09-16 15:03:41,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [446.89386, 899.0052, 1469.9026, 1068.265, 649.87787, 1755.4623, 1579.9008, 882.245, 1139.8271, 804.88135]
2025-09-16 15:03:41,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 160.0, 265.0, 197.0, 112.0, 323.0, 310.0, 155.0, 221.0, 168.0]
2025-09-16 15:03:41,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 15 seconds)
2025-09-16 15:05:37,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:05:38,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 507.02255 ± 334.329
2025-09-16 15:05:38,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1262.7272, 401.87085, 124.35363, 552.3814, 633.72394, 107.72816, 118.99938, 513.0197, 640.52325, 714.89777]
2025-09-16 15:05:38,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 70.0, 24.0, 96.0, 113.0, 21.0, 23.0, 103.0, 116.0, 126.0]
2025-09-16 15:05:38,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 1 second)
2025-09-16 15:07:37,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:07:39,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 721.36121 ± 350.217
2025-09-16 15:07:39,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1101.4681, 580.9844, 408.9941, 481.3171, 401.94284, 1040.3352, 1166.3328, 311.28714, 1246.8027, 474.1479]
2025-09-16 15:07:39,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 114.0, 88.0, 105.0, 70.0, 196.0, 229.0, 57.0, 238.0, 83.0]
2025-09-16 15:07:39,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 59 seconds)
2025-09-16 15:09:37,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:09:40,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1009.99023 ± 639.658
2025-09-16 15:09:40,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [678.3908, 107.16027, 2089.4749, 1144.438, 457.8016, 765.1606, 582.00165, 975.0047, 2204.3633, 1096.1064]
2025-09-16 15:09:40,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 21.0, 384.0, 214.0, 95.0, 138.0, 114.0, 181.0, 412.0, 207.0]
2025-09-16 15:09:40,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 8 seconds)
2025-09-16 15:11:39,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:11:42,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1035.33984 ± 1287.282
2025-09-16 15:11:42,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [346.79272, 400.3429, 1958.1393, 718.5959, 617.43085, 262.52234, 709.8435, 4616.0464, 621.8978, 101.78771]
2025-09-16 15:11:42,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 69.0, 368.0, 125.0, 109.0, 46.0, 148.0, 850.0, 118.0, 20.0]
2025-09-16 15:11:42,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 7 seconds)
2025-09-16 15:13:40,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:13:43,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1027.56079 ± 590.164
2025-09-16 15:13:43,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1258.7649, 1890.8541, 942.0319, 705.7197, 601.4216, 520.51447, 2300.2773, 811.1488, 358.1201, 886.7542]
2025-09-16 15:13:43,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [231.0, 342.0, 186.0, 126.0, 107.0, 93.0, 424.0, 144.0, 62.0, 161.0]
2025-09-16 15:13:43,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 3 seconds)
2025-09-16 15:15:43,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:15:45,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 817.18732 ± 692.337
2025-09-16 15:15:45,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [395.2406, 714.8832, 2130.5837, 314.51224, 2087.7212, 1093.6359, 567.1747, 360.25754, 130.18266, 377.6818]
2025-09-16 15:15:45,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 134.0, 399.0, 56.0, 393.0, 199.0, 98.0, 71.0, 25.0, 66.0]
2025-09-16 15:15:45,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 11 seconds)
2025-09-16 15:17:45,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:17:47,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1025.35486 ± 481.271
2025-09-16 15:17:47,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [712.53827, 811.57434, 1055.2069, 1458.1088, 1945.387, 1529.5565, 329.59537, 541.3939, 661.94336, 1208.244]
2025-09-16 15:17:47,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 152.0, 201.0, 284.0, 355.0, 285.0, 61.0, 98.0, 116.0, 239.0]
2025-09-16 15:17:47,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 12 seconds)
2025-09-16 15:19:44,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:19:47,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1384.52771 ± 782.343
2025-09-16 15:19:47,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [643.5266, 1693.4738, 784.91986, 2715.711, 1511.3435, 1095.3241, 107.436714, 1492.7706, 1168.9902, 2631.7817]
2025-09-16 15:19:47,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 302.0, 145.0, 518.0, 306.0, 199.0, 21.0, 282.0, 198.0, 511.0]
2025-09-16 15:19:47,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1384.53) for latency 12
2025-09-16 15:19:47,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 10 seconds)
2025-09-16 15:21:46,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:21:48,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 869.98254 ± 541.299
2025-09-16 15:21:48,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [940.43945, 485.27585, 331.36887, 1502.7924, 2064.242, 853.77295, 598.04114, 873.06793, 107.9256, 942.8986]
2025-09-16 15:21:48,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 86.0, 61.0, 275.0, 389.0, 158.0, 111.0, 166.0, 21.0, 167.0]
2025-09-16 15:21:48,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 7 seconds)
2025-09-16 15:23:47,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:23:52,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1561.87988 ± 916.612
2025-09-16 15:23:52,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1209.2822, 1458.3677, 1212.721, 3416.3188, 1755.4833, 2331.101, 1633.2042, 113.73837, 2185.7715, 302.81018]
2025-09-16 15:23:52,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [245.0, 286.0, 241.0, 638.0, 347.0, 420.0, 305.0, 22.0, 430.0, 56.0]
2025-09-16 15:23:52,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1561.88) for latency 12
2025-09-16 15:23:52,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 8 seconds)
2025-09-16 15:25:50,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:25:52,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 672.98846 ± 504.872
2025-09-16 15:25:52,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [925.97, 1731.2042, 1036.783, 504.51398, 113.45029, 125.398155, 285.5318, 516.0705, 294.80734, 1196.1558]
2025-09-16 15:25:52,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 323.0, 197.0, 89.0, 22.0, 24.0, 53.0, 95.0, 51.0, 222.0]
2025-09-16 15:25:52,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 5 seconds)
2025-09-16 15:27:51,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:27:54,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 821.33087 ± 681.707
2025-09-16 15:27:54,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [760.6361, 1094.0173, 2556.271, 95.35156, 400.6806, 363.7065, 1005.85187, 693.68286, 113.809235, 1129.3015]
2025-09-16 15:27:54,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 206.0, 470.0, 19.0, 71.0, 82.0, 183.0, 126.0, 22.0, 205.0]
2025-09-16 15:27:54,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 3 seconds)
2025-09-16 15:29:53,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:29:56,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 859.87390 ± 530.545
2025-09-16 15:29:56,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1218.3689, 947.1777, 1538.137, 343.66696, 587.6308, 119.03651, 682.2742, 685.3844, 538.3808, 1938.6815]
2025-09-16 15:29:56,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [220.0, 172.0, 294.0, 64.0, 104.0, 23.0, 144.0, 128.0, 113.0, 362.0]
2025-09-16 15:29:56,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 3 seconds)
2025-09-16 15:31:53,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:31:55,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 830.89294 ± 682.669
2025-09-16 15:31:55,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [686.239, 363.3502, 418.06815, 386.88232, 1263.7, 2704.9211, 683.1401, 292.93167, 630.00824, 879.68915]
2025-09-16 15:31:55,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 70.0, 76.0, 69.0, 233.0, 514.0, 117.0, 51.0, 131.0, 186.0]
2025-09-16 15:31:55,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 1 second)
2025-09-16 15:33:55,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:33:58,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1048.99622 ± 701.856
2025-09-16 15:33:58,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [369.4486, 1164.0779, 546.4045, 1460.0005, 2159.9504, 411.44528, 453.30847, 688.20514, 2410.6956, 826.4254]
2025-09-16 15:33:58,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 213.0, 116.0, 270.0, 397.0, 73.0, 86.0, 127.0, 427.0, 149.0]
2025-09-16 15:33:58,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1251 [DEBUG]: Training session finished
