2025-09-16 11:12:12,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.100-delay_3
2025-09-16 11:12:12,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.100-delay_3
2025-09-16 11:12:12,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'3': <latency_env.delayed_mdp.ConstantDelay object at 0x14f4457306d0>}
2025-09-16 11:12:12,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:12:12,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:12:12,367 baseline-bpql-noisepromille100-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=427, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:12:12,367 baseline-bpql-noisepromille100-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:12:13,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:12:13,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:14:00,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:14:01,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 362.43018 ± 88.674
2025-09-16 11:14:01,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [473.7681, 343.06876, 484.15878, 511.95303, 252.6432, 293.43494, 354.83792, 303.5585, 277.09863, 329.77972]
2025-09-16 11:14:01,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 66.0, 97.0, 97.0, 49.0, 59.0, 70.0, 59.0, 51.0, 69.0]
2025-09-16 11:14:01,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (362.43) for latency 3
2025-09-16 11:14:01,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 57 minutes, 32 seconds)
2025-09-16 11:15:58,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:15:59,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 416.90469 ± 85.271
2025-09-16 11:15:59,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [451.42084, 419.92557, 292.3994, 329.58597, 450.09918, 450.9958, 445.25485, 609.9834, 386.98993, 332.39172]
2025-09-16 11:15:59,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 79.0, 54.0, 62.0, 85.0, 84.0, 83.0, 118.0, 72.0, 68.0]
2025-09-16 11:15:59,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (416.90) for latency 3
2025-09-16 11:15:59,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 3 minutes, 54 seconds)
2025-09-16 11:17:55,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:17:56,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 421.25153 ± 69.553
2025-09-16 11:17:56,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [364.44543, 455.66235, 386.66217, 509.39648, 345.3057, 434.46274, 315.7896, 378.82693, 506.95868, 515.00507]
2025-09-16 11:17:56,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 91.0, 71.0, 95.0, 70.0, 83.0, 63.0, 80.0, 93.0, 95.0]
2025-09-16 11:17:56,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (421.25) for latency 3
2025-09-16 11:17:56,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 4 minutes, 37 seconds)
2025-09-16 11:19:52,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:19:53,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 362.33044 ± 59.148
2025-09-16 11:19:53,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [365.7312, 354.30463, 319.21945, 276.46292, 333.11572, 408.17535, 482.20523, 421.48712, 369.55338, 293.04935]
2025-09-16 11:19:53,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 80.0, 69.0, 62.0, 71.0, 78.0, 108.0, 89.0, 82.0, 64.0]
2025-09-16 11:19:53,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 49 seconds)
2025-09-16 11:21:51,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:21:52,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 489.59521 ± 125.134
2025-09-16 11:21:52,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [449.96533, 388.87683, 529.6427, 406.05466, 368.21558, 382.58603, 536.2894, 651.53064, 765.9986, 416.79254]
2025-09-16 11:21:52,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 71.0, 99.0, 78.0, 70.0, 84.0, 107.0, 122.0, 145.0, 76.0]
2025-09-16 11:21:52,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (489.60) for latency 3
2025-09-16 11:21:52,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 3 minutes, 9 seconds)
2025-09-16 11:23:48,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:23:49,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 430.91519 ± 112.322
2025-09-16 11:23:49,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [343.9501, 401.73706, 651.8789, 300.91946, 348.5001, 517.0096, 516.8113, 540.62665, 384.1531, 303.5659]
2025-09-16 11:23:49,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 73.0, 131.0, 63.0, 75.0, 107.0, 105.0, 115.0, 71.0, 66.0]
2025-09-16 11:23:49,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 4 minutes, 12 seconds)
2025-09-16 11:25:45,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:25:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 595.41144 ± 112.137
2025-09-16 11:25:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [572.11426, 630.21643, 525.30774, 730.21857, 808.9798, 696.797, 519.75867, 553.4466, 466.26242, 451.01297]
2025-09-16 11:25:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 127.0, 111.0, 144.0, 166.0, 133.0, 105.0, 103.0, 86.0, 86.0]
2025-09-16 11:25:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (595.41) for latency 3
2025-09-16 11:25:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 2 minutes, 9 seconds)
2025-09-16 11:27:43,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:27:44,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 610.94171 ± 126.230
2025-09-16 11:27:44,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [860.5555, 610.0771, 518.5073, 594.0313, 656.60254, 614.3429, 421.9721, 449.51166, 766.8224, 616.99426]
2025-09-16 11:27:44,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 119.0, 113.0, 123.0, 135.0, 133.0, 87.0, 84.0, 156.0, 116.0]
2025-09-16 11:27:44,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (610.94) for latency 3
2025-09-16 11:27:44,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 24 seconds)
2025-09-16 11:29:42,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:29:44,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 721.06177 ± 214.868
2025-09-16 11:29:44,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [817.4255, 520.5813, 645.79736, 704.7619, 366.13187, 1003.2261, 1144.797, 761.121, 638.2805, 608.49567]
2025-09-16 11:29:44,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 111.0, 137.0, 140.0, 79.0, 192.0, 235.0, 139.0, 122.0, 131.0]
2025-09-16 11:29:44,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (721.06) for latency 3
2025-09-16 11:29:44,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 59 minutes, 9 seconds)
2025-09-16 11:31:40,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:31:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 627.15350 ± 135.241
2025-09-16 11:31:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [642.83356, 567.92017, 506.99628, 620.5396, 602.1634, 898.32526, 569.8158, 488.83118, 861.5689, 512.5409]
2025-09-16 11:31:42,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 109.0, 103.0, 131.0, 116.0, 173.0, 123.0, 102.0, 178.0, 113.0]
2025-09-16 11:31:42,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 57 seconds)
2025-09-16 11:33:39,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:33:41,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 566.51617 ± 124.917
2025-09-16 11:33:41,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [623.2745, 742.13794, 550.7754, 494.1272, 473.7519, 626.22504, 515.3725, 805.2506, 411.0439, 423.20242]
2025-09-16 11:33:41,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 144.0, 103.0, 107.0, 102.0, 136.0, 113.0, 169.0, 88.0, 93.0]
2025-09-16 11:33:41,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 55 minutes, 33 seconds)
2025-09-16 11:35:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:35:39,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 595.05634 ± 271.860
2025-09-16 11:35:39,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [264.80597, 610.45624, 583.564, 514.44867, 541.69336, 473.01535, 779.30273, 1311.4987, 374.64404, 497.13403]
2025-09-16 11:35:39,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 130.0, 125.0, 110.0, 117.0, 105.0, 149.0, 249.0, 83.0, 106.0]
2025-09-16 11:35:39,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 53 minutes, 47 seconds)
2025-09-16 11:37:35,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:37:37,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 744.72430 ± 336.950
2025-09-16 11:37:37,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [763.7695, 937.8891, 499.27924, 496.37488, 537.0904, 534.17773, 834.9478, 941.08307, 1566.2386, 336.393]
2025-09-16 11:37:37,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 180.0, 92.0, 107.0, 114.0, 101.0, 168.0, 184.0, 312.0, 71.0]
2025-09-16 11:37:37,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (744.72) for latency 3
2025-09-16 11:37:37,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 53 seconds)
2025-09-16 11:39:33,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:39:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 647.97125 ± 122.300
2025-09-16 11:39:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [560.0162, 917.97534, 760.11127, 492.9379, 668.0865, 605.14655, 529.79407, 610.87146, 751.0187, 583.7548]
2025-09-16 11:39:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 169.0, 156.0, 102.0, 130.0, 113.0, 101.0, 112.0, 158.0, 119.0]
2025-09-16 11:39:35,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 49 minutes, 26 seconds)
2025-09-16 11:41:31,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:41:33,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 640.92401 ± 170.254
2025-09-16 11:41:33,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [706.8157, 542.74445, 561.52545, 810.3541, 625.89703, 841.5256, 519.6157, 377.61072, 480.954, 942.1974]
2025-09-16 11:41:33,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 111.0, 122.0, 154.0, 133.0, 161.0, 110.0, 85.0, 101.0, 180.0]
2025-09-16 11:41:33,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 47 minutes, 32 seconds)
2025-09-16 11:43:28,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:43:30,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 739.32086 ± 200.687
2025-09-16 11:43:30,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [775.75116, 1262.598, 689.7818, 513.4627, 825.29803, 733.50226, 589.74927, 586.46985, 610.6911, 805.9043]
2025-09-16 11:43:30,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 257.0, 139.0, 103.0, 157.0, 141.0, 119.0, 121.0, 117.0, 154.0]
2025-09-16 11:43:30,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 45 minutes)
2025-09-16 11:45:27,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:45:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 667.63348 ± 166.889
2025-09-16 11:45:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [516.3563, 854.97266, 499.12814, 451.91855, 686.07086, 468.94467, 681.4582, 938.4588, 753.8277, 825.1985]
2025-09-16 11:45:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 170.0, 92.0, 85.0, 136.0, 86.0, 145.0, 198.0, 141.0, 158.0]
2025-09-16 11:45:29,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 43 minutes, 19 seconds)
2025-09-16 11:47:23,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:47:25,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 834.98303 ± 253.500
2025-09-16 11:47:25,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [597.4989, 830.70215, 936.274, 655.74695, 1094.6815, 824.03217, 1433.2935, 572.1814, 787.0377, 618.3814]
2025-09-16 11:47:25,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 160.0, 196.0, 133.0, 214.0, 165.0, 292.0, 116.0, 145.0, 118.0]
2025-09-16 11:47:25,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (834.98) for latency 3
2025-09-16 11:47:25,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 48 seconds)
2025-09-16 11:49:23,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:49:25,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 731.26233 ± 147.959
2025-09-16 11:49:25,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [894.5917, 695.9449, 635.60944, 969.5508, 529.5477, 694.4346, 498.73993, 899.05035, 759.1194, 736.0341]
2025-09-16 11:49:25,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 140.0, 118.0, 195.0, 97.0, 128.0, 107.0, 168.0, 161.0, 154.0]
2025-09-16 11:49:25,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 39 minutes, 14 seconds)
2025-09-16 11:51:20,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:51:23,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1038.46863 ± 375.961
2025-09-16 11:51:23,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [700.362, 1743.2281, 1106.8003, 1092.832, 901.44885, 388.44095, 949.4006, 1479.7168, 737.90125, 1284.5549]
2025-09-16 11:51:23,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 334.0, 230.0, 206.0, 168.0, 83.0, 179.0, 285.0, 144.0, 261.0]
2025-09-16 11:51:23,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1038.47) for latency 3
2025-09-16 11:51:23,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 37 minutes, 19 seconds)
2025-09-16 11:53:16,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:53:18,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 839.50195 ± 302.761
2025-09-16 11:53:18,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [586.5491, 1400.9387, 870.39795, 1450.6837, 621.9013, 761.9909, 649.52545, 671.74896, 667.8411, 713.44257]
2025-09-16 11:53:18,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 292.0, 179.0, 282.0, 128.0, 148.0, 130.0, 122.0, 123.0, 141.0]
2025-09-16 11:53:18,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 34 minutes, 52 seconds)
2025-09-16 11:55:14,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:55:17,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1115.66309 ± 321.071
2025-09-16 11:55:17,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1084.7769, 732.7654, 894.06396, 733.4529, 1538.5226, 846.2138, 1656.6649, 954.73846, 1379.382, 1336.0502]
2025-09-16 11:55:17,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 137.0, 176.0, 134.0, 300.0, 159.0, 335.0, 192.0, 260.0, 266.0]
2025-09-16 11:55:17,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1115.66) for latency 3
2025-09-16 11:55:17,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes, 53 seconds)
2025-09-16 11:57:12,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:57:13,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 785.78729 ± 161.662
2025-09-16 11:57:13,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [759.97485, 877.1018, 717.561, 986.4151, 833.6564, 1049.4419, 862.42236, 578.0949, 683.3173, 509.88733]
2025-09-16 11:57:13,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 164.0, 147.0, 192.0, 151.0, 199.0, 157.0, 105.0, 131.0, 94.0]
2025-09-16 11:57:13,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes, 56 seconds)
2025-09-16 11:59:09,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:59:12,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1090.21106 ± 297.109
2025-09-16 11:59:12,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1003.4082, 744.16925, 775.38086, 750.9594, 1079.7928, 1738.3563, 1392.3347, 1259.5752, 1099.5023, 1058.632]
2025-09-16 11:59:12,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 149.0, 139.0, 139.0, 211.0, 346.0, 266.0, 244.0, 224.0, 205.0]
2025-09-16 11:59:12,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 28 minutes, 46 seconds)
2025-09-16 12:01:07,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:01:10,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1233.28296 ± 586.939
2025-09-16 12:01:10,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [939.6686, 665.1036, 1245.5636, 1550.0029, 2735.188, 893.11475, 1295.9644, 597.7554, 927.166, 1483.3019]
2025-09-16 12:01:10,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 128.0, 238.0, 308.0, 535.0, 171.0, 254.0, 113.0, 184.0, 288.0]
2025-09-16 12:01:10,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1233.28) for latency 3
2025-09-16 12:01:10,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 26 minutes, 47 seconds)
2025-09-16 12:03:07,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:03:09,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 952.59424 ± 210.114
2025-09-16 12:03:09,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [932.3358, 794.8904, 861.86786, 860.2109, 635.2899, 839.96747, 863.9331, 1297.6438, 1310.0092, 1129.7941]
2025-09-16 12:03:09,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 148.0, 171.0, 156.0, 112.0, 151.0, 158.0, 243.0, 254.0, 219.0]
2025-09-16 12:03:09,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 25 minutes, 44 seconds)
2025-09-16 12:05:06,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:05:09,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1317.83057 ± 301.828
2025-09-16 12:05:09,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1417.7615, 1022.9779, 1525.4855, 1933.6299, 1427.3778, 890.1716, 1558.6853, 987.92084, 1259.5979, 1154.6985]
2025-09-16 12:05:09,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [288.0, 206.0, 300.0, 375.0, 265.0, 174.0, 298.0, 200.0, 244.0, 217.0]
2025-09-16 12:05:09,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1317.83) for latency 3
2025-09-16 12:05:09,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 1 second)
2025-09-16 12:07:05,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:07:09,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1536.95093 ± 547.448
2025-09-16 12:07:09,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2134.064, 2327.162, 1539.3938, 842.88043, 1630.4685, 1433.7356, 850.58264, 1777.9448, 724.9954, 2108.2834]
2025-09-16 12:07:09,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [421.0, 473.0, 317.0, 164.0, 319.0, 286.0, 179.0, 357.0, 155.0, 410.0]
2025-09-16 12:07:09,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1536.95) for latency 3
2025-09-16 12:07:09,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 54 seconds)
2025-09-16 12:09:05,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:09:07,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1007.39014 ± 624.160
2025-09-16 12:09:07,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [853.4452, 518.58234, 780.8565, 454.26492, 1188.192, 717.9236, 1349.0088, 2704.4983, 883.25946, 623.8702]
2025-09-16 12:09:07,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 110.0, 151.0, 87.0, 228.0, 135.0, 254.0, 526.0, 178.0, 123.0]
2025-09-16 12:09:07,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 53 seconds)
2025-09-16 12:11:03,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:11:07,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1501.18042 ± 746.069
2025-09-16 12:11:07,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3615.7417, 1560.6705, 1232.8662, 1426.5234, 1104.3452, 944.3371, 1265.0746, 922.9368, 1747.2017, 1192.1067]
2025-09-16 12:11:07,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [701.0, 321.0, 249.0, 284.0, 221.0, 193.0, 252.0, 176.0, 334.0, 226.0]
2025-09-16 12:11:07,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 11 seconds)
2025-09-16 12:13:04,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:13:08,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1299.27234 ± 495.059
2025-09-16 12:13:08,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1533.1372, 605.2054, 1901.5228, 1027.1364, 696.7848, 1136.2439, 1225.1118, 2037.5555, 1920.3455, 909.68005]
2025-09-16 12:13:08,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [297.0, 117.0, 367.0, 202.0, 154.0, 221.0, 237.0, 398.0, 382.0, 182.0]
2025-09-16 12:13:08,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 17 minutes, 41 seconds)
2025-09-16 12:15:02,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:15:05,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1386.79614 ± 949.858
2025-09-16 12:15:05,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1292.1201, 2289.3171, 716.5059, 3861.8508, 1171.116, 480.14957, 851.33777, 970.2159, 1391.5048, 843.84314]
2025-09-16 12:15:05,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [248.0, 437.0, 139.0, 751.0, 225.0, 97.0, 159.0, 186.0, 276.0, 156.0]
2025-09-16 12:15:06,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 12 seconds)
2025-09-16 12:17:10,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:17:14,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1376.39490 ± 1146.137
2025-09-16 12:17:14,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [918.85693, 809.14545, 778.34015, 541.80524, 969.3206, 1483.516, 1738.9796, 777.73, 4662.4717, 1083.783]
2025-09-16 12:17:14,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 167.0, 146.0, 108.0, 203.0, 301.0, 334.0, 154.0, 939.0, 215.0]
2025-09-16 12:17:14,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 15 minutes, 3 seconds)
2025-09-16 12:19:04,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:19:07,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1107.81140 ± 376.354
2025-09-16 12:19:07,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1061.8884, 959.5123, 943.3256, 948.17145, 1393.4796, 1001.33484, 710.05536, 642.21686, 1981.8667, 1436.2628]
2025-09-16 12:19:07,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 184.0, 180.0, 185.0, 267.0, 197.0, 135.0, 120.0, 388.0, 280.0]
2025-09-16 12:19:07,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 58 seconds)
2025-09-16 12:21:02,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:21:08,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2130.83423 ± 857.911
2025-09-16 12:21:08,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [955.0791, 1455.5337, 849.7968, 2966.3687, 1966.098, 3393.4897, 2815.035, 1694.1294, 2132.1182, 3080.6934]
2025-09-16 12:21:08,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 272.0, 168.0, 592.0, 405.0, 669.0, 550.0, 349.0, 439.0, 607.0]
2025-09-16 12:21:08,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (2130.83) for latency 3
2025-09-16 12:21:08,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 12 seconds)
2025-09-16 12:23:06,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:23:12,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2141.39258 ± 1253.633
2025-09-16 12:23:12,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3208.1204, 1339.1453, 4941.9067, 3211.1047, 1804.4927, 2336.074, 1834.1342, 1252.882, 718.5574, 767.5092]
2025-09-16 12:23:12,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [626.0, 256.0, 974.0, 619.0, 331.0, 453.0, 361.0, 245.0, 133.0, 140.0]
2025-09-16 12:23:12,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (2141.39) for latency 3
2025-09-16 12:23:12,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 55 seconds)
2025-09-16 12:25:08,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:25:13,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2071.79785 ± 668.484
2025-09-16 12:25:13,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1985.6588, 1871.8798, 1019.4958, 2698.2903, 3316.0127, 959.2237, 2409.5835, 2186.2974, 2125.8027, 2145.734]
2025-09-16 12:25:13,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [392.0, 364.0, 195.0, 516.0, 642.0, 189.0, 477.0, 428.0, 403.0, 418.0]
2025-09-16 12:25:13,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 34 seconds)
2025-09-16 12:27:11,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:27:19,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2781.73511 ± 1544.491
2025-09-16 12:27:19,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2308.9749, 5019.352, 5112.2925, 1969.5701, 1184.0764, 716.09454, 1663.1124, 3852.8137, 4265.108, 1725.9572]
2025-09-16 12:27:19,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [469.0, 1000.0, 1000.0, 386.0, 223.0, 132.0, 347.0, 748.0, 825.0, 353.0]
2025-09-16 12:27:19,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (2781.74) for latency 3
2025-09-16 12:27:19,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 8 seconds)
2025-09-16 12:29:20,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:29:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1821.60291 ± 779.201
2025-09-16 12:29:25,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2720.2634, 572.04034, 1429.5698, 1585.5714, 2084.1106, 2048.758, 1096.8296, 3396.3494, 2030.1584, 1252.3792]
2025-09-16 12:29:25,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [551.0, 126.0, 280.0, 305.0, 400.0, 397.0, 226.0, 673.0, 407.0, 242.0]
2025-09-16 12:29:25,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 5 minutes, 44 seconds)
2025-09-16 12:31:24,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:31:33,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3194.61084 ± 1504.066
2025-09-16 12:31:33,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4411.136, 2072.1833, 1743.4863, 2237.67, 4849.187, 3583.4478, 1120.6986, 5138.6196, 5099.477, 1690.2034]
2025-09-16 12:31:33,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [867.0, 407.0, 338.0, 427.0, 958.0, 707.0, 228.0, 1000.0, 1000.0, 312.0]
2025-09-16 12:31:33,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3194.61) for latency 3
2025-09-16 12:31:33,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 5 minutes, 2 seconds)
2025-09-16 12:33:27,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:33:35,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2991.10229 ± 1191.525
2025-09-16 12:33:35,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2421.3362, 3401.5115, 5065.857, 2709.6494, 1419.836, 5021.4634, 3119.563, 3088.2776, 1744.836, 1918.6929]
2025-09-16 12:33:35,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [465.0, 686.0, 1000.0, 542.0, 279.0, 1000.0, 622.0, 600.0, 365.0, 394.0]
2025-09-16 12:33:35,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 2 minutes, 29 seconds)
2025-09-16 12:35:32,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:35:45,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4495.03613 ± 1408.468
2025-09-16 12:35:45,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5137.0093, 5073.702, 5119.621, 5040.0835, 5065.685, 409.12213, 3882.1543, 5052.0576, 5095.849, 5075.073]
2025-09-16 12:35:45,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 92.0, 764.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:35:45,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (4495.04) for latency 3
2025-09-16 12:35:45,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 2 minutes, 6 seconds)
2025-09-16 12:37:43,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:37:55,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4057.63135 ± 872.731
2025-09-16 12:37:55,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4890.465, 4991.6685, 4146.6113, 4979.3926, 4975.8926, 2646.6213, 4117.2417, 3802.01, 3245.9084, 2780.5015]
2025-09-16 12:37:55,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 839.0, 1000.0, 1000.0, 541.0, 834.0, 763.0, 654.0, 568.0]
2025-09-16 12:37:55,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 51 seconds)
2025-09-16 12:39:48,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:40:01,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4361.82080 ± 990.965
2025-09-16 12:40:01,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3533.171, 4991.2314, 4968.261, 4586.2524, 4365.4517, 5026.8945, 4447.968, 1704.6196, 5022.8174, 4971.5405]
2025-09-16 12:40:01,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [717.0, 1000.0, 1000.0, 925.0, 841.0, 1000.0, 897.0, 338.0, 1000.0, 1000.0]
2025-09-16 12:40:01,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 58 minutes, 41 seconds)
2025-09-16 12:42:02,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:42:10,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2804.12817 ± 1542.241
2025-09-16 12:42:10,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5093.0825, 914.8558, 1541.6448, 4519.0645, 4481.1826, 2071.7805, 1553.3884, 2678.5383, 947.03064, 4240.7134]
2025-09-16 12:42:10,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 178.0, 310.0, 893.0, 883.0, 419.0, 309.0, 549.0, 190.0, 809.0]
2025-09-16 12:42:10,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 56 minutes, 51 seconds)
2025-09-16 12:44:10,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:44:23,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4814.62744 ± 999.842
2025-09-16 12:44:23,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5163.662, 5102.5615, 1826.1567, 5183.7593, 5182.248, 5142.5903, 5170.4565, 5241.927, 5218.368, 4914.549]
2025-09-16 12:44:23,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 357.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 986.0]
2025-09-16 12:44:23,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (4814.63) for latency 3
2025-09-16 12:44:23,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 56 minutes, 38 seconds)
2025-09-16 12:46:19,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:46:30,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3983.81714 ± 1345.257
2025-09-16 12:46:30,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3748.1426, 5035.7407, 1258.2596, 3868.391, 5230.7065, 5203.4883, 5075.6763, 4747.713, 3843.8997, 1826.158]
2025-09-16 12:46:30,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [707.0, 1000.0, 238.0, 732.0, 1000.0, 1000.0, 1000.0, 915.0, 749.0, 349.0]
2025-09-16 12:46:30,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 53 minutes, 57 seconds)
2025-09-16 12:48:31,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:48:46,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5057.05957 ± 307.418
2025-09-16 12:48:46,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5158.3667, 5192.9883, 5155.265, 5189.711, 5131.9805, 5133.67, 5151.517, 5152.4146, 4136.597, 5168.086]
2025-09-16 12:48:46,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 790.0, 1000.0]
2025-09-16 12:48:46,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5057.06) for latency 3
2025-09-16 12:48:46,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 52 minutes, 43 seconds)
2025-09-16 12:50:46,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:50:59,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4660.31396 ± 1174.740
2025-09-16 12:50:59,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5165.3955, 5192.506, 1424.5348, 5201.9795, 5232.1675, 5212.258, 5199.9067, 3633.3813, 5178.643, 5162.364]
2025-09-16 12:50:59,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 276.0, 1000.0, 1000.0, 1000.0, 1000.0, 700.0, 1000.0, 1000.0]
2025-09-16 12:50:59,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 51 minutes, 46 seconds)
2025-09-16 12:52:55,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:53:10,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5118.41504 ± 66.345
2025-09-16 12:53:10,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4954.7974, 5113.82, 5161.8735, 5117.5757, 5109.529, 5167.0967, 5203.205, 5136.018, 5058.8867, 5161.3477]
2025-09-16 12:53:10,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:53:10,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5118.42) for latency 3
2025-09-16 12:53:10,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 49 minutes, 58 seconds)
2025-09-16 12:55:06,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:55:18,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4397.45361 ± 1052.140
2025-09-16 12:55:18,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5217.8784, 5216.1133, 5159.0557, 5168.7207, 5231.9375, 3449.2544, 2166.6292, 5193.5215, 3704.4768, 3466.9492]
2025-09-16 12:55:18,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 673.0, 406.0, 1000.0, 721.0, 662.0]
2025-09-16 12:55:18,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 46 minutes, 59 seconds)
2025-09-16 12:57:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:57:31,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4904.17676 ± 420.886
2025-09-16 12:57:31,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5041.3965, 4992.3374, 5099.5938, 5121.5244, 3652.4023, 5069.37, 5060.8457, 5083.658, 4927.709, 4992.9336]
2025-09-16 12:57:31,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 759.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:57:31,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 45 minutes, 46 seconds)
2025-09-16 12:59:34,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:59:48,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5179.60645 ± 174.467
2025-09-16 12:59:48,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5273.966, 5236.4346, 5238.127, 4661.021, 5215.727, 5186.151, 5257.8423, 5228.667, 5262.7817, 5235.342]
2025-09-16 12:59:48,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 883.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:59:48,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5179.61) for latency 3
2025-09-16 12:59:48,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 43 minutes, 46 seconds)
2025-09-16 13:01:42,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:01:54,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4243.54980 ± 1733.219
2025-09-16 13:01:54,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5009.428, 5180.751, 5106.004, 777.7101, 5161.6484, 5134.4565, 5112.4185, 5085.2764, 5089.0454, 778.7579]
2025-09-16 13:01:54,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 155.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 161.0]
2025-09-16 13:01:54,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 40 minutes, 25 seconds)
2025-09-16 13:03:56,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:04:11,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5114.99072 ± 98.958
2025-09-16 13:04:11,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5182.488, 5192.565, 5012.4434, 4863.776, 5154.661, 5194.6177, 5189.6387, 5111.5767, 5129.817, 5118.326]
2025-09-16 13:04:11,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:04:11,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 39 minutes, 3 seconds)
2025-09-16 13:06:11,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:06:25,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5250.56689 ± 51.576
2025-09-16 13:06:25,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5161.8525, 5244.67, 5285.82, 5198.8086, 5321.47, 5178.146, 5271.8228, 5295.196, 5251.963, 5295.918]
2025-09-16 13:06:25,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:06:25,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5250.57) for latency 3
2025-09-16 13:06:25,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 37 minutes, 53 seconds)
2025-09-16 13:08:24,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:08:38,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5142.04736 ± 97.324
2025-09-16 13:08:38,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5140.4614, 5136.7603, 5174.343, 5182.586, 4864.9644, 5189.586, 5217.07, 5199.3765, 5115.4365, 5199.8867]
2025-09-16 13:08:38,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:08:38,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 35 minutes, 42 seconds)
2025-09-16 13:10:30,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:10:44,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5276.50488 ± 112.291
2025-09-16 13:10:44,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5336.432, 5321.604, 5335.9077, 5287.7856, 5253.397, 5317.69, 5331.4585, 5324.9673, 5308.248, 4947.5586]
2025-09-16 13:10:44,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:10:44,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5276.50) for latency 3
2025-09-16 13:10:44,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 31 minutes, 51 seconds)
2025-09-16 13:12:44,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:12:58,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4855.89746 ± 1037.248
2025-09-16 13:12:58,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5169.2173, 5249.6504, 5208.9204, 5140.69, 5192.438, 1748.3362, 5182.335, 5303.1665, 5110.5083, 5253.7114]
2025-09-16 13:12:58,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 365.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:12:58,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 30 minutes, 43 seconds)
2025-09-16 13:14:58,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:15:13,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5065.03809 ± 106.840
2025-09-16 13:15:13,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5060.193, 5113.2134, 5098.606, 5190.102, 5074.5527, 5155.081, 5097.3555, 4883.82, 4845.015, 5132.4463]
2025-09-16 13:15:13,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 986.0, 1000.0]
2025-09-16 13:15:13,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 28 minutes, 18 seconds)
2025-09-16 13:17:11,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:17:26,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5298.39697 ± 28.632
2025-09-16 13:17:26,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5277.3022, 5300.4517, 5307.8687, 5353.2124, 5262.3037, 5304.2046, 5256.0347, 5281.3765, 5331.7144, 5309.498]
2025-09-16 13:17:26,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:17:26,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5298.40) for latency 3
2025-09-16 13:17:26,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 25 minutes, 51 seconds)
2025-09-16 13:19:22,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:19:37,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5223.39697 ± 75.146
2025-09-16 13:19:37,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5240.0874, 5282.8154, 5214.8867, 5014.297, 5241.71, 5239.9863, 5215.9824, 5267.6504, 5214.6465, 5301.908]
2025-09-16 13:19:37,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:19:37,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 23 minutes, 22 seconds)
2025-09-16 13:21:35,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:21:49,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5080.55615 ± 385.181
2025-09-16 13:21:49,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5204.0015, 3944.13, 5151.5728, 5231.578, 5253.823, 5301.7075, 5107.21, 5172.1216, 5114.0146, 5325.4023]
2025-09-16 13:21:49,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 773.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 993.0, 1000.0, 1000.0]
2025-09-16 13:21:49,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 21 minutes, 59 seconds)
2025-09-16 13:23:47,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:23:59,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4316.84326 ± 1660.952
2025-09-16 13:23:59,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1105.4271, 5137.2666, 5177.821, 5106.524, 5111.356, 5167.3516, 888.28925, 5172.0454, 5191.611, 5110.745]
2025-09-16 13:23:59,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 186.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:23:59,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 19 minutes, 23 seconds)
2025-09-16 13:26:02,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:26:17,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5185.87207 ± 77.196
2025-09-16 13:26:17,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5270.054, 5167.5913, 5287.247, 5122.243, 5255.094, 5025.3784, 5189.852, 5214.2793, 5215.7896, 5111.187]
2025-09-16 13:26:17,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:26:17,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 17 minutes, 31 seconds)
2025-09-16 13:28:10,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:28:25,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4999.68457 ± 254.964
2025-09-16 13:28:25,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5138.404, 5150.622, 5046.3716, 5150.0776, 4278.859, 5144.7896, 5098.5435, 5125.869, 4993.491, 4869.8193]
2025-09-16 13:28:25,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 867.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:28:25,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 14 minutes, 43 seconds)
2025-09-16 13:30:20,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:30:32,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4356.37598 ± 1527.403
2025-09-16 13:30:32,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [937.14923, 5264.7646, 4332.2324, 1901.4086, 5397.591, 5367.942, 5298.5205, 5358.0444, 4469.562, 5236.547]
2025-09-16 13:30:32,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 1000.0, 803.0, 355.0, 1000.0, 1000.0, 1000.0, 1000.0, 847.0, 1000.0]
2025-09-16 13:30:32,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 12 minutes, 4 seconds)
2025-09-16 13:32:32,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:32:46,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5083.12793 ± 678.667
2025-09-16 13:32:46,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5283.568, 5319.5195, 3048.043, 5278.972, 5311.8604, 5339.286, 5284.671, 5308.0796, 5338.355, 5318.9253]
2025-09-16 13:32:46,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 580.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:32:46,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 10 minutes, 4 seconds)
2025-09-16 13:34:39,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:34:49,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4028.56787 ± 1158.699
2025-09-16 13:34:49,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3408.7153, 3772.3242, 2168.7705, 5322.164, 5468.96, 2785.525, 5250.8477, 3183.568, 5367.7783, 3557.026]
2025-09-16 13:34:49,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [639.0, 690.0, 409.0, 1000.0, 1000.0, 522.0, 1000.0, 611.0, 1000.0, 683.0]
2025-09-16 13:34:49,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 7 minutes, 10 seconds)
2025-09-16 13:36:53,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:37:07,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5312.00928 ± 20.147
2025-09-16 13:37:07,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5320.8945, 5341.5703, 5298.8564, 5311.4385, 5302.855, 5309.361, 5332.7725, 5316.3022, 5264.113, 5321.9316]
2025-09-16 13:37:07,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:37:07,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5312.01) for latency 3
2025-09-16 13:37:07,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 4 minutes, 58 seconds)
2025-09-16 13:39:05,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:39:20,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4900.47119 ± 660.939
2025-09-16 13:39:20,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5194.8164, 5110.2446, 5162.302, 2925.0344, 5157.172, 5016.69, 5176.286, 5151.248, 5040.6313, 5070.288]
2025-09-16 13:39:20,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 545.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:39:20,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 3 minutes, 15 seconds)
2025-09-16 13:41:20,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:41:34,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5043.33643 ± 746.087
2025-09-16 13:41:34,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5276.5737, 5344.838, 5269.16, 5282.095, 2807.9265, 5320.372, 5325.715, 5282.8047, 5204.096, 5319.7803]
2025-09-16 13:41:34,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 530.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:41:34,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 47 seconds)
2025-09-16 13:43:34,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:43:49,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5073.09668 ± 27.109
2025-09-16 13:43:49,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5108.2227, 5050.22, 5055.907, 5053.089, 5053.2817, 5119.1455, 5088.7573, 5060.831, 5039.322, 5102.1924]
2025-09-16 13:43:49,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:43:49,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 59 minutes, 41 seconds)
2025-09-16 13:45:57,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:46:12,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5222.22607 ± 29.036
2025-09-16 13:46:12,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5235.719, 5186.6387, 5264.55, 5220.695, 5232.301, 5254.6577, 5166.2397, 5233.402, 5196.1196, 5231.937]
2025-09-16 13:46:12,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:46:12,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 59 minutes, 10 seconds)
2025-09-16 13:48:13,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:48:27,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5297.49316 ± 23.866
2025-09-16 13:48:27,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5283.071, 5287.105, 5298.8867, 5361.16, 5305.364, 5299.877, 5267.3496, 5280.4897, 5300.9805, 5290.6494]
2025-09-16 13:48:27,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:48:27,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 56 minutes, 40 seconds)
2025-09-16 13:50:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:50:36,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5358.34375 ± 19.187
2025-09-16 13:50:36,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5353.139, 5360.116, 5346.3906, 5394.4126, 5327.795, 5357.5635, 5369.512, 5385.5444, 5340.972, 5347.9917]
2025-09-16 13:50:36,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:50:36,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5358.34) for latency 3
2025-09-16 13:50:36,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 54 minutes, 5 seconds)
2025-09-16 13:52:36,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:52:49,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5005.57568 ± 696.546
2025-09-16 13:52:49,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5236.5283, 5291.1133, 5238.853, 5239.6353, 5160.726, 5263.024, 5014.1025, 5238.1323, 2937.3284, 5436.3145]
2025-09-16 13:52:49,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 972.0, 1000.0, 942.0, 1000.0, 548.0, 1000.0]
2025-09-16 13:52:49,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 51 minutes, 47 seconds)
2025-09-16 13:54:47,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:55:02,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5318.05566 ± 16.123
2025-09-16 13:55:02,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5324.799, 5334.5146, 5285.519, 5320.633, 5304.987, 5316.445, 5321.3135, 5341.129, 5331.2524, 5299.965]
2025-09-16 13:55:02,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:55:02,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 49 minutes, 18 seconds)
2025-09-16 13:57:02,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:57:16,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5222.12793 ± 12.408
2025-09-16 13:57:16,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5233.437, 5232.4478, 5215.3105, 5220.484, 5196.9316, 5233.756, 5213.813, 5232.141, 5234.044, 5208.914]
2025-09-16 13:57:16,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:57:16,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 28 seconds)
2025-09-16 13:59:13,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:59:27,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5233.83594 ± 33.918
2025-09-16 13:59:27,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5277.4165, 5253.396, 5247.2827, 5185.182, 5226.7754, 5273.984, 5172.5264, 5257.364, 5207.547, 5236.881]
2025-09-16 13:59:27,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:59:27,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 44 minutes)
2025-09-16 14:01:21,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:01:24,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1013.57263 ± 504.613
2025-09-16 14:01:24,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [812.44324, 805.4756, 754.9999, 612.2181, 478.37784, 1612.6664, 1810.6995, 534.979, 862.4486, 1851.4191]
2025-09-16 14:01:24,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 157.0, 136.0, 114.0, 102.0, 305.0, 345.0, 111.0, 181.0, 364.0]
2025-09-16 14:01:24,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 2 seconds)
2025-09-16 14:03:18,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:03:25,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2593.90796 ± 1861.812
2025-09-16 14:03:25,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5278.2764, 2671.0046, 987.7801, 1078.0255, 4899.0684, 772.8862, 888.74426, 790.8937, 5300.622, 3271.7793]
2025-09-16 14:03:25,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 519.0, 213.0, 221.0, 936.0, 173.0, 186.0, 153.0, 1000.0, 673.0]
2025-09-16 14:03:25,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 8 seconds)
2025-09-16 14:05:27,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:05:39,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4611.02539 ± 1449.954
2025-09-16 14:05:39,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5373.658, 5338.359, 2394.049, 5331.2227, 5346.7383, 5245.483, 5334.3125, 1139.3345, 5286.093, 5321.0024]
2025-09-16 14:05:39,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 453.0, 1000.0, 1000.0, 1000.0, 1000.0, 218.0, 1000.0, 1000.0]
2025-09-16 14:05:39,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 36 minutes, 7 seconds)
2025-09-16 14:07:31,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:07:45,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5289.67480 ± 34.702
2025-09-16 14:07:45,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5313.9062, 5329.626, 5285.2954, 5284.813, 5230.403, 5328.277, 5314.7944, 5237.815, 5258.8516, 5312.969]
2025-09-16 14:07:45,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:07:45,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 34 seconds)
2025-09-16 14:09:42,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:09:57,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5317.45020 ± 11.634
2025-09-16 14:09:57,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5293.0747, 5328.6533, 5323.5454, 5301.605, 5321.669, 5331.752, 5323.222, 5309.293, 5320.435, 5321.248]
2025-09-16 14:09:57,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:09:57,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 31 minutes, 27 seconds)
2025-09-16 14:11:53,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:12:08,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5336.58447 ± 25.926
2025-09-16 14:12:08,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5328.065, 5353.2876, 5348.827, 5351.437, 5339.524, 5363.503, 5334.0317, 5282.4585, 5366.2236, 5298.487]
2025-09-16 14:12:08,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:12:08,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 2 seconds)
2025-09-16 14:14:04,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:14:19,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5296.60498 ± 25.713
2025-09-16 14:14:19,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5335.939, 5289.498, 5264.98, 5257.9175, 5275.857, 5308.9785, 5336.3267, 5312.4404, 5297.6885, 5286.425]
2025-09-16 14:14:19,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:14:19,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 18 seconds)
2025-09-16 14:16:16,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:16:30,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5270.51465 ± 220.292
2025-09-16 14:16:30,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5336.4307, 5286.203, 5383.343, 5352.751, 5359.3716, 4614.5786, 5353.9985, 5304.79, 5358.639, 5355.04]
2025-09-16 14:16:30,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 849.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:16:30,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 2 seconds)
2025-09-16 14:18:30,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:18:44,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5336.72510 ± 33.042
2025-09-16 14:18:44,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5324.193, 5324.691, 5330.9233, 5386.359, 5358.691, 5325.3247, 5272.6846, 5312.671, 5342.0923, 5389.6167]
2025-09-16 14:18:44,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:18:44,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 9 seconds)
2025-09-16 14:20:45,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:20:59,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5386.11230 ± 18.148
2025-09-16 14:20:59,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5352.616, 5412.115, 5386.849, 5373.2207, 5372.3755, 5393.6304, 5394.1875, 5368.052, 5408.9927, 5399.088]
2025-09-16 14:20:59,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:20:59,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5386.11) for latency 3
2025-09-16 14:20:59,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 4 seconds)
2025-09-16 14:22:59,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:23:14,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5128.14355 ± 21.779
2025-09-16 14:23:14,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5156.065, 5119.219, 5141.286, 5077.903, 5159.9604, 5121.363, 5121.319, 5124.3774, 5124.4375, 5135.508]
2025-09-16 14:23:14,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:23:14,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 59 seconds)
2025-09-16 14:25:15,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:25:29,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5346.56348 ± 18.989
2025-09-16 14:25:29,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5349.408, 5349.0205, 5365.0767, 5331.0405, 5364.911, 5309.9795, 5344.735, 5377.7495, 5346.0293, 5327.6777]
2025-09-16 14:25:29,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:25:29,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 52 seconds)
2025-09-16 14:27:32,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:27:46,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5316.92676 ± 23.284
2025-09-16 14:27:46,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5318.994, 5293.0444, 5349.0005, 5345.915, 5325.9224, 5301.8535, 5289.5005, 5341.0977, 5323.0137, 5280.9272]
2025-09-16 14:27:46,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:27:46,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 46 seconds)
2025-09-16 14:29:50,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:30:04,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5283.22168 ± 24.249
2025-09-16 14:30:04,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5298.547, 5296.1494, 5274.242, 5334.365, 5288.4883, 5280.1167, 5286.484, 5234.748, 5267.343, 5271.732]
2025-09-16 14:30:04,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:30:04,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 35 seconds)
2025-09-16 14:32:00,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:32:15,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5288.97217 ± 35.301
2025-09-16 14:32:15,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5317.567, 5269.589, 5227.4053, 5237.191, 5275.9033, 5340.153, 5313.359, 5326.253, 5288.703, 5293.5977]
2025-09-16 14:32:15,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:32:15,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 15 seconds)
2025-09-16 14:34:25,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:34:39,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5250.55176 ± 43.709
2025-09-16 14:34:39,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5291.993, 5263.372, 5234.9233, 5270.8174, 5291.7627, 5302.4536, 5238.996, 5228.897, 5238.6733, 5143.6294]
2025-09-16 14:34:39,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:34:39,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 8 seconds)
2025-09-16 14:36:37,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:36:51,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5335.95020 ± 22.034
2025-09-16 14:36:51,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5362.741, 5325.12, 5348.696, 5323.155, 5311.9443, 5300.66, 5378.692, 5337.9253, 5330.778, 5339.787]
2025-09-16 14:36:51,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:36:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 49 seconds)
2025-09-16 14:38:48,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:39:03,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5323.79004 ± 17.529
2025-09-16 14:39:03,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5295.145, 5326.557, 5333.9272, 5331.336, 5346.96, 5320.9395, 5339.1367, 5317.943, 5335.7407, 5290.2163]
2025-09-16 14:39:03,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:39:03,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 30 seconds)
2025-09-16 14:41:02,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:41:17,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5274.17432 ± 15.283
2025-09-16 14:41:17,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5277.7466, 5251.966, 5282.1455, 5290.3057, 5307.3237, 5266.789, 5263.75, 5272.756, 5259.2915, 5269.6694]
2025-09-16 14:41:17,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:41:17,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 14 seconds)
2025-09-16 14:43:16,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:43:31,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5309.88721 ± 26.281
2025-09-16 14:43:31,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5294.035, 5364.0728, 5318.5664, 5322.6777, 5276.107, 5303.3496, 5304.466, 5291.035, 5343.107, 5281.458]
2025-09-16 14:43:31,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:43:31,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1251 [DEBUG]: Training session finished
