2025-09-16 11:50:25,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.100-delay_6
2025-09-16 11:50:25,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.100-delay_6
2025-09-16 11:50:25,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x14eb633807d0>}
2025-09-16 11:50:25,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:50:25,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:50:25,672 baseline-bpql-noisepromille100-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=478, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:50:25,672 baseline-bpql-noisepromille100-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:50:27,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:50:27,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:52:08,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:52:09,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 312.69388 ± 38.342
2025-09-16 11:52:09,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [290.2597, 367.83365, 334.9302, 358.96082, 270.59, 287.575, 344.46057, 334.62396, 288.65466, 249.05011]
2025-09-16 11:52:09,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 68.0, 64.0, 66.0, 51.0, 55.0, 64.0, 62.0, 55.0, 47.0]
2025-09-16 11:52:09,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (312.69) for latency 6
2025-09-16 11:52:09,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 47 minutes, 59 seconds)
2025-09-16 11:53:59,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:54:00,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 333.14932 ± 37.074
2025-09-16 11:54:00,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [381.0122, 308.37988, 292.33167, 302.30096, 330.53073, 374.78253, 389.25507, 277.05634, 329.47504, 346.36856]
2025-09-16 11:54:00,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 57.0, 55.0, 56.0, 63.0, 71.0, 77.0, 53.0, 62.0, 65.0]
2025-09-16 11:54:00,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (333.15) for latency 6
2025-09-16 11:54:00,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 53 minutes, 47 seconds)
2025-09-16 11:55:50,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:55:51,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 429.97470 ± 47.659
2025-09-16 11:55:51,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [365.65244, 461.4719, 491.07922, 392.44897, 478.99185, 435.06076, 361.2889, 405.62973, 496.1509, 411.97238]
2025-09-16 11:55:51,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 90.0, 105.0, 74.0, 89.0, 82.0, 69.0, 79.0, 103.0, 78.0]
2025-09-16 11:55:51,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (429.97) for latency 6
2025-09-16 11:55:51,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 54 minutes, 39 seconds)
2025-09-16 11:57:41,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:57:42,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 447.27988 ± 90.981
2025-09-16 11:57:42,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [346.11154, 330.7737, 421.2793, 367.39594, 612.24133, 491.4388, 492.71155, 500.63425, 360.99615, 549.21655]
2025-09-16 11:57:42,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 62.0, 81.0, 73.0, 128.0, 98.0, 93.0, 95.0, 69.0, 108.0]
2025-09-16 11:57:42,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (447.28) for latency 6
2025-09-16 11:57:42,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 53 minutes, 56 seconds)
2025-09-16 11:59:33,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:59:34,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 352.19507 ± 36.848
2025-09-16 11:59:34,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [340.4977, 407.12805, 410.97284, 327.20258, 388.19763, 351.97418, 294.46408, 313.64386, 337.1518, 350.7177]
2025-09-16 11:59:34,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 77.0, 94.0, 67.0, 85.0, 74.0, 61.0, 64.0, 67.0, 78.0]
2025-09-16 11:59:34,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 53 minutes, 9 seconds)
2025-09-16 12:01:24,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:01:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 475.09137 ± 74.541
2025-09-16 12:01:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [339.59406, 489.22223, 582.3318, 479.70413, 484.7722, 520.98236, 579.11005, 480.34564, 383.79218, 411.05902]
2025-09-16 12:01:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 94.0, 110.0, 91.0, 100.0, 98.0, 113.0, 91.0, 72.0, 88.0]
2025-09-16 12:01:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (475.09) for latency 6
2025-09-16 12:01:25,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 54 minutes, 23 seconds)
2025-09-16 12:03:16,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:03:17,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 492.59430 ± 98.079
2025-09-16 12:03:17,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [393.12332, 453.46814, 737.25714, 464.69846, 531.5404, 413.60403, 579.50287, 416.63364, 498.4525, 437.66272]
2025-09-16 12:03:17,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 100.0, 141.0, 103.0, 100.0, 91.0, 109.0, 77.0, 102.0, 88.0]
2025-09-16 12:03:17,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (492.59) for latency 6
2025-09-16 12:03:17,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 52 minutes, 51 seconds)
2025-09-16 12:05:08,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:05:09,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 481.98022 ± 107.359
2025-09-16 12:05:09,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [577.1264, 537.3485, 439.0821, 447.7202, 432.34952, 718.6765, 421.25488, 479.63037, 479.41492, 287.1991]
2025-09-16 12:05:09,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 98.0, 90.0, 84.0, 80.0, 133.0, 88.0, 89.0, 94.0, 60.0]
2025-09-16 12:05:09,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 51 minutes, 3 seconds)
2025-09-16 12:07:00,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:07:02,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 534.72711 ± 96.711
2025-09-16 12:07:02,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [540.7789, 512.98944, 445.98434, 594.2406, 404.9341, 734.8886, 647.052, 456.96835, 550.6949, 458.73987]
2025-09-16 12:07:02,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 100.0, 87.0, 129.0, 76.0, 139.0, 121.0, 101.0, 104.0, 101.0]
2025-09-16 12:07:02,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (534.73) for latency 6
2025-09-16 12:07:02,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 49 minutes, 50 seconds)
2025-09-16 12:08:53,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:08:54,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 532.34399 ± 117.861
2025-09-16 12:08:54,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [453.98584, 469.86584, 666.54584, 412.9308, 614.5154, 460.88403, 445.86786, 433.6448, 576.8472, 788.35254]
2025-09-16 12:08:54,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 87.0, 126.0, 75.0, 115.0, 84.0, 84.0, 79.0, 124.0, 167.0]
2025-09-16 12:08:54,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 48 minutes, 3 seconds)
2025-09-16 12:10:44,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:10:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 556.05072 ± 154.900
2025-09-16 12:10:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [476.5316, 488.38712, 518.52356, 463.76556, 675.8751, 488.1203, 453.54852, 608.8748, 966.471, 420.40976]
2025-09-16 12:10:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 93.0, 104.0, 85.0, 131.0, 99.0, 86.0, 113.0, 188.0, 93.0]
2025-09-16 12:10:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (556.05) for latency 6
2025-09-16 12:10:46,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 46 minutes, 19 seconds)
2025-09-16 12:12:37,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:12:39,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 530.94745 ± 139.565
2025-09-16 12:12:39,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [431.8681, 427.28897, 508.36484, 351.99664, 522.23816, 530.01306, 899.2677, 489.71164, 589.88696, 558.839]
2025-09-16 12:12:39,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 90.0, 110.0, 70.0, 101.0, 112.0, 179.0, 107.0, 126.0, 116.0]
2025-09-16 12:12:39,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 44 minutes, 40 seconds)
2025-09-16 12:14:30,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:14:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 628.71729 ± 178.924
2025-09-16 12:14:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [682.33124, 451.38315, 396.1512, 618.1405, 747.7805, 1026.5677, 769.24603, 472.56824, 616.2286, 506.77524]
2025-09-16 12:14:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 92.0, 71.0, 121.0, 161.0, 198.0, 148.0, 93.0, 136.0, 98.0]
2025-09-16 12:14:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (628.72) for latency 6
2025-09-16 12:14:32,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 43 minutes, 12 seconds)
2025-09-16 12:16:22,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:16:24,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 613.32239 ± 122.114
2025-09-16 12:16:24,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [700.14905, 474.57385, 677.2075, 669.8121, 574.11487, 492.50244, 555.9681, 733.09045, 426.82742, 828.9784]
2025-09-16 12:16:24,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 91.0, 129.0, 128.0, 108.0, 100.0, 112.0, 141.0, 92.0, 158.0]
2025-09-16 12:16:24,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 41 minutes, 12 seconds)
2025-09-16 12:18:17,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:18:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 679.02930 ± 134.605
2025-09-16 12:18:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [772.79266, 604.7395, 700.63245, 752.35583, 552.187, 575.0215, 800.8979, 958.97943, 526.81836, 545.86816]
2025-09-16 12:18:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 114.0, 132.0, 149.0, 107.0, 125.0, 154.0, 187.0, 100.0, 103.0]
2025-09-16 12:18:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (679.03) for latency 6
2025-09-16 12:18:18,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 39 minutes, 56 seconds)
2025-09-16 12:20:09,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:20:11,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 600.63031 ± 103.155
2025-09-16 12:20:11,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [643.10693, 672.96906, 795.07196, 484.30884, 588.88416, 476.7403, 739.1018, 546.1086, 544.5666, 515.44476]
2025-09-16 12:20:11,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 130.0, 153.0, 100.0, 111.0, 88.0, 156.0, 118.0, 104.0, 98.0]
2025-09-16 12:20:11,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 38 minutes, 12 seconds)
2025-09-16 12:22:05,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:22:07,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 678.62085 ± 178.229
2025-09-16 12:22:07,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [834.78973, 519.2464, 678.89404, 847.71344, 662.61646, 586.41785, 564.55804, 1077.1097, 500.3076, 514.55524]
2025-09-16 12:22:07,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 96.0, 135.0, 162.0, 123.0, 111.0, 108.0, 207.0, 112.0, 115.0]
2025-09-16 12:22:07,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 37 minutes, 11 seconds)
2025-09-16 12:23:59,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:24:01,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 674.19104 ± 150.638
2025-09-16 12:24:01,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [707.847, 928.9301, 765.8272, 738.9932, 336.60117, 688.89343, 783.15845, 601.80493, 547.8406, 642.01434]
2025-09-16 12:24:01,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 201.0, 167.0, 139.0, 76.0, 130.0, 149.0, 128.0, 116.0, 127.0]
2025-09-16 12:24:01,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 35 minutes, 35 seconds)
2025-09-16 12:25:54,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:25:55,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 669.99683 ± 157.349
2025-09-16 12:25:55,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [627.00415, 667.53326, 498.54623, 757.6105, 724.912, 559.83344, 965.7861, 386.3241, 683.12823, 829.2899]
2025-09-16 12:25:55,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 130.0, 111.0, 142.0, 136.0, 102.0, 184.0, 72.0, 142.0, 160.0]
2025-09-16 12:25:55,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 34 minutes, 14 seconds)
2025-09-16 12:27:50,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:27:52,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 751.90417 ± 200.786
2025-09-16 12:27:52,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [798.5506, 746.4479, 704.0646, 548.4517, 1220.4972, 825.05225, 541.1224, 894.3353, 491.89432, 748.62445]
2025-09-16 12:27:52,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 145.0, 160.0, 110.0, 239.0, 172.0, 100.0, 175.0, 103.0, 147.0]
2025-09-16 12:27:52,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (751.90) for latency 6
2025-09-16 12:27:52,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 32 minutes, 53 seconds)
2025-09-16 12:29:45,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:29:47,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 713.82281 ± 147.917
2025-09-16 12:29:47,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [602.3597, 1015.49084, 814.2406, 921.951, 598.91254, 656.9308, 660.9503, 545.7488, 594.2523, 727.39056]
2025-09-16 12:29:47,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 189.0, 159.0, 203.0, 128.0, 142.0, 124.0, 107.0, 126.0, 145.0]
2025-09-16 12:29:47,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 31 minutes, 37 seconds)
2025-09-16 12:31:38,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:31:40,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 735.55060 ± 212.171
2025-09-16 12:31:40,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [995.829, 463.589, 1141.1517, 503.88538, 739.1627, 756.80273, 520.207, 892.374, 598.52704, 743.97766]
2025-09-16 12:31:40,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 85.0, 219.0, 96.0, 144.0, 141.0, 99.0, 180.0, 114.0, 142.0]
2025-09-16 12:31:40,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 29 minutes, 5 seconds)
2025-09-16 12:33:33,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:33:35,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 790.05524 ± 169.077
2025-09-16 12:33:35,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [865.04724, 789.62976, 632.3699, 898.8544, 713.184, 705.92004, 670.1261, 1220.4186, 614.096, 790.90607]
2025-09-16 12:33:35,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 146.0, 123.0, 191.0, 130.0, 132.0, 125.0, 239.0, 115.0, 168.0]
2025-09-16 12:33:35,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (790.06) for latency 6
2025-09-16 12:33:35,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 27 minutes, 29 seconds)
2025-09-16 12:35:29,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:35:31,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 898.08136 ± 264.148
2025-09-16 12:35:31,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [741.47894, 789.77924, 1074.8248, 648.1723, 906.7567, 836.60345, 767.8688, 762.12494, 1621.9802, 831.2233]
2025-09-16 12:35:31,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 149.0, 230.0, 137.0, 175.0, 165.0, 147.0, 147.0, 325.0, 161.0]
2025-09-16 12:35:31,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (898.08) for latency 6
2025-09-16 12:35:31,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 25 minutes, 56 seconds)
2025-09-16 12:37:24,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:37:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 715.29663 ± 268.778
2025-09-16 12:37:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [683.04626, 380.94058, 935.4571, 660.65643, 716.1511, 1026.6752, 1228.0558, 718.1364, 371.6335, 432.2139]
2025-09-16 12:37:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 88.0, 178.0, 143.0, 151.0, 196.0, 242.0, 149.0, 72.0, 96.0]
2025-09-16 12:37:26,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 23 minutes, 36 seconds)
2025-09-16 12:39:19,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:39:21,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 781.67371 ± 164.441
2025-09-16 12:39:21,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [769.89685, 1032.3757, 779.99457, 831.6563, 461.28952, 1030.0948, 831.5814, 767.3422, 712.828, 599.6772]
2025-09-16 12:39:21,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 204.0, 161.0, 174.0, 86.0, 204.0, 172.0, 148.0, 136.0, 111.0]
2025-09-16 12:39:21,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 21 minutes, 39 seconds)
2025-09-16 12:41:13,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:41:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 704.27557 ± 94.650
2025-09-16 12:41:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [629.6189, 682.64764, 848.43726, 685.74994, 655.13684, 738.19995, 626.1926, 630.4272, 910.66595, 635.6798]
2025-09-16 12:41:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 134.0, 166.0, 131.0, 130.0, 142.0, 118.0, 120.0, 177.0, 127.0]
2025-09-16 12:41:15,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 19 minutes, 48 seconds)
2025-09-16 12:43:11,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:43:13,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 845.45593 ± 290.505
2025-09-16 12:43:13,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1007.84454, 603.5602, 698.04913, 1266.0981, 653.5721, 489.3054, 716.3133, 847.9636, 722.3452, 1449.5079]
2025-09-16 12:43:13,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [200.0, 132.0, 128.0, 246.0, 125.0, 91.0, 140.0, 156.0, 135.0, 278.0]
2025-09-16 12:43:13,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 18 minutes, 34 seconds)
2025-09-16 12:45:04,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:45:06,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 932.33557 ± 420.867
2025-09-16 12:45:06,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [736.3355, 601.7395, 1399.6576, 640.6388, 1161.7611, 1913.3304, 647.57745, 526.56, 697.6954, 998.06006]
2025-09-16 12:45:06,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 113.0, 269.0, 125.0, 227.0, 370.0, 123.0, 100.0, 135.0, 190.0]
2025-09-16 12:45:06,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (932.34) for latency 6
2025-09-16 12:45:06,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 16 minutes, 4 seconds)
2025-09-16 12:47:00,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:47:03,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 844.72040 ± 228.315
2025-09-16 12:47:03,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1028.8214, 1141.1697, 1045.895, 1168.6241, 851.3678, 814.10876, 689.9517, 546.0078, 565.3558, 595.90155]
2025-09-16 12:47:03,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 219.0, 198.0, 226.0, 160.0, 177.0, 145.0, 103.0, 106.0, 128.0]
2025-09-16 12:47:03,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 14 minutes, 30 seconds)
2025-09-16 12:48:56,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:48:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1112.19861 ± 375.307
2025-09-16 12:48:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [879.77765, 1080.6906, 1087.6178, 1800.6497, 513.3169, 1349.3204, 1580.864, 784.2358, 776.9096, 1268.6044]
2025-09-16 12:48:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 203.0, 224.0, 352.0, 111.0, 260.0, 309.0, 144.0, 148.0, 252.0]
2025-09-16 12:48:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1112.20) for latency 6
2025-09-16 12:48:59,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 12 minutes, 52 seconds)
2025-09-16 12:50:55,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:50:58,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1024.29688 ± 298.973
2025-09-16 12:50:58,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1120.3416, 684.4194, 912.53534, 1007.4294, 855.24457, 1422.5923, 949.56464, 854.5306, 1697.5034, 738.8073]
2025-09-16 12:50:58,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 133.0, 172.0, 197.0, 160.0, 270.0, 199.0, 161.0, 318.0, 139.0]
2025-09-16 12:50:58,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 8 seconds)
2025-09-16 12:52:52,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:52:54,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1000.26288 ± 314.125
2025-09-16 12:52:54,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [700.1849, 1206.8372, 1085.6927, 1086.4078, 1472.3142, 554.4291, 780.7534, 908.6416, 691.5352, 1515.833]
2025-09-16 12:52:54,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 224.0, 200.0, 204.0, 294.0, 104.0, 174.0, 170.0, 132.0, 290.0]
2025-09-16 12:52:54,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 50 seconds)
2025-09-16 12:54:47,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:54:49,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1091.63269 ± 412.350
2025-09-16 12:54:49,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [989.7617, 1472.6868, 1963.0516, 609.1937, 1144.4481, 678.5363, 653.74493, 809.51404, 1404.7683, 1190.6213]
2025-09-16 12:54:49,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 277.0, 372.0, 118.0, 220.0, 131.0, 121.0, 153.0, 259.0, 224.0]
2025-09-16 12:54:49,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 16 seconds)
2025-09-16 12:56:45,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:56:48,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1113.87585 ± 318.554
2025-09-16 12:56:48,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [991.297, 1096.0172, 1092.1238, 641.00244, 1741.7959, 842.84686, 1329.0919, 1107.8784, 788.5807, 1508.1238]
2025-09-16 12:56:48,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 224.0, 212.0, 128.0, 342.0, 159.0, 253.0, 227.0, 166.0, 322.0]
2025-09-16 12:56:48,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1113.88) for latency 6
2025-09-16 12:56:48,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 54 seconds)
2025-09-16 12:58:43,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:58:45,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 779.86310 ± 202.801
2025-09-16 12:58:45,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1030.6007, 971.89734, 443.12576, 913.6474, 853.48694, 1068.5239, 618.3798, 653.8857, 627.64636, 617.43665]
2025-09-16 12:58:45,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 193.0, 89.0, 188.0, 179.0, 221.0, 117.0, 145.0, 117.0, 119.0]
2025-09-16 12:58:45,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 4 seconds)
2025-09-16 13:00:39,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:00:42,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1009.01740 ± 230.795
2025-09-16 13:00:42,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1340.8652, 1113.5593, 1340.0593, 1054.5645, 1021.6097, 919.319, 565.7027, 1063.6448, 709.0069, 961.8424]
2025-09-16 13:00:42,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [259.0, 210.0, 258.0, 210.0, 194.0, 174.0, 105.0, 201.0, 150.0, 184.0]
2025-09-16 13:00:42,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 40 seconds)
2025-09-16 13:02:34,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:02:37,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1345.86731 ± 565.232
2025-09-16 13:02:37,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [901.6293, 2882.9104, 1290.3429, 1439.1783, 1168.1206, 991.1473, 1122.6643, 1064.4274, 1706.8118, 891.4411]
2025-09-16 13:02:37,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 583.0, 255.0, 273.0, 217.0, 184.0, 210.0, 194.0, 331.0, 165.0]
2025-09-16 13:02:37,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1345.87) for latency 6
2025-09-16 13:02:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 30 seconds)
2025-09-16 13:04:33,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:04:36,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1109.62781 ± 455.593
2025-09-16 13:04:36,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [646.48615, 628.19714, 1677.4983, 914.4997, 2092.161, 909.81726, 676.11597, 1304.3656, 1260.1086, 987.0296]
2025-09-16 13:04:36,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 118.0, 315.0, 176.0, 395.0, 172.0, 129.0, 245.0, 237.0, 185.0]
2025-09-16 13:04:36,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 15 seconds)
2025-09-16 13:06:34,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:06:38,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1526.30432 ± 492.931
2025-09-16 13:06:38,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2392.5605, 1793.7007, 597.9956, 1305.5457, 1227.3075, 1899.7423, 1270.2505, 1086.0881, 1789.3843, 1900.4672]
2025-09-16 13:06:38,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [465.0, 330.0, 109.0, 244.0, 234.0, 386.0, 238.0, 200.0, 337.0, 355.0]
2025-09-16 13:06:38,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1526.30) for latency 6
2025-09-16 13:06:38,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 56 seconds)
2025-09-16 13:08:30,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:08:34,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1729.22913 ± 580.102
2025-09-16 13:08:34,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1515.4813, 1341.8972, 2281.4553, 1639.2078, 1707.7794, 2536.3018, 466.11023, 2457.8564, 1483.4708, 1862.7307]
2025-09-16 13:08:34,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [291.0, 245.0, 439.0, 322.0, 325.0, 485.0, 107.0, 468.0, 294.0, 355.0]
2025-09-16 13:08:34,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1729.23) for latency 6
2025-09-16 13:08:34,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 55 seconds)
2025-09-16 13:10:29,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:10:36,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2620.13599 ± 1186.267
2025-09-16 13:10:36,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2766.8604, 1720.8226, 5125.408, 2575.1375, 1479.4806, 4394.855, 1276.0193, 2678.6584, 2361.373, 1822.744]
2025-09-16 13:10:36,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [528.0, 323.0, 1000.0, 502.0, 277.0, 867.0, 249.0, 523.0, 456.0, 349.0]
2025-09-16 13:10:36,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (2620.14) for latency 6
2025-09-16 13:10:36,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 54 minutes, 54 seconds)
2025-09-16 13:12:30,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:12:34,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1497.19897 ± 579.874
2025-09-16 13:12:34,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1649.8695, 725.9671, 1808.104, 415.6423, 1624.3129, 2031.3531, 2084.605, 803.58936, 1879.0503, 1949.4961]
2025-09-16 13:12:34,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [311.0, 135.0, 337.0, 76.0, 316.0, 389.0, 387.0, 172.0, 351.0, 372.0]
2025-09-16 13:12:34,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 18 seconds)
2025-09-16 13:14:35,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:14:43,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2670.55054 ± 1401.606
2025-09-16 13:14:43,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3406.4185, 2130.8296, 1790.3739, 2172.7146, 1058.4309, 2812.795, 1562.8812, 5159.193, 5169.1016, 1442.7672]
2025-09-16 13:14:43,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [652.0, 437.0, 351.0, 437.0, 224.0, 559.0, 296.0, 1000.0, 1000.0, 315.0]
2025-09-16 13:14:43,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (2670.55) for latency 6
2025-09-16 13:14:43,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 19 seconds)
2025-09-16 13:16:32,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:16:38,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2004.02319 ± 1046.620
2025-09-16 13:16:38,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [858.7673, 1430.8934, 702.5082, 3126.732, 1972.7844, 1604.3363, 2340.897, 2059.7148, 4422.513, 1521.0868]
2025-09-16 13:16:38,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 274.0, 145.0, 600.0, 372.0, 303.0, 448.0, 413.0, 857.0, 285.0]
2025-09-16 13:16:38,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 57 seconds)
2025-09-16 13:18:33,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:18:37,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1653.08337 ± 612.185
2025-09-16 13:18:37,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1048.0305, 1539.1028, 1630.8207, 2422.1711, 1399.083, 922.0748, 2085.3909, 1613.382, 978.86273, 2891.9158]
2025-09-16 13:18:37,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 291.0, 306.0, 450.0, 263.0, 171.0, 390.0, 300.0, 193.0, 541.0]
2025-09-16 13:18:37,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 48 minutes, 25 seconds)
2025-09-16 13:20:31,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:20:40,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3350.94800 ± 1623.526
2025-09-16 13:20:40,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2989.2668, 3174.1636, 3900.7617, 5242.1177, 4402.2334, 5272.996, 1546.4738, 1475.5262, 513.6057, 4992.3345]
2025-09-16 13:20:40,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [573.0, 598.0, 740.0, 1000.0, 834.0, 1000.0, 301.0, 302.0, 100.0, 955.0]
2025-09-16 13:20:40,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3350.95) for latency 6
2025-09-16 13:20:40,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 44 seconds)
2025-09-16 13:22:36,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:22:42,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2266.90552 ± 1298.010
2025-09-16 13:22:42,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [963.32367, 2610.8093, 769.8265, 1994.1611, 5120.9375, 675.6456, 1830.6575, 2977.7773, 3424.7617, 2301.155]
2025-09-16 13:22:42,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [201.0, 511.0, 154.0, 398.0, 1000.0, 124.0, 366.0, 560.0, 658.0, 470.0]
2025-09-16 13:22:42,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 45 minutes, 30 seconds)
2025-09-16 13:24:41,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:24:52,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3918.77856 ± 1463.590
2025-09-16 13:24:52,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4220.0005, 5327.197, 601.3648, 2411.8843, 4851.3276, 2994.163, 5162.202, 4867.273, 5236.958, 3515.417]
2025-09-16 13:24:52,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [799.0, 1000.0, 124.0, 491.0, 942.0, 575.0, 1000.0, 913.0, 1000.0, 666.0]
2025-09-16 13:24:52,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3918.78) for latency 6
2025-09-16 13:24:52,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 43 minutes, 30 seconds)
2025-09-16 13:26:49,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:26:59,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3405.82666 ± 1969.609
2025-09-16 13:26:59,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1862.2006, 3091.5544, 822.53973, 701.1824, 5226.9546, 5207.275, 5338.8877, 5317.8438, 5284.217, 1205.609]
2025-09-16 13:26:59,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [342.0, 575.0, 158.0, 154.0, 972.0, 1000.0, 1000.0, 1000.0, 1000.0, 228.0]
2025-09-16 13:26:59,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 43 minutes, 28 seconds)
2025-09-16 13:29:00,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:29:11,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3677.16650 ± 1404.808
2025-09-16 13:29:11,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5283.0977, 5120.6978, 1582.9054, 5196.5225, 2358.7214, 5178.1274, 1680.2552, 3050.6196, 3713.834, 3606.8818]
2025-09-16 13:29:11,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 301.0, 1000.0, 470.0, 1000.0, 322.0, 580.0, 707.0, 690.0]
2025-09-16 13:29:11,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 43 minutes, 31 seconds)
2025-09-16 13:31:08,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:31:20,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4224.84180 ± 1246.372
2025-09-16 13:31:20,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3379.5344, 3343.9233, 5289.59, 5254.5806, 4600.5, 5291.806, 5161.8926, 3222.2615, 5244.7676, 1459.5571]
2025-09-16 13:31:20,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [655.0, 651.0, 1000.0, 1000.0, 876.0, 1000.0, 1000.0, 620.0, 1000.0, 275.0]
2025-09-16 13:31:20,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (4224.84) for latency 6
2025-09-16 13:31:20,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 42 minutes, 23 seconds)
2025-09-16 13:33:13,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:33:20,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2771.14673 ± 1575.701
2025-09-16 13:33:20,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2284.746, 867.4118, 2819.0508, 2023.7306, 5278.379, 1143.6046, 3959.7134, 5349.5796, 944.22095, 3041.031]
2025-09-16 13:33:20,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [433.0, 185.0, 523.0, 369.0, 1000.0, 211.0, 742.0, 1000.0, 181.0, 557.0]
2025-09-16 13:33:20,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 39 minutes, 54 seconds)
2025-09-16 13:35:21,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:35:36,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5245.50488 ± 29.380
2025-09-16 13:35:36,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5199.0933, 5254.945, 5231.372, 5207.3955, 5302.4927, 5242.0913, 5239.3403, 5237.9365, 5279.219, 5261.167]
2025-09-16 13:35:36,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:35:36,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5245.50) for latency 6
2025-09-16 13:35:36,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 38 minutes, 46 seconds)
2025-09-16 13:37:27,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:37:39,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4110.92041 ± 1363.946
2025-09-16 13:37:39,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5176.1055, 5200.934, 5167.6196, 3305.1611, 5179.5493, 2336.3599, 2894.1326, 1546.5052, 5181.1646, 5121.672]
2025-09-16 13:37:39,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 628.0, 1000.0, 446.0, 616.0, 300.0, 1000.0, 1000.0]
2025-09-16 13:37:39,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 36 minutes, 2 seconds)
2025-09-16 13:39:34,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:39:44,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3701.75122 ± 1274.012
2025-09-16 13:39:44,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5271.4946, 2674.9307, 2585.0527, 3069.1096, 5233.59, 3222.347, 1876.8816, 5254.691, 2762.2017, 5067.212]
2025-09-16 13:39:44,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 511.0, 521.0, 599.0, 1000.0, 598.0, 350.0, 1000.0, 521.0, 991.0]
2025-09-16 13:39:44,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 32 minutes, 51 seconds)
2025-09-16 13:41:48,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:42:01,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4683.43652 ± 1305.687
2025-09-16 13:42:01,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5241.6484, 3825.4238, 5288.2026, 5175.2773, 5245.993, 981.70795, 5301.206, 5242.06, 5270.7656, 5262.0815]
2025-09-16 13:42:01,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 723.0, 1000.0, 972.0, 1000.0, 196.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:42:01,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 31 minutes, 52 seconds)
2025-09-16 13:43:50,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:44:03,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4510.56396 ± 1072.076
2025-09-16 13:44:03,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2533.222, 4519.024, 5290.6875, 5284.6313, 5310.375, 3560.2444, 5307.46, 2765.572, 5329.8843, 5204.541]
2025-09-16 13:44:03,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [475.0, 855.0, 1000.0, 1000.0, 1000.0, 658.0, 1000.0, 522.0, 1000.0, 1000.0]
2025-09-16 13:44:03,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 29 minutes, 56 seconds)
2025-09-16 13:45:59,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:46:12,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4493.12207 ± 1553.895
2025-09-16 13:46:12,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4670.507, 5372.532, 5341.6606, 5300.333, 1601.1777, 5367.7227, 5349.8506, 1230.6553, 5331.8555, 5364.9297]
2025-09-16 13:46:12,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [869.0, 1000.0, 1000.0, 1000.0, 325.0, 1000.0, 1000.0, 259.0, 1000.0, 1000.0]
2025-09-16 13:46:12,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 26 minutes, 52 seconds)
2025-09-16 13:48:16,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:48:30,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4927.28027 ± 736.264
2025-09-16 13:48:30,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5248.9023, 2768.2053, 5238.0615, 5180.6753, 5212.425, 5234.1216, 4707.4688, 5214.5767, 5221.7734, 5246.592]
2025-09-16 13:48:30,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 542.0, 1000.0, 1000.0, 1000.0, 1000.0, 893.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:48:30,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 26 minutes, 47 seconds)
2025-09-16 13:50:18,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:50:32,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4813.78369 ± 938.825
2025-09-16 13:50:32,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3035.3914, 5248.476, 5249.0522, 5277.679, 5314.2554, 5269.244, 5318.374, 2841.9268, 5287.144, 5296.293]
2025-09-16 13:50:32,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [570.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 542.0, 1000.0, 1000.0]
2025-09-16 13:50:32,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 24 minutes, 17 seconds)
2025-09-16 13:52:31,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:52:46,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5041.43457 ± 614.833
2025-09-16 13:52:46,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5255.1533, 3199.711, 5264.788, 5234.0586, 5183.1147, 5290.775, 5279.727, 5246.519, 5270.2476, 5190.2495]
2025-09-16 13:52:46,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 612.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:52:46,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 21 minutes, 36 seconds)
2025-09-16 13:54:43,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:54:57,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4835.22754 ± 1142.218
2025-09-16 13:54:57,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5221.7905, 1427.5085, 5271.117, 5251.1304, 5252.5605, 5275.1333, 5226.39, 5269.2383, 5295.5537, 4861.854]
2025-09-16 13:54:57,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 274.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 939.0]
2025-09-16 13:54:57,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 20 minutes, 40 seconds)
2025-09-16 13:56:55,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:57:07,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4454.52539 ± 1470.399
2025-09-16 13:57:07,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5294.2827, 4105.7637, 5277.897, 2476.7048, 5330.717, 5241.909, 5346.15, 899.5441, 5271.4077, 5300.879]
2025-09-16 13:57:07,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 771.0, 1000.0, 493.0, 1000.0, 1000.0, 1000.0, 170.0, 1000.0, 1000.0]
2025-09-16 13:57:07,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 18 minutes, 41 seconds)
2025-09-16 13:59:04,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:59:19,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5204.50879 ± 38.219
2025-09-16 13:59:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5227.528, 5205.6, 5237.0845, 5231.1167, 5101.8774, 5227.9976, 5203.8496, 5223.616, 5175.871, 5210.545]
2025-09-16 13:59:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:59:19,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 15 minutes, 45 seconds)
2025-09-16 14:01:11,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:01:26,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5245.40527 ± 23.908
2025-09-16 14:01:26,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5232.575, 5232.325, 5277.2954, 5278.688, 5233.737, 5256.349, 5259.3677, 5197.488, 5258.889, 5227.335]
2025-09-16 14:01:26,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:01:26,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 14 minutes, 8 seconds)
2025-09-16 14:03:26,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:03:37,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4032.16357 ± 1982.995
2025-09-16 14:03:37,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5296.7544, 5341.9463, 810.671, 1580.0823, 5324.6157, 5315.3047, 5351.107, 5315.9824, 674.0588, 5311.1123]
2025-09-16 14:03:37,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 153.0, 294.0, 1000.0, 1000.0, 1000.0, 1000.0, 126.0, 1000.0]
2025-09-16 14:03:37,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 11 minutes, 39 seconds)
2025-09-16 14:05:36,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:05:52,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5270.58643 ± 26.575
2025-09-16 14:05:52,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5295.2314, 5272.565, 5269.3066, 5285.622, 5275.565, 5272.8965, 5274.089, 5275.606, 5290.3267, 5194.651]
2025-09-16 14:05:52,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:05:52,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5270.59) for latency 6
2025-09-16 14:05:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 9 minutes, 53 seconds)
2025-09-16 14:07:49,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:08:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5301.93066 ± 11.639
2025-09-16 14:08:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5300.677, 5293.5547, 5303.5615, 5284.9937, 5287.257, 5314.056, 5308.2188, 5303.158, 5325.8896, 5297.9355]
2025-09-16 14:08:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:08:05,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5301.93) for latency 6
2025-09-16 14:08:05,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 7 minutes, 54 seconds)
2025-09-16 14:09:59,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:10:14,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5292.05127 ± 29.018
2025-09-16 14:10:14,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5280.195, 5318.082, 5310.6987, 5296.5483, 5320.521, 5214.3066, 5298.593, 5292.558, 5281.7715, 5307.238]
2025-09-16 14:10:14,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:10:14,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 5 minutes, 30 seconds)
2025-09-16 14:12:13,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:12:29,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5208.55811 ± 23.851
2025-09-16 14:12:29,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5232.349, 5238.337, 5176.9424, 5164.283, 5195.106, 5214.651, 5210.4365, 5215.6978, 5199.0176, 5238.763]
2025-09-16 14:12:29,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:12:29,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 4 minutes, 1 second)
2025-09-16 14:14:22,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:14:37,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5214.50879 ± 145.142
2025-09-16 14:14:37,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5280.2183, 5233.4155, 5254.723, 5262.319, 5259.1455, 5270.956, 5283.4795, 5224.0195, 5293.3877, 4783.422]
2025-09-16 14:14:37,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 911.0]
2025-09-16 14:14:37,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 35 seconds)
2025-09-16 14:16:40,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:16:54,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4823.46533 ± 1349.544
2025-09-16 14:16:54,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5271.465, 5295.7666, 5296.327, 5260.0693, 5209.7607, 5284.876, 775.4957, 5263.405, 5287.1685, 5290.32]
2025-09-16 14:16:54,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 144.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:16:54,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 59 minutes, 33 seconds)
2025-09-16 14:18:49,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:19:04,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5074.04492 ± 504.185
2025-09-16 14:19:04,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5278.614, 5295.062, 3617.423, 5289.88, 4835.539, 5267.2637, 5319.341, 5261.683, 5319.4077, 5256.238]
2025-09-16 14:19:04,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 685.0, 1000.0, 911.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:19:04,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 57 minutes, 10 seconds)
2025-09-16 14:21:07,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:21:23,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5288.47266 ± 12.575
2025-09-16 14:21:23,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5301.0396, 5291.637, 5263.5405, 5291.002, 5293.4575, 5281.3516, 5281.107, 5308.483, 5297.685, 5275.426]
2025-09-16 14:21:23,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:21:23,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 55 minutes, 41 seconds)
2025-09-16 14:23:17,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:23:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5321.45410 ± 12.070
2025-09-16 14:23:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5311.095, 5319.8, 5300.037, 5320.1313, 5330.496, 5329.1875, 5320.223, 5321.155, 5314.576, 5347.834]
2025-09-16 14:23:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:23:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5321.45) for latency 6
2025-09-16 14:23:33,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 53 minutes, 6 seconds)
2025-09-16 14:25:27,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:25:43,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5288.14941 ± 18.771
2025-09-16 14:25:43,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5256.89, 5283.6777, 5266.7524, 5293.3936, 5308.61, 5300.3647, 5285.763, 5266.786, 5302.973, 5316.2817]
2025-09-16 14:25:43,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:25:43,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 51 minutes, 2 seconds)
2025-09-16 14:27:37,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:27:53,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5282.21924 ± 20.147
2025-09-16 14:27:53,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5306.673, 5299.496, 5286.2427, 5274.376, 5232.113, 5285.692, 5272.6807, 5297.41, 5272.899, 5294.6104]
2025-09-16 14:27:53,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:27:53,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 48 minutes, 20 seconds)
2025-09-16 14:29:51,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:30:07,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5308.16406 ± 13.963
2025-09-16 14:30:07,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5302.529, 5299.1816, 5297.1484, 5325.7886, 5307.361, 5336.617, 5300.802, 5312.3755, 5313.832, 5286.004]
2025-09-16 14:30:07,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:30:07,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 22 seconds)
2025-09-16 14:32:03,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:32:18,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5393.32080 ± 25.403
2025-09-16 14:32:18,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5369.794, 5399.458, 5399.7666, 5422.718, 5409.6016, 5342.372, 5429.7905, 5398.7036, 5366.2197, 5394.786]
2025-09-16 14:32:18,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:32:18,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5393.32) for latency 6
2025-09-16 14:32:18,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 40 seconds)
2025-09-16 14:34:14,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:34:29,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5309.33887 ± 12.761
2025-09-16 14:34:29,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5296.8926, 5308.8926, 5307.4604, 5312.0625, 5326.006, 5307.7715, 5304.744, 5337.037, 5301.35, 5291.1714]
2025-09-16 14:34:29,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:34:29,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 33 seconds)
2025-09-16 14:36:24,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:36:40,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5291.81543 ± 9.412
2025-09-16 14:36:40,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5287.0513, 5303.629, 5285.924, 5277.194, 5301.6006, 5289.0693, 5293.3613, 5307.4263, 5280.609, 5292.292]
2025-09-16 14:36:40,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:36:40,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes, 25 seconds)
2025-09-16 14:38:35,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:38:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5273.31104 ± 14.582
2025-09-16 14:38:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5249.604, 5276.217, 5252.5957, 5280.55, 5285.268, 5258.6294, 5277.2827, 5270.9854, 5285.0884, 5296.889]
2025-09-16 14:38:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:38:50,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 13 seconds)
2025-09-16 14:40:46,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:41:01,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5289.92432 ± 9.483
2025-09-16 14:41:01,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5288.6553, 5279.005, 5303.604, 5275.083, 5292.947, 5293.6377, 5300.6357, 5291.618, 5276.691, 5297.3623]
2025-09-16 14:41:01,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:41:01,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 34 minutes, 54 seconds)
2025-09-16 14:42:56,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:43:12,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5224.39697 ± 64.817
2025-09-16 14:43:12,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5240.747, 5036.319, 5258.278, 5240.3506, 5232.9277, 5282.7896, 5249.95, 5219.0674, 5251.8037, 5231.7324]
2025-09-16 14:43:12,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:43:12,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 32 minutes, 40 seconds)
2025-09-16 14:45:03,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:45:16,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4897.27344 ± 1462.626
2025-09-16 14:45:16,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5389.2275, 5347.742, 5377.09, 5376.6084, 5415.507, 5407.745, 5395.8984, 509.862, 5350.9434, 5402.1104]
2025-09-16 14:45:16,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 94.0, 1000.0, 1000.0]
2025-09-16 14:45:16,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 13 seconds)
2025-09-16 14:47:12,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:47:26,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5383.26074 ± 16.721
2025-09-16 14:47:26,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5404.674, 5389.2227, 5398.65, 5373.379, 5396.016, 5358.937, 5404.153, 5373.2124, 5376.081, 5358.281]
2025-09-16 14:47:26,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:47:26,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 1 second)
2025-09-16 14:49:21,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:49:35,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5022.17432 ± 1274.067
2025-09-16 14:49:35,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5434.5396, 1200.3503, 5435.4697, 5431.11, 5446.468, 5455.439, 5435.4844, 5485.8, 5470.2573, 5426.823]
2025-09-16 14:49:35,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 219.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:49:35,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 47 seconds)
2025-09-16 14:51:36,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:51:52,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5370.95898 ± 13.059
2025-09-16 14:51:52,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5392.1426, 5359.444, 5386.5786, 5367.3345, 5364.5347, 5376.027, 5351.154, 5377.7544, 5380.248, 5354.371]
2025-09-16 14:51:52,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:51:52,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 51 seconds)
2025-09-16 14:53:45,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:54:00,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5250.96484 ± 13.524
2025-09-16 14:54:00,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5268.7124, 5243.6772, 5267.7466, 5239.994, 5226.0664, 5248.253, 5246.0327, 5256.9863, 5243.471, 5268.7075]
2025-09-16 14:54:00,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:54:00,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 36 seconds)
2025-09-16 14:55:56,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:56:10,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5063.66699 ± 865.945
2025-09-16 14:56:10,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5342.0947, 5337.1426, 5371.2446, 5341.6826, 5348.1616, 5363.299, 5327.677, 5362.3438, 2466.2222, 5376.8047]
2025-09-16 14:56:10,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 467.0, 1000.0]
2025-09-16 14:56:10,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 37 seconds)
2025-09-16 14:58:05,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:58:20,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5355.82129 ± 23.048
2025-09-16 14:58:20,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5369.722, 5311.3184, 5345.5, 5371.615, 5347.5723, 5390.394, 5387.0845, 5333.8105, 5349.494, 5351.7036]
2025-09-16 14:58:20,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:58:20,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 25 seconds)
2025-09-16 15:00:09,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:00:22,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4557.33301 ± 1366.481
2025-09-16 15:00:22,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5324.494, 5267.9663, 5167.5723, 5403.9785, 5393.8164, 1502.0267, 2251.4875, 4651.6196, 5379.2905, 5231.08]
2025-09-16 15:00:22,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 284.0, 445.0, 874.0, 1000.0, 1000.0]
2025-09-16 15:00:22,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 5 seconds)
2025-09-16 15:02:16,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:02:30,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4849.38574 ± 1341.463
2025-09-16 15:02:30,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5313.199, 5327.3267, 5292.4736, 5277.4927, 5280.208, 5292.7026, 5320.611, 825.43726, 5259.277, 5305.132]
2025-09-16 15:02:30,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 156.0, 1000.0, 1000.0]
2025-09-16 15:02:30,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 46 seconds)
2025-09-16 15:04:28,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:04:43,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5252.03467 ± 18.110
2025-09-16 15:04:43,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5255.0435, 5212.7354, 5252.126, 5234.2334, 5245.9014, 5269.176, 5283.8027, 5257.4873, 5251.585, 5258.2534]
2025-09-16 15:04:43,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:04:43,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 43 seconds)
2025-09-16 15:06:38,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:06:54,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5355.44775 ± 16.025
2025-09-16 15:06:54,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5362.922, 5351.162, 5331.149, 5356.1836, 5371.651, 5337.2, 5371.368, 5376.732, 5332.3804, 5363.7324]
2025-09-16 15:06:54,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:06:54,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 34 seconds)
2025-09-16 15:08:48,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:09:04,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5399.76855 ± 9.760
2025-09-16 15:09:04,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5406.4897, 5388.049, 5400.1943, 5398.2563, 5391.0386, 5417.861, 5397.038, 5413.3833, 5398.6455, 5386.732]
2025-09-16 15:09:04,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:09:04,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5399.77) for latency 6
2025-09-16 15:09:04,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 26 seconds)
2025-09-16 15:10:52,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:11:07,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5334.57324 ± 14.555
2025-09-16 15:11:07,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5323.8516, 5330.999, 5309.9775, 5337.4775, 5337.0786, 5312.519, 5345.8467, 5349.5864, 5341.9634, 5356.4287]
2025-09-16 15:11:07,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:11:07,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 18 seconds)
2025-09-16 15:13:02,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:13:17,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5403.12451 ± 8.273
2025-09-16 15:13:17,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5400.352, 5393.3477, 5413.2, 5394.1914, 5419.75, 5405.0586, 5409.943, 5398.0483, 5396.684, 5400.673]
2025-09-16 15:13:17,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:13:17,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5403.12) for latency 6
2025-09-16 15:13:17,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 9 seconds)
2025-09-16 15:15:12,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 15:15:27,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5326.93848 ± 20.486
2025-09-16 15:15:27,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5277.1475, 5342.462, 5338.576, 5335.209, 5317.7837, 5345.7554, 5328.205, 5321.4863, 5311.8345, 5350.9233]
2025-09-16 15:15:27,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:15:27,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1251 [DEBUG]: Training session finished
