2025-09-16 11:26:02,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.025-delay_6
2025-09-16 11:26:02,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.025-delay_6
2025-09-16 11:26:02,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x150deb09c550>}
2025-09-16 11:26:02,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:26:02,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:26:02,122 baseline-bpql-noisepromille25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=478, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:26:02,122 baseline-bpql-noisepromille25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:26:03,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:26:03,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:27:51,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:27:52,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 293.41202 ± 17.594
2025-09-16 11:27:52,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [304.87756, 290.28107, 286.11096, 329.0595, 302.381, 302.3679, 287.2858, 287.22375, 288.06815, 256.46457]
2025-09-16 11:27:52,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 64.0, 61.0, 68.0, 64.0, 63.0, 63.0, 62.0, 61.0, 56.0]
2025-09-16 11:27:52,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (293.41) for latency 6
2025-09-16 11:27:52,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 58 minutes, 53 seconds)
2025-09-16 11:29:47,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:29:48,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 383.33310 ± 59.658
2025-09-16 11:29:48,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [290.80453, 397.2674, 361.11, 361.1603, 366.7413, 417.9912, 347.04565, 526.97455, 347.33078, 416.9054]
2025-09-16 11:29:48,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 73.0, 69.0, 77.0, 67.0, 85.0, 65.0, 103.0, 64.0, 80.0]
2025-09-16 11:29:48,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (383.33) for latency 6
2025-09-16 11:29:48,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 3 minutes, 51 seconds)
2025-09-16 11:31:44,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:31:45,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 397.24667 ± 65.147
2025-09-16 11:31:45,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [351.3403, 443.67334, 434.89767, 402.52527, 405.02292, 532.05676, 302.5479, 347.9093, 431.98938, 320.5038]
2025-09-16 11:31:45,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 83.0, 79.0, 79.0, 77.0, 101.0, 61.0, 71.0, 81.0, 64.0]
2025-09-16 11:31:45,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (397.25) for latency 6
2025-09-16 11:31:45,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 3 minutes, 59 seconds)
2025-09-16 11:33:41,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:33:42,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 354.38058 ± 106.166
2025-09-16 11:33:42,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [313.82828, 650.9584, 329.82748, 362.63254, 276.3967, 302.6427, 286.26328, 357.11206, 268.67746, 395.467]
2025-09-16 11:33:42,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 125.0, 73.0, 74.0, 61.0, 67.0, 63.0, 75.0, 58.0, 85.0]
2025-09-16 11:33:42,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 33 seconds)
2025-09-16 11:35:37,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:35:39,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 486.44980 ± 89.764
2025-09-16 11:35:39,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [383.4969, 401.3035, 541.02515, 614.7448, 496.574, 404.2455, 485.2689, 512.4927, 381.57407, 643.77234]
2025-09-16 11:35:39,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 76.0, 102.0, 133.0, 92.0, 76.0, 99.0, 115.0, 72.0, 124.0]
2025-09-16 11:35:39,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (486.45) for latency 6
2025-09-16 11:35:39,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 15 seconds)
2025-09-16 11:37:34,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:37:36,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 452.56885 ± 83.405
2025-09-16 11:37:36,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [394.0419, 532.7607, 502.1184, 493.85834, 439.57455, 284.57993, 476.94226, 418.65967, 385.81238, 597.3402]
2025-09-16 11:37:36,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 114.0, 107.0, 101.0, 86.0, 63.0, 99.0, 82.0, 72.0, 115.0]
2025-09-16 11:37:36,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 2 minutes, 58 seconds)
2025-09-16 11:39:32,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:39:33,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 442.07489 ± 66.208
2025-09-16 11:39:33,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [450.94476, 545.7009, 431.7796, 460.70825, 487.8079, 516.17944, 311.71133, 355.86264, 418.53033, 441.52393]
2025-09-16 11:39:33,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 101.0, 85.0, 85.0, 88.0, 107.0, 70.0, 78.0, 76.0, 81.0]
2025-09-16 11:39:33,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 14 seconds)
2025-09-16 11:41:29,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:41:30,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 350.39282 ± 54.465
2025-09-16 11:41:30,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [298.7435, 303.3995, 329.95798, 415.90778, 367.8986, 308.51285, 479.87515, 331.31848, 320.70557, 347.60876]
2025-09-16 11:41:30,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 66.0, 72.0, 89.0, 81.0, 67.0, 106.0, 72.0, 69.0, 77.0]
2025-09-16 11:41:30,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 59 minutes, 33 seconds)
2025-09-16 11:43:26,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:43:28,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 508.26074 ± 110.127
2025-09-16 11:43:28,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [348.29297, 536.07196, 561.74054, 589.07275, 414.5942, 762.92706, 445.23447, 493.79227, 510.42822, 420.45337]
2025-09-16 11:43:28,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 114.0, 106.0, 110.0, 79.0, 147.0, 84.0, 96.0, 96.0, 79.0]
2025-09-16 11:43:28,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (508.26) for latency 6
2025-09-16 11:43:28,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 36 seconds)
2025-09-16 11:45:24,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:45:25,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 502.02588 ± 72.524
2025-09-16 11:45:25,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [613.27124, 483.28647, 442.56497, 375.77524, 457.88403, 568.93677, 441.49863, 595.0448, 542.6124, 499.38406]
2025-09-16 11:45:25,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 91.0, 83.0, 73.0, 87.0, 121.0, 82.0, 128.0, 104.0, 99.0]
2025-09-16 11:45:25,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 1 second)
2025-09-16 11:47:21,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:47:23,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 602.80457 ± 165.324
2025-09-16 11:47:23,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [504.289, 322.34024, 521.4423, 692.9133, 736.34314, 466.1506, 913.5958, 574.02057, 775.20026, 521.75]
2025-09-16 11:47:23,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 72.0, 99.0, 130.0, 140.0, 88.0, 177.0, 113.0, 147.0, 101.0]
2025-09-16 11:47:23,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (602.80) for latency 6
2025-09-16 11:47:23,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 54 minutes, 7 seconds)
2025-09-16 11:49:19,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:49:21,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 580.61578 ± 63.282
2025-09-16 11:49:21,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [539.8622, 560.95465, 487.33594, 674.42, 648.5669, 491.91107, 653.4745, 624.381, 554.2362, 571.01514]
2025-09-16 11:49:21,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 105.0, 104.0, 144.0, 123.0, 102.0, 128.0, 133.0, 108.0, 109.0]
2025-09-16 11:49:21,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 52 minutes, 23 seconds)
2025-09-16 11:51:17,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:51:19,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 618.48273 ± 145.349
2025-09-16 11:51:19,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [945.6667, 548.0065, 530.51245, 451.77182, 648.9975, 509.14908, 793.9982, 480.87625, 650.86005, 624.9888]
2025-09-16 11:51:19,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 104.0, 107.0, 97.0, 125.0, 102.0, 159.0, 97.0, 126.0, 121.0]
2025-09-16 11:51:19,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (618.48) for latency 6
2025-09-16 11:51:19,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 50 minutes, 47 seconds)
2025-09-16 11:53:15,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:53:16,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 668.70203 ± 104.558
2025-09-16 11:53:16,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [634.73114, 649.7122, 765.60114, 792.6641, 662.8685, 748.0274, 462.62085, 590.67, 806.7994, 573.32587]
2025-09-16 11:53:16,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 122.0, 144.0, 148.0, 127.0, 142.0, 90.0, 113.0, 150.0, 127.0]
2025-09-16 11:53:16,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (668.70) for latency 6
2025-09-16 11:53:16,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 48 minutes, 45 seconds)
2025-09-16 11:55:14,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:55:16,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 624.50671 ± 200.252
2025-09-16 11:55:16,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [753.02, 564.019, 1125.428, 500.01642, 515.24567, 721.7519, 601.8198, 409.76047, 643.9957, 410.01025]
2025-09-16 11:55:16,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 116.0, 227.0, 103.0, 106.0, 127.0, 118.0, 84.0, 128.0, 86.0]
2025-09-16 11:55:16,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 47 minutes, 18 seconds)
2025-09-16 11:57:12,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:57:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 765.81628 ± 149.283
2025-09-16 11:57:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1072.1436, 715.50806, 705.15216, 549.24817, 878.55536, 730.5661, 625.8467, 721.7866, 956.6941, 702.66174]
2025-09-16 11:57:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [213.0, 139.0, 136.0, 102.0, 159.0, 151.0, 118.0, 139.0, 183.0, 132.0]
2025-09-16 11:57:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (765.82) for latency 6
2025-09-16 11:57:14,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 45 minutes, 34 seconds)
2025-09-16 11:59:10,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:59:12,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 668.42651 ± 122.306
2025-09-16 11:59:12,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [677.7697, 511.95258, 688.25885, 551.1249, 474.65845, 820.8354, 666.4827, 721.4957, 686.61084, 885.0763]
2025-09-16 11:59:12,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 94.0, 129.0, 102.0, 88.0, 171.0, 124.0, 139.0, 124.0, 170.0]
2025-09-16 11:59:12,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 43 minutes, 35 seconds)
2025-09-16 12:01:08,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:01:10,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 550.15808 ± 132.953
2025-09-16 12:01:10,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [613.4791, 386.1119, 549.51166, 672.2029, 829.2851, 421.3681, 536.08044, 627.0481, 471.99173, 394.5018]
2025-09-16 12:01:10,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 85.0, 110.0, 138.0, 168.0, 89.0, 104.0, 127.0, 98.0, 85.0]
2025-09-16 12:01:10,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 41 minutes, 24 seconds)
2025-09-16 12:03:08,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:03:09,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 613.63745 ± 112.650
2025-09-16 12:03:09,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [538.8742, 689.02814, 550.9864, 870.4013, 725.01086, 498.64435, 654.4126, 520.25085, 535.1229, 553.64246]
2025-09-16 12:03:09,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 147.0, 102.0, 164.0, 153.0, 91.0, 123.0, 96.0, 99.0, 102.0]
2025-09-16 12:03:09,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes, 3 seconds)
2025-09-16 12:05:05,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:05:07,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 735.31287 ± 127.968
2025-09-16 12:05:07,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [724.89764, 666.1531, 552.1682, 630.5786, 723.8225, 693.6951, 716.992, 1049.3224, 744.0987, 851.40015]
2025-09-16 12:05:07,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 139.0, 108.0, 122.0, 132.0, 128.0, 137.0, 187.0, 138.0, 158.0]
2025-09-16 12:05:07,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 37 minutes, 41 seconds)
2025-09-16 12:07:06,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:07:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 855.48376 ± 276.283
2025-09-16 12:07:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1326.3169, 682.7935, 715.812, 725.9808, 1444.6859, 707.67065, 735.282, 655.46967, 637.57306, 923.2531]
2025-09-16 12:07:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [258.0, 131.0, 135.0, 136.0, 287.0, 129.0, 161.0, 122.0, 124.0, 185.0]
2025-09-16 12:07:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (855.48) for latency 6
2025-09-16 12:07:08,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 36 minutes, 25 seconds)
2025-09-16 12:09:04,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:09:06,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 626.49420 ± 106.563
2025-09-16 12:09:06,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [614.71484, 729.9603, 539.44275, 618.49994, 809.66736, 505.12265, 518.75616, 514.73, 776.7312, 637.31683]
2025-09-16 12:09:06,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 141.0, 101.0, 119.0, 168.0, 106.0, 96.0, 95.0, 153.0, 118.0]
2025-09-16 12:09:06,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 26 seconds)
2025-09-16 12:11:03,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:11:05,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 699.47852 ± 209.815
2025-09-16 12:11:05,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [771.097, 662.44275, 756.55133, 572.2117, 1250.5023, 566.7312, 806.79517, 512.3922, 530.2858, 565.776]
2025-09-16 12:11:05,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 130.0, 142.0, 105.0, 258.0, 104.0, 150.0, 108.0, 112.0, 107.0]
2025-09-16 12:11:05,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 44 seconds)
2025-09-16 12:13:02,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:13:04,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 791.23602 ± 215.030
2025-09-16 12:13:04,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [477.02026, 771.22736, 982.70624, 1142.1073, 719.9254, 925.4993, 589.917, 534.7754, 1052.8638, 716.3181]
2025-09-16 12:13:04,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 147.0, 191.0, 221.0, 138.0, 180.0, 110.0, 103.0, 211.0, 133.0]
2025-09-16 12:13:04,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 46 seconds)
2025-09-16 12:15:03,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:15:05,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 733.46582 ± 176.836
2025-09-16 12:15:05,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [536.9799, 926.99854, 917.80676, 437.86014, 807.35406, 1006.85315, 759.0352, 639.29443, 573.6508, 728.8248]
2025-09-16 12:15:05,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 185.0, 183.0, 84.0, 168.0, 189.0, 158.0, 136.0, 112.0, 136.0]
2025-09-16 12:15:05,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 20 seconds)
2025-09-16 12:17:03,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:17:05,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 735.01038 ± 206.812
2025-09-16 12:17:05,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [764.2096, 853.1064, 1001.3399, 576.0956, 389.68033, 723.98096, 486.21335, 774.3809, 683.76575, 1097.3315]
2025-09-16 12:17:05,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 172.0, 209.0, 108.0, 85.0, 135.0, 96.0, 143.0, 143.0, 211.0]
2025-09-16 12:17:05,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 27 minutes, 10 seconds)
2025-09-16 12:19:02,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:19:04,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 863.97723 ± 311.743
2025-09-16 12:19:04,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [719.61755, 1252.5237, 1013.7501, 808.847, 786.03937, 684.77185, 585.5237, 1579.8179, 680.89124, 527.9905]
2025-09-16 12:19:04,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 262.0, 196.0, 158.0, 160.0, 125.0, 108.0, 305.0, 125.0, 97.0]
2025-09-16 12:19:04,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (863.98) for latency 6
2025-09-16 12:19:04,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 25 minutes, 29 seconds)
2025-09-16 12:21:01,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:21:04,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 845.24817 ± 327.383
2025-09-16 12:21:04,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [662.07275, 1559.2698, 636.24335, 558.7954, 580.1347, 737.8428, 1224.902, 614.6538, 1169.3512, 709.2161]
2025-09-16 12:21:04,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 323.0, 118.0, 109.0, 104.0, 136.0, 229.0, 111.0, 222.0, 149.0]
2025-09-16 12:21:04,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 minutes, 43 seconds)
2025-09-16 12:23:02,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:23:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1152.32788 ± 265.971
2025-09-16 12:23:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [899.29694, 1693.9808, 1212.377, 746.89905, 966.64154, 1164.1287, 961.5734, 1112.0339, 1420.611, 1345.7366]
2025-09-16 12:23:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 325.0, 242.0, 153.0, 187.0, 235.0, 191.0, 233.0, 275.0, 266.0]
2025-09-16 12:23:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1152.33) for latency 6
2025-09-16 12:23:06,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 17 seconds)
2025-09-16 12:25:02,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:25:04,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 765.23846 ± 142.262
2025-09-16 12:25:04,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [622.4982, 864.4598, 496.7638, 805.9514, 822.1002, 633.6563, 787.1087, 706.4391, 974.4131, 938.9941]
2025-09-16 12:25:04,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 159.0, 105.0, 154.0, 157.0, 136.0, 151.0, 150.0, 220.0, 178.0]
2025-09-16 12:25:04,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 52 seconds)
2025-09-16 12:27:02,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:27:06,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1342.87708 ± 241.877
2025-09-16 12:27:06,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1406.9807, 911.778, 1299.8629, 1808.9327, 1571.2302, 1132.9362, 1171.576, 1529.8097, 1373.5359, 1222.1271]
2025-09-16 12:27:06,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [284.0, 172.0, 246.0, 341.0, 297.0, 214.0, 237.0, 297.0, 256.0, 234.0]
2025-09-16 12:27:06,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (1342.88) for latency 6
2025-09-16 12:27:06,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 15 seconds)
2025-09-16 12:29:03,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:29:06,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1023.43573 ± 249.638
2025-09-16 12:29:06,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1093.6123, 681.04205, 1159.8529, 651.2589, 1177.342, 1475.7877, 1147.8262, 1044.3134, 714.53564, 1088.7875]
2025-09-16 12:29:06,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [205.0, 124.0, 217.0, 131.0, 237.0, 281.0, 219.0, 204.0, 138.0, 208.0]
2025-09-16 12:29:06,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 27 seconds)
2025-09-16 12:31:05,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:31:08,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1293.55786 ± 559.754
2025-09-16 12:31:08,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1376.7574, 903.9716, 1821.0079, 1612.2548, 880.6994, 1168.1716, 2559.958, 1292.7836, 628.28174, 691.6918]
2025-09-16 12:31:08,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [264.0, 173.0, 358.0, 321.0, 187.0, 220.0, 494.0, 268.0, 120.0, 137.0]
2025-09-16 12:31:08,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 15 minutes, 1 second)
2025-09-16 12:33:08,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:33:11,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1253.21460 ± 346.763
2025-09-16 12:33:11,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [940.5127, 1234.7988, 1750.5616, 1305.8545, 1079.7715, 1309.6255, 969.2977, 856.0036, 1998.2485, 1087.4731]
2025-09-16 12:33:11,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 231.0, 326.0, 238.0, 205.0, 246.0, 186.0, 165.0, 374.0, 204.0]
2025-09-16 12:33:11,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 15 seconds)
2025-09-16 12:35:10,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:35:16,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2533.19458 ± 1179.673
2025-09-16 12:35:16,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5066.8984, 3973.9785, 2147.8286, 1130.3135, 1524.5105, 2857.0193, 2601.3286, 1236.7944, 2875.8474, 1917.4268]
2025-09-16 12:35:16,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [976.0, 764.0, 407.0, 223.0, 289.0, 568.0, 507.0, 242.0, 552.0, 382.0]
2025-09-16 12:35:16,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2533.19) for latency 6
2025-09-16 12:35:16,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 12 minutes, 40 seconds)
2025-09-16 12:37:15,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:37:20,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1909.99878 ± 1247.328
2025-09-16 12:37:20,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2366.783, 1460.0671, 1277.1285, 628.91046, 5038.7466, 845.4652, 785.8912, 2573.035, 2527.6685, 1596.2926]
2025-09-16 12:37:20,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [470.0, 279.0, 257.0, 134.0, 965.0, 183.0, 168.0, 499.0, 498.0, 289.0]
2025-09-16 12:37:20,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 11 minutes)
2025-09-16 12:39:22,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:39:30,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 2816.82275 ± 1389.208
2025-09-16 12:39:30,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [591.3532, 1637.9204, 2917.0554, 5054.5054, 3327.3567, 5152.5684, 1904.1014, 3306.883, 1800.1106, 2476.3752]
2025-09-16 12:39:30,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 332.0, 559.0, 991.0, 638.0, 1000.0, 385.0, 644.0, 351.0, 480.0]
2025-09-16 12:39:30,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (2816.82) for latency 6
2025-09-16 12:39:30,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 11 minutes, 7 seconds)
2025-09-16 12:41:30,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:41:35,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 1961.00623 ± 670.743
2025-09-16 12:41:35,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2379.7302, 1539.5413, 1869.6412, 1411.096, 1207.4215, 1895.4631, 3742.9102, 1889.8984, 1998.679, 1675.6808]
2025-09-16 12:41:35,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [463.0, 303.0, 360.0, 294.0, 238.0, 371.0, 722.0, 362.0, 388.0, 328.0]
2025-09-16 12:41:35,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 9 minutes, 38 seconds)
2025-09-16 12:43:37,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:43:48,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3810.94727 ± 1336.072
2025-09-16 12:43:48,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [4278.4243, 4208.334, 1075.7234, 5032.3003, 2758.0334, 5086.9995, 5085.3223, 3006.2356, 5110.3755, 2467.7273]
2025-09-16 12:43:48,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [840.0, 825.0, 221.0, 993.0, 557.0, 1000.0, 1000.0, 589.0, 1000.0, 484.0]
2025-09-16 12:43:48,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (3810.95) for latency 6
2025-09-16 12:43:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 9 minutes, 26 seconds)
2025-09-16 12:45:50,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:46:01,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3800.33911 ± 1563.497
2025-09-16 12:46:01,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [2645.4749, 1138.5839, 4015.148, 5219.1357, 1744.6732, 5190.0317, 5213.735, 2430.5654, 5176.362, 5229.6816]
2025-09-16 12:46:01,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [509.0, 232.0, 776.0, 1000.0, 348.0, 1000.0, 1000.0, 466.0, 1000.0, 1000.0]
2025-09-16 12:46:01,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 8 minutes, 54 seconds)
2025-09-16 12:48:03,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:48:16,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4492.20068 ± 1369.092
2025-09-16 12:48:16,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5180.537, 5131.149, 5132.6587, 929.1456, 5162.2856, 5160.7373, 5142.4863, 5103.3657, 2858.583, 5121.06]
2025-09-16 12:48:16,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 180.0, 1000.0, 1000.0, 1000.0, 1000.0, 577.0, 1000.0]
2025-09-16 12:48:16,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4492.20) for latency 6
2025-09-16 12:48:16,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 9 minutes, 1 second)
2025-09-16 12:50:11,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:50:23,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4294.62402 ± 1508.756
2025-09-16 12:50:23,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5262.6885, 5288.0146, 5285.441, 5286.825, 5241.1396, 5226.6626, 1744.3077, 5235.662, 2836.2932, 1539.2073]
2025-09-16 12:50:23,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 329.0, 1000.0, 537.0, 288.0]
2025-09-16 12:50:23,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 6 minutes, 5 seconds)
2025-09-16 12:52:22,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:52:36,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4675.90234 ± 959.831
2025-09-16 12:52:36,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5137.7153, 5149.12, 5125.7017, 5166.625, 5130.907, 3164.955, 2408.3323, 5133.4785, 5151.5625, 5190.623]
2025-09-16 12:52:36,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 623.0, 483.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:52:36,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (4675.90) for latency 6
2025-09-16 12:52:36,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 5 minutes, 31 seconds)
2025-09-16 12:54:41,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:54:55,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5006.39600 ± 789.975
2025-09-16 12:54:55,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5266.5107, 5307.223, 5253.096, 2638.084, 5254.142, 5203.5747, 5308.6875, 5289.163, 5283.536, 5259.944]
2025-09-16 12:54:55,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 505.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:54:55,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5006.40) for latency 6
2025-09-16 12:54:55,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 4 minutes, 30 seconds)
2025-09-16 12:56:45,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:56:55,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3565.96362 ± 1317.028
2025-09-16 12:56:55,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5364.3823, 5327.615, 1906.1561, 4134.4434, 2692.862, 3334.077, 5329.062, 3062.8042, 2647.895, 1860.3396]
2025-09-16 12:56:55,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 367.0, 778.0, 509.0, 640.0, 1000.0, 573.0, 485.0, 343.0]
2025-09-16 12:56:55,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 59 minutes, 48 seconds)
2025-09-16 12:58:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:59:08,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3948.82080 ± 1460.540
2025-09-16 12:59:08,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5209.0737, 5149.9487, 1817.4926, 5168.2974, 4320.564, 3800.703, 2077.6658, 5197.9375, 5160.2935, 1586.2294]
2025-09-16 12:59:08,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 348.0, 1000.0, 834.0, 725.0, 404.0, 1000.0, 1000.0, 313.0]
2025-09-16 12:59:08,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 57 minutes, 27 seconds)
2025-09-16 13:01:14,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:01:25,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4283.73096 ± 1376.569
2025-09-16 13:01:25,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [4428.8066, 3020.708, 5402.409, 4353.0215, 5460.474, 5412.955, 2151.334, 5400.9946, 5433.0107, 1773.5928]
2025-09-16 13:01:25,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [822.0, 571.0, 1000.0, 808.0, 1000.0, 1000.0, 415.0, 1000.0, 1000.0, 336.0]
2025-09-16 13:01:25,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 57 minutes, 4 seconds)
2025-09-16 13:03:25,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:03:38,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4771.93994 ± 1118.438
2025-09-16 13:03:38,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5150.743, 5141.4985, 5167.0054, 5118.505, 5135.539, 5139.032, 5140.942, 1416.9292, 5173.3247, 5135.88]
2025-09-16 13:03:38,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 291.0, 1000.0, 1000.0]
2025-09-16 13:03:38,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 54 minutes, 48 seconds)
2025-09-16 13:05:29,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:05:43,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4951.01318 ± 856.248
2025-09-16 13:05:43,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5221.012, 5269.329, 5228.274, 5324.1123, 5224.9546, 5216.226, 5240.573, 5268.5, 2386.083, 5131.0645]
2025-09-16 13:05:43,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 453.0, 1000.0]
2025-09-16 13:05:43,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 50 minutes, 11 seconds)
2025-09-16 13:07:54,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:08:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5254.36621 ± 48.319
2025-09-16 13:08:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5262.949, 5206.2847, 5166.862, 5308.2803, 5278.0337, 5213.348, 5251.6343, 5278.766, 5237.823, 5339.6816]
2025-09-16 13:08:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:08:08,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5254.37) for latency 6
2025-09-16 13:08:08,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 52 minutes, 16 seconds)
2025-09-16 13:10:02,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:10:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5399.29590 ± 74.756
2025-09-16 13:10:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5495.517, 5445.5522, 5376.652, 5385.4717, 5406.487, 5367.399, 5415.8076, 5418.3604, 5208.1035, 5473.607]
2025-09-16 13:10:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:10:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5399.30) for latency 6
2025-09-16 13:10:17,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 49 minutes, 10 seconds)
2025-09-16 13:12:21,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:12:36,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5334.00195 ± 40.443
2025-09-16 13:12:36,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5302.0913, 5397.749, 5362.357, 5359.191, 5332.5625, 5334.177, 5366.7827, 5264.9966, 5347.8086, 5272.304]
2025-09-16 13:12:36,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:12:36,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 47 minutes, 19 seconds)
2025-09-16 13:14:33,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:14:45,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4557.10449 ± 973.053
2025-09-16 13:14:45,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5421.479, 5442.31, 3564.8691, 5444.5527, 4345.3496, 5393.205, 2982.573, 4383.1035, 3118.5273, 5475.0728]
2025-09-16 13:14:45,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 653.0, 1000.0, 802.0, 1000.0, 552.0, 813.0, 575.0, 1000.0]
2025-09-16 13:14:45,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 44 minutes, 26 seconds)
2025-09-16 13:16:43,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:16:56,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5127.76172 ± 646.616
2025-09-16 13:16:56,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5377.6484, 5445.7954, 5406.691, 5466.2866, 5472.936, 5492.6797, 3646.598, 4051.3396, 5435.398, 5482.2437]
2025-09-16 13:16:56,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 665.0, 754.0, 1000.0, 1000.0]
2025-09-16 13:16:56,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 43 minutes, 15 seconds)
2025-09-16 13:19:00,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:19:15,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5228.11035 ± 16.932
2025-09-16 13:19:15,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5190.767, 5229.2, 5221.507, 5242.9404, 5242.884, 5253.924, 5216.8306, 5216.5923, 5230.921, 5235.54]
2025-09-16 13:19:15,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:19:15,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 40 minutes, 1 second)
2025-09-16 13:21:12,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:21:25,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4776.82568 ± 1488.107
2025-09-16 13:21:25,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [312.77563, 5245.37, 5260.869, 5266.1704, 5296.421, 5255.7837, 5291.2344, 5284.993, 5291.184, 5263.4575]
2025-09-16 13:21:25,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:21:25,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 38 minutes, 2 seconds)
2025-09-16 13:23:25,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:23:40,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5296.62158 ± 163.144
2025-09-16 13:23:40,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5311.2256, 5327.1245, 5421.298, 5304.5063, 5299.669, 5428.924, 4826.2827, 5375.403, 5309.543, 5362.2407]
2025-09-16 13:23:40,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 939.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:23:40,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 35 minutes, 6 seconds)
2025-09-16 13:25:44,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:25:52,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 3258.65723 ± 1506.250
2025-09-16 13:25:52,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [1742.6923, 4517.244, 4043.3953, 5489.2134, 5434.403, 2212.293, 2291.1235, 1770.2906, 1276.0806, 3809.8372]
2025-09-16 13:25:52,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [321.0, 827.0, 733.0, 1000.0, 1000.0, 394.0, 412.0, 322.0, 235.0, 693.0]
2025-09-16 13:25:52,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 33 minutes, 19 seconds)
2025-09-16 13:27:47,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:28:00,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4934.87744 ± 1423.355
2025-09-16 13:28:00,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5445.563, 5525.662, 5327.0103, 668.0348, 5337.9995, 5370.197, 5432.5034, 5443.5864, 5413.4995, 5384.7173]
2025-09-16 13:28:00,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 122.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:28:00,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 30 minutes, 42 seconds)
2025-09-16 13:30:01,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:30:15,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5321.67383 ± 231.905
2025-09-16 13:30:15,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5444.331, 5426.475, 5424.624, 5415.242, 5334.0776, 4637.501, 5397.9404, 5417.625, 5415.55, 5303.3696]
2025-09-16 13:30:15,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 854.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:30:15,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 28 minutes, 1 second)
2025-09-16 13:32:12,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:32:24,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4432.01611 ± 1443.461
2025-09-16 13:32:24,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5192.644, 1174.4606, 5084.674, 5185.736, 5140.267, 1959.6997, 5161.2144, 5143.0205, 5141.6255, 5136.816]
2025-09-16 13:32:24,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 236.0, 1000.0, 1000.0, 1000.0, 409.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:32:24,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 25 minutes, 41 seconds)
2025-09-16 13:34:19,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:34:33,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5396.09717 ± 40.379
2025-09-16 13:34:33,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5392.914, 5396.1685, 5324.2637, 5399.0996, 5373.8047, 5374.6826, 5489.983, 5383.193, 5396.2817, 5430.582]
2025-09-16 13:34:33,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:34:33,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 22 minutes, 47 seconds)
2025-09-16 13:36:33,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:36:45,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4518.22705 ± 1362.064
2025-09-16 13:36:45,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5385.369, 2315.452, 5477.7817, 5412.9624, 2369.4678, 2639.4407, 5378.6255, 5380.9043, 5392.7524, 5429.5107]
2025-09-16 13:36:45,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 413.0, 1000.0, 1000.0, 443.0, 471.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:36:45,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 20 minutes, 33 seconds)
2025-09-16 13:38:43,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:38:58,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5227.17285 ± 60.349
2025-09-16 13:38:58,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5173.638, 5191.9355, 5216.4507, 5240.783, 5194.571, 5287.695, 5376.4023, 5160.296, 5226.0845, 5203.867]
2025-09-16 13:38:58,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:38:58,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 19 minutes)
2025-09-16 13:40:57,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:41:11,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5109.01807 ± 479.356
2025-09-16 13:41:11,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5477.1562, 4096.757, 5458.671, 5296.7236, 5264.0103, 5299.9463, 5368.388, 5302.2856, 4226.586, 5299.6543]
2025-09-16 13:41:11,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 750.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 765.0, 1000.0]
2025-09-16 13:41:11,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 16 minutes, 29 seconds)
2025-09-16 13:43:11,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:43:25,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4827.99805 ± 1352.155
2025-09-16 13:43:25,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5323.975, 5255.9487, 5243.993, 5390.7544, 5255.6265, 5200.8716, 778.27094, 5231.8755, 5167.522, 5431.14]
2025-09-16 13:43:25,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 145.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:43:25,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 14 minutes, 49 seconds)
2025-09-16 13:45:27,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:45:38,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4266.51660 ± 1607.435
2025-09-16 13:45:38,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5447.437, 563.0123, 5431.9277, 3057.1929, 5397.199, 5455.602, 2939.725, 5460.389, 3467.1038, 5445.578]
2025-09-16 13:45:38,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 107.0, 1000.0, 550.0, 1000.0, 1000.0, 541.0, 1000.0, 637.0, 1000.0]
2025-09-16 13:45:38,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 13 minutes, 7 seconds)
2025-09-16 13:47:30,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:47:42,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4689.00879 ± 1289.728
2025-09-16 13:47:42,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3343.2095, 5534.993, 1896.5504, 3141.2207, 5467.6255, 5466.0024, 5516.46, 5443.105, 5554.6533, 5526.274]
2025-09-16 13:47:42,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [615.0, 1000.0, 339.0, 580.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:47:42,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 10 minutes, 4 seconds)
2025-09-16 13:49:38,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:49:52,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5462.32324 ± 19.788
2025-09-16 13:49:52,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5491.437, 5482.069, 5470.4316, 5451.107, 5426.6953, 5438.5884, 5483.9795, 5469.98, 5455.716, 5453.2275]
2025-09-16 13:49:52,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:49:52,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1226 [INFO]: New best (5462.32) for latency 6
2025-09-16 13:49:52,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 7 minutes, 32 seconds)
2025-09-16 13:52:00,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:52:13,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4873.30566 ± 802.267
2025-09-16 13:52:13,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5297.775, 4028.7808, 5321.8984, 3534.6206, 5328.6035, 5459.797, 5522.54, 5453.195, 5336.0654, 3449.7795]
2025-09-16 13:52:13,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 747.0, 1000.0, 643.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 658.0]
2025-09-16 13:52:13,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 6 minutes, 11 seconds)
2025-09-16 13:54:11,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:54:25,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4888.39307 ± 1337.130
2025-09-16 13:54:25,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5328.244, 5334.357, 5338.7056, 5287.124, 5349.089, 5340.736, 5403.54, 5313.8633, 5310.3354, 877.9367]
2025-09-16 13:54:25,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 168.0]
2025-09-16 13:54:25,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 3 minutes, 47 seconds)
2025-09-16 13:56:23,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:56:36,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4969.78564 ± 1137.257
2025-09-16 13:56:36,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5363.174, 5298.4873, 5361.174, 1558.9528, 5308.1216, 5391.637, 5338.591, 5365.342, 5345.279, 5367.1016]
2025-09-16 13:56:36,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 289.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:56:36,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 25 seconds)
2025-09-16 13:58:28,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:58:43,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5163.67578 ± 24.069
2025-09-16 13:58:43,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5208.425, 5197.2134, 5164.763, 5152.0264, 5141.8657, 5158.8447, 5153.4526, 5185.0645, 5144.731, 5130.3716]
2025-09-16 13:58:43,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:58:43,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 59 minutes, 29 seconds)
2025-09-16 14:00:44,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:00:58,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5111.63965 ± 15.068
2025-09-16 14:00:58,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5095.6514, 5099.9854, 5110.4966, 5113.117, 5086.943, 5124.564, 5115.453, 5109.767, 5144.19, 5116.2256]
2025-09-16 14:00:58,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:00:58,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 57 minutes, 42 seconds)
2025-09-16 14:02:56,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:03:10,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5122.58398 ± 960.797
2025-09-16 14:03:10,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5515.6646, 5406.927, 5450.6274, 5193.5723, 5480.0483, 5439.1934, 5521.151, 5456.0293, 2252.8584, 5509.7695]
2025-09-16 14:03:10,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 942.0, 1000.0, 1000.0, 1000.0, 1000.0, 415.0, 1000.0]
2025-09-16 14:03:10,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 54 minutes, 44 seconds)
2025-09-16 14:05:12,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:05:26,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5310.83496 ± 184.309
2025-09-16 14:05:26,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5365.8022, 4766.7627, 5400.1953, 5348.357, 5324.8975, 5401.8843, 5440.9214, 5335.5117, 5359.7593, 5364.2607]
2025-09-16 14:05:26,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 870.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:05:26,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 52 minutes, 54 seconds)
2025-09-16 14:07:19,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:07:32,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5346.35010 ± 462.458
2025-09-16 14:07:32,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [3963.1055, 5526.721, 5513.1924, 5500.749, 5465.8296, 5509.817, 5420.5703, 5565.1436, 5497.4375, 5500.937]
2025-09-16 14:07:32,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [725.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:07:32,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 50 minutes, 19 seconds)
2025-09-16 14:09:26,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:09:39,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4815.06934 ± 1065.478
2025-09-16 14:09:39,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5183.2437, 5141.141, 1619.4698, 5160.2964, 5151.9185, 5170.608, 5222.0083, 5183.0737, 5185.9346, 5133.005]
2025-09-16 14:09:39,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 323.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:09:39,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 48 minutes, 7 seconds)
2025-09-16 14:11:40,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:11:54,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5456.09521 ± 9.730
2025-09-16 14:11:54,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5463.2803, 5464.9985, 5439.256, 5445.7017, 5470.078, 5448.9297, 5465.198, 5449.8496, 5451.6177, 5462.0483]
2025-09-16 14:11:54,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:11:54,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 45 minutes, 56 seconds)
2025-09-16 14:13:53,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:14:07,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5393.33105 ± 21.504
2025-09-16 14:14:07,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5419.4775, 5372.5366, 5369.576, 5412.377, 5367.3735, 5387.5254, 5423.781, 5414.2803, 5395.482, 5370.9062]
2025-09-16 14:14:07,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:14:07,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 51 seconds)
2025-09-16 14:16:05,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:16:19,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5051.00977 ± 557.806
2025-09-16 14:16:19,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5202.257, 5291.193, 5259.8403, 5236.6763, 5257.0273, 5235.0903, 5265.947, 5251.038, 5128.632, 3382.3992]
2025-09-16 14:16:19,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 663.0]
2025-09-16 14:16:19,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 20 seconds)
2025-09-16 14:18:17,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:18:30,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4833.47559 ± 1389.866
2025-09-16 14:18:30,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5330.037, 5226.869, 5279.0664, 5307.288, 664.79346, 5284.687, 5310.8765, 5320.184, 5328.097, 5282.8525]
2025-09-16 14:18:30,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 122.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:18:30,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes, 28 seconds)
2025-09-16 14:20:31,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:20:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5411.85059 ± 35.647
2025-09-16 14:20:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5420.8247, 5430.2153, 5384.91, 5419.0977, 5399.7334, 5459.7783, 5360.079, 5439.113, 5350.11, 5454.6465]
2025-09-16 14:20:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:20:46,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 46 seconds)
2025-09-16 14:22:41,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:22:56,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5383.21973 ± 12.198
2025-09-16 14:22:56,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5362.6274, 5380.081, 5411.3374, 5378.7393, 5374.7974, 5387.2334, 5377.075, 5383.726, 5382.501, 5394.076]
2025-09-16 14:22:56,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:22:56,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 16 seconds)
2025-09-16 14:24:51,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:25:05,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5417.88086 ± 18.350
2025-09-16 14:25:05,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5408.335, 5425.9126, 5424.1143, 5402.897, 5442.1914, 5408.4927, 5384.395, 5422.5605, 5450.381, 5409.529]
2025-09-16 14:25:05,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:25:05,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 32 minutes, 52 seconds)
2025-09-16 14:27:13,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:27:28,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5406.15430 ± 9.340
2025-09-16 14:27:28,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5409.241, 5414.4243, 5399.944, 5426.6035, 5412.92, 5400.6875, 5396.3247, 5396.3535, 5397.887, 5407.1553]
2025-09-16 14:27:28,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:27:28,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 14 seconds)
2025-09-16 14:29:26,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:29:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5113.15186 ± 8.825
2025-09-16 14:29:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5108.4087, 5099.9956, 5116.173, 5112.945, 5106.771, 5112.773, 5119.355, 5102.909, 5120.864, 5131.328]
2025-09-16 14:29:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:29:41,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 4 seconds)
2025-09-16 14:31:30,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:31:44,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5365.05859 ± 15.272
2025-09-16 14:31:44,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5379.7305, 5384.841, 5343.4185, 5354.0034, 5378.9927, 5361.9614, 5371.3354, 5367.0337, 5336.319, 5372.9473]
2025-09-16 14:31:44,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:31:44,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 20 seconds)
2025-09-16 14:33:52,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:34:07,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5174.66113 ± 19.353
2025-09-16 14:34:07,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5155.4907, 5170.967, 5176.4834, 5167.974, 5182.2856, 5151.238, 5225.322, 5173.0137, 5164.0103, 5179.8223]
2025-09-16 14:34:07,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:34:07,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 37 seconds)
2025-09-16 14:36:06,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:36:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5195.48535 ± 13.919
2025-09-16 14:36:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5205.3706, 5189.3916, 5207.323, 5192.1934, 5182.361, 5189.1455, 5186.9224, 5224.671, 5174.1963, 5203.2856]
2025-09-16 14:36:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:36:21,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 31 seconds)
2025-09-16 14:38:19,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:38:34,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5444.74463 ± 30.878
2025-09-16 14:38:34,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5445.8574, 5453.426, 5464.8833, 5437.643, 5441.621, 5357.842, 5476.054, 5452.3994, 5460.0874, 5457.6313]
2025-09-16 14:38:34,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:38:34,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 58 seconds)
2025-09-16 14:40:24,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:40:38,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5429.12842 ± 32.447
2025-09-16 14:40:38,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5459.7217, 5354.656, 5394.0176, 5462.9844, 5450.9385, 5453.042, 5429.3364, 5428.0117, 5411.7095, 5446.8687]
2025-09-16 14:40:38,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:40:38,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 31 seconds)
2025-09-16 14:42:40,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:42:54,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5400.02832 ± 12.556
2025-09-16 14:42:54,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5405.524, 5409.6064, 5389.6836, 5382.7017, 5399.191, 5406.5083, 5424.602, 5407.6104, 5383.85, 5391.0005]
2025-09-16 14:42:54,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:42:55,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 38 seconds)
2025-09-16 14:44:53,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:45:05,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4671.90527 ± 1271.686
2025-09-16 14:45:05,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5436.2603, 5384.9053, 5329.837, 1464.3608, 5387.423, 3109.3064, 5324.7324, 5374.9707, 4550.432, 5356.8193]
2025-09-16 14:45:05,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 264.0, 1000.0, 576.0, 1000.0, 1000.0, 840.0, 1000.0]
2025-09-16 14:45:06,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 9 seconds)
2025-09-16 14:47:04,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:47:18,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5376.41357 ± 20.537
2025-09-16 14:47:18,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5397.8066, 5379.254, 5370.1577, 5336.448, 5369.3584, 5413.7856, 5377.754, 5384.2793, 5352.6157, 5382.675]
2025-09-16 14:47:18,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:47:18,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 57 seconds)
2025-09-16 14:49:18,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:49:33,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5285.08301 ± 16.002
2025-09-16 14:49:33,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5284.889, 5277.622, 5268.295, 5254.862, 5291.984, 5275.318, 5287.6577, 5300.477, 5296.4775, 5313.249]
2025-09-16 14:49:33,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:49:33,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 47 seconds)
2025-09-16 14:51:32,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:51:46,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5276.50537 ± 50.629
2025-09-16 14:51:46,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5204.776, 5244.003, 5247.536, 5210.3096, 5365.4526, 5322.6855, 5254.6035, 5277.025, 5316.3984, 5322.2676]
2025-09-16 14:51:46,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:51:46,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 40 seconds)
2025-09-16 14:53:45,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:54:00,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5202.87158 ± 7.561
2025-09-16 14:54:00,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5201.61, 5205.651, 5202.9307, 5202.338, 5218.2754, 5192.0137, 5195.0864, 5213.0044, 5201.366, 5196.4414]
2025-09-16 14:54:00,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:54:00,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 26 seconds)
2025-09-16 14:55:58,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:56:13,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 5342.18311 ± 21.948
2025-09-16 14:56:13,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5322.693, 5298.3643, 5343.429, 5339.6333, 5322.6753, 5347.195, 5350.6553, 5375.073, 5371.959, 5350.154]
2025-09-16 14:56:13,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:56:13,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 13 seconds)
2025-09-16 14:58:10,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:58:20,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1221 [DEBUG]: Total Reward: 4165.87109 ± 1471.291
2025-09-16 14:58:20,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1222 [DEBUG]: All rewards: [5524.4736, 5257.8296, 4481.705, 1601.9081, 5506.468, 3229.759, 2287.178, 2729.3433, 5543.2007, 5496.848]
2025-09-16 14:58:20,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 953.0, 822.0, 293.0, 1000.0, 586.0, 415.0, 538.0, 1000.0, 1000.0]
2025-09-16 14:58:20,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-humanoid):1251 [DEBUG]: Training session finished
