2025-09-16 12:15:12,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.100-delay_12
2025-09-16 12:15:12,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.100-delay_12
2025-09-16 12:15:12,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'12': <latency_env.delayed_mdp.ConstantDelay object at 0x15468d7d0890>}
2025-09-16 12:15:12,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:15:12,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:15:12,047 baseline-bpql-noisepromille100-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=580, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:15:12,047 baseline-bpql-noisepromille100-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:15:13,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:15:13,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:16:58,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:16:59,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 311.11682 ± 23.436
2025-09-16 12:16:59,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [359.57025, 295.27936, 303.38498, 315.18295, 266.7733, 309.8374, 337.71378, 303.9879, 316.32507, 303.1131]
2025-09-16 12:16:59,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 55.0, 58.0, 59.0, 50.0, 59.0, 66.0, 59.0, 60.0, 57.0]
2025-09-16 12:16:59,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (311.12) for latency 12
2025-09-16 12:16:59,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 54 minutes, 16 seconds)
2025-09-16 12:18:51,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:18:52,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 302.22687 ± 58.318
2025-09-16 12:18:52,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [298.1913, 332.11346, 378.0012, 233.62355, 268.40887, 388.32492, 340.3515, 304.67606, 287.8371, 190.74037]
2025-09-16 12:18:52,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 61.0, 75.0, 49.0, 53.0, 73.0, 64.0, 59.0, 58.0, 40.0]
2025-09-16 12:18:52,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 58 minutes, 30 seconds)
2025-09-16 12:20:48,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:20:48,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 310.99713 ± 120.849
2025-09-16 12:20:48,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [285.46317, 472.25406, 332.3794, 141.08783, 157.31543, 315.5928, 143.0207, 434.0362, 443.39252, 385.42947]
2025-09-16 12:20:48,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 91.0, 64.0, 28.0, 31.0, 62.0, 28.0, 82.0, 83.0, 72.0]
2025-09-16 12:20:48,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 39 seconds)
2025-09-16 12:22:44,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:22:45,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 365.14584 ± 77.564
2025-09-16 12:22:45,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [434.85535, 326.71783, 305.2842, 431.90222, 461.81583, 286.49228, 395.33163, 202.73682, 409.32428, 396.99802]
2025-09-16 12:22:45,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 61.0, 56.0, 80.0, 93.0, 53.0, 73.0, 42.0, 74.0, 74.0]
2025-09-16 12:22:45,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (365.15) for latency 12
2025-09-16 12:22:45,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 43 seconds)
2025-09-16 12:24:40,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:24:41,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 427.42352 ± 157.068
2025-09-16 12:24:41,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [368.0865, 326.00766, 328.76492, 303.64578, 371.17172, 382.28815, 871.3018, 417.8985, 498.69458, 406.3759]
2025-09-16 12:24:41,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 62.0, 69.0, 55.0, 68.0, 73.0, 167.0, 76.0, 104.0, 76.0]
2025-09-16 12:24:41,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (427.42) for latency 12
2025-09-16 12:24:41,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 59 minutes, 49 seconds)
2025-09-16 12:26:37,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:26:38,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 373.23254 ± 75.877
2025-09-16 12:26:38,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [250.0015, 458.42676, 384.34207, 350.01202, 297.86127, 321.92206, 316.76138, 505.3558, 437.77206, 409.87067]
2025-09-16 12:26:38,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 93.0, 84.0, 65.0, 56.0, 69.0, 58.0, 99.0, 81.0, 77.0]
2025-09-16 12:26:38,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 1 minute, 25 seconds)
2025-09-16 12:28:33,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:28:34,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 463.16376 ± 87.584
2025-09-16 12:28:34,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [636.89087, 392.32605, 417.464, 497.2333, 419.00732, 475.63135, 354.1956, 467.40005, 593.54913, 377.94012]
2025-09-16 12:28:34,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 73.0, 75.0, 93.0, 77.0, 106.0, 65.0, 88.0, 112.0, 69.0]
2025-09-16 12:28:34,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (463.16) for latency 12
2025-09-16 12:28:34,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 25 seconds)
2025-09-16 12:30:27,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:30:28,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 402.23737 ± 80.510
2025-09-16 12:30:28,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [399.443, 470.6531, 343.86838, 554.54004, 397.35327, 404.24258, 484.03122, 394.2603, 296.50964, 277.4725]
2025-09-16 12:30:28,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 90.0, 69.0, 105.0, 74.0, 75.0, 104.0, 74.0, 56.0, 56.0]
2025-09-16 12:30:28,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 57 minutes, 44 seconds)
2025-09-16 12:32:21,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:32:22,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 343.15021 ± 129.008
2025-09-16 12:32:22,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [320.5263, 484.79825, 348.3926, 348.78384, 346.98386, 258.57007, 143.16492, 155.39273, 567.47876, 457.4106]
2025-09-16 12:32:22,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 90.0, 62.0, 67.0, 63.0, 51.0, 28.0, 30.0, 107.0, 89.0]
2025-09-16 12:32:22,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 55 minutes, 8 seconds)
2025-09-16 12:34:15,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:34:17,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 399.92963 ± 118.516
2025-09-16 12:34:17,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [386.1373, 352.0062, 389.47308, 434.929, 654.5516, 356.60602, 463.59216, 467.58334, 332.80173, 161.61539]
2025-09-16 12:34:17,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 65.0, 84.0, 82.0, 127.0, 69.0, 87.0, 89.0, 65.0, 31.0]
2025-09-16 12:34:17,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 39 seconds)
2025-09-16 12:36:09,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:36:10,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 353.60797 ± 100.940
2025-09-16 12:36:10,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [146.53882, 424.16208, 410.53873, 193.05956, 330.9226, 368.13266, 415.42004, 470.34924, 435.42905, 341.52698]
2025-09-16 12:36:10,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 91.0, 87.0, 37.0, 63.0, 70.0, 83.0, 88.0, 82.0, 65.0]
2025-09-16 12:36:10,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 49 minutes, 52 seconds)
2025-09-16 12:38:04,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:38:05,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 407.28152 ± 126.983
2025-09-16 12:38:05,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [351.4348, 522.20013, 442.33865, 352.47974, 386.78314, 585.81506, 529.94244, 288.26944, 135.04385, 478.5078]
2025-09-16 12:38:05,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 104.0, 82.0, 75.0, 83.0, 131.0, 107.0, 56.0, 26.0, 91.0]
2025-09-16 12:38:05,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 47 minutes, 41 seconds)
2025-09-16 12:40:00,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:40:01,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 393.42529 ± 104.058
2025-09-16 12:40:01,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [401.98636, 371.89154, 138.68135, 558.3805, 444.4161, 324.41342, 454.90622, 369.74573, 419.2352, 450.59674]
2025-09-16 12:40:01,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 82.0, 27.0, 114.0, 83.0, 61.0, 95.0, 72.0, 80.0, 89.0]
2025-09-16 12:40:01,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 46 minutes, 6 seconds)
2025-09-16 12:41:54,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:41:55,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 497.90015 ± 92.429
2025-09-16 12:41:55,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [373.28973, 497.59302, 683.3628, 429.24, 421.70444, 605.1136, 524.38617, 570.7991, 436.62186, 436.89032]
2025-09-16 12:41:55,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 90.0, 136.0, 79.0, 79.0, 117.0, 95.0, 104.0, 85.0, 86.0]
2025-09-16 12:41:55,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (497.90) for latency 12
2025-09-16 12:41:55,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 44 minutes, 18 seconds)
2025-09-16 12:43:49,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:43:50,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 476.45923 ± 111.002
2025-09-16 12:43:50,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [445.94693, 627.3645, 369.02817, 547.3315, 435.7838, 335.9759, 697.2888, 386.62704, 414.04773, 505.19778]
2025-09-16 12:43:50,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 131.0, 68.0, 112.0, 79.0, 61.0, 128.0, 87.0, 84.0, 93.0]
2025-09-16 12:43:50,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 42 minutes, 34 seconds)
2025-09-16 12:45:45,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:45:46,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 405.32629 ± 161.014
2025-09-16 12:45:46,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [551.0011, 514.323, 135.01389, 574.50604, 353.6869, 573.33624, 432.36926, 343.60693, 462.8699, 112.54992]
2025-09-16 12:45:46,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 108.0, 26.0, 116.0, 72.0, 105.0, 83.0, 67.0, 98.0, 22.0]
2025-09-16 12:45:46,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 41 minutes, 9 seconds)
2025-09-16 12:47:40,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:47:41,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 442.35724 ± 123.041
2025-09-16 12:47:41,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [365.7406, 384.36633, 755.58636, 500.19135, 508.83044, 330.5239, 361.13562, 337.242, 495.2891, 384.6663]
2025-09-16 12:47:41,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 72.0, 144.0, 92.0, 98.0, 63.0, 68.0, 74.0, 94.0, 75.0]
2025-09-16 12:47:41,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 39 minutes, 22 seconds)
2025-09-16 12:49:36,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:49:37,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 455.34082 ± 73.253
2025-09-16 12:49:37,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [301.80658, 581.4835, 509.7361, 523.26715, 417.41226, 476.65115, 388.41574, 447.46457, 462.5874, 444.58328]
2025-09-16 12:49:37,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 107.0, 99.0, 98.0, 78.0, 88.0, 73.0, 83.0, 90.0, 87.0]
2025-09-16 12:49:37,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 37 minutes, 33 seconds)
2025-09-16 12:51:31,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:51:32,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 449.75372 ± 167.357
2025-09-16 12:51:32,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [715.0988, 425.69058, 129.98978, 588.4658, 629.8383, 400.89246, 377.30417, 578.7046, 316.53165, 335.02115]
2025-09-16 12:51:32,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 79.0, 25.0, 124.0, 120.0, 74.0, 69.0, 110.0, 66.0, 63.0]
2025-09-16 12:51:32,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 35 minutes, 47 seconds)
2025-09-16 12:53:27,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:53:28,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 440.78204 ± 106.295
2025-09-16 12:53:28,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [711.1115, 478.98328, 291.42752, 340.39233, 425.25253, 488.24817, 434.58313, 422.13855, 430.36566, 385.3177]
2025-09-16 12:53:28,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 88.0, 64.0, 63.0, 93.0, 90.0, 83.0, 87.0, 78.0, 72.0]
2025-09-16 12:53:28,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 34 minutes, 4 seconds)
2025-09-16 12:55:22,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:55:23,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 511.94263 ± 65.619
2025-09-16 12:55:23,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [443.63492, 484.47104, 588.8284, 579.19434, 404.26712, 525.1763, 588.9463, 529.2056, 551.2812, 424.4215]
2025-09-16 12:55:23,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 88.0, 111.0, 109.0, 73.0, 95.0, 109.0, 94.0, 102.0, 78.0]
2025-09-16 12:55:23,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (511.94) for latency 12
2025-09-16 12:55:23,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 32 minutes, 4 seconds)
2025-09-16 12:57:18,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:57:19,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 583.24396 ± 324.786
2025-09-16 12:57:19,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1467.665, 398.92148, 473.6455, 471.62082, 163.66187, 609.7661, 440.56458, 632.94977, 672.2733, 501.37036]
2025-09-16 12:57:19,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [284.0, 73.0, 99.0, 89.0, 31.0, 113.0, 80.0, 119.0, 130.0, 98.0]
2025-09-16 12:57:19,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (583.24) for latency 12
2025-09-16 12:57:19,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 11 seconds)
2025-09-16 12:59:14,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:59:16,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 515.67078 ± 83.528
2025-09-16 12:59:16,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [490.95987, 373.93497, 559.4884, 493.50812, 406.45572, 611.3206, 640.14056, 596.854, 457.52438, 526.52167]
2025-09-16 12:59:16,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 70.0, 121.0, 93.0, 76.0, 116.0, 121.0, 113.0, 85.0, 99.0]
2025-09-16 12:59:16,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 31 seconds)
2025-09-16 13:01:09,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:01:11,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 554.57922 ± 147.219
2025-09-16 13:01:11,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [505.50342, 850.12354, 456.0287, 352.37054, 412.59656, 491.3628, 679.5583, 530.378, 742.1523, 525.7181]
2025-09-16 13:01:11,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 160.0, 87.0, 67.0, 77.0, 94.0, 129.0, 100.0, 159.0, 108.0]
2025-09-16 13:01:11,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 33 seconds)
2025-09-16 13:03:05,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:03:06,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 525.73645 ± 106.930
2025-09-16 13:03:06,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [466.85965, 503.0234, 466.33304, 428.28995, 478.33777, 398.9623, 720.1407, 591.1841, 714.52216, 489.7116]
2025-09-16 13:03:06,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 94.0, 86.0, 78.0, 87.0, 73.0, 136.0, 129.0, 136.0, 103.0]
2025-09-16 13:03:06,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 29 seconds)
2025-09-16 13:04:58,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:05:00,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 560.93665 ± 190.427
2025-09-16 13:05:00,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [470.3835, 611.9169, 519.8433, 499.3847, 1050.1567, 732.588, 371.00385, 448.87158, 400.43582, 504.78168]
2025-09-16 13:05:00,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 122.0, 96.0, 101.0, 222.0, 155.0, 72.0, 83.0, 74.0, 93.0]
2025-09-16 13:05:00,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 22 minutes, 8 seconds)
2025-09-16 13:06:53,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:06:54,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 441.99332 ± 53.156
2025-09-16 13:06:54,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [476.2279, 495.65207, 494.73026, 500.41888, 415.84363, 454.9891, 365.3934, 358.65314, 473.77197, 384.25278]
2025-09-16 13:06:54,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 88.0, 92.0, 94.0, 76.0, 82.0, 67.0, 69.0, 86.0, 71.0]
2025-09-16 13:06:54,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 19 minutes, 52 seconds)
2025-09-16 13:08:47,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:08:49,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 572.01306 ± 149.221
2025-09-16 13:08:49,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [416.07434, 518.61597, 511.2639, 500.60806, 563.7452, 653.47345, 589.64105, 588.7747, 413.23193, 964.7024]
2025-09-16 13:08:49,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 94.0, 94.0, 92.0, 106.0, 122.0, 116.0, 124.0, 74.0, 182.0]
2025-09-16 13:08:49,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 17 minutes, 27 seconds)
2025-09-16 13:10:42,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:10:44,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 482.71280 ± 102.855
2025-09-16 13:10:44,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [308.15942, 485.90024, 345.36856, 498.36758, 475.9484, 470.77798, 450.9672, 536.5784, 556.24866, 698.81177]
2025-09-16 13:10:44,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 90.0, 62.0, 107.0, 89.0, 88.0, 82.0, 99.0, 105.0, 140.0]
2025-09-16 13:10:44,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 15 minutes, 32 seconds)
2025-09-16 13:12:37,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:12:39,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 661.75500 ± 166.071
2025-09-16 13:12:39,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [728.26373, 623.11035, 710.5915, 1093.5264, 590.31665, 442.8677, 651.3311, 645.36475, 634.5411, 497.63672]
2025-09-16 13:12:39,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 117.0, 135.0, 223.0, 128.0, 83.0, 119.0, 117.0, 120.0, 92.0]
2025-09-16 13:12:39,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (661.76) for latency 12
2025-09-16 13:12:39,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 13 minutes, 37 seconds)
2025-09-16 13:14:32,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:14:33,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 414.31281 ± 172.389
2025-09-16 13:14:33,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [527.8179, 568.62463, 532.46405, 141.77805, 345.03052, 487.921, 673.5293, 322.96994, 422.65955, 120.33324]
2025-09-16 13:14:33,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 121.0, 99.0, 27.0, 63.0, 99.0, 143.0, 64.0, 81.0, 23.0]
2025-09-16 13:14:33,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 11 minutes, 51 seconds)
2025-09-16 13:16:26,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:16:27,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 535.48438 ± 125.527
2025-09-16 13:16:27,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [489.67972, 745.384, 660.4939, 723.152, 432.21216, 466.8832, 552.7395, 404.66815, 508.66846, 370.96304]
2025-09-16 13:16:27,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 166.0, 133.0, 137.0, 79.0, 85.0, 102.0, 77.0, 109.0, 70.0]
2025-09-16 13:16:27,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 9 minutes, 54 seconds)
2025-09-16 13:18:22,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:18:23,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 534.44476 ± 85.843
2025-09-16 13:18:23,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [619.37274, 541.38464, 596.7316, 558.68256, 369.14972, 573.7649, 543.6742, 623.5145, 376.2691, 541.9045]
2025-09-16 13:18:23,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 100.0, 108.0, 102.0, 72.0, 119.0, 98.0, 132.0, 77.0, 102.0]
2025-09-16 13:18:23,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 8 minutes, 17 seconds)
2025-09-16 13:20:16,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:20:18,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 601.03534 ± 114.065
2025-09-16 13:20:18,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [791.1543, 451.60568, 660.08704, 583.1817, 742.3297, 643.5961, 423.41623, 482.6831, 620.66, 611.6394]
2025-09-16 13:20:18,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 84.0, 124.0, 108.0, 140.0, 119.0, 79.0, 91.0, 117.0, 111.0]
2025-09-16 13:20:18,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 6 minutes, 17 seconds)
2025-09-16 13:22:10,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:22:12,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 600.49658 ± 113.353
2025-09-16 13:22:12,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [628.3188, 567.95435, 600.7206, 621.92694, 744.5448, 730.52423, 371.34143, 693.6791, 610.51495, 435.44067]
2025-09-16 13:22:12,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 104.0, 111.0, 130.0, 155.0, 142.0, 68.0, 130.0, 113.0, 84.0]
2025-09-16 13:22:12,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 4 minutes, 12 seconds)
2025-09-16 13:24:06,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:24:08,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 577.84247 ± 212.031
2025-09-16 13:24:08,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [869.279, 125.26918, 746.85315, 427.23947, 618.3671, 478.96902, 619.34955, 661.3517, 409.04114, 822.7057]
2025-09-16 13:24:08,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 24.0, 136.0, 77.0, 130.0, 89.0, 111.0, 120.0, 76.0, 173.0]
2025-09-16 13:24:08,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes, 40 seconds)
2025-09-16 13:26:00,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:26:02,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 612.80188 ± 100.575
2025-09-16 13:26:02,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [582.4238, 619.6551, 836.72186, 597.08026, 692.1725, 571.9472, 496.3971, 562.7871, 694.3348, 474.49878]
2025-09-16 13:26:02,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 114.0, 159.0, 108.0, 132.0, 110.0, 90.0, 103.0, 129.0, 89.0]
2025-09-16 13:26:02,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 45 seconds)
2025-09-16 13:27:56,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:27:57,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 485.56436 ± 155.519
2025-09-16 13:27:57,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [586.04694, 372.7476, 504.99683, 708.4964, 129.56345, 480.8221, 521.99274, 588.8349, 358.24933, 603.89355]
2025-09-16 13:27:57,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 73.0, 91.0, 138.0, 25.0, 88.0, 104.0, 122.0, 67.0, 123.0]
2025-09-16 13:27:57,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 58 minutes, 42 seconds)
2025-09-16 13:29:52,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:29:54,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 767.36853 ± 257.159
2025-09-16 13:29:54,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1178.217, 537.72534, 864.67065, 477.66983, 793.137, 505.5843, 977.38135, 1108.2856, 803.56396, 427.45016]
2025-09-16 13:29:54,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 100.0, 167.0, 87.0, 155.0, 93.0, 192.0, 220.0, 150.0, 77.0]
2025-09-16 13:29:54,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (767.37) for latency 12
2025-09-16 13:29:54,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 57 minutes, 7 seconds)
2025-09-16 13:31:47,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:31:49,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 637.48743 ± 121.731
2025-09-16 13:31:49,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [680.8962, 575.5809, 627.25665, 602.17377, 469.40735, 884.10297, 699.17694, 459.26865, 763.57874, 613.432]
2025-09-16 13:31:49,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 104.0, 113.0, 109.0, 92.0, 164.0, 132.0, 84.0, 140.0, 113.0]
2025-09-16 13:31:49,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 55 minutes, 24 seconds)
2025-09-16 13:33:43,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:33:45,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 603.75153 ± 131.642
2025-09-16 13:33:45,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [442.85806, 712.1554, 594.86096, 493.8268, 753.68854, 469.6622, 508.66153, 632.87744, 558.5281, 870.3962]
2025-09-16 13:33:45,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 141.0, 112.0, 88.0, 141.0, 86.0, 102.0, 116.0, 102.0, 164.0]
2025-09-16 13:33:45,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 53 minutes, 25 seconds)
2025-09-16 13:35:38,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:35:40,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 606.20544 ± 163.308
2025-09-16 13:35:40,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [562.08575, 819.1986, 447.5667, 616.96454, 238.47327, 808.56433, 612.71924, 577.88416, 729.73083, 648.8663]
2025-09-16 13:35:40,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 160.0, 84.0, 130.0, 46.0, 149.0, 123.0, 122.0, 137.0, 122.0]
2025-09-16 13:35:40,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 51 minutes, 41 seconds)
2025-09-16 13:37:33,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:37:34,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 595.10291 ± 130.422
2025-09-16 13:37:34,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [589.7177, 461.5527, 819.47327, 602.2195, 727.8586, 588.4875, 509.858, 333.2085, 665.231, 653.422]
2025-09-16 13:37:34,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 87.0, 154.0, 114.0, 133.0, 111.0, 92.0, 62.0, 137.0, 128.0]
2025-09-16 13:37:34,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 49 minutes, 37 seconds)
2025-09-16 13:39:27,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:39:29,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 587.88464 ± 97.705
2025-09-16 13:39:29,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [508.64636, 499.30518, 657.4353, 514.546, 546.97723, 628.6581, 481.1532, 817.59485, 647.1862, 577.3445]
2025-09-16 13:39:29,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 92.0, 119.0, 92.0, 101.0, 116.0, 88.0, 161.0, 126.0, 112.0]
2025-09-16 13:39:29,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 47 minutes, 20 seconds)
2025-09-16 13:41:23,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:41:24,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 617.63916 ± 134.838
2025-09-16 13:41:24,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [872.1224, 716.08246, 540.28973, 403.22247, 596.36163, 617.87756, 565.6343, 783.3894, 621.12805, 460.2836]
2025-09-16 13:41:24,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 142.0, 123.0, 73.0, 107.0, 113.0, 104.0, 145.0, 123.0, 99.0]
2025-09-16 13:41:24,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 45 minutes, 29 seconds)
2025-09-16 13:43:18,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:43:19,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 611.27307 ± 267.327
2025-09-16 13:43:19,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [522.88196, 566.83356, 573.45807, 136.72896, 818.94855, 447.20233, 1208.3163, 781.0273, 596.6952, 460.638]
2025-09-16 13:43:19,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 103.0, 107.0, 26.0, 154.0, 88.0, 225.0, 141.0, 108.0, 102.0]
2025-09-16 13:43:19,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 43 minutes, 26 seconds)
2025-09-16 13:45:14,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:45:15,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 641.58881 ± 195.172
2025-09-16 13:45:15,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [542.8998, 524.0367, 625.20886, 670.05994, 478.95578, 531.5452, 694.0225, 625.8056, 1191.7134, 531.64056]
2025-09-16 13:45:15,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 96.0, 114.0, 141.0, 88.0, 96.0, 127.0, 112.0, 223.0, 96.0]
2025-09-16 13:45:15,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 41 minutes, 42 seconds)
2025-09-16 13:47:09,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:47:11,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 756.29535 ± 208.070
2025-09-16 13:47:11,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [793.67236, 745.9151, 525.9648, 718.7277, 926.68756, 572.9775, 1270.1932, 720.89453, 752.9123, 535.00867]
2025-09-16 13:47:11,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 142.0, 103.0, 159.0, 172.0, 104.0, 251.0, 131.0, 147.0, 116.0]
2025-09-16 13:47:11,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 40 minutes)
2025-09-16 13:49:04,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:49:06,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 636.62665 ± 257.835
2025-09-16 13:49:06,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [923.04565, 839.13745, 981.0619, 417.53278, 748.88025, 124.503334, 520.8368, 681.7863, 757.6796, 371.8026]
2025-09-16 13:49:06,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 155.0, 177.0, 79.0, 138.0, 24.0, 102.0, 145.0, 158.0, 69.0]
2025-09-16 13:49:06,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 38 minutes, 6 seconds)
2025-09-16 13:50:59,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:51:01,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 808.31329 ± 107.333
2025-09-16 13:51:01,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [734.52985, 737.9511, 740.2735, 866.0759, 919.30225, 986.11505, 726.11694, 730.0301, 680.50507, 962.23334]
2025-09-16 13:51:01,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 135.0, 135.0, 175.0, 161.0, 182.0, 145.0, 134.0, 122.0, 188.0]
2025-09-16 13:51:01,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (808.31) for latency 12
2025-09-16 13:51:01,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 36 minutes, 7 seconds)
2025-09-16 13:52:54,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:52:56,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 567.76666 ± 147.004
2025-09-16 13:52:56,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [427.00354, 483.58887, 479.2166, 799.8858, 658.7355, 720.0275, 410.89148, 549.70526, 762.72076, 385.89124]
2025-09-16 13:52:56,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 105.0, 86.0, 159.0, 122.0, 156.0, 72.0, 103.0, 140.0, 71.0]
2025-09-16 13:52:56,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 34 minutes, 10 seconds)
2025-09-16 13:54:51,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:54:53,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 656.54291 ± 186.818
2025-09-16 13:54:53,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [801.485, 616.4145, 732.2727, 621.7757, 1131.6597, 613.5976, 540.59753, 441.379, 561.95154, 504.2959]
2025-09-16 13:54:53,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 112.0, 150.0, 113.0, 211.0, 132.0, 117.0, 99.0, 106.0, 92.0]
2025-09-16 13:54:53,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 22 seconds)
2025-09-16 13:56:47,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:56:48,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 639.86694 ± 153.866
2025-09-16 13:56:48,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [513.82416, 635.086, 907.48425, 843.19415, 584.35614, 612.5181, 335.2804, 702.28265, 692.59717, 572.0463]
2025-09-16 13:56:48,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 120.0, 168.0, 154.0, 111.0, 117.0, 62.0, 129.0, 129.0, 107.0]
2025-09-16 13:56:48,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 22 seconds)
2025-09-16 13:58:41,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:58:43,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 717.68292 ± 228.057
2025-09-16 13:58:43,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1317.1375, 796.2481, 429.23813, 687.6828, 682.6494, 630.4309, 538.77765, 796.4022, 565.3524, 732.9099]
2025-09-16 13:58:43,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [261.0, 151.0, 84.0, 146.0, 131.0, 130.0, 102.0, 174.0, 103.0, 147.0]
2025-09-16 13:58:43,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes, 32 seconds)
2025-09-16 14:00:37,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:00:39,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 787.02527 ± 302.662
2025-09-16 14:00:39,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [623.9442, 1191.2241, 505.51315, 1015.37933, 130.55249, 902.3087, 1027.3411, 1067.1575, 714.0059, 692.8259]
2025-09-16 14:00:39,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 220.0, 109.0, 191.0, 25.0, 173.0, 187.0, 202.0, 128.0, 141.0]
2025-09-16 14:00:39,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 26 minutes, 42 seconds)
2025-09-16 14:02:35,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:02:36,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 702.24554 ± 213.281
2025-09-16 14:02:36,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [533.6685, 796.31635, 567.2543, 1218.7701, 857.03033, 611.27386, 794.8079, 634.8114, 437.72058, 570.80206]
2025-09-16 14:02:36,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 149.0, 105.0, 225.0, 166.0, 131.0, 145.0, 116.0, 80.0, 107.0]
2025-09-16 14:02:36,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 25 minutes, 8 seconds)
2025-09-16 14:04:28,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:04:30,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 904.95087 ± 364.377
2025-09-16 14:04:30,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1343.5913, 472.50302, 1236.6453, 547.4017, 1496.7043, 811.24347, 1239.5876, 618.57367, 738.5929, 544.66547]
2025-09-16 14:04:30,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 87.0, 227.0, 118.0, 288.0, 152.0, 228.0, 127.0, 134.0, 99.0]
2025-09-16 14:04:30,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (904.95) for latency 12
2025-09-16 14:04:30,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 22 minutes, 46 seconds)
2025-09-16 14:06:25,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:06:27,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 841.20526 ± 541.045
2025-09-16 14:06:27,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [569.8435, 600.03674, 2310.6135, 786.26746, 628.93024, 733.07477, 970.50397, 101.989876, 774.7595, 936.033]
2025-09-16 14:06:27,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 113.0, 438.0, 146.0, 117.0, 139.0, 193.0, 20.0, 136.0, 170.0]
2025-09-16 14:06:27,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 21 minutes)
2025-09-16 14:08:20,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:08:23,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 981.51562 ± 314.176
2025-09-16 14:08:23,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1043.749, 1106.0619, 1771.1013, 561.1223, 970.1423, 990.3662, 648.45935, 1069.7701, 849.01654, 805.3675]
2025-09-16 14:08:23,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 203.0, 328.0, 121.0, 177.0, 177.0, 121.0, 198.0, 154.0, 157.0]
2025-09-16 14:08:23,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (981.52) for latency 12
2025-09-16 14:08:23,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 19 minutes, 11 seconds)
2025-09-16 14:10:17,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:10:19,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 873.27716 ± 480.788
2025-09-16 14:10:19,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1565.3077, 587.84503, 405.8914, 539.3003, 1855.2417, 1019.4915, 633.73914, 738.76636, 309.1942, 1077.994]
2025-09-16 14:10:19,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [304.0, 124.0, 73.0, 97.0, 356.0, 192.0, 133.0, 141.0, 56.0, 207.0]
2025-09-16 14:10:19,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 17 minutes, 18 seconds)
2025-09-16 14:12:14,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:12:16,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 901.95349 ± 368.802
2025-09-16 14:12:16,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [791.0816, 1381.631, 363.07947, 1256.4412, 662.0171, 667.85675, 764.13556, 584.2514, 1584.2036, 964.8371]
2025-09-16 14:12:16,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 257.0, 66.0, 255.0, 121.0, 120.0, 141.0, 108.0, 308.0, 172.0]
2025-09-16 14:12:16,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 15 minutes, 19 seconds)
2025-09-16 14:14:11,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:14:13,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 780.79303 ± 220.859
2025-09-16 14:14:13,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [704.02106, 677.8889, 456.73306, 1028.624, 827.4931, 549.82184, 789.43567, 1175.2599, 1014.9329, 583.7196]
2025-09-16 14:14:13,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 135.0, 91.0, 191.0, 147.0, 101.0, 143.0, 210.0, 184.0, 105.0]
2025-09-16 14:14:13,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 13 minutes, 45 seconds)
2025-09-16 14:16:06,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:16:08,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 958.14954 ± 284.381
2025-09-16 14:16:08,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [770.2093, 1185.1381, 1363.698, 1008.41547, 617.3472, 644.6892, 711.061, 1281.7947, 706.9545, 1292.1874]
2025-09-16 14:16:08,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 221.0, 260.0, 204.0, 113.0, 121.0, 130.0, 238.0, 133.0, 240.0]
2025-09-16 14:16:08,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 11 minutes, 44 seconds)
2025-09-16 14:18:02,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:18:04,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 892.11426 ± 211.382
2025-09-16 14:18:04,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [611.45056, 647.0939, 1131.6116, 936.1383, 860.1181, 835.913, 1178.2065, 781.09863, 706.783, 1232.7286]
2025-09-16 14:18:04,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 121.0, 210.0, 181.0, 154.0, 158.0, 223.0, 143.0, 133.0, 225.0]
2025-09-16 14:18:04,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 9 minutes, 48 seconds)
2025-09-16 14:20:01,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:20:04,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1033.27710 ± 355.407
2025-09-16 14:20:04,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1858.2297, 749.6509, 1087.2698, 857.51514, 1429.1431, 672.1536, 997.08826, 800.17255, 707.89594, 1173.6537]
2025-09-16 14:20:04,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [348.0, 154.0, 205.0, 176.0, 278.0, 148.0, 212.0, 171.0, 145.0, 224.0]
2025-09-16 14:20:04,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1033.28) for latency 12
2025-09-16 14:20:04,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 16 seconds)
2025-09-16 14:21:57,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:21:59,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 874.55566 ± 319.679
2025-09-16 14:21:59,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1323.8191, 1356.2697, 1340.9159, 653.599, 513.6595, 749.8736, 678.4078, 883.023, 549.3668, 696.6221]
2025-09-16 14:21:59,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [247.0, 254.0, 251.0, 121.0, 96.0, 137.0, 124.0, 167.0, 117.0, 131.0]
2025-09-16 14:21:59,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 6 seconds)
2025-09-16 14:23:53,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:23:56,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 971.73029 ± 324.450
2025-09-16 14:23:56,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [995.6935, 1192.3312, 861.16077, 1548.8896, 1020.6302, 323.95782, 808.0775, 1312.4813, 987.25684, 666.82434]
2025-09-16 14:23:56,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 222.0, 169.0, 284.0, 186.0, 64.0, 143.0, 248.0, 182.0, 122.0]
2025-09-16 14:23:56,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 8 seconds)
2025-09-16 14:25:52,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:25:54,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1035.72534 ± 318.071
2025-09-16 14:25:54,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [850.683, 1223.6803, 859.06055, 576.57745, 839.66364, 1294.0181, 596.76324, 1575.7872, 1283.521, 1257.4998]
2025-09-16 14:25:54,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 224.0, 164.0, 106.0, 156.0, 229.0, 120.0, 295.0, 238.0, 262.0]
2025-09-16 14:25:54,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1035.73) for latency 12
2025-09-16 14:25:54,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 30 seconds)
2025-09-16 14:27:48,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:27:51,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 934.51184 ± 232.411
2025-09-16 14:27:51,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [995.34546, 1140.5842, 1187.6855, 879.4192, 606.78723, 545.7211, 770.0462, 891.81323, 1294.6614, 1033.0544]
2025-09-16 14:27:51,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 236.0, 229.0, 155.0, 105.0, 97.0, 142.0, 175.0, 247.0, 187.0]
2025-09-16 14:27:51,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 34 seconds)
2025-09-16 14:29:44,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:29:48,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1432.87683 ± 319.134
2025-09-16 14:29:48,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1925.1028, 1705.2529, 1402.9141, 1128.2798, 1637.0876, 804.6733, 1608.0673, 1339.643, 1135.1852, 1642.5616]
2025-09-16 14:29:48,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [361.0, 324.0, 263.0, 209.0, 289.0, 150.0, 309.0, 262.0, 218.0, 333.0]
2025-09-16 14:29:48,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1432.88) for latency 12
2025-09-16 14:29:48,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 24 seconds)
2025-09-16 14:31:42,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:31:45,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1100.40088 ± 395.011
2025-09-16 14:31:45,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1215.6196, 1022.7293, 1022.84174, 557.94366, 1259.7213, 1594.4447, 731.0573, 470.12332, 1508.3787, 1621.149]
2025-09-16 14:31:45,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 190.0, 199.0, 98.0, 257.0, 333.0, 140.0, 83.0, 273.0, 311.0]
2025-09-16 14:31:45,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 38 seconds)
2025-09-16 14:33:39,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:33:42,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1128.87537 ± 426.098
2025-09-16 14:33:42,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1595.1786, 1193.2577, 1117.7649, 679.635, 1424.8861, 1497.6838, 886.54535, 1784.3302, 452.73, 656.74255]
2025-09-16 14:33:42,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [291.0, 222.0, 206.0, 121.0, 258.0, 305.0, 181.0, 336.0, 99.0, 128.0]
2025-09-16 14:33:42,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 45 seconds)
2025-09-16 14:35:37,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:35:40,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1172.11682 ± 596.112
2025-09-16 14:35:40,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1081.28, 1058.758, 2706.388, 1332.5776, 1416.48, 662.9478, 971.18286, 1339.4514, 741.46216, 410.63986]
2025-09-16 14:35:40,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 196.0, 532.0, 254.0, 278.0, 130.0, 182.0, 256.0, 137.0, 77.0]
2025-09-16 14:35:40,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 44 seconds)
2025-09-16 14:37:36,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:37:41,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1656.94458 ± 1301.224
2025-09-16 14:37:41,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4475.691, 524.70233, 1096.2489, 916.18976, 724.8828, 2316.9985, 1308.1627, 1051.8976, 3609.6282, 545.04346]
2025-09-16 14:37:41,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [888.0, 98.0, 212.0, 173.0, 131.0, 453.0, 254.0, 193.0, 662.0, 101.0]
2025-09-16 14:37:41,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1656.94) for latency 12
2025-09-16 14:37:41,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 8 seconds)
2025-09-16 14:39:33,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:39:37,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1265.02954 ± 525.765
2025-09-16 14:39:37,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1090.8201, 2217.4138, 1483.9058, 946.5206, 1816.0966, 794.986, 803.6167, 410.2764, 1378.2101, 1708.4504]
2025-09-16 14:39:37,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 416.0, 268.0, 174.0, 339.0, 142.0, 146.0, 75.0, 259.0, 321.0]
2025-09-16 14:39:37,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 1 second)
2025-09-16 14:41:33,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:41:35,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 909.47571 ± 193.498
2025-09-16 14:41:35,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [975.11523, 824.7451, 1126.9073, 917.3178, 986.2963, 866.05133, 1048.1198, 1015.5038, 384.4293, 950.2707]
2025-09-16 14:41:35,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 149.0, 208.0, 165.0, 178.0, 159.0, 195.0, 188.0, 72.0, 170.0]
2025-09-16 14:41:35,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 11 seconds)
2025-09-16 14:43:31,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:43:34,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1166.42493 ± 642.771
2025-09-16 14:43:34,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1229.5043, 835.0151, 614.2478, 444.33932, 2915.6226, 922.98047, 1346.2775, 1009.2118, 1260.9886, 1086.0614]
2025-09-16 14:43:34,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [257.0, 158.0, 114.0, 94.0, 569.0, 176.0, 246.0, 188.0, 235.0, 199.0]
2025-09-16 14:43:34,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 22 seconds)
2025-09-16 14:45:28,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:45:30,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 906.15851 ± 304.250
2025-09-16 14:45:30,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [542.9446, 587.2628, 1342.3684, 1035.3634, 861.73584, 908.3277, 447.06366, 1257.1777, 801.47437, 1277.866]
2025-09-16 14:45:30,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 116.0, 237.0, 191.0, 150.0, 176.0, 80.0, 232.0, 145.0, 237.0]
2025-09-16 14:45:30,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 13 seconds)
2025-09-16 14:47:26,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:47:29,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1402.24109 ± 406.107
2025-09-16 14:47:29,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1941.0194, 1347.7622, 738.36664, 1360.1758, 1308.2396, 2248.5793, 1400.6835, 1204.8458, 1453.5825, 1019.1572]
2025-09-16 14:47:29,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [367.0, 248.0, 130.0, 254.0, 236.0, 407.0, 262.0, 221.0, 281.0, 193.0]
2025-09-16 14:47:29,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 12 seconds)
2025-09-16 14:49:25,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:49:30,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1668.56995 ± 917.788
2025-09-16 14:49:30,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1234.3893, 3525.3828, 2784.784, 1223.8424, 1469.5562, 1639.8846, 730.92346, 1077.339, 495.49826, 2504.0996]
2025-09-16 14:49:30,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 651.0, 515.0, 227.0, 283.0, 323.0, 130.0, 232.0, 88.0, 487.0]
2025-09-16 14:49:30,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1668.57) for latency 12
2025-09-16 14:49:30,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 33 seconds)
2025-09-16 14:51:24,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:51:29,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1587.93213 ± 655.100
2025-09-16 14:51:29,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1957.1283, 520.41125, 1352.3656, 867.53955, 1187.5812, 2775.0837, 2418.0034, 1486.5974, 1942.5056, 1372.1041]
2025-09-16 14:51:29,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [371.0, 104.0, 255.0, 159.0, 225.0, 520.0, 448.0, 281.0, 356.0, 269.0]
2025-09-16 14:51:29,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 36 seconds)
2025-09-16 14:53:29,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:53:34,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1758.58325 ± 1202.162
2025-09-16 14:53:34,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1046.364, 1262.7461, 1978.9718, 1450.6365, 836.53894, 855.62244, 2134.072, 5118.342, 1082.446, 1820.0911]
2025-09-16 14:53:34,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 222.0, 365.0, 281.0, 171.0, 157.0, 404.0, 1000.0, 218.0, 359.0]
2025-09-16 14:53:34,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1758.58) for latency 12
2025-09-16 14:53:34,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes)
2025-09-16 14:55:27,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:55:31,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1582.94031 ± 814.629
2025-09-16 14:55:31,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [856.6211, 2761.316, 1980.9813, 738.27234, 2827.2493, 1844.9701, 1130.9158, 2129.9636, 1246.9052, 312.20905]
2025-09-16 14:55:31,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 523.0, 370.0, 146.0, 516.0, 336.0, 225.0, 404.0, 234.0, 54.0]
2025-09-16 14:55:31,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 4 seconds)
2025-09-16 14:57:26,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:57:30,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1408.19165 ± 625.806
2025-09-16 14:57:30,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [928.0913, 2869.0366, 1117.5164, 617.57806, 1656.6456, 1977.9257, 1475.982, 1550.6617, 959.85486, 928.625]
2025-09-16 14:57:30,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 535.0, 209.0, 130.0, 302.0, 361.0, 284.0, 284.0, 182.0, 176.0]
2025-09-16 14:57:30,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 2 seconds)
2025-09-16 14:59:26,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:59:33,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2418.63208 ± 1258.553
2025-09-16 14:59:33,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3681.857, 1996.149, 2250.1824, 1828.1396, 998.10333, 5271.972, 2771.4346, 1720.697, 777.15497, 2890.6306]
2025-09-16 14:59:33,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [679.0, 370.0, 413.0, 347.0, 180.0, 1000.0, 527.0, 326.0, 148.0, 539.0]
2025-09-16 14:59:33,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (2418.63) for latency 12
2025-09-16 14:59:33,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 8 seconds)
2025-09-16 15:01:29,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:01:33,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1419.50952 ± 690.658
2025-09-16 15:01:33,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2177.439, 555.8483, 1200.2664, 2376.7761, 1394.8015, 2293.0278, 421.0212, 608.838, 1500.8583, 1666.2179]
2025-09-16 15:01:33,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [407.0, 98.0, 237.0, 443.0, 247.0, 439.0, 76.0, 111.0, 317.0, 313.0]
2025-09-16 15:01:33,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 11 seconds)
2025-09-16 15:03:27,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:03:35,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2846.56665 ± 1665.425
2025-09-16 15:03:35,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5314.0635, 1193.7314, 4723.381, 2577.4065, 1363.5447, 1194.6394, 5343.9023, 1152.0662, 3601.7246, 2001.2064]
2025-09-16 15:03:35,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 243.0, 880.0, 481.0, 273.0, 230.0, 1000.0, 201.0, 673.0, 374.0]
2025-09-16 15:03:35,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (2846.57) for latency 12
2025-09-16 15:03:35,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 1 second)
2025-09-16 15:05:39,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:05:49,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3440.60620 ± 1788.363
2025-09-16 15:05:49,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1052.4761, 5338.4517, 1200.5901, 5242.0225, 5234.2236, 2030.0353, 4856.7666, 1007.162, 4394.0176, 4050.3179]
2025-09-16 15:05:49,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 1000.0, 235.0, 1000.0, 1000.0, 395.0, 920.0, 192.0, 843.0, 775.0]
2025-09-16 15:05:49,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3440.61) for latency 12
2025-09-16 15:05:49,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 42 seconds)
2025-09-16 15:07:40,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:07:50,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3693.79224 ± 1786.963
2025-09-16 15:07:50,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [490.50153, 4683.7344, 3783.0786, 2131.724, 5400.041, 3535.4133, 5405.987, 5306.5127, 970.86664, 5230.0615]
2025-09-16 15:07:50,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 908.0, 682.0, 398.0, 1000.0, 658.0, 1000.0, 1000.0, 180.0, 1000.0]
2025-09-16 15:07:50,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3693.79) for latency 12
2025-09-16 15:07:50,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 43 seconds)
2025-09-16 15:09:41,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:09:50,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3154.32471 ± 1501.568
2025-09-16 15:09:50,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1458.1709, 463.95798, 2644.15, 4122.997, 2975.1292, 4275.719, 2918.9907, 2157.93, 5292.6284, 5233.5728]
2025-09-16 15:09:50,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [271.0, 83.0, 506.0, 776.0, 550.0, 808.0, 545.0, 394.0, 1000.0, 1000.0]
2025-09-16 15:09:50,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 34 seconds)
2025-09-16 15:11:50,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:11:57,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2310.45020 ± 1401.210
2025-09-16 15:11:57,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4556.494, 3152.9553, 669.89923, 3661.5974, 1130.5526, 1019.0293, 3111.5298, 1384.3787, 3755.4045, 662.66223]
2025-09-16 15:11:57,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [887.0, 638.0, 136.0, 729.0, 235.0, 200.0, 598.0, 262.0, 701.0, 117.0]
2025-09-16 15:11:57,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 43 seconds)
2025-09-16 15:13:52,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:14:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3825.11328 ± 1569.447
2025-09-16 15:14:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3482.8584, 5243.556, 3347.5012, 2087.6497, 5270.817, 5257.6587, 5268.6313, 1499.6749, 1491.8785, 5300.9097]
2025-09-16 15:14:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [675.0, 1000.0, 606.0, 404.0, 1000.0, 1000.0, 1000.0, 296.0, 288.0, 1000.0]
2025-09-16 15:14:03,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3825.11) for latency 12
2025-09-16 15:14:03,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 44 seconds)
2025-09-16 15:16:02,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:16:14,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3970.73120 ± 1986.021
2025-09-16 15:16:14,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5229.755, 5279.931, 5198.2163, 1118.3579, 5328.06, 5331.5933, 5204.1133, 1347.37, 403.56024, 5266.3535]
2025-09-16 15:16:14,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 204.0, 1000.0, 1000.0, 1000.0, 247.0, 70.0, 1000.0]
2025-09-16 15:16:14,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3970.73) for latency 12
2025-09-16 15:16:14,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 35 seconds)
2025-09-16 15:18:07,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:18:17,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3605.37354 ± 1840.681
2025-09-16 15:18:17,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5458.149, 2156.5007, 2456.4539, 5522.145, 5385.8174, 601.2857, 3364.0935, 989.89307, 4934.3076, 5185.088]
2025-09-16 15:18:17,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 398.0, 438.0, 1000.0, 1000.0, 105.0, 615.0, 204.0, 903.0, 935.0]
2025-09-16 15:18:17,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 32 seconds)
2025-09-16 15:20:25,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:20:37,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4167.16553 ± 1831.923
2025-09-16 15:20:37,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5111.314, 5063.4897, 461.1466, 4695.8027, 5050.488, 5117.6167, 567.97144, 5256.753, 5175.378, 5171.694]
2025-09-16 15:20:37,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 82.0, 914.0, 1000.0, 1000.0, 105.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:20:37,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (4167.17) for latency 12
2025-09-16 15:20:37,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 47 seconds)
2025-09-16 15:22:26,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:22:37,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3848.89014 ± 1971.268
2025-09-16 15:22:37,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5307.6157, 1346.1587, 5301.308, 5289.7617, 883.49286, 5448.7593, 3910.852, 5179.444, 523.8923, 5297.6196]
2025-09-16 15:22:37,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 257.0, 1000.0, 1000.0, 157.0, 1000.0, 728.0, 1000.0, 92.0, 1000.0]
2025-09-16 15:22:37,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 32 seconds)
2025-09-16 15:24:33,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:24:44,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3813.10596 ± 1554.510
2025-09-16 15:24:44,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1912.233, 5280.7407, 5251.0244, 5199.1714, 2682.7595, 5265.881, 5305.0312, 3549.2231, 1150.1619, 2534.8347]
2025-09-16 15:24:44,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [371.0, 1000.0, 1000.0, 1000.0, 515.0, 1000.0, 1000.0, 682.0, 223.0, 478.0]
2025-09-16 15:24:44,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 24 seconds)
2025-09-16 15:26:45,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:26:57,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3778.12891 ± 1504.268
2025-09-16 15:26:57,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1357.875, 3673.2786, 1687.121, 5152.3096, 5035.884, 5118.5786, 1848.0887, 5101.806, 3746.8118, 5059.537]
2025-09-16 15:26:57,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [269.0, 735.0, 327.0, 1000.0, 1000.0, 1000.0, 361.0, 1000.0, 731.0, 1000.0]
2025-09-16 15:26:57,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 17 seconds)
2025-09-16 15:28:59,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:29:10,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3700.27466 ± 1912.908
2025-09-16 15:29:10,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2428.7036, 5352.3447, 1064.7705, 1454.5055, 5349.999, 5385.659, 5315.027, 762.9139, 4518.257, 5370.567]
2025-09-16 15:29:10,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [449.0, 1000.0, 202.0, 287.0, 1000.0, 1000.0, 1000.0, 138.0, 852.0, 1000.0]
2025-09-16 15:29:10,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 10 seconds)
2025-09-16 15:31:07,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:31:18,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3476.77661 ± 2216.445
2025-09-16 15:31:18,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5318.5293, 147.48056, 5292.638, 5244.134, 1147.3931, 5222.501, 389.50977, 1499.1637, 5222.247, 5284.1704]
2025-09-16 15:31:18,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 28.0, 1000.0, 1000.0, 209.0, 1000.0, 71.0, 275.0, 1000.0, 1000.0]
2025-09-16 15:31:18,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1251 [DEBUG]: Training session finished
